Skip to content

Fuseji lookup (eg. マ○ドナ○ド) #2437

Draft
noatdk wants to merge 5 commits into
yomidevs:masterfrom
noatdk:fuseji-lookup
Draft

Fuseji lookup (eg. マ○ドナ○ド) #2437
noatdk wants to merge 5 commits into
yomidevs:masterfrom
noatdk:fuseji-lookup

Conversation

@noatdk
Copy link
Copy Markdown

@noatdk noatdk commented Jun 3, 2026

Summary

Adds support for looking up fuseji (伏せ字) (per #2436).

マ◯ド◯ルド    → マクドナルド
打〇込む      → 打ち込む
〇ち込む      → 打ち込む   (leading mask)
打ち込〇      → 打ち込む   (trailing mask)

Screenshots

image image image
[fuseji] anchor "マ" (prefix) skip-scan steps expr=323 read=300; 171 survivors -> 171 rows; scan 56.7ms, gets 6.0ms
[fuseji] "マ○ド」 は おそらく 「マッド" via prefix "マ": 171 match(es) in 64.30ms

[fuseji] anchor "マ" (prefix) skip-scan steps expr=337 read=315; 172 survivors -> 172 rows; scan 31.3ms, gets 6.0ms
[fuseji] "マ○ドナ○ドよりモスバーガ○のが" via prefix "マ": 172 match(es) in 37.70ms

[fuseji] anchor "マ" (prefix) skip-scan steps expr=2840 read=2720; 1031 survivors -> 1031 rows; scan 362.9ms, gets 34.0ms
[fuseji] "マ〇〇ド」 は おそらく 「マイ" via prefix "マ": 1031 match(es) in 399.90ms

[fuseji] anchor "お" (prefix) skip-scan steps expr=633 read=272; 256 survivors -> 256 rows; scan 59.1ms, gets 8.6ms
[fuseji] "お◯子" via prefix "お": 256 match(es) in 69.10ms

[fuseji] anchor "ちゃんとお .." (suffix) cursor scanned 0 keys; 0 survivors, scan 8.1ms
[fuseji] anchor "ちゃんとお ." (suffix) cursor scanned 0 keys; 0 survivors, scan 0.5ms
[fuseji] anchor "ちゃんとお " (suffix) cursor scanned 0 keys; 0 survivors, scan 2.5ms
[fuseji] anchor "ちゃんとお" (suffix) cursor scanned 0 keys; 0 survivors, scan 3.3ms
[fuseji] anchor "ちゃんと" (suffix) cursor scanned 7 keys; 0 survivors, scan 5.9ms
[fuseji] anchor "ちゃん" (suffix) cursor scanned 6139 keys; 1092 survivors -> 1092 rows; scan 1068.7ms, gets 60.2ms
[fuseji] "〇〇ちゃんとお ...Read " via suffix "ちゃん": 1092 match(es) in 1153.30ms

[fuseji] anchor "ちゃん•君」と呼" (suffix) cursor scanned 0 keys; 0 survivors, scan 4.4ms
[fuseji] anchor "ちゃん•君」と" (suffix) cursor scanned 0 keys; 0 survivors, scan 2.6ms
[fuseji] anchor "ちゃん•君」" (suffix) cursor scanned 0 keys; 0 survivors, scan 2.7ms
[fuseji] anchor "ちゃん•君" (suffix) cursor scanned 0 keys; 0 survivors, scan 1.3ms
[fuseji] anchor "ちゃん•" (suffix) cursor scanned 0 keys; 0 survivors, scan 0.9ms
[fuseji] anchor "ちゃん" (suffix) cursor scanned 6139 keys; 1092 survivors -> 1092 rows; scan 386.9ms, gets 49.6ms
[fuseji] "〇〇ちゃん•君」と呼ぶのは普通な" via suffix "ちゃん": 1092 match(es) in 451.10ms

Chrome, macOS, only PixivLight installed

How it works

Triggers are single-character wildcards. Since IndexedDB only does prefix scans, the lookup is anchored on the unmasked text:

  1. Anchor on the unmasked run before the first trigger (prefix scan), or — if the text starts with a trigger — after the last trigger (suffix scan via the reversed indices).
  2. Scan for matches. For a prefix anchor a skip-scan (loose index scan) walks the index driven by the masked pattern: it seeks straight to the required char at literal positions, lets the cursor enumerate only the chars that
    actually occur at mask positions, and skips subtrees that can't match — so cost scales with distinct mask-position characters + matches, not the size of the (low-selectivity) anchor range. Suffix anchors use a plain key cursor.
  3. Build only the surviving records into entries, matching each against the pattern to record the matched length (code-point matcher, one wildcard per trigger).

Settings

Two new rows under Advanced translation settings:

  • Enable fuseji lookuptranslation.enableFusejiLookup (toggle, default false)
  • Fuseji trigger characterstranslation.fusejiTriggers (text, default ◯○〇●)

Trigger characters act as one-character wildcards during lookup; the default set covers the common circle variants used as masks.

Limitations

  • Consecutive masks before a literal (e.g. マ◯◯ド) are slower and noisy: every word that fits the all-masked span counts as a partial match, so each extra adjacent trigger multiplies the number of results (and the time to build them).
  • Deflection skipped. When the mask covers the stem (e.g. ◯んでる for 死んでる), the visible tail (んでる) is not a dictionary-form headword and the anchor we look up is the stripped tail, so it can't resolve to 死ぬ. Supporting it would require deinflecting the visible tail and reconstructing the pattern (◯んでる → ◯ぬ/◯む) before lookup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant