Skip to content

Fix ordering of words with special chars#916

Merged
FyreByrd merged 3 commits into
mainfrom
fix/special-chars-index
Dec 3, 2025
Merged

Fix ordering of words with special chars#916
FyreByrd merged 3 commits into
mainfrom
fix/special-chars-index

Conversation

@FyreByrd
Copy link
Copy Markdown
Collaborator

@FyreByrd FyreByrd commented Dec 2, 2025

Words with smart quotes were being sorted ahead of words without. As far as I can tell, this problem only affected reversals.

Questions:

  • Are smart quotes the only characters we should be worried about?

Summary by CodeRabbit

  • Bug Fixes
    • Fixed sorting of lexicon entries to properly handle special characters, including smart quotes
    • Enhanced word ordering to maintain language context through improved locale-aware comparison

✏️ Tip: You can customize this high-level summary in your review settings.

@FyreByrd FyreByrd requested a review from chrisvire December 2, 2025 21:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 2, 2025

Walkthrough

Smart quote characters are normalized during word sorting by removing them before locale-aware comparison. A new letter field is added to ReversalWord objects, replacing the previous approach of deriving the sort letter from the word's first character, enabling more explicit and consistent sorting control.

Changes

Cohort / File(s) Summary
Reverse Index Sorting Normalization
convert/convertReverseIndex.ts
Introduces noCompareRE regex to identify smart quotes and updates per-letter entry sorting to compare sanitized strings via a.localeCompare(b) after removing smart quote characters, preserving language context.
Lexicon Component & Type Updates
src/routes/lexicon/+page.svelte, src/lib/data/stores/lexicon.svelte
Changes sorting logic to use a.letter field directly instead of deriving it from a.word[0].toLowerCase(). Adds letter: string property to ReversalWord type; loadLetterData now populates this field during data initialization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify that noCompareRE regex correctly targets all intended smart quote variants
  • Confirm letter field is consistently populated across all loadLetterData code paths
  • Test sort order with mixed smart quote and standard quote scenarios in different locales

Poem

🐰 Smart quotes once tripped our sorting song,
But now they vanish, nothing wrong!
With letter fields so clear and bright,
Our words align in order right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix ordering of words with special chars' is directly related to the main change: fixing sort order issues with special characters (smart quotes) in reversals.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/special-chars-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40a6310 and 723440f.

📒 Files selected for processing (2)
  • convert/convertReverseIndex.ts (1 hunks)
  • src/routes/lexicon/+page.svelte (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: test
🔇 Additional comments (2)
src/routes/lexicon/+page.svelte (1)

82-83: LGTM! Improved sorting approach.

Using the pre-computed letter field is more efficient than deriving it from the first character of each word during every sort operation. This works correctly with the build-time letter assignment performed in convertReverseIndex.ts.

convert/convertReverseIndex.ts (1)

70-72: Verify whether the regex should include additional typographic characters.

The current regex only removes double smart quotes (" and ") before sorting. Consider whether single smart quotes (' and '), em/en dashes (—/–), or other typographic punctuation should also be normalized to ensure consistent sort order across all potential input data.

Comment thread convert/convertReverseIndex.ts Outdated
@chrisvire chrisvire force-pushed the fix/special-chars-index branch 2 times, most recently from 88f14e8 to 92e813a Compare December 3, 2025 19:11
Instead of first char of display string
The chars are not removed from the string, only ignored for sorting during conversion process.
@chrisvire chrisvire force-pushed the fix/special-chars-index branch from 92e813a to b15de39 Compare December 3, 2025 19:12
@chrisvire
Copy link
Copy Markdown
Member

Waiting for feedback on other characters other than smart quotes to ignore.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/routes/lexicon/+page.svelte (2)

73-78: Sorting by letter looks correct for bucket ordering; consider edge cases and intra-letter order

Using alphabet.indexOf(a.letter) instead of deriving from a.word[0] is a good way to avoid leading smart-quote issues and centralizes normalization in the index/builder.

Two follow-ups to consider:

  • If any ReversalWord has a missing/invalid letter, indexOf will be -1, and those entries will sort before all valid letters. Either ensure letter is always valid for this alphabet (preferred) or add a fallback (e.g., treat -1 as the end of the alphabet).
  • This comparator only orders by letter, so intra-letter ordering is left to whatever order the words currently have plus engine sort stability. If you ever need deterministic per-letter sorting here (beyond what the generator gives you), you might add a secondary key (e.g., a normalized sortKey from the index / word.localeCompare).

101-127: letter population is sensible; tweak typing and naming for clarity

Stamping letter from the loadLetterData(letter) parameter makes each ReversalWord self-describing, which works well with the new sort behavior.

A couple of small improvements:

  • The type at Line 101 uses name: 'string', which is a literal type; this is almost certainly meant to be name: string:
-const data: Record<string, { index: number; name: 'string' }[]> =
+const data: Record<string, { index: number; name: string }[]> =
     await response.json();
  • If letter is conceptually the bucket key (not necessarily the literal first character in word), a slightly more explicit name like bucketLetter or a short comment could help future readers understand that this already incorporates the smart-quote / special-char normalization done when building the index.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88f14e8 and b15de39.

📒 Files selected for processing (2)
  • convert/convertReverseIndex.ts (1 hunks)
  • src/routes/lexicon/+page.svelte (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • convert/convertReverseIndex.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test

@FyreByrd
Copy link
Copy Markdown
Collaborator Author

FyreByrd commented Dec 3, 2025

Based on discussion with @chrisvire and an independent investigation of the app-builders codebase, it should be enough for now to ignore only the smart quotes for sorting reversal entries during the conversion process. In the future, we may also need to rework how we do reversals in general to more closely align with the Android app.

@FyreByrd FyreByrd merged commit 57588a9 into main Dec 3, 2025
4 checks passed
@FyreByrd FyreByrd deleted the fix/special-chars-index branch December 3, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants