Skip to content

feat(fts): strip whitespace in tokenization & remove row_id field in lucene++#139

Merged
lszskye merged 4 commits intoalibaba:mainfrom
lxy-9602:rm-rowid-field-in-lucene
Feb 12, 2026
Merged

feat(fts): strip whitespace in tokenization & remove row_id field in lucene++#139
lszskye merged 4 commits intoalibaba:mainfrom
lxy-9602:rm-rowid-field-in-lucene

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 commented Feb 11, 2026

Purpose

  1. Strip whitespace during tokenization to avoid indexing empty or whitespace only tokens.
  2. Remove row_id field from Lucene document schema to reduce index size and
    minimize I/O during search.

Linked issue: #69

Tests

JiebaAnalyzerTest, TestNormalize

API and Format

Documentation

@lucasfang lucasfang requested a review from Copilot February 11, 2026 09:24
Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@lszskye lszskye merged commit 2cb98dc into alibaba:main Feb 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants