Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A note on using Intl.Segmenter #2

Open
andrew--r opened this issue May 2, 2023 · 2 comments
Open

A note on using Intl.Segmenter #2

andrew--r opened this issue May 2, 2023 · 2 comments

Comments

@andrew--r
Copy link

andrew--r commented May 2, 2023

Hi! Intl.Segmenter is a native API for locale-aware text segmentation. While is not supported everywhere yet, it’d be nice to mention it in README and maybe compare the library not only to regexps, but also to the native API.

@hyrious
Copy link

hyrious commented May 6, 2023

Here's the sample code to use that if it helps:

function countWordsViaIntl(text) {
  const segmenter = new Intl.Segmenter(void 0, { granularity: "word" });
  const iterable = segmenter.segment(text);
  let i = 0;
  for (const e of iterable) if (e.isWordLike) i++;
  return i;
}

Note that it may not return the same result as countWords provided in this repo as there could be edge cases around emojis.

@thecodrr
Copy link
Owner

thecodrr commented May 6, 2023

@hyrious @andrew--r I tried adding a benchmark for Intl.Segmenter but unfortunately it constantly errors out with Javascript out of memory. Either the issue is with Tinybench or the implementation of Intl.Segmenter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants