Skip to content

Cache cl100k_base locally + retry on network errors (cherry-pick upstream #183)#2

Merged
zhujunsan merged 2 commits into
mainfrom
cherry/tokenizer-cache-183
Jun 17, 2026
Merged

Cache cl100k_base locally + retry on network errors (cherry-pick upstream #183)#2
zhujunsan merged 2 commits into
mainfrom
cherry/tokenizer-cache-183

Conversation

@zhujunsan

Copy link
Copy Markdown
Owner

Summary

Cherry-pick of upstream PR jwadow/kiro-gateway#183 (commit 11703f3 by @1c3z).

_get_encoding() used to mark tiktoken permanently unusable on ANY exception — including a transient IncompleteRead while pulling cl100k_base.tiktoken from the Azure CDN on first use. A single dropped connection silently demoted the whole process to the length-based fallback heuristic until restart, losing tokenization precision for every later request.

  • Pins TIKTOKEN_CACHE_DIR to a repo-local .tiktoken_cache/ (via setdefault, so operator overrides win) → the ~1.6 MB BPE blob is fetched at most once and reused across restarts.
  • Only caches the ImportError (tiktoken truly missing) as permanent; network/transient errors leave _encoding=None so the next call retries.
  • .gitignore: ignore .tiktoken_cache/.

Valuable in containers where first-call network flakiness shouldn't permanently degrade token accounting.

Version

Bumps APP_VERSION to 2.4.dev.13-fork.3.

Merge order

Fork-number series. Intended merge order: fork.2 (jwadow#215) → fork.3 (this) → fork.4 (jwadow#182). Out-of-order merges produce a one-line APP_VERSION conflict — keep the highest fork number.

Test plan

  • Full suite green locally: pytest -q → 1695 passed (includes the PR's 4 new tests)
  • CI green on the fork

1c3z and others added 2 commits June 17, 2026 17:51
`_get_encoding` used to mark tiktoken unusable on ANY exception, including a
transient `IncompleteRead` while pulling cl100k_base.tiktoken from the Azure
blob CDN. A single dropped connection on first use permanently demoted the
whole process to the length-based fallback heuristic, silently losing
tokenization precision for every later request until restart.

Two changes:

1. Pin `TIKTOKEN_CACHE_DIR` (via `setdefault`, so operator overrides win) to a
   repo-local `.tiktoken_cache/` directory. The ~1.6 MB BPE blob is fetched at
   most once per working copy and reused across restarts, eliminating the
   per-process re-download that exposed us to CDN flakiness in the first place.
   `.tiktoken_cache/` added to .gitignore.

2. Separate `ImportError` (tiktoken truly missing → cache the disabled state
   permanently) from runtime errors (network/IO failure during BPE download →
   do NOT cache, return None this call, retry on next call). One transient
   blip no longer kills tokenization for the lifetime of the process.

Tests (tests/unit/test_tokenizer.py::TestEncodingCacheAndRetry):
- TIKTOKEN_CACHE_DIR is set to repo-local `.tiktoken_cache` on module import.
- Network failure does not poison the `_encoding` sentinel.
- After a failure, the next call retries and a recovered encoding is cached.
- `ImportError` is still cached so we don't re-import a missing package on
  every call (regression guard for the previous behavior).
Bump for the tokenizer-cache cherry-pick (upstream jwadow#183).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@zhujunsan zhujunsan force-pushed the cherry/tokenizer-cache-183 branch from 36c36e8 to 0c9bb70 Compare June 17, 2026 09:52
@zhujunsan zhujunsan merged commit 3ef8efa into main Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants