postpone timezone regex evaluation until first use - shaves off time from package import #1181

beda42 · 2023-09-07T12:08:15Z

This MR is related to issue #533 which is caused mainly by a time intensive parsing of regular expressions for timezone matching. This MR introduces a global object TzRegexCache and moves preparation of the regexps into it. The regexps are no longer parsed on startup, but during first use instead. On my machine it shaves off about 200 ms from the import time of dateparser thus reducing it to less than 20 % of the original import time.

…from package import

It can be expensive to import due to building thousands of regexps at import time (see scrapinghub/dateparser#1181).

tobymao · 2025-01-30T06:28:39Z

is this library still actively maintained? i ran into this issue and would like to fix this. my idea is instead to cache the regexes so that overall time gets faster. this solution will still incur regex parsing time, although it does speed up parsing.

but i only want to do a PR if it actually has a shot at making it in.

Gallaecio · 2025-01-30T09:15:52Z

We are reviewing PRs, and may even have a release “soon” (see the most recent PRs).

tobymao · 2025-01-30T15:42:56Z

We are reviewing PRs, and may even have a release “soon” (see the most recent PRs).

amazing! would you accept my idea? is there somewhere to discuss with maintainers this approach before i start to make iterations faster?

Gallaecio · 2025-01-30T15:47:59Z

You can either start a draft PR with little code and start the discussion there, or discuss it here.

this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533

tobymao · 2025-01-30T22:17:34Z

i've taken another stab at this @beda42 #1250

instead of lazy evaluation, i do caching so that we only pay the regex compile time cost once. it will break the cache if there's a version bump.

this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533

this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533

#1250) * feat: add caching for timezone offsets, significantly speeds up import this is different from pr #1181. it builds a cache at install time which can be distributed. closes #533 * Upgrade the minimum version of regex --------- Co-authored-by: Adrián Chaves <[email protected]>

postpone timezone regex evaluation until first use - shaves off time …

92fb484

…from package import

nijel added a commit to nijel/weblate that referenced this pull request Oct 2, 2024

feat(search): lazily import dateparser

bf9ff64

It can be expensive to import due to building thousands of regexps at import time (see scrapinghub/dateparser#1181).

nijel mentioned this pull request Oct 2, 2024

feat(search): lazily import dateparser WeblateOrg/weblate#12655

Merged

5 tasks

nijel added a commit to WeblateOrg/weblate that referenced this pull request Oct 2, 2024

feat(search): lazily import dateparser

404a3c1

It can be expensive to import due to building thousands of regexps at import time (see scrapinghub/dateparser#1181).

tobymao mentioned this pull request Jan 30, 2025

feat: add caching for timezone offsets, significantly speeds up import #1250

Merged

wRAR closed this in #1250 Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

postpone timezone regex evaluation until first use - shaves off time from package import #1181

postpone timezone regex evaluation until first use - shaves off time from package import #1181

Uh oh!

beda42 commented Sep 7, 2023

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Gallaecio commented Jan 30, 2025

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Gallaecio commented Jan 30, 2025

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Uh oh!

postpone timezone regex evaluation until first use - shaves off time from package import #1181

postpone timezone regex evaluation until first use - shaves off time from package import #1181

Uh oh!

Conversation

beda42 commented Sep 7, 2023

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Gallaecio commented Jan 30, 2025

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Gallaecio commented Jan 30, 2025

Uh oh!

tobymao commented Jan 30, 2025

Uh oh!

Uh oh!