-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postpone timezone regex evaluation until first use - shaves off time from package import #1181
base: master
Are you sure you want to change the base?
Conversation
…from package import
It can be expensive to import due to building thousands of regexps at import time (see scrapinghub/dateparser#1181).
It can be expensive to import due to building thousands of regexps at import time (see scrapinghub/dateparser#1181).
is this library still actively maintained? i ran into this issue and would like to fix this. my idea is instead to cache the regexes so that overall time gets faster. this solution will still incur regex parsing time, although it does speed up parsing. but i only want to do a PR if it actually has a shot at making it in. |
We are reviewing PRs, and may even have a release “soon” (see the most recent PRs). |
amazing! would you accept my idea? is there somewhere to discuss with maintainers this approach before i start to make iterations faster? |
You can either start a draft PR with little code and start the discussion there, or discuss it here. |
this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533
this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533
this is different from pr scrapinghub#1181. that pr only makes import faster but still incurs cost on the first usage. this one leverages an optional cache. closes scrapinghub#533
this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533
this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533
this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533
this is different from pr scrapinghub#1181. it builds a cache at install time which can be distributed. closes scrapinghub#533
This MR is related to issue #533 which is caused mainly by a time intensive parsing of regular expressions for timezone matching. This MR introduces a global object
TzRegexCache
and moves preparation of the regexps into it. The regexps are no longer parsed on startup, but during first use instead. On my machine it shaves off about 200 ms from the import time ofdateparser
thus reducing it to less than 20 % of the original import time.