OK to create a PR for increased accuracy of en_US zip codes? #3532
verdantburrito
started this conversation in
Ideas
Replies: 3 comments 5 replies
-
Single zip codes could just be a static string. Doesn't have to follow the previous patterns. Whatever is the most compact way to represent them makes sense If we did this we should also modify plain zipCode (without state) to only return valid zip codes also as this is likely used more than the state version. |
Beta Was this translation helpful? Give feedback.
5 replies
-
I created an "Issues" post to formally propose the change: #3534 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Pull request is up: #3539 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was wondering what the appetite would be for updating the /locales/en_US/location/postcode_by_state.ts file data to only describe valid US postal codes.
Faker currently generates values using integer ranges by state, like so:
However, some of the values it generates (e.g.
02803
) are simply random 5-digit integers; the United States Postal Service (USPS) doesn't recognize them:I suspect a significant percentage of the US zipcodes generated by Faker are invalid for this reason. For example, here is the single line of code Faker currently uses to generate Rhode Island (
RI
) zip codes:However, of those 140 zipcodes, 55 of them (39%) are invalid & not recognized by the USPS:
Here is an updated set of code that describes only the 85 valid Rhode Island (
RI
) zipcodes:The only reason I hesitate to create a pull request is that it'd require 10,006 lines of codes, instead of the 93 lines of codes in use today, to describe the set of valid, contiguous, non-gapped zipcodes for all US states. Part of that is because in trying to follow the existing code's convention, a range containing a single zipcode -- like the
RI
one below -- would need to be defined by its own line of code:Not a big deal by itself. But when applied to all United States zipcodes, it adds up very quickly; there are 3,359 ranges that contain only a single US zipcode.
Anyway, please let me know what you think. If the number of lines of code in the file is a non-issue, then I'm worrying about nothing & I'd be happy to create the PR with the code I already have sitting locally. And if there's a more concise way I should be describing those single-zipcode ranges, please let me know -- I'm more than happy to tweak my script as needed.
Beta Was this translation helpful? Give feedback.
All reactions