Skip to content

Commit 46dc0cd

Browse files
EmberCrazearthurdejongcclaussvovavilipetr.prikryl
authored
Fetches the latest upstream (#46)
* Switch from nose to pytest Nose hasn't seen a release since 2015 and sadly doesn't work with Python 3.10. See nose-devs/nose#1099 * Upgrade to CodeQL Action v2 https://github.blog/changelog/2022-04-27-code-scanning-deprecation-of-codeql-action-v1/ * Fix flake8 error This stops using not as a function and hopefully also makes the logic clearer. * Upgrade GitHub Actions Update checkout to v3 (no relevant changes) and setup-python to v4 (changes the names for pypy versions). * Add support for Python 3.10 * Put long line flake8 ignores in files instead of globally We have some long URLs in the code (mostly in docstrings) and wrapping them does not improve readability (and is difficult in docstrings) so the E501 ignore is now put inside each file instead of globally. Closes arthurdejong/python-stdnum#302 * Fix small typo Improper inflection of plurals. Closes arthurdejong/python-stdnum#299 * Add Czech bank account numbers Closes arthurdejong/python-stdnum#295 Closes arthurdejong/python-stdnum#296 * Use str.zfill() for padding leading zeros * Add extra court alias for german Handelsregisternummer Charlottenburg (Berlin) is a valid court representation for Berlin (Charlottenburg). See https://www.northdata.com/VRB+Service+GmbH,+Berlin/Amtsgericht+Charlottenburg+%28Berlin%29+HRB+103587+B Closes arthurdejong/python-stdnum#298 * Remove redundant steps with tox_job This also switches the other Tox jobs to use the latest Python 3.x interpreter. Closes arthurdejong/python-stdnum#305 * Update ISIL download URL * Provide a timeout to all download scripts * Update names of Wikipedia pages with IMSI codes * Ignore invalid downloaded country codes The page currently lists a country without a country code (is listed as "-"). This also ensures that lists of country codes are handled consistently. * Do not print trailing space * Update database files * Fix German OffeneRegister company registry URL * Update EU VAT Vies test with new number The number used before was apparently no longer valid. * Add support for Tunisia TIN Closes arthurdejong/python-stdnum#317 Closes arthurdejong/python-stdnum#309 * Add Kenyan TIN Closes arthurdejong/python-stdnum#300 Closes arthurdejong/python-stdnum#310 * Add support for Morocco TIN Closes arthurdejong/python-stdnum#226 Closes arthurdejong/python-stdnum#312 * Add Algerian NIF number This currently only checks the length and whether it only contains digits because little could be found on the structure of the number of whether there are any check digits. Closes arthurdejong/python-stdnum#313 Closes arthurdejong/python-stdnum#307 * Fix a couple typos found by codespell Closes arthurdejong/python-stdnum#333 * Add North Macedonian ЕДБ Note that this is implementation is mostly based on unofficial sources describing the format, which match the hundreds of examples found online. https://forum.it.mk/threads/modularna-kontrola-na-embg-edb-dbs-itn.15663/?__cf_chl_tk=Op2PaEIauip6Z.ZjvhP897O8gRVAwe5CDAVTpjx1sEo-1663498930-0-gaNycGzNCRE#post-187048 Also note that the algorithm for the check digit was tested on all found examples, and it doesn't work for all of them, despite those failing examples don't seem to be valid according to the official online search. Closes arthurdejong/python-stdnum#330 Closes arthurdejong/python-stdnum#222 * Add Faroe Islands V-number Closes arthurdejong/python-stdnum#323 Closes arthurdejong/python-stdnum#219 * Add support for Montenegro TIN Closes arthurdejong/python-stdnum#331 Closes arthurdejong/python-stdnum#223 * Add CAS Registry Number * Add support for Ghana TIN Closes arthurdejong/python-stdnum#326 Closes arthurdejong/python-stdnum#262 * Support running tests with PyPy 2.7 This also applies the fix from cfc80c8 from Python 2.7 to PyPy. * Update Fødselsnummer test case for date in future The future was now. This problem was pushed forwards to October 2039. * Remove duplicate CAS Registry Number The recently added stdnum.cas module was already available as teh stdnum.casrn module. Reverts acb6934 * Improve validation of CAS Registry Number This ensures that a leading 0 is treated as invalid. * Remove unused import Fixes 09d595b * Switch to parse_qs() from urllib.parse The function was removed from the cgi module in Python 3.8. * Switch to escape() from html The function was removed from the cgi module in Python 3.8. * Support "I" and "O" in CUSIP number It is unclear why these letters were considered invalid at the time of the implementation. This also reduces the test set a bit while still covering most cases. Closes arthurdejong/python-stdnum#337 * Add a check_uid() function to the stdnum.ch.uid module This function can be used to performa a lookup of organisation information by the Swiss Federal Statistical Office web service. Related to arthurdejong/python-stdnum#336 * Make all exceptions inherit from ValueError All the validation exceptions (subclasses of ValidationError) are raised when a number is provided with an inappropriate value. * Pad with zeroes in a more readable manner Closes arthurdejong/python-stdnum#340 * Use HTTPS in URLs where possible * Ensure we always run flake8-bugbear This assumes that we no longer use Python 2.7 for running the flake8 tests any more. * Add support for Slovenian EMŠO (Unique Master Citizen Number) Closes arthurdejong/python-stdnum#338 * Add Pakistani ID card number Based on the implementation provided by Quantum Novice (Syed Haseeb Shah). Closes arthurdejong/python-stdnum#306 Closes arthurdejong/python-stdnum#304 * vatin: Add a few more tests for is_valid See arthurdejong/python-stdnum#316 * Pick up custom certificate from script path This ensures that the script can be run from any directory. Fixes c4ad714 * Increase timeout for CN Open Data download It seems that raw.githubusercontent.com can be extremely slow. * Update German OffeneRegister lookup data format It appears that the data structure at OffeneRegister has changed which requires a different query. Data is returned in a different structure. * Update database files * Get files ready for 1.18 release * Avoid newer flake8 The new 6.0.0 contains a number of backwards incompatible changes for which plugins need to be updated and configuration needs to be updated. Sadly the maintainer no longer accepts contributions or discussion See PyCQA/flake8#1760 * Fix a typo Clocses arthurdejong/python-stdnum#341 * Run Python 3.5 and 3.6 GitHub tests on older Ubuntu The ubuntu-latest now points to ubuntu-22.04 instead of ubuntu-20.04 before. This also switches the PyPy version to test with to 3.9. * Fix typos found by codespell Closes arthurdejong/python-stdnum#344 * Add initial CONTRIBUTING.md file Initial description of the information needed for adding new number formats and some coding and testing guidelines. * Add support for Egypt TIN This also convertis Arabic digits to ASCII digits. Closes arthurdejong/python-stdnum#225 Closes arthurdejong/python-stdnum#334 * Extend number properties to show in online check This also ensures that flake8 is run on the WSGI script. * Fix typo in UEN docstring * Fix Albanian tax number validation This extends the description of the Albanian NIPT (NUIS) number with information on the structure of the number. The first character was previously limited between J and L but this letter indicates a decade and the number is also used for individuals to where it indicates a birth date. Thanks Julien Launois for pointing this out. Source: https://www.oecd.org/tax/automatic-exchange/crs-implementation-and-assistance/tax-identification-numbers/Albania-TIN.pdf Fixes 3db826c Closes arthurdejong/python-stdnum#402 * Update IBAN database file Closes arthurdejong/python-stdnum#409 * Extend date parsing in GS1-128 Some new AIs have new date formats or have changed the way optional components of formats are defined. * Fix date formatting on PyPy 2.7 The original way of calling strftime was likely an artifact of Python 2.6 support. Fixes 7e84c05 * Add support for Python 3.11 * Ensure flake8 is run on all Python files This also fixes code style fixes in the Sphinx configuration file. * Add get_county() function to Romanian CNP This also validates the county part of the number. Closes arthurdejong/python-stdnum#407 * Add functionality to get gender from Belgian National Number This also extends the documentation for the number. Closes https://github.com/arthurdejong/python-stdnum/pull/347/files * Add support for Finland HETU new century indicating signs More information at https://dvv.fi/en/reform-of-personal-identity-code Cloess arthurdejong/python-stdnum#396 * Add Spanish postcode validator Closes arthurdejong/python-stdnum#401 * Add support for Guinea TIN Closes arthurdejong/python-stdnum#384 Closes arthurdejong/python-stdnum#386 * Add automated checking for correct license header * Minor ISSN and ISBN documentation fixes Fix a comment that claimed incorrect ISSN length and use slightly more consistent terminology around check digits in ISSN and ISBN. Closes arthurdejong/python-stdnum#415 * Handle (partially) unknown birthdate of Belgian National Number This adds documentation for the special cases regarding birth dates embedded in the number, allows for date parts to be unknown and adds functions for getting the year and month. Closes arthurdejong/python-stdnum#416 * Run Python 2.7 tests in a container for GitHub Actions See actions/setup-python#672 * Add Belgian BIS Number Closes arthurdejong/python-stdnum#418 * Validate first digit of Canadian SIN See http://www.straightlineinternational.com/docs/vaildating_canadian_sin.pdf See https://lists.arthurdejong.org/python-stdnum-users/2023/msg00000.html * Fix file headers This improves consistency across files and fixes some files that had an incorrect file name reference. * Extend license check to file header check This also checks that the file name referenced in the file header is correct. * Add Slovenian Corporate Registration Number Closes arthurdejong/python-stdnum#414 * Validate European VAT numbers with EU or IM prefix Closes arthurdejong/python-stdnum#417 * Remove EU NACE update script The website that publishes the NACE catalogue has changed and a complete re-write of the script would be necessary. The data file hasn't changed since 2017 so is also unlikely to change until it is going to be replaced by NACE rev. 2.1 in 2025. See https://ec.europa.eu/eurostat/web/nace The NACE rev 2 specification can now be found here: https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2/data The NACE rev 2.1 specification can now be found here: https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2.1._%28NACE_2.1%29/data In both cases a ZIP file with RDF metadata can be downloaded (but the web applciation also exposes some simpler JSON APIs). * Update database files This also modifies the OUI update script because the website has changed to HTTPS and is sometimes very slow. The Belgian Commerzbank no longer has a registration and a bank account number in the tests used that bank. * Replace test number for German company registry The number seems to be no longer valid breaking the online tests. * Update Belarusian UNP online check The API for the online check for Belarusian UNP numbers at https://www.portal.nalog.gov.by/grp/getData has changed some small details of the API. * Rename license_file option in setup.cfg It seems the old option wasn't working with all versions of setuptools anyway. See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html * Avoid the deprecated assertRegexpMatches function * Use importlib.resource in place of deprecated pkg_resources Closes arthurdejong/python-stdnum#412 Closes arthurdejong/python-stdnum#413 * Remove obsolete intermediate certificate The portal.nalog.gov.by web no longer has an incomplete certificate chain. * Ensure all files are included in source archive Fixes b1dc313 Fixes 90044e2 * Get files ready for 1.19 release * Add support for Python 3.12 * Fix typo (thanks Александр Кизеев) * Ensure EU VAT numbers don't accept duplicate country codes * Add British Columbia PHN Closes arthurdejong/python-stdnum#421 * Add European Community (EC) Number Closes arthurdejong/python-stdnum#422 * Fix vatin number compacting for "EU" VAT numbers Thanks Davide Walder for finding this. Closes arthurdejong/python-stdnum#427 * Imporve French NIF validation (checksum) The last 3 digits are a checksum. % 511 https://ec.europa.eu/taxation_customs/tin/specs/FS-TIN%20Algorithms-Public.docx Closes arthurdejong/python-stdnum#426 * Fix Ukrainian EDRPOU check digit calculation This fixes the case where the weighted sum woud be 10 which should result in a check digit of 0. Closes arthurdejong/python-stdnum#429 * Add Indian virtual identity number Closes arthurdejong/python-stdnum#428 * Use HTTPS in URLs where possible * Switch to using openpyxl for parsing XLSX files The xlrd has dropped support for parsing XLSX files. We still use xlrd for update/be_banks.py because they use the classic XLS format and openpyxl does not support that format. * Add update-dat tox target for convenient data file updating * Update database files The Belgian bpost bank no longer has a registration and a few bank account numbers in the tests that used that bank were removed. Also updates the update/gs1_ai.py script to handle the new format of the data published by GS1. Also update the GS1-128 module to handle some different date formats. The Pakistan entry was kept in the stdnum/iban.dat file because the PDF version of the IBAN Registry still contains the country. fix db * Get files ready for 1.20 release * Drop support for Python 3.5 We don't have an easy way to test with Python 3.5 any more. * Add support for Indonesian NIK * Fix a typo Closes arthurdejong/python-stdnum#443 * Update Irish PPS validator to support new numbers See https://www.charteredaccountants.ie/News/b-range-pps-numbers Closes arthurdejong/python-stdnum#440 Closes arthurdejong/python-stdnum#441 * Update Czech database files Closes arthurdejong/python-stdnum#439 Closes arthurdejong/python-stdnum#435 * Adjust Swiss uid module to accept numbers without CHE prefix Closes arthurdejong/python-stdnum#437 Closes arthurdejong/python-stdnum#423 * Support 16 digit Indonesian NPWP numbers The Indonesian NPWP is being switched from 15 to 16 digits. The number is now the NIK for Indonesian citizens and the old format with a leading 0 for others (organisations and non-citizens). See https://www.grantthornton.co.id/insights/global-insights1/updates-regarding-the-format-of-indonesian-tax-id-numbers/ Closes arthurdejong/python-stdnum#432 * Replace use of deprecated inspect.getargspec() Use the inspect.signature() function instead. The inspect.getargspec() function was removed in Python 3.11. * Add Belgian SSN number Closes arthurdejong/python-stdnum#438 * Fix zeep client timeout parameter The timeout parameter of the zeep transport class is not responsable for POST/GET timeouts. The operational_timeout parameter should be used for that. See mvantellingen/python-zeep#140 Closes arthurdejong/python-stdnum#444 Closes arthurdejong/python-stdnum#445 * Customise certificate validation for web services This adds a `verify` argument to all functions that use network services for lookups. The option is used to configure how certificate validation works, the same as in the requests library. For SOAP requests this is implemented properly when using the Zeep library. The implementations using Suds and PySimpleSOAP have been updated on a best-effort basis but their use has been deprecated because they do not seem to work in practice in a lot of cases already. Related to arthurdejong/python-stdnum#452 Related to arthurdejong/python-stdnum#453 * Add Dutch identiteitskaartnummer Closes arthurdejong/python-stdnum#449 * Add Belgian eID card number Closes arthurdejong/python-stdnum#448 * Ensure get_soap_client() caches with verify This fixes the get_soap_client() function to cache SOAP clients taking the verify argument into account. Fixes 3fcebb2 * Ignore deprecation warnings in flake8 target This silences a ton of ast deprecation warnings that we can't fix in python-stdnum anyway. * Add more tests for Verhoeff implementation See arthurdejong/python-stdnum#456 * Use older Github runner for Python 3.7 tests * Add missing music industry ISRC country codes Closes arthurdejong/python-stdnum#455 Closes arthurdejong/python-stdnum#454 * Allow Uruguay RUT number starting with 22 * Drop Python 2 support This deprecates the stdnum.util.to_unicode() function because we no longer have to deal with bytestrings. * Add International Standard Name Identifier Closes arthurdejong/python-stdnum#463 * Support Ecuador public RUC with juridical format It seems that numbers with a format used for juridical RUCs have been issued to companies. Closes arthurdejong/python-stdnum#457 * Add Spanish CAE Number Closes arthurdejong/python-stdnum#446 * Add Russian ОГРН Closes arthurdejong/python-stdnum#459 * Add support for Python 3.13 * Fix Czech Rodné číslo check digit validation It seems that a small minority of numbers assigned with a checksum of 10 are still valid and expected to have a check digit value of 0. According to https://www.domzo13.cz/sw/evok/help/born_numbers.html this practice even happended (but less frequently) after 1985. Closes arthurdejong/python-stdnum#468 * Drop more Python 2.7 compatibility code * Ignore test failures from www.dgii.gov.do There was a change in the SOAP service and there is a new URL. However, the API has changed and seems to require authentication. We ignore test failures for now but unless a solution is found the DGII validation will be removed. See: arthurdejong/python-stdnum#462 See: arthurdejong/python-stdnum#461 --------- Co-authored-by: Arthur de Jong <[email protected]> Co-authored-by: Christian Clauss <[email protected]> Co-authored-by: vovavili <[email protected]> Co-authored-by: petr.prikryl <[email protected]> Co-authored-by: Romuald R <[email protected]> Co-authored-by: Leandro Regueiro <[email protected]> Co-authored-by: Dimitri Papadopoulos <[email protected]> Co-authored-by: Blaž Bregar <[email protected]> Co-authored-by: valeriko <[email protected]> Co-authored-by: Ali-Akber Saifee <[email protected]> Co-authored-by: RaduBorzea <[email protected]> Co-authored-by: Jeff Horemans <[email protected]> Co-authored-by: mjturt <[email protected]> Co-authored-by: Victor <[email protected]> Co-authored-by: Chales Horn <[email protected]> Co-authored-by: Blaž Bregar <[email protected]> Co-authored-by: Ömer Boratav <[email protected]> Co-authored-by: Daniel Weber <[email protected]> Co-authored-by: Kevin Dagostino <[email protected]> Co-authored-by: Atul Deolekar <[email protected]> Co-authored-by: vanderkoort <[email protected]> Co-authored-by: Olly Middleton <[email protected]> Co-authored-by: Joris Makauskis <[email protected]> Co-authored-by: Victor Sordoillet <[email protected]> Co-authored-by: Quique Porta <[email protected]> Co-authored-by: nvmbrasserie <[email protected]>
1 parent a457382 commit 46dc0cd

File tree

237 files changed

+15133
-2694
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

237 files changed

+15133
-2694
lines changed

.github/workflows/test.yml

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,46 +9,54 @@ on:
99
- cron: '9 0 * * 1'
1010

1111
jobs:
12-
test:
13-
runs-on: ubuntu-latest
12+
test_legacy:
13+
runs-on: ubuntu-20.04
1414
strategy:
15-
matrix:
16-
python-version: [2.7, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6]
1715
fail-fast: false
16+
matrix:
17+
python-version: [3.6, 3.7]
1818
steps:
19-
- uses: actions/checkout@v2
19+
- uses: actions/checkout@v3
2020
- name: Set up Python ${{ matrix.python-version }}
21-
uses: actions/setup-python@v2
21+
uses: actions/setup-python@v4
2222
with:
2323
python-version: ${{ matrix.python-version }}
2424
- name: Install dependencies
2525
run: python -m pip install --upgrade pip tox
2626
- name: Run tox
27-
run: tox -e "$(echo py${{ matrix.python-version }} | sed -e 's/[.-]//g;s/pypypy/pypy/')" --skip-missing-interpreters false
28-
docs:
27+
run: tox -e "$(echo py${{ matrix.python-version }} | sed -e 's/[.]//g;s/pypypy/pypy/')" --skip-missing-interpreters false
28+
test:
2929
runs-on: ubuntu-latest
30+
strategy:
31+
fail-fast: false
32+
matrix:
33+
python-version: [3.8, 3.9, '3.10', 3.11, 3.12, 3.13, pypy3.9]
3034
steps:
31-
- uses: actions/checkout@v2
32-
- name: Set up Python 3.8
33-
uses: actions/setup-python@v2
35+
- uses: actions/checkout@v3
36+
- name: Set up Python ${{ matrix.python-version }}
37+
uses: actions/setup-python@v4
3438
with:
35-
python-version: 3.8
39+
python-version: ${{ matrix.python-version }}
3640
- name: Install dependencies
3741
run: python -m pip install --upgrade pip tox
3842
- name: Run tox
39-
run: tox -e docs --skip-missing-interpreters false
40-
flake8:
43+
run: tox -e "$(echo py${{ matrix.python-version }} | sed -e 's/[.]//g;s/pypypy/pypy/')" --skip-missing-interpreters false
44+
tox_job:
4145
runs-on: ubuntu-latest
46+
strategy:
47+
fail-fast: false
48+
matrix:
49+
tox_job: [docs, flake8, headers]
4250
steps:
43-
- uses: actions/checkout@v2
44-
- name: Set up Python 3.8
45-
uses: actions/setup-python@v2
51+
- uses: actions/checkout@v3
52+
- name: Set up Python
53+
uses: actions/setup-python@v4
4654
with:
47-
python-version: 3.8
55+
python-version: 3.x
4856
- name: Install dependencies
4957
run: python -m pip install --upgrade pip tox
50-
- name: Tox
51-
run: tox -e flake8 --skip-missing-interpreters false
58+
- name: Run tox ${{ matrix.tox_job }}
59+
run: tox -e ${{ matrix.tox_job }} --skip-missing-interpreters false
5260
CodeQL:
5361
runs-on: ubuntu-latest
5462
permissions:
@@ -57,12 +65,12 @@ jobs:
5765
security-events: write
5866
steps:
5967
- name: Checkout repository
60-
uses: actions/checkout@v2
68+
uses: actions/checkout@v3
6169
- name: Initialize CodeQL
62-
uses: github/codeql-action/init@v1
70+
uses: github/codeql-action/init@v2
6371
with:
6472
languages: python
6573
- name: Build
66-
uses: github/codeql-action/autobuild@v1
74+
uses: github/codeql-action/autobuild@v2
6775
- name: Perform CodeQL Analysis
68-
uses: github/codeql-action/analyze@v1
76+
uses: github/codeql-action/analyze@v2

CONTRIBUTING.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
Contributing to python-stdnum
2+
=============================
3+
4+
This document describes general guidelines for contributing new formats or
5+
other enhancement to python-stdnum.
6+
7+
8+
Adding number formats
9+
---------------------
10+
11+
Basically any number or code that has some validation mechanism available or
12+
some common formatting is eligible for inclusion into this library. If the
13+
only specification of the number is "it consists of 6 digits" implementing
14+
validation may not be that useful.
15+
16+
Contributions of new formats or requests to implement validation for a format
17+
should include the following:
18+
19+
* The format name and short description.
20+
* References to (official) sources that describe the format.
21+
* A one or two paragraph description containing more details of the number
22+
(e.g. purpose and issuer and possibly format information that might be
23+
useful to end users).
24+
* If available, a link to an (official) validation service for the number,
25+
reference implementations or similar sources that allow validating the
26+
correctness of the implementation.
27+
* A set of around 20 to 100 "real" valid numbers for testing (more is better
28+
during development but only around 100 will be retained for regression
29+
testing).
30+
* If the validation depends on some (online) list of formats, structures or
31+
parts of the identifier (e.g. a list of region codes that are part of the
32+
number) a way to easily update the registry information should be
33+
available.
34+
35+
36+
Code contributions
37+
------------------
38+
39+
Improvements to python-stdnum are most welcome. Integrating contributions
40+
will be done on a best-effort basis and can be made easier if the following
41+
are considered:
42+
43+
* Ideally contributions are made as GitHub pull requests, but contributions
44+
by email (privately or through the python-stdnum-users mailing list) can
45+
also be considered.
46+
* Submitted contributions will often be reformatted and sometimes
47+
restructured for consistency with other parts.
48+
* Contributions will be acknowledged in the release notes.
49+
* Contributions should add or update a copyright statement if you feel the
50+
contribution is significant.
51+
* All contribution should be made with compatible applicable copyright.
52+
* It is not needed to modify the NEWS, README.md or files under docs for new
53+
formats; these files will be updated on release.
54+
* Marking valid numbers as invalid should be avoided and are much worse than
55+
marking invalid numbers as valid. Since the primary use case for
56+
python-stdnum is to validate entered data having an implementation that
57+
results in "computer says no" should be avoided.
58+
* Number format implementations should include links to sources of
59+
information: generally useful links (e.g. more details about the number
60+
itself) should be in the module docstring, if it relates more to the
61+
implementation (e.g. pointer to reference implementation, online API
62+
documentation or similar) a comment in the code is better
63+
* Country-specific numbers and codes go in a country or region package (e.g.
64+
stdnum.eu.vat or stdnum.nl.bsn) while global numbers go in the toplevel
65+
name space (e.g. stdnum.isbn).
66+
* All code should be well tested and achieve 100% code coverage.
67+
* Existing code structure conventions (e.g. see README for interface) should
68+
be followed.
69+
* Git commit messages should follow the usual 7 rules.
70+
* Declarative or functional constructs are preferred over an iterative
71+
approach, e.g.::
72+
73+
s = sum(int(c) for c in number)
74+
75+
over::
76+
77+
s = 0
78+
for c in number:
79+
s += int(c)
80+
81+
82+
Testing
83+
-------
84+
85+
Tests can be run with `tox`. Some basic code style tests can be run with `tox
86+
-e flake8` and most other targets run the test suite with various supported
87+
Python interpreters.
88+
89+
Module implementations have a couple of smaller test cases that also serve as
90+
basic documentation of the happy flow.
91+
92+
More extensive tests are available, per module, in the tests directory. These
93+
tests (also doctests) cover more corner cases and should include a set of
94+
valid numbers that demonstrate that the module works correctly for real
95+
numbers.
96+
97+
The normal tests should never require online sources for execution. All
98+
functions that deal with online lookups (e.g. the EU VIES service for VAT
99+
validation) should only be tested using conditional unittests.
100+
101+
102+
Finding test numbers
103+
--------------------
104+
105+
Some company numbers are commonly published on a company's website contact
106+
page (e.g. VAT or other registration numbers, bank account numbers). Doing a
107+
web search limited to a country and some key words generally turn up a lot of
108+
pages with this information.
109+
110+
Another approach is to search for spreadsheet-type documents with some
111+
keywords that match the number. This sometimes turns up lists of companies
112+
(also occasionally works for personal identifiers).
113+
114+
For information that is displayed on ID cards or passports it is sometimes
115+
useful to do an image search.
116+
117+
For dealing with numbers that point to individuals it is important to:
118+
119+
* Only keep the data that is needed to test the implementation.
120+
* Ensure that no actual other data relation to a person or other personal
121+
information is kept or can be inferred from the kept data.
122+
* The presence of a number in the test set should not provide any information
123+
about the person (other than that there is a person with the number or
124+
information that is present in the number itself).
125+
126+
Sometimes numbers are part of a data leak. If this data is used to pick a few
127+
sample numbers from the selection should be random and the leak should not be
128+
identifiable from the picked numbers. For example, if the leaked numbers
129+
pertain only to people with a certain medical condition, membership of some
130+
organisation or other specific property the leaked data should not be used.
131+
132+
133+
Reverse engineering
134+
-------------------
135+
136+
Sometimes a number format clearly has a check digit but the algorithm is not
137+
publicly documented. It is sometimes possible to reverse engineer the used
138+
check digit algorithm from a large set of numbers.
139+
140+
For example, given numbers that, apart from the check digit, only differ in
141+
one digit will often expose the weights used. This works reasonably well if
142+
the algorithm uses modulo 11 is over a weighted sums over the digits.
143+
144+
See https://github.com/arthurdejong/python-stdnum/pull/203#issuecomment-623188812
145+
146+
147+
Registries
148+
----------
149+
150+
Some numbers or parts of numbers use validation base on a registry of known
151+
good prefixes, ranges or formats. It is only useful to fully base validation
152+
on these registries if the update frequency to these registries is very low.
153+
154+
If there is a registry that is used (a list of known values, ranges or
155+
otherwise) the downloaded information should be stored in a data file (see
156+
the stdnum.numdb module). Only the minimal amount of data should be kept (for
157+
validation or identification).
158+
159+
The data files should be able to be created and updated using a script in the
160+
`update` directory.

0 commit comments

Comments
 (0)