Skip to content

Commit af0e6a6

Browse files
committed
Merge bitcoin/bitcoin#33702: contrib: Remove brittle, confusing and redundant UTF8 encoding from Python IO
fad6118 test: Fix "typo" in written invalid content (MarcoFalke) fab085c contrib: Use text=True in subprocess over manual encoding handling (MarcoFalke) fa71c15 scripted-diff: Bump copyright headers after encoding changes (MarcoFalke) fae6124 contrib: Remove confusing and redundant encoding from IO (MarcoFalke) fa7d72b lint: Drop check to enforce encoding to be specified in Python scripts (MarcoFalke) faf39d8 test: Clarify that Python UTF-8 mode is the default today for most systems (MarcoFalke) fa83e3a lint: Do not allow locale dependent shell scripts (MarcoFalke) Pull request description: Historically, there was an attempt via `test/lint/lint-python-utf8-encoding.py` to enforce explicit UTF8 in every Python IO statement (`open`, `subprocess`, ...). However, the lint check has many problems: * The check is incomplete and many IO statements lack the explicit UTF8 specification. * It was added at a time when some systems were not UTF8 by default. * The check is brittle, as it depends on a fragile regex. In theory, now that the minimum Python version is 3.10 (since commit 2123c94), the check could be replaced by `PYTHONWARNDEFAULTENCODING=1` from https://docs.python.org/3/whatsnew/3.10.html#optional-encodingwarning-and-encoding-locale-option. However, this comes with many other problems: * All our Python scripts already assume and require UTF8 to be set externally. On almost all modern systems, this is already the default. Some Windows versions do not have UTF8 by default and require `PYTHONUTF8=1` to be set for the tests to run already today (with or without the changes in this pull). Also, the CI and many other Bash scripts force UTF8 via `LC_ALL`. Finally, Python 3.15 will likely enable UTF8 on *all* systems by default, per https://peps.python.org/pep-0686/#abstract. * So adding UTF8 to every single IO call is redundant, verbose, and confusing, given that it is the expected default. So fix all issues, by: * Removing the `test/lint/lint-python-utf8-encoding.py` check. * Removing the encoding on the individual IO calls. * Clarifying the existing docs around the existing UTF8 requirement and assumption. Obviously, every IO call is still free to specify UTF8 or any other encoding explicitly, if there is a documented need for it in the future. ACKs for top commit: theStack: re-ACK fad6118 laanwj: Re-ACK fad6118 Tree-SHA512: 78025ea3508597d2299490347614f0ee3e4c66e3ba559ff50e498045a9c8bbd92f3a5ced18719d8fcebbd1e47bdbb56a0c85a5b73b425adb0ea4f02fe69c3149
2 parents 4c784b2 + fad6118 commit af0e6a6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+217
-299
lines changed

ci/test/02_run_container.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ def main():
2626
["bash", "-c", "grep export ./ci/test/00_setup_env*.sh"],
2727
stdout=subprocess.PIPE,
2828
text=True,
29-
encoding="utf8",
3029
).stdout.splitlines()
3130
settings = set(l.split("=")[0].split("export ")[1] for l in settings)
3231
# Add "hidden" settings, which are never exported, manually. Otherwise,
@@ -42,7 +41,7 @@ def main():
4241
u=os.environ["USER"],
4342
c=os.environ["CONTAINER_NAME"],
4443
)
45-
with open(env_file, "w", encoding="utf8") as file:
44+
with open(env_file, "w") as file:
4645
for k, v in os.environ.items():
4746
if k in settings:
4847
file.write(f"{k}={v}\n")

contrib/devtools/circular-dependencies.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2018-2020 The Bitcoin Core developers
2+
# Copyright (c) 2018-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55

@@ -49,7 +49,7 @@ def module_name(path):
4949
# TODO: implement support for multiple include directories
5050
for arg in sorted(files.keys()):
5151
module = files[arg]
52-
with open(arg, 'r', encoding="utf8") as f:
52+
with open(arg, 'r') as f:
5353
for line in f:
5454
match = RE.match(line)
5555
if match:

contrib/devtools/clang-format-diff.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ def main():
169169
sys.exit(p.returncode)
170170

171171
if not args.i:
172-
with open(filename, encoding="utf8") as f:
172+
with open(filename) as f:
173173
code = f.readlines()
174174
formatted_code = StringIO(stdout).readlines()
175175
diff = difflib.unified_diff(

contrib/devtools/copyright_header.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2016-2022 The Bitcoin Core developers
2+
# Copyright (c) 2016-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55

@@ -54,12 +54,12 @@ def applies_to_file(filename):
5454
GIT_TOPLEVEL_CMD = 'git rev-parse --show-toplevel'.split(' ')
5555

5656
def call_git_ls(base_directory):
57-
out = subprocess.check_output([*GIT_LS_CMD, base_directory])
58-
return [f for f in out.decode("utf-8").split('\n') if f != '']
57+
out = subprocess.check_output([*GIT_LS_CMD, base_directory], text=True)
58+
return [f for f in out.split('\n') if f != '']
5959

6060
def call_git_toplevel():
6161
"Returns the absolute path to the project root"
62-
return subprocess.check_output(GIT_TOPLEVEL_CMD).strip().decode("utf-8")
62+
return subprocess.check_output(GIT_TOPLEVEL_CMD, text=True).strip()
6363

6464
def get_filenames_to_examine(base_directory):
6565
"Returns an array of absolute paths to any project files in the base_directory that pass the include/exclude filters"
@@ -140,7 +140,7 @@ def file_has_without_c_style_copyright_for_holder(contents, holder_name):
140140
################################################################################
141141

142142
def read_file(filename):
143-
return open(filename, 'r', encoding="utf8").read()
143+
return open(filename, 'r').read()
144144

145145
def gather_file_info(filename):
146146
info = {}
@@ -298,8 +298,8 @@ def report_cmd(argv):
298298
GIT_LOG_CMD = "git log --pretty=format:%%ai %s"
299299

300300
def call_git_log(filename):
301-
out = subprocess.check_output((GIT_LOG_CMD % filename).split(' '))
302-
return out.decode("utf-8").split('\n')
301+
out = subprocess.check_output((GIT_LOG_CMD % filename).split(' '), text=True)
302+
return out.split('\n')
303303

304304
def get_git_change_years(filename):
305305
git_log_lines = call_git_log(filename)
@@ -316,12 +316,12 @@ def get_most_recent_git_change_year(filename):
316316
################################################################################
317317

318318
def read_file_lines(filename):
319-
with open(filename, 'r', encoding="utf8") as f:
319+
with open(filename, 'r') as f:
320320
file_lines = f.readlines()
321321
return file_lines
322322

323323
def write_file_lines(filename, file_lines):
324-
with open(filename, 'w', encoding="utf8") as f:
324+
with open(filename, 'w') as f:
325325
f.write(''.join(file_lines))
326326

327327
################################################################################

contrib/filter-lcov.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2017-2020 The Bitcoin Core developers
2+
# Copyright (c) 2017-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55

@@ -16,8 +16,8 @@
1616
outfile = args.outfile
1717

1818
in_remove = False
19-
with open(tracefile, 'r', encoding="utf8") as f:
20-
with open(outfile, 'w', encoding="utf8") as wf:
19+
with open(tracefile, 'r') as f:
20+
with open(outfile, 'w') as wf:
2121
for line in f:
2222
for p in pattern:
2323
if line.startswith("SF:") and p in line:

contrib/linearize/linearize-data.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
#
33
# linearize-data.py: Construct a linear, no-fork version of the chain.
44
#
5-
# Copyright (c) 2013-2022 The Bitcoin Core developers
5+
# Copyright (c) 2013-present The Bitcoin Core developers
66
# Distributed under the MIT software license, see the accompanying
77
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
88
#
@@ -34,7 +34,7 @@ def get_blk_dt(blk_hdr):
3434
# When getting the list of block hashes, undo any byte reversals.
3535
def get_block_hashes(settings):
3636
blkindex = []
37-
with open(settings['hashlist'], "r", encoding="utf8") as f:
37+
with open(settings['hashlist'], "r") as f:
3838
for line in f:
3939
line = line.rstrip()
4040
if settings['rev_hash_bytes'] == 'true':
@@ -267,7 +267,7 @@ def run(self):
267267
print("Usage: linearize-data.py CONFIG-FILE")
268268
sys.exit(1)
269269

270-
with open(sys.argv[1], encoding="utf8") as f:
270+
with open(sys.argv[1]) as f:
271271
for line in f:
272272
# skip comment lines
273273
m = re.search(r'^\s*#', line)

contrib/linearize/linearize-hashes.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
#
33
# linearize-hashes.py: List blocks in a linear, no-fork version of the chain.
44
#
5-
# Copyright (c) 2013-2022 The Bitcoin Core developers
5+
# Copyright (c) 2013-present The Bitcoin Core developers
66
# Distributed under the MIT software license, see the accompanying
77
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
88
#
@@ -87,7 +87,7 @@ def get_block_hashes(settings, max_blocks_per_call=10000):
8787

8888
def get_rpc_cookie():
8989
# Open the cookie file
90-
with open(os.path.join(os.path.expanduser(settings['datadir']), '.cookie'), 'r', encoding="ascii") as f:
90+
with open(os.path.join(os.path.expanduser(settings['datadir']), '.cookie'), 'r') as f:
9191
combined = f.readline()
9292
combined_split = combined.split(":")
9393
settings['rpcuser'] = combined_split[0]
@@ -98,7 +98,7 @@ def get_rpc_cookie():
9898
print("Usage: linearize-hashes.py CONFIG-FILE")
9999
sys.exit(1)
100100

101-
with open(sys.argv[1], encoding="utf8") as f:
101+
with open(sys.argv[1]) as f:
102102
for line in f:
103103
# skip comment lines
104104
m = re.search(r'^\s*#', line)

contrib/message-capture/message-capture-parser.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2020-2022 The Bitcoin Core developers
2+
# Copyright (c) 2020-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55
"""Parse message capture binary files. To be used in conjunction with -capturemessages."""
@@ -205,7 +205,7 @@ def main():
205205

206206
jsonrep = json.dumps(messages)
207207
if output:
208-
with open(str(output), 'w+', encoding="utf8") as f_out:
208+
with open(str(output), 'w+') as f_out:
209209
f_out.write(jsonrep)
210210
else:
211211
print(jsonrep)

contrib/seeds/generate-seeds.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2014-2021 The Bitcoin Core developers
2+
# Copyright (c) 2014-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55
'''
@@ -168,16 +168,16 @@ def main():
168168
g.write(' *\n')
169169
g.write(' * Each line contains a BIP155 serialized (networkID, addr, port) tuple.\n')
170170
g.write(' */\n')
171-
with open(os.path.join(indir,'nodes_main.txt'), 'r', encoding="utf8") as f:
171+
with open(os.path.join(indir,'nodes_main.txt'), 'r') as f:
172172
process_nodes(g, f, 'chainparams_seed_main')
173173
g.write('\n')
174-
with open(os.path.join(indir,'nodes_signet.txt'), 'r', encoding="utf8") as f:
174+
with open(os.path.join(indir,'nodes_signet.txt'), 'r') as f:
175175
process_nodes(g, f, 'chainparams_seed_signet')
176176
g.write('\n')
177-
with open(os.path.join(indir,'nodes_test.txt'), 'r', encoding="utf8") as f:
177+
with open(os.path.join(indir,'nodes_test.txt'), 'r') as f:
178178
process_nodes(g, f, 'chainparams_seed_test')
179179
g.write('\n')
180-
with open(os.path.join(indir,'nodes_testnet4.txt'), 'r', encoding="utf8") as f:
180+
with open(os.path.join(indir,'nodes_testnet4.txt'), 'r') as f:
181181
process_nodes(g, f, 'chainparams_seed_testnet4')
182182
g.write('#endif // BITCOIN_CHAINPARAMSSEEDS_H\n')
183183

contrib/seeds/makeseeds.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
# Copyright (c) 2013-2022 The Bitcoin Core developers
2+
# Copyright (c) 2013-present The Bitcoin Core developers
33
# Distributed under the MIT software license, see the accompanying
44
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
55
#
@@ -211,7 +211,7 @@ def main():
211211
print('Done.', file=sys.stderr)
212212

213213
print('Loading and parsing DNS seeds…', end='', file=sys.stderr, flush=True)
214-
with open(args.seeds, 'r', encoding='utf8') as f:
214+
with open(args.seeds, 'r') as f:
215215
lines = f.readlines()
216216
ips = [parseline(line) for line in lines]
217217
random.shuffle(ips)

0 commit comments

Comments
 (0)