-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathfiles_read_write.py
More file actions
361 lines (268 loc) · 11.6 KB
/
files_read_write.py
File metadata and controls
361 lines (268 loc) · 11.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
'''Reading and Writing Files'''
# Persistence
# -----------------------------------------------------------------------------
# In the context of storing data in a computer system, this means that the data
# survives after the process with which it was created has ended. In other
# words, for data to be considered persistent, it must write to non-volatile
# storage such as a disk (as opposed to memory).
# The simplest kind of persistence is a plain old file, sometimes called a flat
# file. This is just a sequence of bytes stored under a filename. You read from
# a file into memory and write from memory to a file.
# File Handling Basics
# -----------------------------------------------------------------------------
# Before reading or writing a file, you need to open it:
fileobject = open('filename.txt', 'w')
# The arg after the filename is the mode. The letter indicates the operation:
# r read - default mode if not specified
# w write - if file doesn't exist, it's created. If exists, it's overwritten
# x write - but only if the file does not already exist
# a append - write after the end if the file exists
# An optional second letter after mode indicates the file type:
# t text - default type if not specified
# b binary
# After opening a file, you call functions to read or write data, then you need
# to close the file:
fileobject.close()
# Close files automatically
# -----------------------------------------------------------------------------
# using: with expression as variable
# If you forget to close a file that you've opened, you can end up wasting
# system resources or accidentally overwriting a file. Python has 'context
# managers' to deal with things such as open files:
with open('filename.txt', 'r') as fileobject:
pass
# After the block of code completes (normally or by a raised exception),
# the file is closed automatically. This is the preferred way of
# opening/closing files.
# write()
# -----------------------------------------------------------------------------
text1 = '...Some content...'
text2 = """
The Gashlycrumb Tinies - by Edward Gorey
A is for Amy who fell down the stairs.
B is for Basil assaulted by bears.
C is for Clair who wasted away.
D is for Desmond thrown out of the sleigh.
E is for Ernest who choked on a peach.
F is for Fanny, sucked dry by a leech.
G is for George, smothered under a rug.
H is for Hector, done in by a thug.
I is for Ida who drowned in the lake.
J is for James who took lye, by mistake.
K is for Kate who was struck with an axe.
L is for Leo who swallowed some tacks.
M is for Maud who was swept out to sea.
N is for Nevil who died of ennui.
O is for Olive, run through with an awl.
P is for Prue, trampled flat in a brawl.
Q is for Quinton who sank in a mire.
R is for Rhoda, consumed by a fire.
S is for Susan who perished of fits.
T is for Titas who blew into bits.
U is for Una who slipped down a drain.
V is for Victor, squashed under a train.
W is for Winie, embedded in ice.
X is for Xerxes, devoured by mice.
Y is for Yoric whose head was bashed in.
Z is for Zilla who drank too much gin.
The end
"""
# This writes the contents of text to the file testfile1:
with open('testfile1.txt', 'w') as fob:
fob.write(text1)
# If you have a very large source string, you can write it in chunks
# (because a very large source could be quite taxing on memory, or
# impossible if the data was source was say a 25GB):
size = len(text2)
offset = 0
chunk = 100
with open('testfile1.txt', 'w') as fob:
while True:
if offset > size:
break
fob.write(text2[offset: offset + chunk])
offset += chunk
# Test 'x' with our own exception handler:
try:
fob = open('testfile1.txt', 'x')
fob.write('stuff')
except FileExistsError:
print('testfile1 file already exists!')
# print() to a file
# -----------------------------------------------------------------------------
# You can also print to a text file. Note: when typing out file=fileobject,
# it's the convention to NOT have spaces around the equals sign because
# these are named arguments as opposed to variable assignments.
with open('testfile2.txt', 'w') as fob:
print(text1, file=fob)
# When printing additional data to a file you'll get a space between each
# argument and a newline at the end. These are due to the following arguments:
# sep (separator, which defaults to a space ' ')
# end (end string, which defaults to a newline '\n')
# If you want to change these print() defaults:
with open('testfile2.txt', 'w') as fob:
print(text1, file=fob, sep='', end='')
# read()
# -----------------------------------------------------------------------------
# read() reads all contents of the file and returns a single string
with open('testfile1.txt', 'r') as fob:
poem = fob.read()
print(type(poem)) # <class 'str'>
# You can provide a max character count for how much is read in at a time.
# The following will read 100 characters at a time and append each chunk to
# the string:
poem = ''
with open('testfile1.txt', 'r') as fob:
chunk = 100
while True:
fragment = fob.read(chunk)
# As you read, Python keeps track of where the pointer is in the file.
if not fragment:
break
poem += fragment
# After you've read all the way to the end, further calls to read() will
# return an empty string (''), which is treated as False in 'if not fragment'.
# This breaks out of the while True loop.
# readline()
# -----------------------------------------------------------------------------
# this example does the same as above but feeds one line at a time instead of
# chunks of 100 characters:
poem = ''
with open('testfile1.txt', 'r') as fob:
while True:
line = fob.readline()
if not line:
break
poem += line
print(type(line)) # <class 'str'>
# Another approach:
with open("testfile1.txt", 'r') as fob:
line = fob.readline()
while line:
print(line, end='')
line = fob.readline() # moves to the next line
# Read a file by iterating
# -----------------------------------------------------------------------------
# The easiest way to read a text file is by using an iterator. This returns
# one line at a time... similar to previous examples but less code:
poem = ''
with open('testfile1.txt', 'r') as fob:
for line in fob:
poem += line
# a variation:
with open("testfile1.txt", 'r') as fob:
for line in fob:
if "by" in line.lower():
print(line, end='')
# The Gashlycrumb Tinies - by Edward Gorey
# B is for Basil assaulted by bears.
# F is for Fanny, sucked dry by a leech.
# H is for Hector, done in by a thug.
# J is for James who took lye, by mistake.
# R is for Rhoda, consumed by a fire.
# X is for Xerxes, devoured by mice.
# Example: read(), write() and iteration
# -----------------------------------------------------------------------------
# this example opens each file in a list and writes it's content to one new
# file and names it with todays date and time.
import datetime
filename = datetime.datetime.now().strftime('%m-%d-%Y-%H-%M-%S')
files = ['file1.txt', 'file2.txt', 'file3.txt']
with open(filename + '.txt', 'w') as fob1:
for f in files:
with open(f) as fob2:
fob1.write(fob2.read())
# readlines() *plural
# -----------------------------------------------------------------------------
# readlines() - the previous examples read and build up a single string.
# This call reads a line at a time and returns a list of one-line strings:
with open('testfile1.txt', 'r') as fob:
lines = fob.readlines()
print('Lines read: ', len(lines)) # Lines read: 29
for line in lines:
print(line, end='')
print(type(lines)) # <class 'list'>
# with readlines() you can go from the last line to the first:
with open("testfile1.txt", 'r') as fob:
lines = fob.readlines()
for line in lines[::-1]:
print(line, end='')
# NOTE: If you tried using fob.read() or fob.readline() in the above, all the
# letters would be printed in reverse, not just the lines.
# eval()
# -----------------------------------------------------------------------------
# Problems can arise when trying to read data structures from files. Example:
unkle = ('The Road, Pt. 1', 'UNKLE', '2017', (
(1, 'Inter 1'),
(2, 'Farewell'),
(3, 'Looking for the Rain'),
(4, 'Cowboys or Indians')))
with open('music.txt', 'w') as music_file:
print(unkle, file=music_file)
# the problem here is that there's no easy way to read the data in the file
# back in as a tuple because it's now just a string with brackets. That's when
# eval() can help:
with open('music.txt', 'r') as music_file:
music_contents = music_file.readline()
unkle = eval(music_contents)
print(type(unkle)) # <class 'tuple'>
album, artist, year, tracks = unkle # tuple unpacking
print(album) # The Road, Pt. 1
print(tracks[3]) # (4, 'Cowboys or Indians')
# Binary Files
# -----------------------------------------------------------------------------
# Write a Binary file:
bdata = bytes(range(0, 256))
with open('testbinary', 'wb') as fob:
fob.write(bdata)
# as with text you can write binary in chunks:
size = len(bdata)
offset = 0
chunk = 100
with open('testbinary', 'wb') as fob:
while True:
if offset > size:
break
fob.write(bdata[offset: offset + chunk])
offset += chunk
# read() a Binary file:
with open('testbinary', 'rb') as fob:
bdata = fob.read()
# seek(), tell()
# -----------------------------------------------------------------------------
# Reminder: As you read and write, Python keeps track of where you are in
# the file. The tell() function returns your current offset position in
# bytes. The seek() function lets you jump to another offset in the file.
# This means you don't have to read every byte in a file to read the last one.
# Note: seek() also returns the current offset.
with open('testbinary', 'rb') as fob:
fob.tell() # returns 0
fob.seek(255) # moves to one byte before the end of the file
bdata = fob.read() # reads the last byte
len(bdata) # returns 1
# You can call seek() with a second argument: seek(offset, origin)
# If origin is 0 (the default), move offset bytes from the start
# If origin is 1, move offset bytes from the current position
# If origin is 2, move offset bytes relative to the end
# So to get to the last byte we could also do:
with open('testbinary', 'rb') as fob:
fob.seek(-1, 2)
# These functions are most useful for binary files. Though you can use them
# with text files, you would have a hard time calculating offsets as the
# most popular encoding (UTF-8) uses varying numbers of bytes per character.
# That being said, a simple fob.seek(0) can be useful for moving your pointer
# back to the beginning of the file.
# truncate()
# -----------------------------------------------------------------------------
# filename.truncate() - Empties the file
# Read, Write Append chart
# -----------------------------------------------------------------------------
# | R | R+| W | W+| A | A+|
# ––––––––––––––––––––––––––––––––––––––––––
# read | X | X | | X | | X |
# write | | X | X | X | X | X |
# create | | | X | X | X | X |
# truncate | | | X | X | | |
# position: start | X | X | X | X | | |
# position: end | | | | | X | X |
# ––––––––––––––––––––––––––––––––––––––––––