You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/02-file_parsing.md
+71-40Lines changed: 71 additions & 40 deletions
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ or similar to this if you are on Windows
50
50
~~~
51
51
{: .output}
52
52
53
-
Notice that the file paths are different for these two systems. The Windows system uses a forward slash ('\\'), while Mac and Linux use a backslash ('/') for filepaths.
53
+
Notice that the file paths are different for these two systems. The Windows system uses a backslash ('\\'), while Mac and Linux use a forward slash ('/') for filepaths.
54
54
55
55
When we write a script, we want it to be usable on any operating system, thus we will use a python module called `os.path` that will allow us to define file paths in a general way.
56
56
@@ -69,7 +69,7 @@ data/outfiles/ethanol.out
69
69
~~~
70
70
{:. .output}
71
71
72
-
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a forward slash is used.
72
+
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a backslash is used.
73
73
74
74
> ## Absolute and relative paths
75
75
> File paths can be *absolute*, or *relative*.
@@ -108,6 +108,18 @@ outfile.close()
108
108
~~~
109
109
{: .language-python}
110
110
111
+
> ## An alternative way to open a file.
112
+
> Alternatively, you can open a file using `context-manager`. In this case, the context manager will automatically handle closing of the file. To use a context manager to open and close the file, you use the word `with`, and put everything you want to be done while the file is open in an indented block.
113
+
> ~~~
114
+
> with open(ethanol_file,"r") as outfile:
115
+
> data = outfile.readlines()
116
+
> ~~~
117
+
> {: .language-python}
118
+
>
119
+
> This is often the preferred way to deal with files because you do not have to remember to close the file.
120
+
{: .callout}
121
+
122
+
111
123
> ## Check Your Understanding
112
124
> Check that your file was read in correctly by determining how many lines are in the file.
113
125
>> ## Answer
@@ -129,7 +141,7 @@ Let's take a look at what's in the file.
129
141
130
142
~~~
131
143
for line in data:
132
-
print(line)
144
+
print(line)
133
145
~~~
134
146
{: .language-python}
135
147
@@ -196,12 +208,24 @@ print(words)
196
208
197
209
From this `print` statement, we now see that we have a list called words, where we have split `energy_line`. The energy is actually the fourth element of this list, so we can now save it as a new variable.
198
210
199
-
```
211
+
```python
200
212
energy = words[3]
201
213
print(energy)
202
214
```
203
215
{: .language-python}
204
216
217
+
> ## Python negative indexing
218
+
> We also recogize that "energy" is the last element of the list. Therefore, an alternative way to assign `energy` is:
219
+
> ```python
220
+
> energy = words[-1]
221
+
> print(energy)
222
+
> ```
223
+
>
224
+
> In the example above, the index value of `-1` gives the last element, and `-2` would give the second last element of a list, and so on. An excelent tutorial on Python list accessed by index can be found [here](https://realpython.com/python-lists-tuples/#list-elements-can-be-accessed-by-index)
225
+
{: .callout}
226
+
227
+
228
+
205
229
```
206
230
-154.09130176573018
207
231
```
@@ -237,48 +261,48 @@ energy = float(words[3])
237
261
>## Exercise on File Parsing (should we move this to the end?)
238
262
Use the provided sapt.out file. In this output file, the program calculates the interaction energy for an ethene-ethyne complex. The output reports four interaction energy components: electrostatics, induction, exchange, and dispersion. Parse each of these energies, in kcal/mole, from the output file. (Hint: study the file in a text editor to help you decide what to search for.) Calculate the total interaction energy by adding the four components together. Your code's output should look something like this:
239
263
> ~~~
240
-
> Electrostatics : -2.25850118 kcal/mole
241
-
> Exchange : 2.27730198 kcal/mole
242
-
> Induction : -0.5216933 kcal/mole
243
-
> Dispersion : -0.9446677 kcal/mole
244
-
> Total Energy : 1.4475602000000003 kcal/mole
264
+
> Electrostatics : -2.25850118 kcal/mol
265
+
> Exchange : 2.27730198 kcal/mol
266
+
> Induction : -0.5216933 kcal/mol
267
+
> Dispersion : -0.9446677 kcal/mol
268
+
> Total Energy : 1.4475602000000003 kcal/mol
245
269
> ~~~
246
270
> {: language.python}
247
271
>
248
272
> > ## Solution
249
273
>>
250
274
>> This is one possible solution for the SAPT parsing exercise
>> print('Total Energy : {} kcal/mol'.format(total_energy))
282
306
>> ~~~
283
307
>> {: .language-python}
284
308
> {: .solution}
@@ -300,6 +324,13 @@ for linenum, line in enumerate(list_name):
300
324
301
325
In this notation, there are now *two* variables you can use in your loop commands, `linenum` (which can be named something else) will keep up with what iteration you are on in the loop, in this case what line you are on in the file. The variable `line` (which could be named something else) functions exactly as it did before, holding the actual information from the list. Finally, instead of just giving the list name you use `enumerate(list_name)`.
302
326
327
+
> ## `Enumerate` with index other than 0:
328
+
> `enumerate(list_name)` will start with 0-index so the first line will be label as '0', to change this behavior, use `start` variable in enumerate. For example, to start with index of "1" you can do:
329
+
> ```python
330
+
> for linenum, line in enumerate(data, start=1):
331
+
> # do something with 'linenum' and 'line'
332
+
{: .callout}
333
+
303
334
This block of code searches our file for the line that contains "Center" and reports the line number.
304
335
```
305
336
for linenum, line in enumerate(data):
@@ -313,7 +344,7 @@ for linenum, line in enumerate(data):
313
344
Center X Y Z Mass
314
345
```
315
346
{: .output}
316
-
Now we know that this is line 77 in our file (remember that you start counting at zero!).
347
+
Now we know that this is line 77 in our file (remember that you start counting at zero!).
317
348
318
349
>## Check Your Understanding
319
350
>What would be printed if you entered the following:
@@ -345,6 +376,6 @@ Now we know that this is line 77 in our file (remember that you start counting a
345
376
{: .challenge}
346
377
347
378
## A final note about regular expressions
348
-
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at _______.
379
+
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions (regex) is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/2e/chapter7/) book. A great test site for regex is [here](https://regex101.com/)
0 commit comments