You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-4
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,19 @@
1
-
# FIXME Lesson title
1
+
# Python Scripting for the Computational Molecular Sciences
2
2
3
-
[](https://swc-slack-invite.herokuapp.com/)
3
+
This is the GitHub repository for the Python Data and Scripting Workshop developed by [The Molecular Sciences Software Institute](https://molssi.org). This website template is based on a template developed by [The Software Carpentries](https://software-carpentry.org). You can find the rendered website for this material [here](https://molssi-education.github.io/python_scripting_cms/).
4
4
5
-
FIXME
5
+
The MolSSI Python Data and Scripting workshop is designed for students who are currently involved in, or planning to start computational chemistry research. This workshop is designed to help students develop practical programming skills that will benefit their undergraduate research, and will take students through introductory programming and scripting with Python to version control and sharing their code with others. NO prior programming experience is required.
6
+
7
+
### Workshop Topics
8
+
- Basic Python syntax and control structures
9
+
- Reading and writing files
10
+
- File manipulation and parsing
11
+
- Analyzing and graphing data
12
+
- Writing functions
13
+
- Creating command line programs from Python scripts
Copy file name to clipboardExpand all lines: _episodes/01-introduction.md
+18-14
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,7 @@ Now that our notebook is set-up, we're ready to start learning some Python!
48
48
Any python interpreter can work just like a calculator. This is not very useful. Type the following into the next cell of your Jupyter notebook.
49
49
50
50
```
51
-
3+7
51
+
3 + 7
52
52
```
53
53
{: .language-python}
54
54
@@ -68,7 +68,7 @@ Let's see this in action with a calculation. Type the following into the next ce
68
68
deltaH = -541.5 #kJ/mole
69
69
deltaS = 10.4 #kJ/(mole K)
70
70
temp = 298 #Kelvin
71
-
deltaG = deltaH - temp*deltaS
71
+
deltaG = deltaH - temp * deltaS
72
72
```
73
73
{: .language-python}
74
74
@@ -93,7 +93,7 @@ In the previous code block, we introduced the `print()` function. Often, we wil
93
93
Note that if you do not specify a new name for a variable, then it doesn't automatically change the value of the variable; this is called being *immutable*. For example if we typed
94
94
```
95
95
print(deltaG)
96
-
deltaG*1000
96
+
deltaG * 1000
97
97
print(deltaG)
98
98
```
99
99
{: .language-python}
@@ -106,7 +106,7 @@ print(deltaG)
106
106
Nothing happened to the value of `deltaG`. If we wanted to change the value of `deltaG` we would have to re-save the variable using the same name to overwrite the existing value.
107
107
```
108
108
print(deltaG)
109
-
deltaG = deltaG*1000
109
+
deltaG = deltaG * 1000
110
110
print(deltaG)
111
111
```
112
112
{: .language-python}
@@ -119,7 +119,7 @@ print(deltaG)
119
119
There are situations where it is reasonable to overwrite a variable with a new value, but you should always think carefully about this. Usually it is a better practice to give the variable a new name and leave the existing variable as is.
120
120
```
121
121
print(deltaG)
122
-
deltaG_joules = deltaG*1000
122
+
deltaG_joules = deltaG * 1000
123
123
print(deltaG)
124
124
print(deltaG_joules)
125
125
```
@@ -137,7 +137,7 @@ Python can do what is called multiple assignment where you assign several variab
137
137
```
138
138
#I can assign all these variables at once
139
139
deltaH, deltaS, temp = -541.5, 10.4, 298
140
-
deltaG = deltaH - temp*deltaS
140
+
deltaG = deltaH - temp * deltaS
141
141
print(deltaG)
142
142
```
143
143
{: .language-python}
@@ -201,7 +201,7 @@ print(energy_kcal[0])
201
201
You can use an element of a list as a variable in a calculation.
202
202
```
203
203
# Convert the second list element to kilojoules.
204
-
energy_kilojoules = energy_kcal[1]*4.184
204
+
energy_kilojoules = energy_kcal[1] * 4.184
205
205
print(energy_kilojoules)
206
206
```
207
207
{: .language-python}
@@ -263,7 +263,7 @@ energy_kcal[0:2]
263
263
print(energy_kcal)
264
264
```
265
265
{: .language-python}
266
-
nothing happens to `energy_kcal.
266
+
nothing happens to `energy_kcal`.
267
267
```
268
268
[-13.4, -2.7, 5.4, 42.1]
269
269
[-13.4, -2.7, 5.4, 42.1]
@@ -279,10 +279,14 @@ for variable in list:
279
279
```
280
280
{: .language-python}
281
281
282
-
Indentation is very important in python. There is nothing like an `end` or `exit` statement that tells you that you are finished with the loop. The indentation shows you what statements are in the loop. Let's use a loop to change all of our energies in kcal to kJ.
282
+
There are two very important pieces of syntax for the `for` loop. Notice the colon `:` after the word list. You will always have a colon at the end of a `for` statement. If you forget the colon, you will get an error when you try to run your code.
283
+
284
+
The second thing to notice is that the lines of code under the `for` loop (the things you want to do several times) are indented. Indentation is very important in python. There is nothing like an `end` or `exit` statement that tells you that you are finished with the loop. The indentation shows you what statements are in the loop. Each indentation is 4 spaces by convention in Python 3. However, if you are using an editor which understands Python, it will do the correct indentation for you when you press the tab key on your keyboard. In fact, the Jupyter notebook will notice that you used a colon (`:`) in the previous line, and will indent for you (so you will not need to press tab).
285
+
286
+
Let's use a loop to change all of our energies in kcal to kJ.
283
287
```
284
288
for number in energy_kcal:
285
-
kJ = number*4.184
289
+
kJ = number * 4.184
286
290
print(kJ)
287
291
```
288
292
{: .language-python}
@@ -304,7 +308,7 @@ list_name.append(new_thing)
304
308
Try running this block of code. See if you can figure out why it doesn't work.
305
309
```
306
310
for number in energy_kcal:
307
-
kJ = number*4.184
311
+
kJ = number * 4.184
308
312
energy_kJ.append(kJ)
309
313
310
314
print(energy_kJ)
@@ -329,7 +333,7 @@ This code doesn't work because on the first iteration of our loop, the list `ene
329
333
```
330
334
energy_kJ = []
331
335
for number in energy_kcal:
332
-
kJ = number*4.184
336
+
kJ = number * 4.184
333
337
energy_kJ.append(kJ)
334
338
335
339
print(energy_kJ)
@@ -347,7 +351,7 @@ Within your code, you may need to evaluate a variable and then do something if t
347
351
negative_energy_kJ = []
348
352
349
353
for number in energy_kJ:
350
-
if number<0:
354
+
if number < 0:
351
355
negative_energy_kJ.append(number)
352
356
353
357
print(negative_energy_kJ)
@@ -369,7 +373,7 @@ You can also use `and`, `or`, and `not` to check more than one condition.
Copy file name to clipboardExpand all lines: _episodes/02-file_parsing.md
+71-40
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ or similar to this if you are on Windows
50
50
~~~
51
51
{: .output}
52
52
53
-
Notice that the file paths are different for these two systems. The Windows system uses a forward slash ('\\'), while Mac and Linux use a backslash ('/') for filepaths.
53
+
Notice that the file paths are different for these two systems. The Windows system uses a backslash ('\\'), while Mac and Linux use a forward slash ('/') for filepaths.
54
54
55
55
When we write a script, we want it to be usable on any operating system, thus we will use a python module called `os.path` that will allow us to define file paths in a general way.
56
56
@@ -69,7 +69,7 @@ data/outfiles/ethanol.out
69
69
~~~
70
70
{:. .output}
71
71
72
-
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a forward slash is used.
72
+
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a backslash is used.
73
73
74
74
> ## Absolute and relative paths
75
75
> File paths can be *absolute*, or *relative*.
@@ -108,6 +108,18 @@ outfile.close()
108
108
~~~
109
109
{: .language-python}
110
110
111
+
> ## An alternative way to open a file.
112
+
> Alternatively, you can open a file using `context-manager`. In this case, the context manager will automatically handle closing of the file. To use a context manager to open and close the file, you use the word `with`, and put everything you want to be done while the file is open in an indented block.
113
+
> ~~~
114
+
> with open(ethanol_file,"r") as outfile:
115
+
> data = outfile.readlines()
116
+
> ~~~
117
+
> {: .language-python}
118
+
>
119
+
> This is often the preferred way to deal with files because you do not have to remember to close the file.
120
+
{: .callout}
121
+
122
+
111
123
> ## Check Your Understanding
112
124
> Check that your file was read in correctly by determining how many lines are in the file.
113
125
>> ## Answer
@@ -129,7 +141,7 @@ Let's take a look at what's in the file.
129
141
130
142
~~~
131
143
for line in data:
132
-
print(line)
144
+
print(line)
133
145
~~~
134
146
{: .language-python}
135
147
@@ -196,12 +208,24 @@ print(words)
196
208
197
209
From this `print` statement, we now see that we have a list called words, where we have split `energy_line`. The energy is actually the fourth element of this list, so we can now save it as a new variable.
198
210
199
-
```
211
+
```python
200
212
energy = words[3]
201
213
print(energy)
202
214
```
203
215
{: .language-python}
204
216
217
+
> ## Python negative indexing
218
+
> We also recogize that "energy" is the last element of the list. Therefore, an alternative way to assign `energy` is:
219
+
> ```python
220
+
> energy = words[-1]
221
+
> print(energy)
222
+
> ```
223
+
>
224
+
> In the example above, the index value of `-1` gives the last element, and `-2` would give the second last element of a list, and so on. An excelent tutorial on Python list accessed by index can be found [here](https://realpython.com/python-lists-tuples/#list-elements-can-be-accessed-by-index)
225
+
{: .callout}
226
+
227
+
228
+
205
229
```
206
230
-154.09130176573018
207
231
```
@@ -237,48 +261,48 @@ energy = float(words[3])
237
261
>## Exercise on File Parsing (should we move this to the end?)
238
262
Use the provided sapt.out file. In this output file, the program calculates the interaction energy for an ethene-ethyne complex. The output reports four interaction energy components: electrostatics, induction, exchange, and dispersion. Parse each of these energies, in kcal/mole, from the output file. (Hint: study the file in a text editor to help you decide what to search for.) Calculate the total interaction energy by adding the four components together. Your code's output should look something like this:
239
263
> ~~~
240
-
> Electrostatics : -2.25850118 kcal/mole
241
-
> Exchange : 2.27730198 kcal/mole
242
-
> Induction : -0.5216933 kcal/mole
243
-
> Dispersion : -0.9446677 kcal/mole
244
-
> Total Energy : 1.4475602000000003 kcal/mole
264
+
> Electrostatics : -2.25850118 kcal/mol
265
+
> Exchange : 2.27730198 kcal/mol
266
+
> Induction : -0.5216933 kcal/mol
267
+
> Dispersion : -0.9446677 kcal/mol
268
+
> Total Energy : 1.4475602000000003 kcal/mol
245
269
> ~~~
246
270
> {: language.python}
247
271
>
248
272
> > ## Solution
249
273
>>
250
274
>> This is one possible solution for the SAPT parsing exercise
>> print('Total Energy : {} kcal/mol'.format(total_energy))
282
306
>> ~~~
283
307
>> {: .language-python}
284
308
> {: .solution}
@@ -300,6 +324,13 @@ for linenum, line in enumerate(list_name):
300
324
301
325
In this notation, there are now *two* variables you can use in your loop commands, `linenum` (which can be named something else) will keep up with what iteration you are on in the loop, in this case what line you are on in the file. The variable `line` (which could be named something else) functions exactly as it did before, holding the actual information from the list. Finally, instead of just giving the list name you use `enumerate(list_name)`.
302
326
327
+
> ## `Enumerate` with index other than 0:
328
+
> `enumerate(list_name)` will start with 0-index so the first line will be label as '0', to change this behavior, use `start` variable in enumerate. For example, to start with index of "1" you can do:
329
+
> ```python
330
+
> for linenum, line in enumerate(data, start=1):
331
+
> # do something with 'linenum' and 'line'
332
+
{: .callout}
333
+
303
334
This block of code searches our file for the line that contains "Center" and reports the line number.
304
335
```
305
336
for linenum, line in enumerate(data):
@@ -313,7 +344,7 @@ for linenum, line in enumerate(data):
313
344
Center X Y Z Mass
314
345
```
315
346
{: .output}
316
-
Now we know that this is line 77 in our file (remember that you start counting at zero!).
347
+
Now we know that this is line 77 in our file (remember that you start counting at zero!).
317
348
318
349
>## Check Your Understanding
319
350
>What would be printed if you entered the following:
@@ -345,6 +376,6 @@ Now we know that this is line 77 in our file (remember that you start counting a
345
376
{: .challenge}
346
377
347
378
## A final note about regular expressions
348
-
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at _______.
379
+
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions (regex) is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/2e/chapter7/) book. A great test site for regex is [here](https://regex101.com/)
0 commit comments