Skip to content

Commit 491d3f4

Browse files
authored
Merge branch 'gh-pages' into documentation
2 parents 6215da3 + 2a2a4d4 commit 491d3f4

8 files changed

+349
-85
lines changed

Diff for: README.md

+16-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,19 @@
1-
# FIXME Lesson title
1+
# Python Scripting for the Computational Molecular Sciences
22

3-
[![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://swc-slack-invite.herokuapp.com/)
3+
This is the GitHub repository for the Python Data and Scripting Workshop developed by [The Molecular Sciences Software Institute](https://molssi.org). This website template is based on a template developed by [The Software Carpentries](https://software-carpentry.org). You can find the rendered website for this material [here](https://molssi-education.github.io/python_scripting_cms/).
44

5-
FIXME
5+
The MolSSI Python Data and Scripting workshop is designed for students who are currently involved in, or planning to start computational chemistry research. This workshop is designed to help students develop practical programming skills that will benefit their undergraduate research, and will take students through introductory programming and scripting with Python to version control and sharing their code with others. NO prior programming experience is required.
6+
7+
### Workshop Topics
8+
- Basic Python syntax and control structures
9+
- Reading and writing files
10+
- File manipulation and parsing
11+
- Analyzing and graphing data
12+
- Writing functions
13+
- Creating command line programs from Python scripts
14+
- Basic testing using PyTest
15+
- Version control with git
16+
- Sharing code on GitHub
617

718
## Contributing
819

@@ -15,7 +26,8 @@ how to write new episodes.
1526

1627
## Maintainer(s)
1728

18-
* FIXME
29+
* Jessica A. Nash ([email protected], GitHub: janash)
30+
* Ashley Ringer McDonald
1931

2032
## Authors
2133

Diff for: _episodes/01-introduction.md

+18-14
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Now that our notebook is set-up, we're ready to start learning some Python!
4848
Any python interpreter can work just like a calculator. This is not very useful. Type the following into the next cell of your Jupyter notebook.
4949

5050
```
51-
3+7
51+
3 + 7
5252
```
5353
{: .language-python}
5454

@@ -68,7 +68,7 @@ Let's see this in action with a calculation. Type the following into the next ce
6868
deltaH = -541.5 #kJ/mole
6969
deltaS = 10.4 #kJ/(mole K)
7070
temp = 298 #Kelvin
71-
deltaG = deltaH - temp*deltaS
71+
deltaG = deltaH - temp * deltaS
7272
```
7373
{: .language-python}
7474

@@ -93,7 +93,7 @@ In the previous code block, we introduced the `print()` function. Often, we wil
9393
Note that if you do not specify a new name for a variable, then it doesn't automatically change the value of the variable; this is called being *immutable*. For example if we typed
9494
```
9595
print(deltaG)
96-
deltaG*1000
96+
deltaG * 1000
9797
print(deltaG)
9898
```
9999
{: .language-python}
@@ -106,7 +106,7 @@ print(deltaG)
106106
Nothing happened to the value of `deltaG`. If we wanted to change the value of `deltaG` we would have to re-save the variable using the same name to overwrite the existing value.
107107
```
108108
print(deltaG)
109-
deltaG = deltaG*1000
109+
deltaG = deltaG * 1000
110110
print(deltaG)
111111
```
112112
{: .language-python}
@@ -119,7 +119,7 @@ print(deltaG)
119119
There are situations where it is reasonable to overwrite a variable with a new value, but you should always think carefully about this. Usually it is a better practice to give the variable a new name and leave the existing variable as is.
120120
```
121121
print(deltaG)
122-
deltaG_joules = deltaG*1000
122+
deltaG_joules = deltaG * 1000
123123
print(deltaG)
124124
print(deltaG_joules)
125125
```
@@ -137,7 +137,7 @@ Python can do what is called multiple assignment where you assign several variab
137137
```
138138
#I can assign all these variables at once
139139
deltaH, deltaS, temp = -541.5, 10.4, 298
140-
deltaG = deltaH - temp*deltaS
140+
deltaG = deltaH - temp * deltaS
141141
print(deltaG)
142142
```
143143
{: .language-python}
@@ -201,7 +201,7 @@ print(energy_kcal[0])
201201
You can use an element of a list as a variable in a calculation.
202202
```
203203
# Convert the second list element to kilojoules.
204-
energy_kilojoules = energy_kcal[1]*4.184
204+
energy_kilojoules = energy_kcal[1] * 4.184
205205
print(energy_kilojoules)
206206
```
207207
{: .language-python}
@@ -263,7 +263,7 @@ energy_kcal[0:2]
263263
print(energy_kcal)
264264
```
265265
{: .language-python}
266-
nothing happens to `energy_kcal.
266+
nothing happens to `energy_kcal`.
267267
```
268268
[-13.4, -2.7, 5.4, 42.1]
269269
[-13.4, -2.7, 5.4, 42.1]
@@ -279,10 +279,14 @@ for variable in list:
279279
```
280280
{: .language-python}
281281
282-
Indentation is very important in python. There is nothing like an `end` or `exit` statement that tells you that you are finished with the loop. The indentation shows you what statements are in the loop. Let's use a loop to change all of our energies in kcal to kJ.
282+
There are two very important pieces of syntax for the `for` loop. Notice the colon `:` after the word list. You will always have a colon at the end of a `for` statement. If you forget the colon, you will get an error when you try to run your code.
283+
284+
The second thing to notice is that the lines of code under the `for` loop (the things you want to do several times) are indented. Indentation is very important in python. There is nothing like an `end` or `exit` statement that tells you that you are finished with the loop. The indentation shows you what statements are in the loop. Each indentation is 4 spaces by convention in Python 3. However, if you are using an editor which understands Python, it will do the correct indentation for you when you press the tab key on your keyboard. In fact, the Jupyter notebook will notice that you used a colon (`:`) in the previous line, and will indent for you (so you will not need to press tab).
285+
286+
Let's use a loop to change all of our energies in kcal to kJ.
283287
```
284288
for number in energy_kcal:
285-
kJ = number*4.184
289+
kJ = number * 4.184
286290
print(kJ)
287291
```
288292
{: .language-python}
@@ -304,7 +308,7 @@ list_name.append(new_thing)
304308
Try running this block of code. See if you can figure out why it doesn't work.
305309
```
306310
for number in energy_kcal:
307-
kJ = number*4.184
311+
kJ = number * 4.184
308312
energy_kJ.append(kJ)
309313
310314
print(energy_kJ)
@@ -329,7 +333,7 @@ This code doesn't work because on the first iteration of our loop, the list `ene
329333
```
330334
energy_kJ = []
331335
for number in energy_kcal:
332-
kJ = number*4.184
336+
kJ = number * 4.184
333337
energy_kJ.append(kJ)
334338
335339
print(energy_kJ)
@@ -347,7 +351,7 @@ Within your code, you may need to evaluate a variable and then do something if t
347351
negative_energy_kJ = []
348352
349353
for number in energy_kJ:
350-
if number<0:
354+
if number < 0:
351355
negative_energy_kJ.append(number)
352356
353357
print(negative_energy_kJ)
@@ -369,7 +373,7 @@ You can also use `and`, `or`, and `not` to check more than one condition.
369373
```
370374
negative_numbers = []
371375
for number in energy_kJ:
372-
if number<0 or number==0:
376+
if number < 0 or number == 0:
373377
negative_numbers.append(number)
374378
375379
print(negative_numbers)

Diff for: _episodes/02-file_parsing.md

+71-40
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ or similar to this if you are on Windows
5050
~~~
5151
{: .output}
5252

53-
Notice that the file paths are different for these two systems. The Windows system uses a forward slash ('\\'), while Mac and Linux use a backslash ('/') for filepaths.
53+
Notice that the file paths are different for these two systems. The Windows system uses a backslash ('\\'), while Mac and Linux use a forward slash ('/') for filepaths.
5454

5555
When we write a script, we want it to be usable on any operating system, thus we will use a python module called `os.path` that will allow us to define file paths in a general way.
5656

@@ -69,7 +69,7 @@ data/outfiles/ethanol.out
6969
~~~
7070
{:. .output}
7171

72-
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a forward slash is used.
72+
Here, we have specified that our filepath contains the 'data' and 'outfiles' directory, and the `os.path` module has made this into a filepath that is usable by our system. If you are on Windows, you will instead see that a backslash is used.
7373

7474
> ## Absolute and relative paths
7575
> File paths can be *absolute*, or *relative*.
@@ -108,6 +108,18 @@ outfile.close()
108108
~~~
109109
{: .language-python}
110110

111+
> ## An alternative way to open a file.
112+
> Alternatively, you can open a file using `context-manager`. In this case, the context manager will automatically handle closing of the file. To use a context manager to open and close the file, you use the word `with`, and put everything you want to be done while the file is open in an indented block.
113+
> ~~~
114+
> with open(ethanol_file,"r") as outfile:
115+
> data = outfile.readlines()
116+
> ~~~
117+
> {: .language-python}
118+
>
119+
> This is often the preferred way to deal with files because you do not have to remember to close the file.
120+
{: .callout}
121+
122+
111123
> ## Check Your Understanding
112124
> Check that your file was read in correctly by determining how many lines are in the file.
113125
>> ## Answer
@@ -129,7 +141,7 @@ Let's take a look at what's in the file.
129141
130142
~~~
131143
for line in data:
132-
print(line)
144+
print(line)
133145
~~~
134146
{: .language-python}
135147
@@ -196,12 +208,24 @@ print(words)
196208
197209
From this `print` statement, we now see that we have a list called words, where we have split `energy_line`. The energy is actually the fourth element of this list, so we can now save it as a new variable.
198210
199-
```
211+
```python
200212
energy = words[3]
201213
print(energy)
202214
```
203215
{: .language-python}
204216
217+
> ## Python negative indexing
218+
> We also recogize that "energy" is the last element of the list. Therefore, an alternative way to assign `energy` is:
219+
> ```python
220+
> energy = words[-1]
221+
> print(energy)
222+
> ```
223+
>
224+
> In the example above, the index value of `-1` gives the last element, and `-2` would give the second last element of a list, and so on. An excelent tutorial on Python list accessed by index can be found [here](https://realpython.com/python-lists-tuples/#list-elements-can-be-accessed-by-index)
225+
{: .callout}
226+
227+
228+
205229
```
206230
-154.09130176573018
207231
```
@@ -237,48 +261,48 @@ energy = float(words[3])
237261
>## Exercise on File Parsing (should we move this to the end?)
238262
Use the provided sapt.out file. In this output file, the program calculates the interaction energy for an ethene-ethyne complex. The output reports four interaction energy components: electrostatics, induction, exchange, and dispersion. Parse each of these energies, in kcal/mole, from the output file. (Hint: study the file in a text editor to help you decide what to search for.) Calculate the total interaction energy by adding the four components together. Your code's output should look something like this:
239263
> ~~~
240-
> Electrostatics : -2.25850118 kcal/mole
241-
> Exchange : 2.27730198 kcal/mole
242-
> Induction : -0.5216933 kcal/mole
243-
> Dispersion : -0.9446677 kcal/mole
244-
> Total Energy : 1.4475602000000003 kcal/mole
264+
> Electrostatics : -2.25850118 kcal/mol
265+
> Exchange : 2.27730198 kcal/mol
266+
> Induction : -0.5216933 kcal/mol
267+
> Dispersion : -0.9446677 kcal/mol
268+
> Total Energy : 1.4475602000000003 kcal/mol
245269
> ~~~
246270
> {: language.python}
247271
>
248272
> > ## Solution
249273
>>
250274
>> This is one possible solution for the SAPT parsing exercise
251275
>> ~~~
252-
>> saptout = open('SAPT.out','r')
253-
>> saptlines = saptout.readlines()
254-
>> important_lines=[]
255-
>> energies=[]
256-
>> for line in saptlines:
257-
>> if 'Electrostatics ' in line:
258-
>> electro_line = line
259-
>> important_lines.append(electro_line)
260-
>> if 'Exchange ' in line:
261-
>> exchange_line = line
262-
>> important_lines.append(exchange_line)
263-
>> if 'Induction ' in line:
264-
>> induction_line = line
265-
>> important_lines.append(induction_line)
266-
>> if 'Dispersion ' in line:
267-
>> dispersion_line = line
268-
>> important_lines.append(dispersion_line)
269-
>>
270-
>> #print(important_lines)
271-
>>
276+
>> important_lines = []
277+
>> energies = []
278+
>>
279+
>> with open('SAPT.out','r') as saptout:
280+
>> for line in saptout:
281+
>> if 'Electrostatics ' in line:
282+
>> electro_line = line
283+
>> important_lines.append(electro_line)
284+
>> if 'Exchange ' in line:
285+
>> exchange_line = line
286+
>> important_lines.append(exchange_line)
287+
>> if 'Induction ' in line:
288+
>> induction_line = line
289+
>> important_lines.append(induction_line)
290+
>> if 'Dispersion ' in line:
291+
>> dispersion_line = line
292+
>> important_lines.append(dispersion_line)
293+
>>
294+
>> # print(important_lines)
295+
>>
272296
>> for line in important_lines:
273-
>> words = line.split()
274-
>> #print(words)
275-
>> energy_type = words[0]
276-
>> energy_kcal = float(words[3])
277-
>> energies.append(energy_kcal)
278-
>> print(energy_type, ':', energy_kcal, 'kcal/mole')
279-
>>
280-
>> total_energy=energies[0]+energies[1]+energies[2]+energies[3]
281-
>> print('Total Energy', ':', total_energy, 'kcal/mole')
297+
>> words = line.split()
298+
>> # print(words)
299+
>> energy_type = words[0]
300+
>> energy_kcal = float(words[3])
301+
>> energies.append(energy_kcal)
302+
>> print('{} : {} kcal/mol'.format(energy_type, energy_kcal))
303+
>>
304+
>> total_energy = sum(energies)
305+
>> print('Total Energy : {} kcal/mol'.format(total_energy))
282306
>> ~~~
283307
>> {: .language-python}
284308
> {: .solution}
@@ -300,6 +324,13 @@ for linenum, line in enumerate(list_name):
300324
301325
In this notation, there are now *two* variables you can use in your loop commands, `linenum` (which can be named something else) will keep up with what iteration you are on in the loop, in this case what line you are on in the file. The variable `line` (which could be named something else) functions exactly as it did before, holding the actual information from the list. Finally, instead of just giving the list name you use `enumerate(list_name)`.
302326
327+
> ## `Enumerate` with index other than 0:
328+
> `enumerate(list_name)` will start with 0-index so the first line will be label as '0', to change this behavior, use `start` variable in enumerate. For example, to start with index of "1" you can do:
329+
> ```python
330+
> for linenum, line in enumerate(data, start=1):
331+
> # do something with 'linenum' and 'line'
332+
{: .callout}
333+
303334
This block of code searches our file for the line that contains "Center" and reports the line number.
304335
```
305336
for linenum, line in enumerate(data):
@@ -313,7 +344,7 @@ for linenum, line in enumerate(data):
313344
Center X Y Z Mass
314345
```
315346
{: .output}
316-
Now we know that this is line 77 in our file (remember that you start counting at zero!).
347+
Now we know that this is line 77 in our file (remember that you start counting at zero!).
317348
318349
>## Check Your Understanding
319350
>What would be printed if you entered the following:
@@ -345,6 +376,6 @@ Now we know that this is line 77 in our file (remember that you start counting a
345376
{: .challenge}
346377
347378
## A final note about regular expressions
348-
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at _______.
379+
Sometimes you will need to match something more complex than just a particular word or phrase in your output file. Sometimes you will need to match a particular word, but only if it is found at the beginning of a line. Or perhaps you will need to match a particular pattern of data, like a capital letter followed by a number, but you won't know the exact letter and number you are looking for. These types of matching situations are handled with something called *regular expressions* which is accessed through the python module `re`. While using regular expressions (regex) is outside the scope of this tutorial, they are very useful and you might want to learn more about them in the future. A tutorial can be found at [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/2e/chapter7/) book. A great test site for regex is [here](https://regex101.com/)
349380
350381
{% include links.md %}

0 commit comments

Comments
 (0)