Skip to content

Commit c1aac26

Browse files
authored
Merge pull request #58 from janash/argparse
switch command line lesson to use argparse
2 parents 467b32e + fb47108 commit c1aac26

File tree

1 file changed

+191
-46
lines changed

1 file changed

+191
-46
lines changed

_episodes/07-command_line.md

Lines changed: 191 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,18 @@ questions:
66
- "How do I move my code from the interactive jupyter notebook to run from the Linux command line?"
77
objectives:
88
- "Make code executable from the Linux command line."
9-
- "Use sys.argv() to accept user inputs."
9+
- "Use argparse to accept user inputs."
1010
keypoints:
11-
- "You must `import sys` in your code to accept user arguments."
12-
- "The name of the script itself is always `sys.argv[0]` so the first user input is normally `sys.argv[1]`."
11+
- "You must `import argparse` in your code to accept user arguments."
12+
- "You add must first create an argument parser using `parser = argparse.ArgumentParser`"
13+
- "You add arguments using `parser.add_argument`"
1314
---
1415
## Creating and running a python input file
1516

1617
We are now going to move our geometry analysis code out of the Jupyter notebook and into a format that can be run from the Linux command line. Open your favorite text editor and create a new file called "geom_analysis.py" (or choose another filename, just make sure the extension is .py). Paste in your geometry analysis code (the version with your functions) from your jupyter notebook and save your file.
1718

1819
The best practice is to put all your functions at the top of the file, right after your import statements. Your file will look something like this.
19-
```
20+
~~~
2021
import numpy
2122
import os
2223
@@ -49,81 +50,133 @@ for num1 in range(0,num_atoms):
4950
bond_length_12 = calculate_distance(coord[num1], coord[num2])
5051
if bond_check(bond_length_12) is True:
5152
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
52-
```
53+
~~~
5354
{: .language-python}
5455

5556
Exit your text editor and go back to the command line. Now all you have to do to run your code is type
5657

57-
```
58+
~~~
5859
$ python geom_analysis.py
59-
```
60+
~~~
6061
{: .language-bash}
6162
in your Terminal window. Your code should either print the output to the screen or write it to a file, depending on what you have it set up to do. (The code example given prints to the screen.)
6263

6364
## Changing your code to accept user inputs
64-
In your current code, the name of the xyzfile to analyze, "water.xyz", is hardcoded; in order to change it, you have to open your code and change the name of the file that is read in. If you were going to use this code to analyze geometries in your research, you would probably want to be able to specify the name of the input file when you run the code, so that you don't have to change it every single time. These types of user inputs are called *arguments* and to make our code accept arguments, we have to import a new python library in our code.
65+
In your current code, the name of the xyzfile to analyze, "water.xyz", is hardcoded; in order to change it, you have to open your code and change the name of the file that is read in. If you were going to use this code to analyze geometries in your research, you would probably want to be able to specify the name of the input file when you run the code, so that you don't have to change it every single time. You might want to use the script like this:
66+
67+
~~~
68+
$ python geom_analysis.py water.xyz
69+
~~~
70+
{: .language-bash}
71+
72+
These types of user inputs are called *arguments* and to make our code accept arguments, we have to import a new python library in our code.
73+
6574

6675
Open your geometry analysis code in your text editor and add this line at the top.
6776

6877
~~~
69-
import sys
78+
import argparse
7079
~~~
7180
{: .language-python}
7281

73-
Now that you have imported the `sys` library, you can use its functions. The library has a function called `sys.argv()` which creates a list of all the arguments the user enters at the command line. Everything after *python* is an argument, so `sys.argv[0]` is always the name of your script. We would like our code to accept the name of the xyz file we want to analyze as an argument. Add this line to your code.
74-
```
75-
xyzfilename = sys.argv[1]
76-
```
82+
We are importing a library called [https://docs.python.org/3/library/argparse.html](argparse) which can be used to easily make scripts with command line arguments. `Argparse` has the ability to allow us to easily write documentation for our scripts as well.
83+
84+
We tell argparse that we want to add a command line interface. The syntax for this is
85+
86+
~~~
87+
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
88+
~~~
7789
{: .language-python}
7890

79-
Then you need to go the part of your code where you read in the data from the xyz file and change the name of the file to read to `xyzfilename`.
80-
```
91+
We've included a description of the script for our users using `description=`. This description does not need to explain what the arguments are, that will be done automatically for us in the next steps.
92+
93+
Next, we have to tell `argparse` what arguments it should expect. In general, the syntax for this is
94+
95+
~~~
96+
parser.add_argument("argument_name", help="Your help message for this argument.")
97+
~~~
98+
{: .language-python}
99+
100+
Let's add one for the xyz file the user should specify.
101+
102+
~~~
103+
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
104+
~~~
105+
106+
Next, we have to get the arguments.
107+
108+
~~~
109+
args = parser.parse_args()
110+
~~~
111+
{: .language-python}
112+
113+
Our arguments are in the `args` variable. We can get the value of an argument by using `args.argument_name`, so to get the xyz file the user puts in, we use `args.xyz_file`. Notice that what follows after the dot is the same thing we but in quotation marks when using `add_argument.`
114+
115+
~~~
116+
xyzfilename = args.xyz_file
81117
symbols, coord = open_xyz(xyzfilename)
82-
```
118+
~~~
83119
{: .language python}
84120

85121
Save your code and go back to the Terminal window. Make sure you are in the directory where your code is saved and type
86-
```
87-
$ python geom_analysis.py data/water.xyz
88-
```
122+
123+
~~~
124+
$ python geom_analysis.py --help
125+
~~~
126+
{: .language-bash}
127+
128+
This will print a help message. The `argparse` library has written this help message for us based on the descriptions and arguments we added.
129+
130+
~~~
131+
usage: analyze.py [-h] xyz_file
132+
133+
This script analyzes a user given xyz file and outputs the length of the
134+
bonds.
135+
136+
positional arguments:
137+
xyz_file The filepath for the xyz file to analyze.
138+
139+
optional arguments:
140+
-h, --help show this help message and exit
141+
~~~
142+
{: .output}
143+
144+
Now try running your script with an xyz file.
145+
146+
~~~
147+
$ python analyze.py data/water.xyz
148+
~~~
149+
{: .language-bash}
150+
89151
Check that the output of your code is what you expected.
90152

153+
What would happen if the user forgot to specify the name of the xyz file?
91154

92-
What would happen if the user forgot to specify the name of the xyz file? The way the code is written now, it would give an error message.
93-
```
94-
Traceback (most recent call last):
95-
File "geom_analysis.py", line 22, in <module>
96-
xyzfilename = sys.argv[1]
97-
IndexError: list index out of range
98-
```
155+
~~~
156+
usage: analyze.py [-h] xyz_file
157+
analyze.py: error: the following arguments are required: xyz_file
158+
~~~
99159
{: .error}
100-
The reason it says the list index is out of range is because `sys.argv[1]` does not exist. Since the user forgot to specify the name of the xyz file, the `sys.argv` list only has one element, `sys.argv[0]`. It would be better to print an error message and let the user know that they didn't enter the input correctly. Our code is expecting exactly two inputs: the script name and the xyz file name. The easiest way to add an error message is to check the length of the sys.argv list and print an error message and exit if it does not equal the expected length.
101-
102-
While you have practiced coding, you have probably seen many error messages. We can actually raise errors in our code and write error messages to our users.
103-
```
104-
if len(sys.argv) < 2:
105-
raise NameError("Incorrect input! Please specify a file to analyze.")
106-
```
107-
{: .language-python}
108160

109-
This will exit the code and print our error message if the user does not specify a filename.
161+
Argparse handles this for us and prints an error message. It tells us that we must specifiy an xyz file.
110162

111-
There are different types of errors you can raise. For example, you may want to raise a `TypeError` if you have data that is not the right type. If you want to learn more about raising errors, [see the official documenation from Python](https://docs.python.org/3/tutorial/errors.html)
163+
Try out your program with other XYZ files in your `data` folder.
112164

113-
We need to add one more thing to our code. When you write a code that includes function definitions and a main script, you need to tell python which part is the main script. (This becomes very important later when we are talking about testing.) *After* your import statements and function definitions and *before* you check the length of the `sys.argv` list add this line to your code.
165+
## The "main" part of our script
166+
We need to add one more thing to our code. When you write a code that includes function definitions and a main script, you need to tell python which part is the main script. (This becomes very important later when we are talking about testing.) *After* your import statements and function definitions and *before* use `argparse`
114167
```
115168
if __name__ == "__main__":
116169
```
117170
{: .language-python}
118171

119-
Since this is an `if` statement, you now need to indent each line of your main script below this if statement. Be very careful with your indentation! Don't use a mixture of tabs and spaces!
172+
Since this is an `if` statement, you now need to indent each line of your main script below this if statement. Be very careful with your indentation! Don't use a mixture of tabs and spaces! A good way to indent multiple lines in many text editors is to highlight the lines you would like to indent, then press `tab`.
120173

121174
Save your code and run it again. It should work exactly as before. If you now get an error message, it is probably due to inconsistent indentation.
122175

123-
```
176+
~~~
124177
import os
125178
import numpy
126-
import sys
179+
import argparse
127180
128181
def calculate_distance(atom1_coord, atom2_coord):
129182
x_distance = atom1_coord[0] - atom2_coord[0]
@@ -148,22 +201,114 @@ def open_xyz(filename):
148201
149202
if __name__ == "__main__":
150203
151-
if len(sys.argv) < 2:
152-
raise NameError("Incorrect input! Please specify a file to analyze.")
204+
## Get the arguments.
205+
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
206+
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
153207
208+
args = parser.parse_args()
154209
155-
xyz_file = sys.argv[1]
156-
symbols, coord = open_xyz(xyz_file)
210+
symbols, coord = open_xyz(args.xyz_file)
157211
num_atoms = len(symbols)
158212
159213
for num1 in range(0,num_atoms):
160214
for num2 in range(0,num_atoms):
161215
if num1<num2:
162216
bond_length_12 = calculate_distance(coord[num1], coord[num2])
163-
if bond_check(bond_length_12) is True:
217+
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
164218
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
219+
~~~
220+
{: .language-python}
165221

166-
```
222+
## Extension - Optional Arguments
223+
What's another argument we might want to include? We also might want to let the user specify a minimum and maximum bond length on the command line. We would want these to be optional, just like they are in our function.
224+
225+
We can add optional arguments by putting a dash (`-`) or two dashes (`--`) in front of the argument name when we add an argument. Add this line below where you added the fist argument. Note that all `add_argument` lines should be above the line with `parse_args`.
226+
227+
~~~
228+
parser.add_argument('-minimum_length', help='The minimum distance to consider atoms bonded.', type=float, default=0)
229+
~~~
230+
{: .language-python}
231+
232+
We've added two new things to our argument as well. We have told `argparse` that the argument people will pass for this will be a decimal number (float) and that the default value will be zero. If a user does not use `-minimum_length`, the value will be zero.
233+
234+
Now we will change our code to use this value. Find the line where you call `bond_check` and change it to use the value from `argparse`.
235+
236+
~~~
237+
if bond_check(bond_length_12, minimum_length=args.minimum_length) is True:
238+
~~~
167239
{: .language-python}
168240

241+
~~~
242+
$ python analyze.py data/water.xyz -minimum_length 1
243+
~~~
244+
{: .language-bash}
245+
246+
Now we can override the minimum length when running our script. If we don't specify it, it will default to using zero.
247+
248+
We can do the same thing for our maximum bond length.
249+
250+
~~~
251+
parser.add_argument('-maximum_length', help='The maximium distance to consider atoms bonded.', type=float, default=1.5)
252+
~~~
253+
{: .language-python}
254+
255+
And, don't forget to update your `bond_check` function.
256+
~~~
257+
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
258+
~~~
259+
{: .language-python}
260+
261+
Our final program looks like this:
262+
263+
~~~
264+
import os
265+
import numpy
266+
import argparse
267+
268+
def calculate_distance(atom1_coord, atom2_coord):
269+
x_distance = atom1_coord[0] - atom2_coord[0]
270+
y_distance = atom1_coord[1] - atom2_coord[1]
271+
z_distance = atom1_coord[2] - atom2_coord[2]
272+
bond_length_12 = numpy.sqrt(x_distance**2+y_distance**2+z_distance**2)
273+
return bond_length_12
274+
275+
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
276+
if atom_distance > minimum_length and atom_distance <= maximum_length:
277+
return True
278+
else:
279+
return False
280+
281+
def open_xyz(filename):
282+
xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode')
283+
symbols = xyz_file[:,0]
284+
coord = (xyz_file[:,1:])
285+
coord = coord.astype(numpy.float)
286+
return symbols, coord
287+
288+
289+
if __name__ == "__main__":
290+
291+
## Get the arguments.
292+
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
293+
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
294+
295+
parser.add_argument('-minimum_length', help='The minimum distance to consider atoms bonded.', type=float, default=0)
296+
parser.add_argument('-maximum_length', help='The maximium distance to consider atoms bonded.', type=float, default=1.5)
297+
298+
args = parser.parse_args()
299+
300+
symbols, coord = open_xyz(args.xyz_file)
301+
num_atoms = len(symbols)
302+
303+
for num1 in range(0,num_atoms):
304+
for num2 in range(0,num_atoms):
305+
if num1<num2:
306+
bond_length_12 = calculate_distance(coord[num1], coord[num2])
307+
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
308+
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
309+
~~~
310+
{: .language-python}
311+
312+
313+
169314
{% include links.md %}

0 commit comments

Comments
 (0)