@@ -10,7 +10,7 @@ File System -- os, os.path, shutil
10
10
The * os* and * os.path* modules include many functions to interact
11
11
with the file system. The * shutil* module can copy files.
12
12
13
- - [ os module docs] ( https://docs.python.org/2.7 /library/os.html )
13
+ - [ os module docs] ( https://docs.python.org/3 /library/os.html )
14
14
- ` filenames = os.listdir(dir) ` -- list of filenames in that directory
15
15
path (not including . and ..). The filenames are just the names in
16
16
the directory, not their absolute paths.
@@ -34,9 +34,9 @@ import os
34
34
def printdir (dir ):
35
35
filenames = os.listdir(dir )
36
36
for filename in filenames:
37
- print filename
38
- print os.path.join(dir , filename)
39
- print os.path.abspath(os.path.join(dir , filename))
37
+ print ( filename)
38
+ print ( os.path.join(dir , filename) )
39
+ print ( os.path.abspath(os.path.join(dir , filename) ))
40
40
```
41
41
42
42
Exploring a module works well with the built-in python ` help() ` and ` dir() `
@@ -57,7 +57,7 @@ Running External Processes -- subprocess
57
57
The ` subprocess ` module is a simple way to run an external command and
58
58
capture its output.
59
59
60
- - [ subprocess module docs] ( https://docs.python.org/2.7 /library/subprocess.html )
60
+ - [ subprocess module docs] ( https://docs.python.org/3 /library/subprocess.html )
61
61
- ` subprocess.check_output(["cmd", "argument1"]) ` -- Run command with
62
62
arguments and return its output as a byte string. If the return code was
63
63
non-zero it raises a ` CalledProcessError ` . The ` CalledProcessError ` object will
@@ -76,7 +76,7 @@ import subprocess
76
76
def listdir (dir ):
77
77
args = shlex.split(' ls -l ' + dir )
78
78
output = subprocess.check_output(args)
79
- print output
79
+ print ( output)
80
80
```
81
81
82
82
Exceptions
@@ -89,112 +89,93 @@ run-time error might be that a variable used in the program does not
89
89
have a value (` ValueError ` .. you've probably seen that one a few times),
90
90
or a file open operation error because a file does not exist (` IOError ` ).
91
91
Learn more in [ the exceptions
92
- tutorial] ( https://docs.python.org/2.7 /tutorial/errors.html ) and see [ the entire
93
- exception list] ( https://docs.python.org/2.7 /library/exceptions.html ) .
92
+ tutorial] ( https://docs.python.org/3 /tutorial/errors.html ) and see [ the entire
93
+ exception list] ( https://docs.python.org/3 /library/exceptions.html ) .
94
94
95
95
Without any error handling code (as we have done thus far), a run-time
96
96
exception just halts the program with an error message. That's a good
97
97
default behavior, and you've seen it many times. You can add a
98
98
"try/except" structure to your code to handle exceptions, like this:
99
99
100
100
``` python
101
- import io
102
-
103
101
filename = ' does_not_exist.txt'
104
102
try :
105
- f = io.open(filename)
103
+ with open (filename) as f:
104
+ for line in f:
105
+ print (line,end = " " )
106
106
except IOError as e:
107
- print e.strerror
108
- print e.filename
109
- else :
110
- for line in f:
111
- print line,
112
- f.close()
107
+ print (e.strerror)
108
+ print (e.filename)
113
109
```
110
+ Or you could write that in this way, if you prefer to keep error handling near the
111
+ function that throws them:
114
112
113
+ ``` python
114
+ try :
115
+ f = open (filename)
116
+ except IOError :
117
+ print (' error' )
118
+ else :
119
+ with f:
120
+ print (f.readlines())
121
+ ```
115
122
The try: section includes the code which might throw an exception. The except:
116
123
section holds the code to run if there is an exception. If there is no
117
124
exception, the except: section is skipped (that is, that code is for error
118
125
handling only, not the "normal" case for the code). The optional ` else ` section
119
126
is useful for code that must be executed if the try clause does not raise an
120
127
exception.
121
128
122
- HTTP -- urllib2 and urlparse
129
+ HTTP -- requests
123
130
---------------------------
124
131
125
- The module * urllib2 * provides url fetching -- making a url look like a
126
- file you can read from. The * urlparse * module can take apart and put
127
- together urls .
132
+ The module * requests * provides url fetching -- making a url look like a
133
+ file you can read from. While * Requests-html * makes parsing HTML as simple as
134
+ possible .
128
135
129
- - [ urllib2 module
130
- docs] ( https://docs.python.org/2/library/urllib2.html )
131
- - ` ufile = urllib2.urlopen(url) ` -- returns a file like object for that
136
+ - [ requests module] ( http://docs.python-requests.org/en/master/# )
137
+ [ docs] ( http://docs.python-requests.org/en/master/api/ )
138
+ - [ requests-html module] ( https://html.python-requests.org/ )
139
+ [ docs] ( https://html.python-requests.org/#api-documentation )
140
+ - ` r = requests.get('https://api.github.com/events') ` -- returns a response object for that
132
141
url
133
- - ` text = ufile.read() ` -- can read from it, like a file (readlines()
134
- etc. also work)
135
- - ` info = ufile.info() ` -- the meta info for that request.
136
- ` info.gettype() ` is the mime type, e.g. 'text/html'
137
- - ` baseurl = ufile.geturl() ` -- gets the "base" url for the request,
138
- which may be different from the original because of redirects
139
- - ` urllib2.urlretrieve(url, filename) ` -- downloads the url data to the
140
- given file path
141
- - ` urlparse.urljoin(baseurl, url) ` -- given a url that may or may not be
142
- full, and the baseurl of the page it comes from, return a full url.
143
- Use ` geturl() ` above to provide the base url.
142
+ - ` text = r.text ` -- get the contents of the page.
143
+ - ` content = r.content ` -- get the ** binary** contents of the response.
144
+ - ` json = r.jason() ` -- get response parsed into a JSON object.
145
+ - ` r.status_code ` -- the HTTP status code (200=all good).
146
+ - ` r.headers ` shows the headers of the response
144
147
145
148
``` python
146
149
# # Given a url, try to retrieve it.
147
150
# # print its base url and its text.
148
151
149
- import urllib2
152
+ import requests
150
153
151
154
def wget (url ):
152
- f = urllib2.urlopen(url)
153
- info = f.info()
154
- print info.gettype()
155
- print ' base url:' + f.geturl()
156
- text = f.read()
157
- print text
155
+ r = requests.get(url)
156
+ if r.status_code == requests.codes.ok:
157
+ print (r.headers[' Content-Type' ])
158
+ print (r.url)
159
+ text= r.text
160
+ print (text)
161
+ else :
162
+ print (f " Problem opening URL ( { url} ) gave { r.status_code} " )
163
+
158
164
159
165
wget(' http://httpbin.org/ip' )
166
+ # Fails due to HTTP 418 (teapot)
167
+ wget(' http://httpbin.org/status/418' )
160
168
```
161
169
162
- The above code works fine, but does not include error handling if a url
163
- does not work for some reason. Here's a version of the function which
164
- adds try/except logic to print an error message if the url operation
165
- fails.
166
-
167
- ``` python
168
- # # Version that uses try/except to print an error message if the
169
- # # urlopen() fails.
170
-
171
- def wget2 (url ):
172
- try :
173
- f = urllib2.urlopen(url)
174
- info = f.info()
175
- print info.gettype()
176
- print ' base url:' + f.geturl()
177
- text = f.read()
178
- print text
179
- except IOError as e:
180
- print ' problem reading url:' , url
181
- print e.code
182
- print e.read()
183
-
184
- # Success
185
- wget2(' http://httpbin.org/ip' )
186
170
187
- # Fails due to HTTP 418 (teapot)
188
- wget2(' http://httpbin.org/status/418' )
189
- ```
190
171
191
172
Exercise
192
173
--------
193
174
194
175
To practice the file system and external-commands material, see the
195
176
[ Copy Special
196
177
Exercise] ( copy-special ) .
197
- To practice the urllib2 material, see the [ Log Puzzle
178
+ To practice the requests material, see the [ Log Puzzle
198
179
Exercise] ( log-puzzle ) .
199
180
200
181
----
@@ -203,5 +184,4 @@ Except as otherwise noted, the content of this page is licensed under
203
184
the [ Creative Commons Attribution 3.0
204
185
License] ( http://creativecommons.org/licenses/by/3.0/ ) , and code samples
205
186
are licensed under the [ Apache 2.0
206
- License] ( http://www.apache.org/licenses/LICENSE-2.0 ) . For details, see
207
- our [ Site Policies] ( https://developers.google.com/terms/site-policies ) .
187
+ License] ( http://www.apache.org/licenses/LICENSE-2.0 ) .
0 commit comments