Skip to content

Commit 89d7f99

Browse files
committed
all text updated, problems need checking
1 parent 65d4dd4 commit 89d7f99

File tree

4 files changed

+82
-109
lines changed

4 files changed

+82
-109
lines changed

copy-special.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,18 @@ your code in copyspecial.py.
1313

1414
The copyspecial.py program takes one or more directories as its
1515
arguments. We'll say that a "special" file is one where the name
16-
contains the pattern \_\_w\_\_ somewhere, where the w is one or more
17-
word chars. The provided main() includes code to parse the command line
16+
contains the pattern `__w__` somewhere, where the w is one or more
17+
word chars. The provided `main()` includes code to parse the command line
1818
arguments, but the rest is up to you. Write functions to implement the
19-
features below and modify main() to call your functions.
19+
features below and modify `main()` to call your functions.
2020

2121
Suggested functions for your solution(details below):
2222

23-
- get\_special\_paths(dir) -- returns a list of the absolute paths of
23+
- `get_special_paths(dir)` -- returns a list of the absolute paths of
2424
the special files in the given directory
25-
- copy\_to(paths, dir) given a list of paths, copies those files into
25+
- `copy_to(paths, dir)` given a list of paths, copies those files into
2626
the given directory
27-
- zip\_to(paths, zippath) given a list of paths, zip those files up
27+
- `zip_to(paths, zippath)` given a list of paths, zip those files up
2828
into the given zipfile
2929

3030
Part A (manipulating file paths)
@@ -45,7 +45,7 @@ We'll assume that names are not repeated across the directories
4545
Part B (file copying)
4646
---------------------
4747

48-
If the "--todir dir" option is present at the start of the command line,
48+
If the "`--todir dir`" option is present at the start of the command line,
4949
do not print anything and instead copy the files to the given directory,
5050
creating it if necessary. Use the python module "shutil" for file
5151
copying.
@@ -57,7 +57,7 @@ copying.
5757
Part C (writing zip files)
5858
------------------------------------
5959

60-
If the "--tozip zipfile" option is present at the start of the command
60+
If the "`--tozip zipfile`" option is present at the start of the command
6161
line create a zip file using the `zipfile` package.
6262

6363
$ ./copyspecial.py --tozip tmp.zip .
@@ -68,5 +68,4 @@ Except as otherwise noted, the content of this page is licensed under
6868
the [Creative Commons Attribution 3.0
6969
License](http://creativecommons.org/licenses/by/3.0/), and code samples
7070
are licensed under the [Apache 2.0
71-
License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
72-
our [Site Policies](https://developers.google.com/terms/site-policies).
71+
License](http://www.apache.org/licenses/LICENSE-2.0).

log-puzzle.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ The slice urls are hidden inside apache log files (the open source
2121
[apache](http://httpd.apache.org/) web server is the most widely used
2222
server on the internet). Each log file is from some server, and the
2323
desired slice urls are hidden within the logs. The log file encodes what
24-
server it comes from like this: the log file animal\_code.google.com is
24+
server it comes from like this: the log file `animal_code.google.com` is
2525
from the code.google.com server (formally, we'll say that the server
2626
name is whatever follows the first underbar). The
27-
animial\_code.google.com log file contains the data for the "animal"
27+
`animial_code.google.com` log file contains the data for the "animal"
2828
puzzle image. Although the data in the log files has the syntax of a
2929
real apache web server, the data beyond what's needed for the puzzle is
3030
randomized data from a real log file.
@@ -38,20 +38,20 @@ what apache log files look like):
3838
The first few numbers are the address of the requesting browser. The
3939
most interesting part is the "GET *path* HTTP" showing the path of a web
4040
request received by the server. The path itself never contain spaces,
41-
and is separated from the GET and HTTP by spaces (regex suggestion: \\S
41+
and is separated from the GET and HTTP by spaces (regex suggestion: `\S`
4242
(upper case S) matches any non-space char). Find the lines in the log
4343
where the string "puzzle" appears inside the path, ignoring the many
4444
other lines in the log.
4545

4646
Part A - Log File To Urls
4747
-------------------------
4848

49-
Complete the read\_urls(filename) function that extracts the puzzle urls
49+
Complete the `read_urls(filename)` function that extracts the puzzle urls
5050
from inside a logfile. Find all the "puzzle" path urls in the logfile.
5151
Combine the path from each url with the server name from the filename to
5252
form a full url, e.g.
5353
"http://www.example.com/path/puzzle/from/inside/file". Screen out urls
54-
that appear more than once. The read\_urls() function should return the
54+
that appear more than once. The `read_urls()` function should return the
5555
list of full urls, sorted into alphabetical order and without
5656
duplicates. Taking the urls in alphabetical order will yield the image
5757
slices in the correct left-to-right order to re-create the original
@@ -66,7 +66,7 @@ one per line.
6666
Part B - Download Images Puzzle
6767
-------------------------------
6868

69-
Complete the download\_images() function which takes a sorted list of
69+
Complete the `download_images()` function which takes a sorted list of
7070
urls and a directory. Download the image from each url into the given
7171
directory, creating the directory first if necessary (see the "os"
7272
module to create a directory, and "urllib.urlretrieve()" for downloading
@@ -78,8 +78,8 @@ working. Each image is a little vertical slice from the original. How to
7878
put the slices together to re-create the original? It can be solved
7979
nicely with a little html (knowledge of HTML is not required).
8080

81-
The download\_images() function should also create an index.html file in
82-
the directory with an \*img\* tag to show each local image file. The img
81+
The `download_images()` function should also create an index.html file in
82+
the directory with an *img* tag to show each local image file. The img
8383
tags should all be on one line together without separation. In this way,
8484
the browser displays all the slices together seamlessly. You do not need
8585
knowledge of HTML to do this; just create an index.html file that looks
@@ -116,7 +116,7 @@ sorting a list of urls each ending with the word-word.jpg pattern should
116116
order the urls by the second word.
117117

118118
Extend your code to order such urls properly, and then you should be
119-
able to decode the second place\_code.google.com puzzle which shows a
119+
able to decode the i`second_place_code.google.com` puzzle which shows a
120120
famous place. What place does it show?
121121

122122
CC Attribution: the images used in this puzzle were made available by
@@ -132,5 +132,4 @@ Except as otherwise noted, the content of this page is licensed under
132132
the [Creative Commons Attribution 3.0
133133
License](http://creativecommons.org/licenses/by/3.0/), and code samples
134134
are licensed under the [Apache 2.0
135-
License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
136-
our [Site Policies](https://developers.google.com/terms/site-policies).
135+
License](http://www.apache.org/licenses/LICENSE-2.0).

regular-expressions.md

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -126,10 +126,7 @@ in the pattern
126126

127127
First the search finds the leftmost match for the pattern, and second it
128128
tries to use up as much of the string as possible -- i.e. `+` and `*` go as
129-
far as possible (the `+` and `*` are said to be "greedy"). If you ever need them
130-
to be less greedy you can use `+?` and `*?` instead and the pattern will only
131-
match the minimum pattern. For example with a string `<a>b<c>`, `<.*>` will
132-
match the whole string but `<.*?>` will only match `<a>`.
129+
far as possible (the `+` and `*` are said to be "greedy").
133130

134131
Repetition Examples
135132
-------------------
@@ -287,22 +284,21 @@ for tuple in tuples:
287284
print(tuple[0])
288285
print(tuple[1])
289286
```
290-
##TODO - here
291287
Once you have the list of tuples, you can loop over it to do some
292288
computation for each tuple. If the pattern includes no parenthesis, then
293289
`findall()` returns a list of found strings as in earlier examples. If the
294-
pattern includes a single set of parenthesis, then findall() returns a
290+
pattern includes a single set of parenthesis, then `findall()` returns a
295291
list of strings corresponding to that single group. (Obscure optional
296292
feature: Sometimes you have paren `( )` groupings in the pattern, but
297293
which you do not want to extract. In that case, write the parens with a
298-
?: at the start, e.g. `(?: )` and that left paren will not count as a
294+
`?:` at the start, e.g. `(?: )` and that left paren will not count as a
299295
group result.)
300296

301297
RE Workflow and Debug
302298
---------------------
303299

304300
Regular expression patterns pack a lot of meaning into just a few
305-
characters , but they are so dense, you can spend a lot of time
301+
characters, but they are so dense, you can spend a lot of time
306302
debugging your patterns. Set up your runtime so you can run a pattern
307303
and print what it matches easily, for example by running it on a small
308304
test text and printing the result of `findall()`. If the pattern matches
@@ -321,14 +317,14 @@ match. The option flag is added as an extra argument to the `search()` or
321317

322318
- `IGNORECASE` -- ignore upper/lowercase differences for matching, so
323319
'a' matches both 'a' and 'A'.
324-
- `DOTALL` -- allow dot (.) to match newline -- normally it matches
325-
anything but newline. This can trip you up -- you think .\* matches
320+
- `DOTALL` -- allow dot (`.`) to match newline -- normally it matches
321+
anything but newline. This can trip you up -- you think `.*` matches
326322
everything, but by default it does not go past the end of a line.
327-
Note that \\s (whitespace) includes newlines, so if you want to
323+
Note that `\s` (whitespace) includes newlines, so if you want to
328324
match a run of whitespace that may include a newline, you can just
329-
use \\s\*
330-
- `MULTILINE` -- Within a string made of many lines, allow \^ and \$ to
331-
match the start and end of each line. Normally \^/\$ would just
325+
use `\s*`
326+
- `MULTILINE` -- Within a string made of many lines, allow `^` and `$` to
327+
match the start and end of each line. Normally `^/$` would just
332328
match the start and end of the whole string.
333329

334330
Greedy vs. Non-Greedy (optional)
@@ -407,5 +403,4 @@ Except as otherwise noted, the content of this page is licensed under
407403
the [Creative Commons Attribution 3.0
408404
License](http://creativecommons.org/licenses/by/3.0/), and code samples
409405
are licensed under the [Apache 2.0
410-
License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
411-
our [Site Policies](https://developers.google.com/terms/site-policies).
406+
License](http://www.apache.org/licenses/LICENSE-2.0).

utilities.md

Lines changed: 52 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ File System -- os, os.path, shutil
1010
The *os* and *os.path* modules include many functions to interact
1111
with the file system. The *shutil* module can copy files.
1212

13-
- [os module docs](https://docs.python.org/2.7/library/os.html)
13+
- [os module docs](https://docs.python.org/3/library/os.html)
1414
- `filenames = os.listdir(dir)` -- list of filenames in that directory
1515
path (not including . and ..). The filenames are just the names in
1616
the directory, not their absolute paths.
@@ -34,9 +34,9 @@ import os
3434
def printdir(dir):
3535
filenames = os.listdir(dir)
3636
for filename in filenames:
37-
print filename
38-
print os.path.join(dir, filename)
39-
print os.path.abspath(os.path.join(dir, filename))
37+
print(filename)
38+
print(os.path.join(dir, filename))
39+
print(os.path.abspath(os.path.join(dir, filename)))
4040
```
4141

4242
Exploring a module works well with the built-in python `help()` and `dir()`
@@ -57,7 +57,7 @@ Running External Processes -- subprocess
5757
The `subprocess` module is a simple way to run an external command and
5858
capture its output.
5959

60-
- [subprocess module docs](https://docs.python.org/2.7/library/subprocess.html)
60+
- [subprocess module docs](https://docs.python.org/3/library/subprocess.html)
6161
- `subprocess.check_output(["cmd", "argument1"])` -- Run command with
6262
arguments and return its output as a byte string. If the return code was
6363
non-zero it raises a `CalledProcessError`. The `CalledProcessError` object will
@@ -76,7 +76,7 @@ import subprocess
7676
def listdir(dir):
7777
args = shlex.split('ls -l ' + dir)
7878
output = subprocess.check_output(args)
79-
print output
79+
print(output)
8080
```
8181

8282
Exceptions
@@ -89,112 +89,93 @@ run-time error might be that a variable used in the program does not
8989
have a value (`ValueError` .. you've probably seen that one a few times),
9090
or a file open operation error because a file does not exist (`IOError`).
9191
Learn more in [the exceptions
92-
tutorial](https://docs.python.org/2.7/tutorial/errors.html) and see [the entire
93-
exception list](https://docs.python.org/2.7/library/exceptions.html).
92+
tutorial](https://docs.python.org/3/tutorial/errors.html) and see [the entire
93+
exception list](https://docs.python.org/3/library/exceptions.html).
9494

9595
Without any error handling code (as we have done thus far), a run-time
9696
exception just halts the program with an error message. That's a good
9797
default behavior, and you've seen it many times. You can add a
9898
"try/except" structure to your code to handle exceptions, like this:
9999

100100
```python
101-
import io
102-
103101
filename = 'does_not_exist.txt'
104102
try:
105-
f = io.open(filename)
103+
with open(filename) as f:
104+
for line in f:
105+
print(line,end=" ")
106106
except IOError as e:
107-
print e.strerror
108-
print e.filename
109-
else:
110-
for line in f:
111-
print line,
112-
f.close()
107+
print(e.strerror)
108+
print(e.filename)
113109
```
110+
Or you could write that in this way, if you prefer to keep error handling near the
111+
function that throws them:
114112

113+
```python
114+
try:
115+
f = open(filename)
116+
except IOError:
117+
print('error')
118+
else:
119+
with f:
120+
print(f.readlines())
121+
```
115122
The try: section includes the code which might throw an exception. The except:
116123
section holds the code to run if there is an exception. If there is no
117124
exception, the except: section is skipped (that is, that code is for error
118125
handling only, not the "normal" case for the code). The optional `else` section
119126
is useful for code that must be executed if the try clause does not raise an
120127
exception.
121128

122-
HTTP -- urllib2 and urlparse
129+
HTTP -- requests
123130
---------------------------
124131

125-
The module *urllib2* provides url fetching -- making a url look like a
126-
file you can read from. The *urlparse* module can take apart and put
127-
together urls.
132+
The module *requests* provides url fetching -- making a url look like a
133+
file you can read from. While *Requests-html* makes parsing HTML as simple as
134+
possible.
128135

129-
- [urllib2 module
130-
docs](https://docs.python.org/2/library/urllib2.html)
131-
- `ufile = urllib2.urlopen(url)` -- returns a file like object for that
136+
- [requests module](http://docs.python-requests.org/en/master/#)
137+
[docs](http://docs.python-requests.org/en/master/api/)
138+
- [requests-html module](https://html.python-requests.org/)
139+
[docs](https://html.python-requests.org/#api-documentation)
140+
- `r = requests.get('https://api.github.com/events')` -- returns a response object for that
132141
url
133-
- `text = ufile.read()` -- can read from it, like a file (readlines()
134-
etc. also work)
135-
- `info = ufile.info()` -- the meta info for that request.
136-
`info.gettype()` is the mime type, e.g. 'text/html'
137-
- `baseurl = ufile.geturl()` -- gets the "base" url for the request,
138-
which may be different from the original because of redirects
139-
- `urllib2.urlretrieve(url, filename)` -- downloads the url data to the
140-
given file path
141-
- `urlparse.urljoin(baseurl, url)` -- given a url that may or may not be
142-
full, and the baseurl of the page it comes from, return a full url.
143-
Use `geturl()` above to provide the base url.
142+
- `text = r.text` -- get the contents of the page.
143+
- `content = r.content` -- get the **binary** contents of the response.
144+
- `json = r.jason()` -- get response parsed into a JSON object.
145+
- `r.status_code` -- the HTTP status code (200=all good).
146+
- `r.headers` shows the headers of the response
144147

145148
```python
146149
## Given a url, try to retrieve it.
147150
## print its base url and its text.
148151

149-
import urllib2
152+
import requests
150153

151154
def wget(url):
152-
f = urllib2.urlopen(url)
153-
info = f.info()
154-
print info.gettype()
155-
print 'base url:' + f.geturl()
156-
text = f.read()
157-
print text
155+
r = requests.get(url)
156+
if r.status_code == requests.codes.ok:
157+
print(r.headers['Content-Type'])
158+
print(r.url)
159+
text=r.text
160+
print(text)
161+
else:
162+
print(f"Problem opening URL ({url}) gave {r.status_code}")
163+
158164

159165
wget('http://httpbin.org/ip')
166+
# Fails due to HTTP 418 (teapot)
167+
wget('http://httpbin.org/status/418')
160168
```
161169

162-
The above code works fine, but does not include error handling if a url
163-
does not work for some reason. Here's a version of the function which
164-
adds try/except logic to print an error message if the url operation
165-
fails.
166-
167-
```python
168-
## Version that uses try/except to print an error message if the
169-
## urlopen() fails.
170-
171-
def wget2(url):
172-
try:
173-
f = urllib2.urlopen(url)
174-
info = f.info()
175-
print info.gettype()
176-
print 'base url:' + f.geturl()
177-
text = f.read()
178-
print text
179-
except IOError as e:
180-
print 'problem reading url:', url
181-
print e.code
182-
print e.read()
183-
184-
# Success
185-
wget2('http://httpbin.org/ip')
186170

187-
# Fails due to HTTP 418 (teapot)
188-
wget2('http://httpbin.org/status/418')
189-
```
190171

191172
Exercise
192173
--------
193174

194175
To practice the file system and external-commands material, see the
195176
[Copy Special
196177
Exercise](copy-special).
197-
To practice the urllib2 material, see the [Log Puzzle
178+
To practice the requests material, see the [Log Puzzle
198179
Exercise](log-puzzle).
199180

200181
----
@@ -203,5 +184,4 @@ Except as otherwise noted, the content of this page is licensed under
203184
the [Creative Commons Attribution 3.0
204185
License](http://creativecommons.org/licenses/by/3.0/), and code samples
205186
are licensed under the [Apache 2.0
206-
License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
207-
our [Site Policies](https://developers.google.com/terms/site-policies).
187+
License](http://www.apache.org/licenses/LICENSE-2.0).

0 commit comments

Comments
 (0)