all text updated, problems need checking

ianturton · ianturton · commit 89d7f997d784 · 2019-02-15T11:54:47.000Z
diff --git a/copy-special.md b/copy-special.md
@@ -13,18 +13,18 @@ your code in copyspecial.py.
 
 The copyspecial.py program takes one or more directories as its
 arguments. We'll say that a "special" file is one where the name
-contains the pattern \_\_w\_\_ somewhere, where the w is one or more
-word chars. The provided main() includes code to parse the command line
+contains the pattern `__w__` somewhere, where the w is one or more
+word chars. The provided `main()` includes code to parse the command line
 arguments, but the rest is up to you. Write functions to implement the
-features below and modify main() to call your functions.
+features below and modify `main()` to call your functions.
 
 Suggested functions for your solution(details below):
 
--   get\_special\_paths(dir) -- returns a list of the absolute paths of
+-   `get_special_paths(dir)` -- returns a list of the absolute paths of
     the special files in the given directory
--   copy\_to(paths, dir) given a list of paths, copies those files into
+-   `copy_to(paths, dir)` given a list of paths, copies those files into
     the given directory
--   zip\_to(paths, zippath) given a list of paths, zip those files up
+-   `zip_to(paths, zippath)` given a list of paths, zip those files up
     into the given zipfile
 
 Part A (manipulating file paths)
@@ -45,7 +45,7 @@ We'll assume that names are not repeated across the directories
 Part B (file copying)
 ---------------------
 
-If the "--todir dir" option is present at the start of the command line,
+If the "`--todir dir`" option is present at the start of the command line,
 do not print anything and instead copy the files to the given directory,
 creating it if necessary. Use the python module "shutil" for file
 copying.
@@ -57,7 +57,7 @@ copying.
 Part C (writing zip files)
 ------------------------------------
 
-If the "--tozip zipfile" option is present at the start of the command
+If the "`--tozip zipfile`" option is present at the start of the command
 line create a zip file using the `zipfile` package.
 
     $ ./copyspecial.py --tozip tmp.zip .
@@ -68,5 +68,4 @@ Except as otherwise noted, the content of this page is licensed under
 the [Creative Commons Attribution 3.0
 License](http://creativecommons.org/licenses/by/3.0/), and code samples
 are licensed under the [Apache 2.0
-License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
-our [Site Policies](https://developers.google.com/terms/site-policies).
+License](http://www.apache.org/licenses/LICENSE-2.0). 
diff --git a/log-puzzle.md b/log-puzzle.md
@@ -21,10 +21,10 @@ The slice urls are hidden inside apache log files (the open source
 [apache](http://httpd.apache.org/) web server is the most widely used
 server on the internet). Each log file is from some server, and the
 desired slice urls are hidden within the logs. The log file encodes what
-server it comes from like this: the log file animal\_code.google.com is
+server it comes from like this: the log file `animal_code.google.com` is
 from the code.google.com server (formally, we'll say that the server
 name is whatever follows the first underbar). The
-animial\_code.google.com log file contains the data for the "animal"
+`animial_code.google.com` log file contains the data for the "animal"
 puzzle image. Although the data in the log files has the syntax of a
 real apache web server, the data beyond what's needed for the puzzle is
 randomized data from a real log file.
@@ -38,20 +38,20 @@ what apache log files look like):
 The first few numbers are the address of the requesting browser. The
 most interesting part is the "GET *path* HTTP" showing the path of a web
 request received by the server. The path itself never contain spaces,
-and is separated from the GET and HTTP by spaces (regex suggestion: \\S
+and is separated from the GET and HTTP by spaces (regex suggestion: `\S`
 (upper case S) matches any non-space char). Find the lines in the log
 where the string "puzzle" appears inside the path, ignoring the many
 other lines in the log.
 
 Part A - Log File To Urls
 -------------------------
 
-Complete the read\_urls(filename) function that extracts the puzzle urls
+Complete the `read_urls(filename)` function that extracts the puzzle urls
 from inside a logfile. Find all the "puzzle" path urls in the logfile.
 Combine the path from each url with the server name from the filename to
 form a full url, e.g.
 "http://www.example.com/path/puzzle/from/inside/file". Screen out urls
-that appear more than once. The read\_urls() function should return the
+that appear more than once. The `read_urls()` function should return the
 list of full urls, sorted into alphabetical order and without
 duplicates. Taking the urls in alphabetical order will yield the image
 slices in the correct left-to-right order to re-create the original
@@ -66,7 +66,7 @@ one per line.
 Part B - Download Images Puzzle
 -------------------------------
 
-Complete the download\_images() function which takes a sorted list of
+Complete the `download_images()` function which takes a sorted list of
 urls and a directory. Download the image from each url into the given
 directory, creating the directory first if necessary (see the "os"
 module to create a directory, and "urllib.urlretrieve()" for downloading
@@ -78,8 +78,8 @@ working. Each image is a little vertical slice from the original. How to
 put the slices together to re-create the original? It can be solved
 nicely with a little html (knowledge of HTML is not required).
 
-The download\_images() function should also create an index.html file in
-the directory with an \*img\* tag to show each local image file. The img
+The `download_images()` function should also create an index.html file in
+the directory with an *img* tag to show each local image file. The img
 tags should all be on one line together without separation. In this way,
 the browser displays all the slices together seamlessly. You do not need
 knowledge of HTML to do this; just create an index.html file that looks
@@ -116,7 +116,7 @@ sorting a list of urls each ending with the word-word.jpg pattern should
 order the urls by the second word.
 
 Extend your code to order such urls properly, and then you should be
-able to decode the second place\_code.google.com puzzle which shows a
+able to decode the i`second_place_code.google.com` puzzle which shows a
 famous place. What place does it show?
 
 CC Attribution: the images used in this puzzle were made available by
@@ -132,5 +132,4 @@ Except as otherwise noted, the content of this page is licensed under
 the [Creative Commons Attribution 3.0
 License](http://creativecommons.org/licenses/by/3.0/), and code samples
 are licensed under the [Apache 2.0
-License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
-our [Site Policies](https://developers.google.com/terms/site-policies).
+License](http://www.apache.org/licenses/LICENSE-2.0). 
diff --git a/regular-expressions.md b/regular-expressions.md
@@ -126,10 +126,7 @@ in the pattern
 
 First the search finds the leftmost match for the pattern, and second it
 tries to use up as much of the string as possible -- i.e. `+` and `*` go as
-far as possible (the `+` and `*` are said to be "greedy"). If you ever need them
-to be less greedy you can use `+?` and `*?` instead and the pattern will only
-match the minimum pattern. For example with a string `<a>b<c>`, `<.*>` will
-match the whole string but `<.*?>` will only match `<a>`.
+far as possible (the `+` and `*` are said to be "greedy"). 
 
 Repetition Examples
 -------------------
@@ -287,22 +284,21 @@ for tuple in tuples:
     print(tuple[0])
     print(tuple[1])
 ```
-##TODO - here
 Once you have the list of tuples, you can loop over it to do some
 computation for each tuple. If the pattern includes no parenthesis, then
 `findall()` returns a list of found strings as in earlier examples. If the
-pattern includes a single set of parenthesis, then findall() returns a
+pattern includes a single set of parenthesis, then `findall()` returns a
 list of strings corresponding to that single group. (Obscure optional
 feature: Sometimes you have paren `( )` groupings in the pattern, but
 which you do not want to extract. In that case, write the parens with a
-?: at the start, e.g. `(?: )` and that left paren will not count as a
+`?:` at the start, e.g. `(?: )` and that left paren will not count as a
 group result.)
 
 RE Workflow and Debug
 ---------------------
 
 Regular expression patterns pack a lot of meaning into just a few
-characters , but they are so dense, you can spend a lot of time
+characters, but they are so dense, you can spend a lot of time
 debugging your patterns. Set up your runtime so you can run a pattern
 and print what it matches easily, for example by running it on a small
 test text and printing the result of `findall()`. If the pattern matches
@@ -321,14 +317,14 @@ match. The option flag is added as an extra argument to the `search()` or
 
 -   `IGNORECASE` -- ignore upper/lowercase differences for matching, so
     'a' matches both 'a' and 'A'.
--   `DOTALL` -- allow dot (.) to match newline -- normally it matches
-    anything but newline. This can trip you up -- you think .\* matches
+-   `DOTALL` -- allow dot (`.`) to match newline -- normally it matches
+    anything but newline. This can trip you up -- you think `.*` matches
     everything, but by default it does not go past the end of a line.
-    Note that \\s (whitespace) includes newlines, so if you want to
+    Note that `\s` (whitespace) includes newlines, so if you want to
     match a run of whitespace that may include a newline, you can just
-    use \\s\*
--   `MULTILINE` -- Within a string made of many lines, allow \^ and \$ to
-    match the start and end of each line. Normally \^/\$ would just
+    use `\s*`
+-   `MULTILINE` -- Within a string made of many lines, allow `^` and `$` to
+    match the start and end of each line. Normally `^/$` would just
     match the start and end of the whole string.
 
 Greedy vs. Non-Greedy (optional)
@@ -407,5 +403,4 @@ Except as otherwise noted, the content of this page is licensed under
 the [Creative Commons Attribution 3.0
 License](http://creativecommons.org/licenses/by/3.0/), and code samples
 are licensed under the [Apache 2.0
-License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
-our [Site Policies](https://developers.google.com/terms/site-policies).
+License](http://www.apache.org/licenses/LICENSE-2.0). 
diff --git a/utilities.md b/utilities.md
@@ -10,7 +10,7 @@ File System -- os, os.path, shutil
 The *os* and *os.path* modules include many functions to interact
 with the file system. The *shutil* module can copy files.
 
--   [os module docs](https://docs.python.org/2.7/library/os.html)
+-   [os module docs](https://docs.python.org/3/library/os.html)
 -   `filenames = os.listdir(dir)` -- list of filenames in that directory
     path (not including . and ..). The filenames are just the names in
     the directory, not their absolute paths.
@@ -34,9 +34,9 @@ import os
 def printdir(dir):
     filenames = os.listdir(dir)
     for filename in filenames:
-        print filename
-        print os.path.join(dir, filename)
-        print os.path.abspath(os.path.join(dir, filename))
+        print(filename)
+        print(os.path.join(dir, filename))
+        print(os.path.abspath(os.path.join(dir, filename)))
 ```
 
 Exploring a module works well with the built-in python `help()` and `dir()`
@@ -57,7 +57,7 @@ Running External Processes -- subprocess
 The `subprocess` module is a simple way to run an external command and
 capture its output.
 
--   [subprocess module docs](https://docs.python.org/2.7/library/subprocess.html)
+-   [subprocess module docs](https://docs.python.org/3/library/subprocess.html)
 -   `subprocess.check_output(["cmd", "argument1"])` -- Run command with
     arguments and return its output as a byte string. If the return code was
     non-zero it raises a `CalledProcessError`. The `CalledProcessError` object will
@@ -76,7 +76,7 @@ import subprocess
 def listdir(dir):
     args = shlex.split('ls -l ' + dir)
     output = subprocess.check_output(args)
-    print output
+    print(output)
 ```
 
 Exceptions
@@ -89,112 +89,93 @@ run-time error might be that a variable used in the program does not
 have a value (`ValueError` .. you've probably seen that one a few times),
 or a file open operation error because a file does not exist (`IOError`).
 Learn more in [the exceptions
-tutorial](https://docs.python.org/2.7/tutorial/errors.html) and see [the entire
-exception list](https://docs.python.org/2.7/library/exceptions.html).
+tutorial](https://docs.python.org/3/tutorial/errors.html) and see [the entire
+exception list](https://docs.python.org/3/library/exceptions.html).
 
 Without any error handling code (as we have done thus far), a run-time
 exception just halts the program with an error message. That's a good
 default behavior, and you've seen it many times. You can add a
 "try/except" structure to your code to handle exceptions, like this:
 
 ```python
-import io
-
 filename = 'does_not_exist.txt'
 try:
-    f = io.open(filename)
+    with open(filename) as f:
+      for line in f:
+        print(line,end=" ")
 except IOError as e:
-    print e.strerror
-    print e.filename
-else:
-    for line in f:
-        print line,
-    f.close()
+    print(e.strerror)
+    print(e.filename)
 ```
+Or you could write that in this way, if you prefer to keep error handling near the
+function that throws them:
 
+```python
+try:
+    f = open(filename)
+except IOError:
+    print('error')
+else:
+    with f:
+        print(f.readlines())
+```
 The try: section includes the code which might throw an exception. The except:
 section holds the code to run if there is an exception. If there is no
 exception, the except: section is skipped (that is, that code is for error
 handling only, not the "normal" case for the code). The optional `else` section
 is useful for code that must be executed if the try clause does not raise an
 exception.
 
-HTTP -- urllib2 and urlparse
+HTTP -- requests
 ---------------------------
 
-The module *urllib2* provides url fetching -- making a url look like a
-file you can read from. The *urlparse* module can take apart and put
-together urls.
+The module *requests* provides url fetching -- making a url look like a
+file you can read from. While *Requests-html* makes parsing HTML as simple as
+possible.
 
--   [urllib2 module
-    docs](https://docs.python.org/2/library/urllib2.html)
--   `ufile = urllib2.urlopen(url)` -- returns a file like object for that
+-   [requests module](http://docs.python-requests.org/en/master/#)
+    [docs](http://docs.python-requests.org/en/master/api/)
+-   [requests-html module](https://html.python-requests.org/)
+    [docs](https://html.python-requests.org/#api-documentation)
+-   `r = requests.get('https://api.github.com/events')` -- returns a response object for that
     url
--   `text = ufile.read()` -- can read from it, like a file (readlines()
-    etc. also work)
--   `info = ufile.info()` -- the meta info for that request.
-    `info.gettype()` is the mime type, e.g. 'text/html'
--   `baseurl = ufile.geturl()` -- gets the "base" url for the request,
-    which may be different from the original because of redirects
--   `urllib2.urlretrieve(url, filename)` -- downloads the url data to the
-    given file path
--   `urlparse.urljoin(baseurl, url)` -- given a url that may or may not be
-    full, and the baseurl of the page it comes from, return a full url.
-    Use `geturl()` above to provide the base url.
+-   `text = r.text` -- get the contents of the page.
+-   `content = r.content` -- get the **binary** contents of the response.
+-   `json = r.jason()` -- get response parsed into a JSON object.
+-   `r.status_code` -- the HTTP status code (200=all good).
+-   `r.headers` shows the headers of the response
 
 ```python
 ## Given a url, try to retrieve it.
 ## print its base url and its text.
 
-import urllib2
+import requests
 
 def wget(url):
-    f = urllib2.urlopen(url)
-    info = f.info()
-    print info.gettype()
-    print 'base url:' + f.geturl()
-    text = f.read()
-    print text
+    r = requests.get(url)
+    if r.status_code == requests.codes.ok:
+        print(r.headers['Content-Type'])
+        print(r.url)
+        text=r.text
+        print(text)
+    else:
+        print(f"Problem opening URL ({url}) gave {r.status_code}")
+
 
 wget('http://httpbin.org/ip')
+# Fails due to HTTP 418 (teapot)
+wget('http://httpbin.org/status/418')
 ```
 
-The above code works fine, but does not include error handling if a url
-does not work for some reason. Here's a version of the function which
-adds try/except logic to print an error message if the url operation
-fails.
-
-```python
-## Version that uses try/except to print an error message if the
-## urlopen() fails.
-
-def wget2(url):
-    try:
-        f = urllib2.urlopen(url)
-        info = f.info()
-        print info.gettype()
-        print 'base url:' + f.geturl()
-        text = f.read()
-        print text
-    except IOError as e:
-        print 'problem reading url:', url
-        print e.code
-        print e.read()
-
-# Success
-wget2('http://httpbin.org/ip')
 
-# Fails due to HTTP 418 (teapot)
-wget2('http://httpbin.org/status/418')
-```
 
 Exercise
 --------
 
 To practice the file system and external-commands material, see the
 [Copy Special
 Exercise](copy-special).
-To practice the urllib2 material, see the [Log Puzzle
+To practice the requests material, see the [Log Puzzle
 Exercise](log-puzzle).
 
 ----
@@ -203,5 +184,4 @@ Except as otherwise noted, the content of this page is licensed under
 the [Creative Commons Attribution 3.0
 License](http://creativecommons.org/licenses/by/3.0/), and code samples
 are licensed under the [Apache 2.0
-License](http://www.apache.org/licenses/LICENSE-2.0). For details, see
-our [Site Policies](https://developers.google.com/terms/site-policies).
+License](http://www.apache.org/licenses/LICENSE-2.0).