Fix 403 when downloading data for mnist tutorial #66

rossbar · 2021-03-09T18:36:37Z

Uses header spoofing to circumvent problems with downloading the MNIST digit data. Also adds data caching to the circleCI builds so in principle, the MNIST data will be cached between CI runs.

Closes #63

rossbar · 2021-03-12T06:01:49Z

The caching of the _data dir between CI runs seems to have worked: the mnist data was not re-downloaded during the build step for #67.

content/tutorial-deep-learning-on-mnist.md

mattip · 2021-03-12T06:06:53Z

LGTM, and will improve CI performance. Just one nit about educating people to spoof headers responsibly.

melissawm · 2021-03-12T21:15:42Z

Unfortunately this is not working for me - I get files that apparently are not in gzip format. Maybe this is why the CI is still failing too?

rossbar · 2021-03-12T23:50:39Z

Maybe this is why the CI is still failing too?

It looks like the CI failure was just related to execution timeout again (not related to downloading). The build artifact makes it seem like the cache is working properly (there would be print statements like "Downloading xyz" at the code cell if the data were being downloaded again).

The failure due to gzip is indeed strange - the cached files are likely compressed otherwise execution would be failing on the decompression step too, though the docs for iter_content do say that compressed files will be automatically be decompressed. I'm not sure that behavior is always consistent, otherwise I don't understand how the cache was originally built with the still-compressed files.

Some possible solutions that come to mind are:

Use response.raw instead of response.iter_content, which shouldn't auto-decompress the content
Wrap the local data loading in a try-except: for gzip.

I'm partial to the second option - it adds a minor amount of boilerplate but should work regardless whether the data was decompressed when it was downloaded.

rossbar · 2021-03-13T00:03:35Z

Ah - another potential problem is that the download is simply failing :). Testing locally now I'm getting the 503 errors (instead of the 403 forbidden from before). There currently isn't a check of the response status so that should be updated as well.

Ultimately I think this PR puts the necessary infrastructure in place to cache the data, but we may also need to change the data source as the original website does not seem very reliable.

melissawm · 2021-03-15T15:41:52Z

Sounds reasonable - thanks @rossbar !

8bitmp3 · 2021-03-15T16:39:54Z

Thank you @rossbar

Note / ToDo from the meeting: Adding a warning sign

cc @melissawm

rossbar added 4 commits March 9, 2021 10:32

header spoofing to circumvent 403.

09a8945

Refactor to store mnist data locally.

1b6b189

Use dict for data src instead of list-of-list.

4f7d724

Add data cache to circleci.

cabbdd1

rossbar changed the title ~~WIP: Fix 403 when downloading data for mnist tutorial~~ Fix 403 when downloading data for mnist tutorial Mar 10, 2021

rossbar mentioned this pull request Mar 12, 2021

ENH: Vectorize model evaluation in mnist tutorial #67

Closed

mattip reviewed Mar 12, 2021

View reviewed changes

content/tutorial-deep-learning-on-mnist.md Outdated Show resolved Hide resolved

Add comment + hide header spoofing.

879fe88

Add check for successful download.

520cd3a

melissawm merged commit 0e5b8d4 into numpy:main Mar 15, 2021

8bitmp3 mentioned this pull request Mar 15, 2021

MNIST dataset can't be downloaded automatically #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix 403 when downloading data for mnist tutorial #66

Fix 403 when downloading data for mnist tutorial #66

Uh oh!

rossbar commented Mar 9, 2021 •

edited

Loading

Uh oh!

rossbar commented Mar 12, 2021

Uh oh!

Uh oh!

mattip commented Mar 12, 2021

Uh oh!

melissawm commented Mar 12, 2021

Uh oh!

rossbar commented Mar 12, 2021

Uh oh!

rossbar commented Mar 13, 2021

Uh oh!

melissawm commented Mar 15, 2021

Uh oh!

8bitmp3 commented Mar 15, 2021

Uh oh!

Uh oh!

Uh oh!

Fix 403 when downloading data for mnist tutorial #66

Fix 403 when downloading data for mnist tutorial #66

Uh oh!

Conversation

rossbar commented Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossbar commented Mar 12, 2021

Uh oh!

Uh oh!

mattip commented Mar 12, 2021

Uh oh!

melissawm commented Mar 12, 2021

Uh oh!

rossbar commented Mar 12, 2021

Uh oh!

rossbar commented Mar 13, 2021

Uh oh!

melissawm commented Mar 15, 2021

Uh oh!

8bitmp3 commented Mar 15, 2021

Uh oh!

Uh oh!

rossbar commented Mar 9, 2021 •

edited

Loading