Skip to content

Commit 9358880

Browse files
authored
Merge pull request #17 from cal-itp/feat-14-version1
Feat 14 version1
2 parents 23436b2 + 8a317d6 commit 9358880

File tree

12 files changed

+180
-126
lines changed

12 files changed

+180
-126
lines changed

.github/workflows/main.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
release:
6+
types: [ published ]
7+
8+
jobs:
9+
checks:
10+
name: "Run Tests"
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v2
14+
- name: Set up Python
15+
uses: actions/setup-python@v2
16+
with:
17+
python-version: '3.9'
18+
- name: Set up Pre-commit
19+
uses: pre-commit/[email protected]
20+
release:
21+
name: "Release to PyPI"
22+
runs-on: ubuntu-latest
23+
needs: checks
24+
if: "github.event_name == 'release' && startsWith(github.event.release.tag_name, 'v')"
25+
steps:
26+
27+
- uses: actions/checkout@v2
28+
- name: "Set up Python"
29+
uses: actions/setup-python@v2
30+
with:
31+
python-version: '3.9'
32+
- name: "Build package"
33+
run: |
34+
python setup.py build sdist
35+
- name: "TEST Upload to PyPI"
36+
uses: pypa/gh-action-pypi-publish@release/v1
37+
if: github.event.release.prerelease
38+
with:
39+
user: __token__
40+
password: ${{ secrets.PYPI_TEST_API_TOKEN }}
41+
repository_url: https://test.pypi.org/legacy/
42+
43+
- name: "Upload to PyPI"
44+
uses: pypa/gh-action-pypi-publish@release/v1
45+
if: "!github.event.release.prerelease"
46+
with:
47+
user: __token__
48+
password: ${{ secrets.PYPI_API_TOKEN }}

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ repos:
55
- id: flake8
66
types:
77
- python
8+
args: ["--max-line-length=88"]
89
- id: trailing-whitespace
910
- id: end-of-file-fixer
1011
- id: check-yaml

README.md

Lines changed: 33 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,54 @@
1-
# Feed Checker
1+
# GTFS Aggregator Checker
22

33
This repo is to verify that a given list of feeds is listed in feed aggregators.
44
Currently it checks transit.land and transitfeeds.com to verify that feeds are
55
listed in an aggregator.
66

7+
## Installation
78

8-
## Requirements
9+
```
10+
pip install gtfs-aggregator-checker
11+
```
912

10-
* `.env` - Acquire an [api key from transitland][1] and save it to a `.env` file
11-
like `TRANSITLAND_API_KEY=SECRET`. Alternatively you can prefix commands with
12-
the api key like `TRANSITLAND_API_KEY=SECRET python feed_checker.py [...]`.
13+
## Configure
1314

14-
* `agencies.yml` - This file can have any structure as the feed checker just
15-
looks for any urls (strings starting with `'http://'`), but the intended usage
16-
is a [Cal-ITP agencies.yml file][2]. (to run the program without an
17-
`agencies.yml` file, see the "Options" section below)
15+
The following env variables can be set in a `.env` file, set to the environment,
16+
or inline like `TRANSITLAND_API_KEY=SECRET python -m gtfs_aggregator_checker`.
1817

19-
## Getting Started
18+
* `TRANSITLAND_API_KEY` An [api key from transitland][1].
2019

21-
To install requirments and check urls run the following. The first time you run
22-
this it will take a while since the cache is empty.
20+
* `GTFS_CACHE_DIR` Folder to save cached files to. Defaults to
21+
`~/.cache/gtfs-aggregator-checker`
2322

24-
``` bash
25-
pip install -r requirements.txt
26-
python feed_checker.py
27-
```
23+
## Getting Started
2824

29-
The final line of stdout will tell how many urls were in `agencies.yml` and how
30-
many of those were matched in a feed. Above that it will list the domains for
31-
each url (in alphabetical order) as well group paths based on if the path was
32-
matched (in both `agencies.yml` and aggregator), missing (in `agencies.yml` but
33-
not aggregator) or unused (in aggregator but not in `agencies.yml`). An ideal
34-
outcome would mean the missing column is empty for all domains.
25+
## CLI Usage
3526

27+
`python -m gtfs_aggregator_checker [YAML_FILE] [OPTIONS]`
3628

37-
## CLI Usage
29+
`python -m gtfs_aggregator_checker` or `python -m gtfs_aggregator_checker
30+
/path/to/yml` will search a [Cal-ITP agencies.yml file][2] for any urls and see
31+
if they are present in any of the feed aggregators. Alternatively you can use a
32+
`--csv-file` or `--url` instead of an `agencies.yml` file.
3833

39-
`python feed_checker.py` or `python feed_checker.py /path/to/yml` will search a
40-
[Cal-ITP agencies.yml file][2] for any urls and see if they are present in any
41-
of the feed aggregators.
34+
The final line of stdout will tell how many urls were in `agencies.yml` and how
35+
many of those were matched in a feed.
4236

4337
### Options
44-
* `python feed_checker.py --help` print the help
45-
* `--csv-file agencies.csv` load a csv instead of a Cal-ITP agencies yaml file (one url per line)
46-
* `--url http://example.com` Check a single url instead of a Cal-ITP agencies yaml file
47-
* `--verbose` Print a table of all results (organized by domain)
38+
* `python -m gtfs_aggregator_checker --help` print the help
39+
* `--csv-file agencies.csv` load a csv instead of a Cal-ITP agencies yaml file
40+
(one url per line)
41+
* `--url http://example.com` Check a single url instead of a Cal-ITP agencies
42+
yaml file
4843
* `--output /path/to/file.json` Save the results as a json file
4944

5045
[1]: https://www.transit.land/documentation/index#signing-up-for-an-api-key
5146
[2]: https://github.com/cal-itp/data-infra/blob/main/airflow/data/agencies.yml
47+
48+
## Development
49+
50+
Clone this repo and `pip install -e /pat/to/feed-checker` to develop locally.
51+
52+
By default, downloaded files (raw html files, api requsets) will be saved to
53+
`~/.cache/calitp_gtfs_aggregator_checker`. This greatly reduces the time
54+
required to run the script. Delete this folder to reset the cache.

cache.py

Lines changed: 0 additions & 78 deletions
This file was deleted.

feed_checker.py renamed to gtfs_aggregator_checker/__init__.py

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
from collections import OrderedDict
22
import json
3-
import typer
43
import urllib.error
54
import urllib.parse
65
import urllib.request
76
import yaml
87

9-
from transitland import get_transitland_urls
10-
from transitfeeds import get_transitfeeds_urls
8+
from .transitland import get_transitland_urls
9+
from .transitfeeds import get_transitfeeds_urls
1110

1211

12+
__version__ = "1.0.0"
1313
SECRET_PARAMS = ["api_key", "token", "apiKey", "key"]
1414

1515

@@ -26,13 +26,7 @@ def clean_url(url):
2626
return urllib.parse.urlunparse(url)
2727

2828

29-
def main(
30-
yml_file=typer.Argument("agencies.yml", help="A yml file containing urls"),
31-
csv_file=typer.Option(None, help="A csv file (one url per line)"),
32-
url=typer.Option(None, help="URL to check instead of a file",),
33-
output=typer.Option(None, help="Path to a file to save output to."),
34-
verbose: bool = typer.Option(False, help="Print a result table to stdout"),
35-
):
29+
def check_feeds(yml_file=None, csv_file=None, url=None, output=None):
3630
results = {}
3731

3832
if url:
@@ -96,7 +90,7 @@ def main(
9690
if "present" not in statuses:
9791
missing.append(url)
9892

99-
if missing and verbose:
93+
if missing:
10094
print(f"Unable to find {len(missing)}/{len(results)} urls:")
10195
for url in missing:
10296
print(url)
@@ -108,7 +102,3 @@ def main(
108102
with open(output, "w") as f:
109103
f.write(json.dumps(results, indent=4))
110104
print(f"Results saved to {output}")
111-
112-
113-
if __name__ == "__main__":
114-
typer.run(main)

gtfs_aggregator_checker/__main__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import typer
2+
3+
from . import check_feeds
4+
5+
6+
def main(
7+
yml_file=typer.Argument("agencies.yml", help="A yml file containing urls"),
8+
csv_file=typer.Option(None, help="A csv file (one url per line)"),
9+
url=typer.Option(None, help="URL to check instead of a file",),
10+
output=typer.Option(None, help="Path to a file to save output to."),
11+
):
12+
check_feeds(yml_file=yml_file, csv_file=csv_file, url=url, output=output)
13+
14+
15+
typer.run(main)

gtfs_aggregator_checker/cache.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
import os
2+
from pathlib import Path
3+
import urllib.error
4+
import urllib.request
5+
6+
from .utils import url_split
7+
8+
9+
def get_cache_dir():
10+
if "GTFS_CACHE_DIR" in os.environ:
11+
path = Path(os.environ["GTFS_CACHE_DIR"])
12+
else:
13+
path = Path.home() / ".cache/gtfs-aggregator-checker"
14+
path.mkdir(exist_ok=True, parents=True)
15+
return path
16+
17+
18+
def get_cached(key, func, directory=None):
19+
if not directory:
20+
directory = get_cache_dir()
21+
path = directory / key
22+
if not path.exists():
23+
content = func()
24+
with open(path, "w") as f:
25+
f.write(content)
26+
with open(path, "r") as f:
27+
return f.read()
28+
29+
30+
def curl_cached(url, key=None):
31+
domain, path = url_split(url)
32+
if key is None:
33+
key = path.replace("/", "__")
34+
if len(key) > 255:
35+
key = key[:255] # max filename length is 255
36+
37+
def get():
38+
req = urllib.request.Request(url)
39+
r = urllib.request.urlopen(req)
40+
return r.read().decode()
41+
42+
path = get_cache_dir() / domain
43+
path.mkdir(exist_ok=True, parents=True)
44+
return get_cached(key, get, directory=path)
File renamed without changes.

transitfeeds.py renamed to gtfs_aggregator_checker/transitfeeds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from bs4 import BeautifulSoup
22
from urllib.error import HTTPError
33

4-
from cache import curl_cached
4+
from .cache import curl_cached
55

66
LOCATION = "67-california-usa"
77
ROOT = "https://transitfeeds.com"

transitland.py renamed to gtfs_aggregator_checker/transitland.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import json
22

3-
from config import env
4-
from cache import curl_cached
3+
from .config import env
4+
from .cache import curl_cached
55

66
API_KEY = env["TRANSITLAND_API_KEY"]
77
BASE_URL = f"https://transit.land/api/v2/rest/feeds?apikey={API_KEY}"
File renamed without changes.

setup.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env python
2+
3+
import re
4+
from setuptools import setup, find_namespace_packages
5+
6+
_version_re = re.compile(r"__version__\s+=\s+(.*)")
7+
8+
with open("gtfs_aggregator_checker/__init__.py", "r") as f:
9+
version = _version_re.search(f.read()).group(1).strip("'\"")
10+
11+
with open("README.md", "r") as f:
12+
long_description = f.read()
13+
14+
setup(
15+
name="gtfs_aggregator_checker",
16+
version=version,
17+
packages=find_namespace_packages(),
18+
install_requires=[
19+
"beautifulsoup4",
20+
"python-dotenv",
21+
"PyYAML",
22+
"requests",
23+
"typer",
24+
],
25+
description="Tool for checking if transit urls are on aggregator websites",
26+
long_description=long_description,
27+
long_description_content_type="text/markdown",
28+
author="",
29+
author_email="",
30+
url="https://github.com/cal-itp/gtfs-aggregator-checker",
31+
)

0 commit comments

Comments
 (0)