Datasets Data URLs and API generally #6

rufuspollock · 2016-04-29T17:27:51Z

From @rgrp on February 24, 2013 18:26

This issue is about the URL / API structure for accessing data (and metadata) from the data packages.

Current Situation

For stuff under /data/: /data/{dataset}/datapackage.json and /data/{dataset}.csv
For other stuff either at /tools/view/ or /community/ via: http://data.okfn.org/tools/dataproxy/?url={path-to-csv} (though this is not much different from datapipes.okfnlabs.org/csv/raw/?url=.... and leaves much to be desired)

Proposal

/data/ + /community/ data packages

For /data/ and /community/ data packages:

/.../{dataset}/datapackage.json     # the datapackage.json file

## data urls
/.../{dataset}/r/{resource-name-or-order}.{format}  

so e.g.

/.../gdp/r/annual.csv   # resource name
/.../gdp/r/0.csv           # resource by index

Formats that we should support would be:

{format} = csv | json | html | raw (by default)
{resource-name} = name as in resources entry. (Also allow order e.g. 1 for first resource, 2 for second resource etc).

Addressing individual elements

Longer-term we could support addressing individual elements e.g. addressing into rows in a dataset or :

.../gdp/r/annual/5/        # row 5 of this dataset, rendered as HTML by default
.../gdp/r/annual/5.csv  # in CSV format
.../gdp/r/annual/5/year/  # cell in row 5, field year (in HTML form by default)

.../{dataset}/r/{resource-name-or-index}/{row-index-or-primary-key}[.html | .csv | .json]
.../{dataset}/r/{resource-name-or-index}/{row-index}/{field-name-or-index}[.html | .csv | .json]

Questions:

How do distinguish row index from primary key when both numerical (which takes precedence?) - i'd argue PK should take precedence and we have e.g. i:{number}
- That said index is always possible whereas primary key may be absent ...
Support for ranges - see approach to this in datapipes

Data packages somewhere online

We follow something similar to the other case but instead of data package name in the url we move the data package url to the query string:

/api/datapackage.json?url={datapackage-url}
/api/data/{resource-name-or-index}.{format}?{datapackage-url}

# e.g. this returns first resource as CSV
/api/data/0.csv?url=https://raw.github.com/datasets/browser-stats/master/datapackage.json

Discussion

data.json is the serialization in the most obvious way - i.e. convert to a hash
- alternative provide this in a results style format (and include the schema)
Should we use download attribute to set filename ...?
- Not needed in above
~~(Now supported) How do we handle multiple data resources / files?~~
- ~~worry about that in the future - so only support first resource for the moment (this is good as it privileges single resource data packages ...)~~

Appendix

Alternatives

Alternatively could be:

{dataset}/{filename}.csv
{dataset}/{filename}.json (CORS enabled ...)

Or

{dataset}/data.csv

Think the former is better ...

Copied from original issue: frictionlessdata/frictionlessdata.io#19

The text was updated successfully, but these errors were encountered:

rufuspollock · 2016-04-29T17:27:52Z

@mihi-tr wrote in #83:

I do think we'll need to think along the lines of having CORS enabled access for the datasets. Based on the dataset.json format (which allows relative urls) the api should look like

.../dataset/datapackage.json

and then have

.../dataset/path-to-data/filename

for the data files - this way it doesn't matter which package url I got pointed at

Alternatively: modify datapackage.json - this is very ugly IMO

rufuspollock · 2016-04-29T17:27:53Z

@mihi-tr I don't know if you saw the extensive refactor of this proposal about a month ago. Please look at proposal above. As part of #73 I actually implemented most of the proposal at least for "core" datasets.

Please let me know if addresses your proxy need - and if there can be an even better API (I guess your biggest concern is which are not in core but note i propose a way to handle these - though not yet implemented).

rufuspollock · 2016-04-29T17:27:53Z

From @mihi-tr on November 30, 2013 19:22

It would (I think). Would need to test this in a practical environment.

rufuspollock · 2016-04-29T17:27:54Z

@mihi-tr here's an example of current api style: http://data.okfn.org/data/s-and-p-500-companies/r/constituents.csv

rufuspollock · 2016-04-29T17:27:54Z

Based on convo with @mihi-tr today downgrading priority to one star:

Unix philosophy - one tool for one job - if we need a data API tool for data packages why not make separate and small
- only counter: convenience here of standard structure (but that's minor)
The "data package" data api tool coming soon

@mihi-tr still be nice to know what exactly should be in that tool ...

rufuspollock · 2016-04-29T17:27:55Z

Have updated proposal to flesh out the case for general online data packages - which are now think is the priority (given that we plan to not to much cataloging in this site and in this app).

rufuspollock · 2016-04-29T17:27:55Z

@mihi-tr could you look at the proposal in main part of issue about data packages online and let me know if this solves your requirements

rufuspollock added question Priority: ★ labels Apr 29, 2016

rufuspollock added Status: In Progress Status: Next up labels Apr 29, 2016

rufuspollock mentioned this issue Apr 29, 2016

Datasets Data URLs and API generally frictionlessdata/frictionlessdata.io#19

Closed

rufuspollock mentioned this issue Jul 31, 2017

Frontend data "API" datahub-v2/frontend#40

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets Data URLs and API generally #6

Datasets Data URLs and API generally #6

rufuspollock commented Apr 29, 2016 •

edited

Loading

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

Datasets Data URLs and API generally #6

Datasets Data URLs and API generally #6

Comments

rufuspollock commented Apr 29, 2016 • edited Loading

Current Situation

Proposal

/data/ + /community/ data packages

Addressing individual elements

Data packages somewhere online

Discussion

Appendix

Alternatives

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016

rufuspollock commented Apr 29, 2016 •

edited

Loading