Skip to content
Olaf Alders edited this page Apr 24, 2014 · 33 revisions

API Docs: v0

For an introduction to the MetaCPAN API which requires no previous knowledge of MetaCPAN or ElasticSearch, see the slides for "Abusing MetaCPAN for Fun and Profit" or watch the actual talk.

There is also a repository of examples you can play with to get up and running in a hurry. Rather than editing this wiki page, please send pull requests for the metacpan-examples repository. If you'd rather edit the wiki, please do, but sending the code pull requests is probably the most helpful way to approach this.

All of these URLs can be tested using TOKUHIROM's excellent MetaCPAN Explorer

To learn more about the ElasticSearch query DSL check out Clinton Gormley's [Terms of Endearment - ES Query DSL Explained] (http://www.slideshare.net/clintongormley/terms-of-endearment-the-elasticsearch-query-dsl-explained) slides.

The query syntax is explained on ElasticSearch's reference page.

Being polite

Currently, the only rules around using the API are to "be polite". We have enforced an upper limit of a size of 5000 on search requests. If you need to fetch more than 5000 items, you should look at using the scrolling API. Search this page for "scroll" to get an example using ElasticSearch.pm or see the ElasticSearch scroll docs if you are connecting in some other way.

You can certainly scroll if you are fetching less than 5000 items. You might want to do this if you are expecting a large data set, but will still need to run many requests to get all of the required data.

Be aware that when you scroll, your docs will come back unsorted, as noted in the ElasticSearch scan documentation.

Identifying Yourself

Part of being polite is letting us know who you are and how to reach you. This is not mandatory, but please do consider adding your app to the API-Consumers page.

Available fields

Available fields can be found by accessing the corresponding _mapping endpoint.

Field documentation

Fields are documented in the API codebase: https://github.com/CPAN-API/cpan-api/tree/master/lib/MetaCPAN/Document Check the Pod for discussion of what the various fields represent. Be sure to have a look at https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Document/File.pm in particular as results for /module are really a thin wrapper around the file type.

Search without constraints

Performing a search without any constraints is an easy way to get sample data

Joins

ElasticSearch itself doesn't support joining data across multiple types. The API server can, however, handle a join query parameter if the underlying type was set up accordingly. Browse https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Server/Controller/ to see all join conditions. Here are some examples.

Joins on documents:

Joins on search results is work in progress.

Restricting the joined results can be done by using the boolean "should" occurrence type:

curl -XPOST http://api.metacpan.org/v0/author/PERLER?join=release -d '
{
    "query": {
        "bool": {
            "should": [{
                "term": {
                    "release.status": "latest"
                }
            }]
        }
    }
}'

JSONP

Simply add a callback query parameter with the name of your callback, and you'll get a JSONP response.

GET convenience URLs

You should be able to run most POST queries, but very few GET urls are currently exposed. However, these convenience endpoints can get you started. You should note that they behave differently than the POST queries in that they will return to you the latest version of a module or dist and they remove a lot of the verbose ElasticSearch data which wraps results.

/distribution/{distribution}

The /distribution endpoint accepts the name of a distribution (e.g. /distribution/Moose), which returns information about the distribution which is not specific to a version (like RT bug counts).

/release/{distribution}

/release/{author}/{release}

The /release endpoint accepts either the name of a distribution (e.g. /release/Moose), which returns the most recent release of the distribution. Or provide the full path which consists of its author and the name of the release (e.g. /release/DOY/Moose-2.0001).

/author/{author}

author refers to the pauseid of the author. It must be uppercased (e.g. /author/DOY).

/module/{module}

Returns the corresponding file of the latest version of the module. Considering that Moose-2.0001 is the latest release, the result of /module/Moose is the same as /file/DOY/Moose-2.0001/lib/Moose.pm.

/pod/{module}

/pod/{author}/{release}/{path}

Returns the POD of the given module. You can change the output format by either passing a content-type query parameter (e.g. /pod/Moose?content-type=text/plain or by adding an Accept header to the HTTP request. Valid content types are:

  • text/html (default)
  • text/plain
  • text/x-pod
  • text/x-markdown

GET Searches

Names of latest releases by OALDERS:

http://api.metacpan.org/v0/release/_search?q=author:OALDERS%20AND%20status:latest&fields=name,status&size=100

All CPAN Authors:

http://api.metacpan.org/v0/author/_search?pretty=true&q=*&size=100000

All CPAN Authors Who Have Provided Twitter IDs:

http://api.metacpan.org/v0/author/_search?pretty=true&q=author.profile.name:twitter

All CPAN Authors Who Have Updated MetaCPAN Profiles:

http://api.metacpan.org/v0/author/_search?q=updated:*&sort=updated:desc

First 100 distributions which SZABGAB has given a ++:

http://api.metacpan.org/v0/favorite/_search?q=user:sWuxlxYeQBKoCQe1f-FQ_Q&size=100&fields=distribution

The 100 most recent releases ( similar to https://metacpan.org/recent )

http://api.metacpan.org/v0/release/_search?q=status:latest&fields=name,status,date&sort=date:desc&size=100

Number of ++'es that DOY's dists have received:

http://api.metacpan.org/v0/favorite/_search?q=author:DOY&size=0

List of users who have ++'ed DOY's dists and the dists they have ++'ed:

http://api.metacpan.org/v0/favorite/_search?q=author:DOY&fields=user,distribution

Last 50 dists to get a ++:

http://api.metacpan.org/v0/favorite/_search?size=50&fields=author,user,release,date&sort=date:desc

Querying the API with MetaCPAN::API

Perhaps the easiest way to get started using MetaCPAN is with MetaCPAN::API.

my $mcpan  = MetaCPAN::API->new();
my $author = $mcpan->author('XSAWYERX');
my $dist   = $mcpan->release( distribution => 'MetaCPAN-API' );

Querying the API with ElasticSearch.pm

The API server at api.metacpan.org is a wrapper around an ElasticSearch instance. It adds support for the convenient GET URLs, handles authentication and does some access control. Therefore you can use the powerful API of ElasticSearch.pm to query MetaCPAN:

use ElasticSearch;

my $es = ElasticSearch->new( servers => 'api.metacpan.org', no_refresh => 1 );

my $scroller = $es->scrolled_search(
    query       => { match_all => {} },
    search_type => 'scan',
    scroll      => '5m',
    index       => 'v0',
    type        => 'release',
    size        => 100,
);

while ( my $result = $scroller->next ) {
    print $result->{_source}->{author}, $/;
}

POST Searches

Please feel free to add queries here as you use them in your own work, so that others can learn from you.

Downstream Dependencies

This query returns a list of all releases which list MooseX::NonMoose as a dependency.

curl -XPOST api.metacpan.org/v0/release/_search -d '{
  "query": {
    "match_all": {}
  },
  "size": 5000,
  "fields": [ "distribution" ],
  "filter": {
    "and": [
      { "term": { "release.dependency.module": "MooseX::NonMoose" } },
      { "term": {"release.maturity": "released"} },
      { "term": {"release.status": "latest"} }
    ]
  }
}'

Note it is also possible to use these queries in GET requests (useful for cross-domain JSONP requests) by appropriately encoding the JSON query into the source parameter of the URL. For example the query above would become:

curl 'api.metacpan.org/v0/release/_search?source=%7B%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22size%22%3A5000%2C%22fields%22%3A%5B%22distribution%22%5D%2C%22filter%22%3A%7B%22and%22%3A%5B%7B%22term%22%3A%7B%22release.dependency.module%22%3A%22MooseX%3A%3ANonMoose%22%7D%7D%2C%7B%22term%22%3A%7B%22release.maturity%22%3A%22released%22%7D%7D%2C%7B%22term%22%3A%7B%22release.status%22%3A%22latest%22%7D%7D%5D%7D%7D'

The size of the CPAN unpacked

curl -XPOST api.metacpan.org/v0/file/_search -d '{
  "query": { "match_all": {} },
  "facets": { 
    "size": {
      "statistical": {
        "field": "stat.size"
  } } },
  "size":0
}'

Get license types of all releases in an arbitrary time span:

curl -XPOST api.metacpan.org/v0/release/_search?size=100 -d '{
  "query": {
    "match_all": {},
    "range" : {
        "release.date" : {
            "from" : "2010-06-05T00:00:00",
            "to" : "2011-06-05T00:00:00"
        }
    }
  },
  "fields": ["release.license", "release.name", "release.distribution", "release.date", "release.version_numified"]
}'

Aggregate by license:

curl -XPOST api.metacpan.org/v0/release/_search -d '{
    "query": {
        "match_all": {}
    },
    "facets": {
        "license": {
            "terms": {
                "field": "release.license"
            }
        }
    },
    "size": 0
}'

Most used file names in the root directory of releases:

curl -XPOST api.metacpan.org/v0/file/_search -d '{
  "query": { "filtered":{"query":{"match_all":{}},"filter":{"term":{"level":0}}}
   },
  "facets": { 
    "license": {
      "terms": {
        "size":100,
        "field":"file.name"
  } } },
  "size":0
}'

Find all releases that contain a particular version of a module:

curl -XPOST api.metacpan.org/v0/file/_search -d '{
  "query": { "filtered":{
      "query":{"match_all":{}},
      "filter":{"and":[
          {"term":{"file.module.name":"DBI::Profile"}},
          {"term":{"file.module.version":"2.014123"}}
      ]}
  }},
  "fields":["release"]
}'

example

Find all authors with github-meets-cpan in their profiles

Because of the dashes in this profile name, we need to use a term.

curl -XPOST api.metacpan.org/v0/author/_search -d '{
  "query": {
    "match_all": {}
  },
  "filter": {
    "term": {
      "author.profile.name": "github-meets-cpan"
    }
  }
}'

Get a leaderboard of ++'ed distributions

curl -XPOST api.metacpan.org/v0/favorite/_search -d '{
  "query": { "match_all": {}
   },
  "facets": { 
    "leaderboard": {
      "terms": {
        "field":"distribution",
        "size" : 100
  } } },
  "size":0
}'

Get a leaderboard of Authors with Most Uploads

curl -XPOST api.metacpan.org/v0/release/_search -d '{
    "query": {
        "match_all": {}
    },
    "facets": {
        "author": {
            "terms": {
                "field": "author",
                "size": 100
            }
        }
    },
    "size": 0
}'

Search for a release by name

curl -XPOST api.metacpan.org/v0/release/_search -d '{ 
  "query" : { "match_all" : {  } },
  "filter" : { "term" : { "release.name" : "YAML-Syck-1.07_01" } }
}'

Get the latest version numbers of your favorite modules

Note that "size" should be the number of distributions you are looking for.

lynx --dump --post_data http://api.metacpan.org/v0/release/_search <<EOL 
{
    "query" : { "terms" : { "release.distribution" : [
        "Mojolicious",
        "MetaCPAN-API",
        "DBIx-Class"
    ] } },
    "filter" : { "term" : { "release.status" : "latest" } },
    "fields" : [ "distribution", "version" ],
    "size"   : 3
}
EOL

Get a list of all files where the directory is false and the path is blank

curl -XPOST api.metacpan.org/v0/file/_search -d '{
  "query": {
    "match_all": {}
  },
  "size": 1000,
  "fields": [ "name", "status", "directory", "path", "distribution" ],
  "filter": {
    "and": [
      { "term": { "directory": false } }, { "term" : { "path" : "" } }
    ]
  }
}'

List releases which have an email address for a bugtracker, but not an url

curl -XPOST api.metacpan.org/v0/release/_search -d '{
  "query": {
    "match_all": {}
  },
  "size": 10,
  "fields": [ "release.name", "release.resources.bugtracker.mailto" ],
  "filter": {
    "and": [
      { "term": {"release.maturity": "released"} },
      { "term": {"release.status": "latest"} },
      {  "exists" : { "field" : "release.resources.bugtracker.mailto" } },
      {  "missing" : { "field" : "release.resources.bugtracker.web" } }
    ]
  }
}'

List distributions for which we have a bugtracker URL

curl -XPOST api.metacpan.org/v0/distribution/_search -d '{
  "query": {
    "match_all": {}
  },
  "size": 1000,
  "filter": {
    "exists" : { "field" : "distribution.bugs.source" }
  }
}'

Search the current PDL documentation for the string axisvals

curl -XPOST api.metacpan.org/v0/file/_search -d '{
    "query" : { "filtered" : {
      "query" : { 
        "query_string" : { 
          "query" : "axisvals", 
          "fields" : [ "pod.analyzed", "module.name" ] }
      },
      "filter" : { "and" : [
        { "term" : { "distribution" : "PDL" } },
        { "term" : { "status" : "latest" } }
      ]}
    }},
    "fields" : [ "documentation", "abstract", "module" ],
    "size" : 20
  }'

Meta

Clone this wiki locally