Skip to content

Commit 5923f7f

Browse files
committed
added new vignettes/articles
1 parent 49df2dd commit 5923f7f

7 files changed

+1388
-5
lines changed

vignettes/articles/api-changes.Rmd.orig

+381
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
title: "Converting an Existing Script"
3+
output: rmarkdown::html_vignette
4+
vignette: >
5+
%\VignetteIndexEntry{Converting an Existing Script}
6+
%\VignetteEngine{knitr::rmarkdown}
7+
%\VignetteEncoding{UTF-8}
8+
---
9+
10+
```{r, include = FALSE}
11+
knitr::opts_chunk$set(
12+
collapse = TRUE,
13+
comment = "#>",
14+
warning = FALSE,
15+
message = FALSE
16+
)
17+
```
18+
19+
If you have a script that worked with the original R package and original version of the API, chances are it will need some possibly substantial changes before it will work with the new version of the R package and API.
20+
21+
## Required API Key
22+
First off you'll need to [request an API key](https://patentsview-support.atlassian.net/servicedesk/customer/portals) and then set the environmental variable PATENTSVIEW_API_KEY to the value of your API key. Ex. set PATENTSVIEW_API_KEY=My_api_key Without a valid API key, all your calls will be rejected by the API.
23+
24+
## The New Throttling Limit
25+
Another new thing is a throttling limit. The new version of the API only allows an individual API key to make 45 calls per minute. The call that exceeds that limit is rejected but does return the number of seconds to wait before calls would be allowed again. Fortunately, the R package handles this for you! Your script will be chugging along and if the API should return a throttling response, the R package will sleep for the required number of seconds before automatically resending your query! The only thing you may notice, besides a warning message, it that the script will pause when throttled before it picks right back up again.
26+
27+
## Philosophical Change
28+
The new version of the API's endpoints are less Swiss Army Knife-like than before, where you could get nearly any data field from any endpoint. Now they have substantially lighter responses and they generally focus on data pertinent to that endpoint. In other words, you can only get USPC fields from the USPC endpoints or from the patent endpoint. This may mean you'll have to make multiple calls to different endpoints to get the same data the old version of API used to return in a single call.
29+
30+
Take a look at the [top assignees application](top-assignees.html). It has to blend together information from separate calls that used to be returned by a single call. This may push your dplyr skills to the limit.
31+
32+
## Changed Field Names and Types
33+
The fields requested by the original script or used in its query may not be available from the new version's endpoints. The nber attributes are no longer available as the
34+
nber_subcategories endpoint was removed. Also, some attributes have new names, like name_last in the nested inventor object returned by the patent endpoint. Now in the fields parameter it would be specified as "inventor.name_last" where formerly it was "inventor_last_name" when using the patent endpoint. This also demonstrates how nested fields need to be fully
35+
qualified in the query parameter.
36+
37+
Also note that some field's types have changed, meaning you'll need to use different operators within your query. Ex. assignee.organization is now a full text field, formerly it was a string.
38+
39+
```{r}
40+
41+
library(patentsview)
42+
43+
# Before you could do a
44+
qry_funs$contains(assignee_organization="Rice University")
45+
46+
# now you would have to do
47+
qry_funs$text_phrase(assignees.assignee_organization="Rice University")
48+
49+
```
50+
Checkout the [API documentation](https://search.patentsview.org/docs/docs/Search%20API/SearchAPIReference/#endpoints) and the [Swagger UI page](https://search.patentsview.org/swagger-ui/) to see the
51+
returned fields and their types. The same information is available in the `fieldsdf` data frame but it is
52+
harder to read.
53+
54+
## Singular Endpoints
55+
56+
The endpoints are now singular, ex: "patent" where previously it was "patents".
57+
```{r}
58+
get_endpoints()
59+
```
60+
61+
62+
## Additions to the R Package
63+
64+
Some of the endpoints now return HATEOAS links, where you make a call back to the API
65+
to retrieve additional data. The new method retrieve_linked_data() does just that.
66+
There is a lot more about this [here](api-changes.html#hateoas-links).
67+
68+
69+
## Conclusion
70+
71+
So there you have it, our attempt at listing what's changed and what to do about it. [Request](https://patentsview-support.atlassian.net/servicedesk/customer/portals) an API key and get going with the new version of the R package! The two API versions will coexist for a while but the API team plans to shutdown the original version of the API in February 2025.
72+
73+
74+
75+
76+
77+
78+
79+
80+
81+
82+
83+
84+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
slug: patentsview-breaking-release
3+
title: "Breaking Release of the Patentsview Package"
4+
package_version: 1.0.0
5+
author:
6+
- Russ Allen
7+
- Chris Baker
8+
date: "`r Sys.Date()`"
9+
output: rmarkdown::html_vignette
10+
tags:
11+
- Software Peer Review
12+
- packages
13+
- R
14+
- community
15+
- tech notes
16+
- Patents
17+
- PatentsView
18+
- API
19+
- API client
20+
- USPTO
21+
- r-universe
22+
description: "Breaking Release of the Patentsview Package"
23+
editor:
24+
vignette: >
25+
%\VignetteIndexEntry{Breaking Release of the Patentsview Package}
26+
%\VignetteEngine{knitr::rmarkdown}
27+
%\VignetteEncoding{UTF-8}
28+
---
29+
30+
*This is a proposed Tech Note to be submitted to rOpenSci. It's here as an Rmd so it will be
31+
knitted by the build process but can be submitted as an md file.*
32+
33+
The Patentsview API team has released a new version of their API, which is used by a
34+
correspondingly new version of the [patentsview](https://docs.ropensci.org/patentsview/) R package.
35+
The problem for users is that the API team has made **breaking changes**, existing programs will not run
36+
with the new version of the R package. Please don't shoot the messenger!
37+
38+
The new version of the R package handles some of the API team's changes where possible,
39+
however an API key is now required. The Patentsview API team plans to shutdown
40+
the original version of the API in the near future. At that
41+
time the original version of the R package will stop working.
42+
43+
The original version of the R package is available on CRAN with the new version available on
44+
[r-universe](https://ropensci.r-universe.dev/patentsview).
45+
After the original version of the API is shutdown, the updated R package will be submitted to CRAN.
46+
47+
## User Impacting API changes:
48+
1. Users will need to [request an API key](https://patentsview-support.atlassian.net/servicedesk/customer/portals) and set an environmental variable PATENTSVIEW_API_KEY to its value.
49+
2. Endpoint changes:
50+
- The nber_subcategories, one of the original seven endpoints, was removed
51+
- cpc_subsections is now cpc_group
52+
- The remaining five original endpoints went from plural to singular, "patents" is now "patent" for example.
53+
Interestingly, the returned data structures are still plural for the most part.
54+
- There are now 27 endpoints, some may need to be called to retrieve fields that are currently (soon-to-be-were)
55+
available from the original endpoints (now some endpoint's returns are lighter, requiring additional calls to be made).
56+
- Now some of the endpoints return HATEOAS (Hypermedia as the Engine of Application State) links to retrieve more data
57+
(URLs for additional calls back to the API)
58+
3. Some fields are now nested and need to be fully qualified when used in a query,
59+
for instance, ```search_pv('{"cpc_current.cpc_group_id":"A01B1/00"}')``` when using the patent endpoint.
60+
61+
In the fields parameter, nested fields can be fully qualified or a new API shorthand can be used, where
62+
group names can specified. When group names are used, all of the group's nested fields will be returned
63+
by the API. For example, the new version of the API and R package will accept fields=c("assignees") when
64+
using the patent endpoint and all nested assignees' fields will be returned by the API.
65+
4. Some field's names have changed, most significantly, patent_number is now patent_id,
66+
and some fields were removed entirely, for instance, rawinventor_first_name and rawinventor_last_name.
67+
5. The original version of the API had queryable fields and additional fields which could be
68+
retrieved but couldn't be part of a conditional query. That notion does not apply to the
69+
new version of the API as all fields are now queryable. You may be able
70+
to simplify your code if you found yourself doing post processing on returned data
71+
because a field you were interested in was not queryable.
72+
6. Now there isn't supposed to be a difference between
73+
operators used on strings vs full text fields, as there was in the original
74+
version of the API. See the tip below the [Syntax section](https://search.patentsview.org/docs/docs/Search%20API/SearchAPIReference/#syntax).
75+
7. Result set paging has changed significantly. This would matter only if users implemented their own
76+
paging, the R package continues to handle result set paging when search_pv's `all_pages = TRUE`.
77+
There is a new result set paging vignette to explain the way the API now pages,
78+
using the `size` and `after` parameters rather than using `per_page` and `page`.
79+
8. Result set sizes are seemingly unbounded now. The original version of the API capped result sets at
80+
100,000 rows.
81+
82+
The API team also [renamed the API](https://search.patentsview.org/docs/#naming-update),
83+
PatentsView's Search API is now the PatentSearch API.
84+
Note that the R package will retain its name, continue to use `library(patentsview)`
85+
86+
## Highlights of the R package:
87+
88+
1. Throttling is now enforced by the API and handled by the R package (sleep as specified by the throttle response before retry)
89+
2. There are four new vignettes
90+
- There is a new "Converting an existing script" vignette
91+
- The [rOpenSci post](/blog/2017/09/19/patentsview/) that announced the original version of the R package has been changed to work with the new version of the API and is now a new vignette.
92+
- Understanding the API, the API team's jupyter notebook, converted to R
93+
- On writing custom result set paging
94+
3. The R package changed internally from using httr to httr2. This only affects users if
95+
they passed additional arguments (...) to `search_pv()`. Previously if they passed config = httr::timeout(40)
96+
they'd now pass timeout = 40 (name-value pairs of valid curl options, as found in curl::curl_options() see [req_options](https://httr2.r-lib.org/reference/req_options.html))
97+
4. Now that the R package is using httr2, users can make use of its last_request() method to see what was sent to the API. This could be useful when trying to fix an invalid request. Also fun would be seeing the raw API response.
98+
```
99+
httr2::last_request()
100+
httr2::last_response()
101+
httr2::last_response() |> httr2::resp_body_json()
102+
```
103+
104+
6. A new function was added
105+
- `retrieve_linked_data()` to retrieve data from a HATEOAS link the API sent back, retrying if throttled
106+
107+
7. An existing function was removed. With the API changes, there is less of a need for
108+
`cast_pv_data()` which was previously part of the R package. The API now returns most fields as appropriate
109+
types, boolean, numeric etc., instead of always returning strings.
110+
111+
## Online Documentation
112+
113+
The API team has thoughtfully provided a Swagger UI page for the new version of the API at https://search.patentsview.org/swagger-ui/.
114+
Think of it as an online version of Postman already loaded with the API’s new endpoints and returns.
115+
The Swagger UI page documents what fields are returned by each endpoint on a successful call.
116+
(Response http code 200).
117+
You can even send in requests and see actual API responses if you enter your API key and press
118+
an endpoint's "Try it out" and "Execute" buttons. Even error responses can be informative,
119+
usually pointing out what went wrong.
120+
121+
In a similar format, the [updated API documentation](https://search.patentsview.org/docs/docs/Search%20API/SearchAPIReference/#endpoints)
122+
lists what each endpoint does. Additionally, the R package's fieldsdf data frame has been updated,
123+
now listing the new set of endpoints and fields that can be queried and/or returned. The R package's
124+
reference pages have also been updated.
125+
126+
## Final Thoughts
127+
As shown in the updated Top Assignees vignette, there will be occasions now where multiple API calls are needed to retrieve the same data as in a single API call in the original version of the API and R package.
128+
Additionally, the reworked rOpenSci post explains what changes had to be made since assignee latitude
129+
and longitude are no longer available from the patent endpoint.
130+
131+
Issues or questions about the API itself can be raised in the [API's portal](https://patentsview-support.atlassian.net/servicedesk/customer/portals) or in the
132+
API's [forum](https://patentsview.org/forum). Issues for the R package can be raised in the [patentsview repo](https://github.com/ropensci/patentsview/issues).
133+
134+

0 commit comments

Comments
 (0)