|
| 1 | +--- |
| 2 | +slug: patentsview-breaking-release |
| 3 | +title: "Breaking Release of the Patentsview Package" |
| 4 | +package_version: 1.0.0 |
| 5 | +author: |
| 6 | + - Russ Allen |
| 7 | + - Chris Baker |
| 8 | +date: "`r Sys.Date()`" |
| 9 | +output: rmarkdown::html_vignette |
| 10 | +tags: |
| 11 | + - Software Peer Review |
| 12 | + - packages |
| 13 | + - R |
| 14 | + - community |
| 15 | + - tech notes |
| 16 | + - Patents |
| 17 | + - PatentsView |
| 18 | + - API |
| 19 | + - API client |
| 20 | + - USPTO |
| 21 | + - r-universe |
| 22 | +description: "Breaking Release of the Patentsview Package" |
| 23 | +editor: |
| 24 | +vignette: > |
| 25 | + %\VignetteIndexEntry{Breaking Release of the Patentsview Package} |
| 26 | + %\VignetteEngine{knitr::rmarkdown} |
| 27 | + %\VignetteEncoding{UTF-8} |
| 28 | +--- |
| 29 | + |
| 30 | +*This is a proposed Tech Note to be submitted to rOpenSci. It's here as an Rmd so it will be |
| 31 | +knitted by the build process but can be submitted as an md file.* |
| 32 | + |
| 33 | +The Patentsview API team has released a new version of their API, which is used by a |
| 34 | +correspondingly new version of the [patentsview](https://docs.ropensci.org/patentsview/) R package. |
| 35 | +The problem for users is that the API team has made **breaking changes**, existing programs will not run |
| 36 | +with the new version of the R package. Please don't shoot the messenger! |
| 37 | + |
| 38 | +The new version of the R package handles some of the API team's changes where possible, |
| 39 | +however an API key is now required. The Patentsview API team plans to shutdown |
| 40 | +the original version of the API in the near future. At that |
| 41 | +time the original version of the R package will stop working. |
| 42 | + |
| 43 | +The original version of the R package is available on CRAN with the new version available on |
| 44 | +[r-universe](https://ropensci.r-universe.dev/patentsview). |
| 45 | +After the original version of the API is shutdown, the updated R package will be submitted to CRAN. |
| 46 | + |
| 47 | +## User Impacting API changes: |
| 48 | +1. Users will need to [request an API key](https://patentsview-support.atlassian.net/servicedesk/customer/portals) and set an environmental variable PATENTSVIEW_API_KEY to its value. |
| 49 | +2. Endpoint changes: |
| 50 | + - The nber_subcategories, one of the original seven endpoints, was removed |
| 51 | + - cpc_subsections is now cpc_group |
| 52 | + - The remaining five original endpoints went from plural to singular, "patents" is now "patent" for example. |
| 53 | +Interestingly, the returned data structures are still plural for the most part. |
| 54 | + - There are now 27 endpoints, some may need to be called to retrieve fields that are currently (soon-to-be-were) |
| 55 | +available from the original endpoints (now some endpoint's returns are lighter, requiring additional calls to be made). |
| 56 | + - Now some of the endpoints return HATEOAS (Hypermedia as the Engine of Application State) links to retrieve more data |
| 57 | +(URLs for additional calls back to the API) |
| 58 | +3. Some fields are now nested and need to be fully qualified when used in a query, |
| 59 | +for instance, ```search_pv('{"cpc_current.cpc_group_id":"A01B1/00"}')``` when using the patent endpoint. |
| 60 | + |
| 61 | + In the fields parameter, nested fields can be fully qualified or a new API shorthand can be used, where |
| 62 | +group names can specified. When group names are used, all of the group's nested fields will be returned |
| 63 | +by the API. For example, the new version of the API and R package will accept fields=c("assignees") when |
| 64 | +using the patent endpoint and all nested assignees' fields will be returned by the API. |
| 65 | +4. Some field's names have changed, most significantly, patent_number is now patent_id, |
| 66 | +and some fields were removed entirely, for instance, rawinventor_first_name and rawinventor_last_name. |
| 67 | +5. The original version of the API had queryable fields and additional fields which could be |
| 68 | +retrieved but couldn't be part of a conditional query. That notion does not apply to the |
| 69 | +new version of the API as all fields are now queryable. You may be able |
| 70 | +to simplify your code if you found yourself doing post processing on returned data |
| 71 | +because a field you were interested in was not queryable. |
| 72 | +6. Now there isn't supposed to be a difference between |
| 73 | +operators used on strings vs full text fields, as there was in the original |
| 74 | +version of the API. See the tip below the [Syntax section](https://search.patentsview.org/docs/docs/Search%20API/SearchAPIReference/#syntax). |
| 75 | +7. Result set paging has changed significantly. This would matter only if users implemented their own |
| 76 | +paging, the R package continues to handle result set paging when search_pv's `all_pages = TRUE`. |
| 77 | +There is a new result set paging vignette to explain the way the API now pages, |
| 78 | +using the `size` and `after` parameters rather than using `per_page` and `page`. |
| 79 | +8. Result set sizes are seemingly unbounded now. The original version of the API capped result sets at |
| 80 | +100,000 rows. |
| 81 | + |
| 82 | +The API team also [renamed the API](https://search.patentsview.org/docs/#naming-update), |
| 83 | +PatentsView's Search API is now the PatentSearch API. |
| 84 | +Note that the R package will retain its name, continue to use `library(patentsview)` |
| 85 | + |
| 86 | +## Highlights of the R package: |
| 87 | + |
| 88 | +1. Throttling is now enforced by the API and handled by the R package (sleep as specified by the throttle response before retry) |
| 89 | +2. There are four new vignettes |
| 90 | + - There is a new "Converting an existing script" vignette |
| 91 | + - The [rOpenSci post](/blog/2017/09/19/patentsview/) that announced the original version of the R package has been changed to work with the new version of the API and is now a new vignette. |
| 92 | + - Understanding the API, the API team's jupyter notebook, converted to R |
| 93 | + - On writing custom result set paging |
| 94 | +3. The R package changed internally from using httr to httr2. This only affects users if |
| 95 | +they passed additional arguments (...) to `search_pv()`. Previously if they passed config = httr::timeout(40) |
| 96 | +they'd now pass timeout = 40 (name-value pairs of valid curl options, as found in curl::curl_options() see [req_options](https://httr2.r-lib.org/reference/req_options.html)) |
| 97 | +4. Now that the R package is using httr2, users can make use of its last_request() method to see what was sent to the API. This could be useful when trying to fix an invalid request. Also fun would be seeing the raw API response. |
| 98 | +``` |
| 99 | +httr2::last_request() |
| 100 | +httr2::last_response() |
| 101 | +httr2::last_response() |> httr2::resp_body_json() |
| 102 | +``` |
| 103 | + |
| 104 | +6. A new function was added |
| 105 | + - `retrieve_linked_data()` to retrieve data from a HATEOAS link the API sent back, retrying if throttled |
| 106 | + |
| 107 | +7. An existing function was removed. With the API changes, there is less of a need for |
| 108 | +`cast_pv_data()` which was previously part of the R package. The API now returns most fields as appropriate |
| 109 | +types, boolean, numeric etc., instead of always returning strings. |
| 110 | + |
| 111 | +## Online Documentation |
| 112 | + |
| 113 | +The API team has thoughtfully provided a Swagger UI page for the new version of the API at https://search.patentsview.org/swagger-ui/. |
| 114 | +Think of it as an online version of Postman already loaded with the API’s new endpoints and returns. |
| 115 | +The Swagger UI page documents what fields are returned by each endpoint on a successful call. |
| 116 | +(Response http code 200). |
| 117 | +You can even send in requests and see actual API responses if you enter your API key and press |
| 118 | +an endpoint's "Try it out" and "Execute" buttons. Even error responses can be informative, |
| 119 | +usually pointing out what went wrong. |
| 120 | + |
| 121 | +In a similar format, the [updated API documentation](https://search.patentsview.org/docs/docs/Search%20API/SearchAPIReference/#endpoints) |
| 122 | +lists what each endpoint does. Additionally, the R package's fieldsdf data frame has been updated, |
| 123 | +now listing the new set of endpoints and fields that can be queried and/or returned. The R package's |
| 124 | +reference pages have also been updated. |
| 125 | + |
| 126 | +## Final Thoughts |
| 127 | +As shown in the updated Top Assignees vignette, there will be occasions now where multiple API calls are needed to retrieve the same data as in a single API call in the original version of the API and R package. |
| 128 | +Additionally, the reworked rOpenSci post explains what changes had to be made since assignee latitude |
| 129 | +and longitude are no longer available from the patent endpoint. |
| 130 | + |
| 131 | +Issues or questions about the API itself can be raised in the [API's portal](https://patentsview-support.atlassian.net/servicedesk/customer/portals) or in the |
| 132 | +API's [forum](https://patentsview.org/forum). Issues for the R package can be raised in the [patentsview repo](https://github.com/ropensci/patentsview/issues). |
| 133 | + |
| 134 | + |
0 commit comments