Add `fetch_entity_names` method #230

jm-rivera · 2025-03-31T08:36:06Z

This PR is part of a group of PRs which will bring some key features from the Data Commons website API to the client library (e.g #229, #231)

Fetch entity names: this is an implementation of a few DC website features:

The name endpoint of the internal API. /api/place/name
The internal code to fetch the english names from the name property.
The internal code to fetch the i18n name using the nameWithLanguage property (the website automatically resolves the user's locale, this method takes it as an argument)

In short, this PR:

Adds a fetch_entity_names method to NodeEndpoint. This method takes one or more entity_dcids and fetches their name. It defaults to English, but other languages (fr, es, etc) can be selected using the optional language argument. Optionally, users can request to fallback to a particular language if the requested language is not available (to avoid sparse results in some languages).

Example usage:

from datacommons_client import DataCommonsClient

dc = DataCommonsClient(dc_instance="datacommons.one.org")

names = dc.node.fetch_entity_names(
    entity_dcids=[
        "africa",
        "country/GTM",
        "country/USA",
        "wikidataId/Q2608785",
    ],
    language="de",
)


{'africa': 'Afrika',
 'country/GTM': 'Guatemala',
 'country/USA': 'Vereinigte Staaten'}

And with the optional fallback to english:

names = dc.node.fetch_entity_names(
    entity_dcids=[
        "africa",
        "country/GTM",
        "country/USA",
        "wikidataId/Q2608785",
    ],
    language="de",
    fallback_language="en",
)

{'africa': 'Afrika',
 'country/GTM': 'Guatemala',
 'country/USA': 'Vereinigte Staaten',
 'wikidataId/Q2608785': 'La Democracia'}

In the background, this method uses name or nameWithLanguage depending on what the user requests. name is much cheaper than nameWithLanguage since it only contains one value per entity. The method only requests and parses nameWithLanguage if a non-english language is selected.

A previous version used english as the fallback, available via a boolean. The latest version implements a more generic, Optional, fallback_language which takes a string (default to None for no fallback)

dwnoble

Thanks Jorge!

I think this feature is useful, and is a cleaner implementation than we have on the website today.

But, there's still some ambiguity around:

If i request a particular language and have english as a fallback, how do i tell from the response if i'm seeing the requested language or the fallback?
Some smaller places may not have a "name" field today in the KG- example: https://datacommons.org/browser/wikidataId/Q4104190 . This is arguably a data issue - maybe in this example we should be setting "name" to the Bulgarian name, but the issue remains. I think on the website we return the DCID instead of the name, but im not sure if that's the best behavior.

What do you think?

datacommons_client/endpoints/node.py

- magic strings to constants - change fallback logic to any language (not en boolean)

jm-rivera · 2025-04-01T14:27:38Z

Thanks @dwnoble!

If i request a particular language and have english as a fallback, how do i tell from the response if i'm seeing the requested language or the fallback?

It's a very fair question. I don't have a simple suggestion... but can see two potential routes:

It's on my list to implement logging. Right now everything happens a bit too silently for my liking. So a solution could be logging a info or warning when the fallback is used.
we add an additional key to the response, which tracks any entities for which the fallback was used. That means we still keep a convenient mapping of entity_id -> name, but if the 'fallback' key is present (or if it has items, depending on how we implement), then the fallback was used for the countries contained in that list.

What do you think of either option? Or maybe you see another route?

Some smaller places may not have a "name" field today in the KG- example: https://datacommons.org/browser/wikidataId/Q4104190 . This is arguably a data issue - maybe in this example we should be setting "name" to the Bulgarian name, but the issue remains. I think on the website we return the DCID instead of the name, but im not sure if that's the best behavior.

In most of the REST API, if something doesn't exist it comes back empty. We don't currently infer or fill missing data on the client library side. I think I'd be in favour of keeping that behaviour consistent.

dwnoble

Thanks @dwnoble!

If i request a particular language and have english as a fallback, how do i tell from the response if i'm seeing the requested language or the fallback?

It's a very fair question. I don't have a simple suggestion... but can see two potential routes:

It's on my list to implement logging. Right now everything happens a bit too silently for my liking. So a solution could be logging a info or warning when the fallback is used.

we add an additional key to the response, which tracks any entities for which the fallback was used. That means we still keep a convenient mapping of entity_id -> name, but if the 'fallback' key is present (or if it has items, depending on how we implement), then the fallback was used for the countries contained in that list.

What do you think of either option? Or maybe you see another route?

Some smaller places may not have a "name" field today in the KG- example: https://datacommons.org/browser/wikidataId/Q4104190 . This is arguably a data issue - maybe in this example we should be setting "name" to the Bulgarian name, but the issue remains. I think on the website we return the DCID instead of the name, but im not sure if that's the best behavior.

In most of the REST API, if something doesn't exist it comes back empty. We don't currently infer or fill missing data on the client library side. I think I'd be in favour of keeping that behaviour consistent.

Agreed that things happen a bit too silently right now, and that our website api currently makes some assumptions that are probably not obvious to users. Extra logging sounds good, but I'd lean toward adding more metadata to the response. What do you think about something like this?

{
  "africa": {
    "value": "Afrika",
    "property": "nameWithLanguage", // Could be 'name' if English was requested or used as fallback
    "language": "de" // The actual language of the value returned
  },
  "wikidataId/Q2608785": { // Example using fallback
     "value": "La Democracia", // The fallback name
     "property": "name", // Assuming 'name' is English and was the fallback
     // Optional: Explicitly note the request vs result?
     // "requestedLanguage": "de"
  }
  // ... other entities
}

jm-rivera · 2025-04-07T17:41:02Z

@dwnoble Here's an implementation (with additional tests) for the type of return you suggested.

Agreed that things happen a bit too silently right now, and that our website api currently makes some assumptions that are probably not obvious to users. Extra logging sounds good, but I'd lean toward adding more metadata to the response. What do you think about something like this?

It's definitely more explicit, and overall I think it's the right way to go. At first, I wanted something more directly usable for analysts who may want to quickly add names to the data they got. With this more complete dictionary, they would have to unpack {k: v['value'] for k,v in result.items()} to have a dictionary that directly takes the dcid and maps it to the name. It's a small inconvenience, but at least there would be no ambiguity.

Thanks for the suggestion, Dan!

datacommons_client/endpoints/node.py

dwnoble · 2025-04-07T18:47:57Z

Accidentally hit close- apologies :)

Fetch names from properties

92223a7

jm-rivera requested review from dwnoble and keyurva March 31, 2025 08:36

jm-rivera self-assigned this Mar 31, 2025

This was referenced Mar 31, 2025

Add fetch_available_statistical_variables #229

Merged

Add parents/ancestry methods #231

Merged

dwnoble reviewed Mar 31, 2025

View reviewed changes

datacommons_client/endpoints/node.py Outdated Show resolved Hide resolved

jm-rivera added 2 commits April 1, 2025 15:46

Merge branch 'master' into add-fetch-names

bdd0535

property constants and default changes

8c27cf8

- magic strings to constants - change fallback logic to any language (not en boolean)

jm-rivera requested a review from dwnoble April 4, 2025 06:23

dwnoble reviewed Apr 4, 2025

View reviewed changes

jm-rivera added 2 commits April 7, 2025 18:58

Merge branch 'master' into add-fetch-names

6ead002

A more complete return object

bb8c0a3

jm-rivera requested a review from dwnoble April 7, 2025 17:41

dwnoble reviewed Apr 7, 2025

View reviewed changes

datacommons_client/endpoints/node.py Outdated Show resolved Hide resolved

dwnoble closed this Apr 7, 2025

dwnoble reopened this Apr 7, 2025

Add Name class

39ce7f1

jm-rivera requested a review from dwnoble April 7, 2025 19:13

Merge remote-tracking branch 'upstream/master' into add-fetch-names

44d78a2

dwnoble approved these changes Apr 7, 2025

View reviewed changes

jm-rivera merged commit eb6adef into datacommonsorg:master Apr 8, 2025
2 checks passed

jm-rivera deleted the add-fetch-names branch April 8, 2025 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `fetch_entity_names` method #230

Add `fetch_entity_names` method #230

Uh oh!

jm-rivera commented Mar 31, 2025 •

edited

Loading

Uh oh!

dwnoble left a comment

Uh oh!

Uh oh!

jm-rivera commented Apr 1, 2025 •

edited

Loading

Uh oh!

dwnoble left a comment

Uh oh!

jm-rivera commented Apr 7, 2025

Uh oh!

Uh oh!

dwnoble commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Add fetch_entity_names method #230

Add fetch_entity_names method #230

Uh oh!

Conversation

jm-rivera commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dwnoble left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jm-rivera commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dwnoble left a comment

Choose a reason for hiding this comment

Uh oh!

jm-rivera commented Apr 7, 2025

Uh oh!

Uh oh!

dwnoble commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Add `fetch_entity_names` method #230

Add `fetch_entity_names` method #230

jm-rivera commented Mar 31, 2025 •

edited

Loading

jm-rivera commented Apr 1, 2025 •

edited

Loading