wrong population size (EA202) for Koreans (Ed1) #321

dorshilton · 2022-11-01T13:31:18Z

the value for this variable is 200, based on the population of a specific island in 1947. Surely this should be modified (it was more around 20 million around that time...)

dorshilton · 2022-11-10T09:19:14Z

Similar issue for Brazilians (Cf4), population size (EA202) equals 2700 presumably since this is the population of a village near Sao Paulo in the ethnographic source. Clearly should be larger (also, EA031 is 50000+).
There appear to be more with a similar problem - here are more ea_id's for which EA031 is 50000+ but population size is smaller:
Ca7
Ch1
Ch5
Ea13
Ed6
Ef11
Eg10

SimonGreenhill · 2022-11-11T00:32:15Z

Thanks Dor, these look like something we should check in more detail. I'll add it to the to-do list

hrncirv · 2023-11-16T17:31:25Z

I looked in the EA to check these values.

According to the Murdock (1967). Ethnographic atlas: a summary, the population sizes are:
Koreans (Ed1): 30,000,000
Amhara (Ca7): 2,000,000
Bulgarians (Ch5): 7,000,000

Other population data I found:
Brazilians (Cf4): 2,700 in Cruz das Almas in 1940. SOURCE: Ethnology 1963, Vol. 2, No. 1
Uttar Pradesh (Ef11): 1,400 in 1941 (the village of Madhopur or Senapur). SOURCE: Ethnology 1971, Vol. 10, No. 1.

For Serbs (Ch1), Punjabi (Ea13), Min Chinese (Ed6) and Telugu (Eg10), I found no population data in EA.

@SimonGreenhill Do you know what the (other) sources were used for this variable?

SimonGreenhill · 2023-11-18T02:12:02Z

I suspect the number recorded here is for this subcase i.e. Kanghwa Island. Perhaps we should think about renaming these cases to e.g. Korean (Kanghwa Island) to be more transparent.

I'm not sure that updating this value to the modern population is the right thing to do (because where do we stop - do we recode other variables that are different?).

hrncirv · 2023-11-22T08:53:35Z

I looked at the description of EA202 variable and this is probably not a bug, but a feature:

"Population of ethnic group as a whole, unless otherwise noted in Comments. Note that source differs by society; EA bibliography is source where possible, otherwise Ember (1992)."
https://d-place.org/parameters/EA202#1/30/152

So I'd leave it as it is. As long as the culture in question has a disclaimer that the values relate to a specific location.

What could theoretically be done is to split this variable into two: "Population of ethnic group" and "Population of subcase". Each would have a bunch of NAs, but it wouldn't be as confusing.

xrotwang · 2023-11-22T09:03:11Z

I'd even go so far as saying that confusion is a feature and not a bug here. It basically signals that one needs to do a bit of research before using these numbers in any analysis. Prefering the smaller numbers also seems to be useful, because extrapolating D-PLACE data to - essentially - the level of big national languages is burdened with all sorts of theoretical problems and getting the speaker numbers for these is probably the smallest issue here :)

dorshilton · 2023-11-22T09:34:33Z

I'd recommend signaling these exceptions more clearly to researchers. Generally speaking, I don't think it's a good idea for one variable to refer to two things, and one can't expect all researchers to review each datapoint.

xrotwang · 2023-11-22T09:49:57Z

I agree that a variable being interpreted differently for different datapoints is generally bad. But since d-place only aggregates data from existing sources, the question becomes whether this is something the editors can/should change or whether we simply document this issue in the variable description. dorshilton ***@***.***> schrieb am Mi., 22. Nov. 2023, 10:34:

…

I'd recommend signaling these exceptions more clearly to researchers. Generally speaking, I don't think it's a good idea for one variable to refer to two things, and one can't expect all researchers to review each datapoint. — Reply to this email directly, view it on GitHub <#321 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGUOKEBAKOO7BIZPFPO353YFXBLJAVCNFSM6AAAAAARUB44U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGQYTAMRZGI> . You are receiving this because you commented.Message ID: ***@***.***>

SimonGreenhill mentioned this issue Nov 1, 2022

Not updated: geographic coordinates for societies in 'tdwg societies' file #280

Closed

xrotwang assigned angela-mc and hrncirv Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong population size (EA202) for Koreans (Ed1) #321

wrong population size (EA202) for Koreans (Ed1) #321

dorshilton commented Nov 1, 2022

dorshilton commented Nov 10, 2022

SimonGreenhill commented Nov 11, 2022

hrncirv commented Nov 16, 2023

SimonGreenhill commented Nov 18, 2023 •

edited

Loading

hrncirv commented Nov 22, 2023

xrotwang commented Nov 22, 2023

dorshilton commented Nov 22, 2023

xrotwang commented Nov 22, 2023 via email

wrong population size (EA202) for Koreans (Ed1) #321

wrong population size (EA202) for Koreans (Ed1) #321

Comments

dorshilton commented Nov 1, 2022

dorshilton commented Nov 10, 2022

SimonGreenhill commented Nov 11, 2022

hrncirv commented Nov 16, 2023

SimonGreenhill commented Nov 18, 2023 • edited Loading

hrncirv commented Nov 22, 2023

xrotwang commented Nov 22, 2023

dorshilton commented Nov 22, 2023

xrotwang commented Nov 22, 2023 via email

SimonGreenhill commented Nov 18, 2023 •

edited

Loading