-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review mappings in CILI from PWN30 to PWN31 #17
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -198,6 +198,7 @@ i199 00039507-s | |
i200 00039705-a | ||
i201 00040060-s | ||
i202 00040189-s | ||
i202 00040305-s | ||
i203 00040548-a | ||
i204 00040757-s | ||
i205 00040908-a | ||
|
@@ -211,7 +212,7 @@ i212 00042063-a | |
i213 00042258-a | ||
i214 00042449-a | ||
i215 00042677-a | ||
i216 00035037-s | ||
i216 00042912-s | ||
i217 00043057-s | ||
i218 00043202-s | ||
i219 00043345-a | ||
|
@@ -3755,7 +3756,7 @@ i3760 00678636-s | |
i3761 00678741-s | ||
i3762 00678855-s | ||
i3763 00678969-a | ||
i3764 00679361-s | ||
i3764 00679196-s | ||
i3765 00679361-s | ||
i3766 00679539-s | ||
i3767 00679725-a | ||
|
@@ -4224,7 +4225,7 @@ i4230 00769908-s | |
i4231 00770017-a | ||
i4232 00770517-s | ||
i4233 00770693-s | ||
i4234 00766556-s | ||
i4234 00770909-s | ||
i4235 00771186-s | ||
i4236 00771658-s | ||
i4237 00771957-s | ||
|
@@ -4294,7 +4295,7 @@ i4300 00783570-s | |
i4301 00783911-s | ||
i4302 00784134-s | ||
i4303 00784271-s | ||
i4304 00805750-s | ||
i4304 00784503-s | ||
i4305 00784620-s | ||
i4306 00784727-a | ||
i4307 00785098-s | ||
|
@@ -4409,7 +4410,7 @@ i4415 00805262-s | |
i4416 00805445-s | ||
i4417 00805518-s | ||
i4418 00805591-s | ||
i4419 00805871-s | ||
i4419 00805750-s | ||
i4420 00805871-s | ||
i4421 00805968-s | ||
i4422 00806085-s | ||
|
@@ -5081,7 +5082,7 @@ i5088 00931766-a | |
i5089 00932022-s | ||
i5090 00932115-s | ||
i5091 00932405-a | ||
i5092 00041424-s | ||
i5092 00932684-s | ||
i5093 00932808-a | ||
i5094 00933056-s | ||
i5095 00933157-s | ||
|
@@ -10599,7 +10600,7 @@ i10612 01943615-s | |
i10613 01943804-s | ||
i10614 01944007-s | ||
i10615 01944376-s | ||
i10616 01939402-a | ||
i10616 01944611-a | ||
i10617 01944939-s | ||
i10618 01945125-s | ||
i10619 01945276-a | ||
|
@@ -12781,7 +12782,7 @@ i12796 02318870-s | |
i12797 02318973-s | ||
i12798 02319122-s | ||
i12799 02319224-a | ||
i12800 02319740-a | ||
i12800 02319740-s | ||
i12801 02319930-s | ||
i12802 02320034-s | ||
i12803 02320130-s | ||
|
@@ -34652,7 +34653,7 @@ i34684 02605001-v | |
i34685 02605322-v | ||
i34686 02605525-v | ||
i34687 02605633-v | ||
i34688 00680696-v | ||
i34688 02605751-v | ||
i34689 02605875-v | ||
i34690 02606079-v | ||
i34691 02606252-v | ||
|
@@ -40360,6 +40361,8 @@ i40392 00950684-n | |
i40393 00950858-n | ||
i40394 00950950-n | ||
i40395 00951332-n | ||
i40396 00951435-n | ||
i40396 00951878-n | ||
i40397 00952059-n | ||
i40398 00952181-n | ||
i40399 00952328-n | ||
|
@@ -49994,6 +49997,7 @@ i50029 02716223-n | |
i50030 02716355-n | ||
i50031 02716453-n | ||
i50033 02716628-n | ||
i50034 02716785-n | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a genuine change in PWN3.1. A synset was split into two new synsets that should both have a new ILI ID There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I got the comment right, you are talking about WN30 02713992-n {'roundel%1:06:01::', 'annulet%1:06:02::'} WN31 02716785-n {'roundel%1:06:01::'} WN31 02716929-n {'annulet%1:06:02::'} In this PR, I was not quite sure if I can suggest new ilis. We have many options here. We can not map, the concept in WN30 was split into specialized ones in PWN31 as you confirmed. But what is i50034 so? It points to 02713992-n, something we are rejecting the existence according to the changes proposed in WN31. Isn't it weird to have i50034 in the CILI at all? If we map i50034 to WN31 02716785-n, we are saying that WN30 02713992-n correspond to WN31 02716785-n... It seems to need to better think about the real meaning of the mappings or go back to a more elaborate schema for mapping semantic networks (e.g. the ones used in EuroWordnet or by SUMO from @apease) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Considering the limited expressivity of the current mappings adopted by CILI, we can mark i50034 as deprecated and keep that points only to WN30 02713992-n. Create two new ILI for 02716785-n and 02716929-n. The unfortunate limitation is to lose the knowledge that both WN31 concepts are specializations of 02713992-n. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we are quite inflexible about the mapping but can easily create more identifiers. The ILI does not capture semantic relations anyway, that is the responsibility of individual wordnets. We should introduce two new ILI identifiers for these more specific senses |
||
i50035 02717050-n | ||
i50036 02717226-n | ||
i50037 02717446-n | ||
|
@@ -55752,7 +55756,7 @@ i55790 03690633-n | |
i55791 03690812-n | ||
i55792 03690966-n | ||
i55793 03691146-n | ||
i55794 03872233-n | ||
i55794 03691288-n | ||
i55795 03691456-n | ||
i55796 03691689-n | ||
i55797 03691796-n | ||
|
@@ -55780,7 +55784,7 @@ i55818 03694673-n | |
i55819 03694769-n | ||
i55820 03694896-n | ||
i55821 03695026-n | ||
i55822 03872586-n | ||
i55822 03695166-n | ||
i55823 03695331-n | ||
i55824 03695494-n | ||
i55825 03695605-n | ||
|
@@ -56278,7 +56282,7 @@ i56316 03782816-n | |
i56317 03783101-n | ||
i56318 03783287-n | ||
i56319 03783494-n | ||
i56320 03872233-n | ||
i56320 03783668-n | ||
i56321 03783835-n | ||
i56322 03783992-n | ||
i56323 03784133-n | ||
|
@@ -56588,7 +56592,7 @@ i56626 03835103-n | |
i56627 03835397-n | ||
i56628 03835494-n | ||
i56629 03835651-n | ||
i56630 03872233-n | ||
i56630 03835818-n | ||
i56631 03835988-n | ||
i56632 03836122-n | ||
i56633 03836375-n | ||
|
@@ -72310,6 +72314,7 @@ i72351 06836139-n | |
i72352 06836320-n | ||
i72353 06836441-n | ||
i72354 06836640-n | ||
i72354 06836790-n | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a distinct and novel sense |
||
i72355 06836975-n | ||
i72356 06837091-n | ||
i72357 06837277-n | ||
|
@@ -90652,6 +90657,7 @@ i90719 10229489-n | |
i90720 10229738-n | ||
i90721 10230113-n | ||
i90722 10230249-n | ||
i90722 10230422-n | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a distinct and novel sense |
||
i90723 10230581-n | ||
i90724 10230706-n | ||
i90725 10230873-n | ||
|
@@ -114599,14 +114605,14 @@ i114669 14800682-n | |
i114670 14800845-n | ||
i114671 14800963-n | ||
i114672 14801083-n | ||
i114673 14802098-n | ||
i114674 14802098-n | ||
i114673 14801263-n | ||
i114674 14801347-n | ||
i114675 14801436-n | ||
i114676 14802098-n | ||
i114677 14802098-n | ||
i114676 14801600-n | ||
i114677 14801682-n | ||
i114678 14801765-n | ||
i114679 14802098-n | ||
i114680 14802098-n | ||
i114679 14801927-n | ||
i114680 14802015-n | ||
i114681 14802098-n | ||
i114682 14802178-n | ||
i114683 14802595-n | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates two synsets in PWN31 with the same ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the PWN30 synset was split into two:
WN30 00040058-s {'supine%5:00:00:passive:01', 'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"; "No other colony showed such supine, selfish helplessness in allowing
her own border citizens to be mercilessly harried"- Theodore Roosevelt
WN31 00040189-s {'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"
WN31 00040305-s {'supine%5:00:00:passive:01'}
passive as a result of indolence or indifference; "No other colony showed such supine, selfish helplessness in allowing her own border citizens to be mercilessly harried"- Theodore Roosevelt
If we consider the definition only, we can say that WN30 00040058-s maps to WN31 00040189-s. But one of its senses and one of its examples are now in another synset. There are some other cases similar to that, so let us first discuss that case, ok? @fcbond @jmccrae
WN30 00040058-s has only one
similarTo
relation with 00039592-a. This relation was projected to WN31 00040305-s and WN31 00040189-s which are both similarTo WN31 00039705-a. Moreover, both WN31 synsets also have an antonym relation to 00038863-a. This means they could not be differentiated by their relations in WN31 so the split is suspicious, they are indistinguishable (by their relations) in both WN31 and WN30. Yep, the glosses and examples differ, but the relations are the real WordNet criteria to define and distinguish a synset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the ili need not be 1-1 with PWN31, right? I am assuming that one ili can map to more than one synset in the same wordnet. So if we consider that i202 is a concept that is both 00040189-s and 00040305-s according to PWN31 , is it fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem that 00040058-s and 00040189-s are the same and should both be mapped to i202. However 00040305-s is a novel sense and will need to be assigned a new ILI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gist at https://gist.githubusercontent.com/ekaf/8cd78cce7005abd923c7ed2af47238e2 pretty prints the wordnet splits dictionary from NLTK, with information about how many senses are carried over into each part of the split. With WN 3.1 it outputs this file:
out-wnsplits.txt, listing the 33 splits since WN 3.0. The first line is:
00040058-s -> 00040305-s (1 sensekey/s) + 00040189-s (2 sensekey/s)
This shows that 00040305-s contains one sense from the source synset, while 00040189-s contains two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the previous ILI mappings contained no splits, this PR introduces the following 5:
i202 00040189-s,00040305-s
i40396 00951435-n,00951878-n
i63228 07059027-n,07059160-n
i72354 06836640-n,06836790-n
i90722 10230249-n,10230422-n
So it seems that until now, mappers have made an effort to select only one most adequate target for each source. I think there is a good reason for avoiding to create splits, because having two targets yields both a true and a false positive for each involved sense.