Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review mappings in CILI from PWN30 to PWN31 #17

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 25 additions & 19 deletions ili-map-pwn31.tab
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@ i199 00039507-s
i200 00039705-a
i201 00040060-s
i202 00040189-s
i202 00040305-s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates two synsets in PWN31 with the same ID

Copy link
Member Author

@arademaker arademaker Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the PWN30 synset was split into two:

% rg "i202\t" ili-map-pwn3*
ili-map-pwn31.tab
201:i202	00040189-s
202:i202	00040305-s

ili-map-pwn30.tab
202:i202	00040058-s

WN30 00040058-s {'supine%5:00:00:passive:01', 'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"; "No other colony showed such supine, selfish helplessness in allowing
her own border citizens to be mercilessly harried"- Theodore Roosevelt

WN31 00040189-s {'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"

WN31 00040305-s {'supine%5:00:00:passive:01'}
passive as a result of indolence or indifference; "No other colony showed such supine, selfish helplessness in allowing her own border citizens to be mercilessly harried"- Theodore Roosevelt

If we consider the definition only, we can say that WN30 00040058-s maps to WN31 00040189-s. But one of its senses and one of its examples are now in another synset. There are some other cases similar to that, so let us first discuss that case, ok? @fcbond @jmccrae

WN30 00040058-s has only one similarTo relation with 00039592-a. This relation was projected to WN31 00040305-s and WN31 00040189-s which are both similarTo WN31 00039705-a. Moreover, both WN31 synsets also have an antonym relation to 00038863-a. This means they could not be differentiated by their relations in WN31 so the split is suspicious, they are indistinguishable (by their relations) in both WN31 and WN30. Yep, the glosses and examples differ, but the relations are the real WordNet criteria to define and distinguish a synset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the ili need not be 1-1 with PWN31, right? I am assuming that one ili can map to more than one synset in the same wordnet. So if we consider that i202 is a concept that is both 00040189-s and 00040305-s according to PWN31 , is it fine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem that 00040058-s and 00040189-s are the same and should both be mapped to i202. However 00040305-s is a novel sense and will need to be assigned a new ILI

Copy link

@ekaf ekaf Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist at https://gist.githubusercontent.com/ekaf/8cd78cce7005abd923c7ed2af47238e2 pretty prints the wordnet splits dictionary from NLTK, with information about how many senses are carried over into each part of the split. With WN 3.1 it outputs this file:
out-wnsplits.txt, listing the 33 splits since WN 3.0. The first line is:

00040058-s -> 00040305-s (1 sensekey/s) + 00040189-s (2 sensekey/s)

This shows that 00040305-s contains one sense from the source synset, while 00040189-s contains two.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the previous ILI mappings contained no splits, this PR introduces the following 5:

i202 00040189-s,00040305-s
i40396 00951435-n,00951878-n
i63228 07059027-n,07059160-n
i72354 06836640-n,06836790-n
i90722 10230249-n,10230422-n

So it seems that until now, mappers have made an effort to select only one most adequate target for each source. I think there is a good reason for avoiding to create splits, because having two targets yields both a true and a false positive for each involved sense.

i203 00040548-a
i204 00040757-s
i205 00040908-a
Expand All @@ -211,7 +212,7 @@ i212 00042063-a
i213 00042258-a
i214 00042449-a
i215 00042677-a
i216 00035037-s
i216 00042912-s
i217 00043057-s
i218 00043202-s
i219 00043345-a
Expand Down Expand Up @@ -3755,7 +3756,7 @@ i3760 00678636-s
i3761 00678741-s
i3762 00678855-s
i3763 00678969-a
i3764 00679361-s
i3764 00679196-s
i3765 00679361-s
i3766 00679539-s
i3767 00679725-a
Expand Down Expand Up @@ -4224,7 +4225,7 @@ i4230 00769908-s
i4231 00770017-a
i4232 00770517-s
i4233 00770693-s
i4234 00766556-s
i4234 00770909-s
i4235 00771186-s
i4236 00771658-s
i4237 00771957-s
Expand Down Expand Up @@ -4294,7 +4295,7 @@ i4300 00783570-s
i4301 00783911-s
i4302 00784134-s
i4303 00784271-s
i4304 00805750-s
i4304 00784503-s
i4305 00784620-s
i4306 00784727-a
i4307 00785098-s
Expand Down Expand Up @@ -4409,7 +4410,7 @@ i4415 00805262-s
i4416 00805445-s
i4417 00805518-s
i4418 00805591-s
i4419 00805871-s
i4419 00805750-s
i4420 00805871-s
i4421 00805968-s
i4422 00806085-s
Expand Down Expand Up @@ -5081,7 +5082,7 @@ i5088 00931766-a
i5089 00932022-s
i5090 00932115-s
i5091 00932405-a
i5092 00041424-s
i5092 00932684-s
i5093 00932808-a
i5094 00933056-s
i5095 00933157-s
Expand Down Expand Up @@ -10599,7 +10600,7 @@ i10612 01943615-s
i10613 01943804-s
i10614 01944007-s
i10615 01944376-s
i10616 01939402-a
i10616 01944611-a
i10617 01944939-s
i10618 01945125-s
i10619 01945276-a
Expand Down Expand Up @@ -12781,7 +12782,7 @@ i12796 02318870-s
i12797 02318973-s
i12798 02319122-s
i12799 02319224-a
i12800 02319740-a
i12800 02319740-s
i12801 02319930-s
i12802 02320034-s
i12803 02320130-s
Expand Down Expand Up @@ -34652,7 +34653,7 @@ i34684 02605001-v
i34685 02605322-v
i34686 02605525-v
i34687 02605633-v
i34688 00680696-v
i34688 02605751-v
i34689 02605875-v
i34690 02606079-v
i34691 02606252-v
Expand Down Expand Up @@ -40360,6 +40361,8 @@ i40392 00950684-n
i40393 00950858-n
i40394 00950950-n
i40395 00951332-n
i40396 00951435-n
i40396 00951878-n
i40397 00952059-n
i40398 00952181-n
i40399 00952328-n
Expand Down Expand Up @@ -49994,6 +49997,7 @@ i50029 02716223-n
i50030 02716355-n
i50031 02716453-n
i50033 02716628-n
i50034 02716785-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a genuine change in PWN3.1. A synset was split into two new synsets that should both have a new ILI ID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got the comment right, you are talking about

WN30 02713992-n {'roundel%1:06:01::', 'annulet%1:06:02::'}
(heraldry) a charge in the shape of a circle; "a hollow roundel"

WN31 02716785-n {'roundel%1:06:01::'}
(heraldry) a charge in the shape of a filled circle; "a hollow roundel"

WN31 02716929-n {'annulet%1:06:02::'}
(heraldry) a charge in the shape of a small ring

In this PR, I was not quite sure if I can suggest new ilis. We have many options here. We can not map, the concept in WN30 was split into specialized ones in PWN31 as you confirmed. But what is i50034 so? It points to 02713992-n, something we are rejecting the existence according to the changes proposed in WN31. Isn't it weird to have i50034 in the CILI at all? If we map i50034 to WN31 02716785-n, we are saying that WN30 02713992-n correspond to WN31 02716785-n... It seems to need to better think about the real meaning of the mappings or go back to a more elaborate schema for mapping semantic networks (e.g. the ones used in EuroWordnet or by SUMO from @apease)

Copy link
Member Author

@arademaker arademaker Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the limited expressivity of the current mappings adopted by CILI, we can mark i50034 as deprecated and keep that points only to WN30 02713992-n. Create two new ILI for 02716785-n and 02716929-n. The unfortunate limitation is to lose the knowledge that both WN31 concepts are specializations of 02713992-n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are quite inflexible about the mapping but can easily create more identifiers. The ILI does not capture semantic relations anyway, that is the responsibility of individual wordnets. We should introduce two new ILI identifiers for these more specific senses

i50035 02717050-n
i50036 02717226-n
i50037 02717446-n
Expand Down Expand Up @@ -55752,7 +55756,7 @@ i55790 03690633-n
i55791 03690812-n
i55792 03690966-n
i55793 03691146-n
i55794 03872233-n
i55794 03691288-n
i55795 03691456-n
i55796 03691689-n
i55797 03691796-n
Expand Down Expand Up @@ -55780,7 +55784,7 @@ i55818 03694673-n
i55819 03694769-n
i55820 03694896-n
i55821 03695026-n
i55822 03872586-n
i55822 03695166-n
i55823 03695331-n
i55824 03695494-n
i55825 03695605-n
Expand Down Expand Up @@ -56278,7 +56282,7 @@ i56316 03782816-n
i56317 03783101-n
i56318 03783287-n
i56319 03783494-n
i56320 03872233-n
i56320 03783668-n
i56321 03783835-n
i56322 03783992-n
i56323 03784133-n
Expand Down Expand Up @@ -56588,7 +56592,7 @@ i56626 03835103-n
i56627 03835397-n
i56628 03835494-n
i56629 03835651-n
i56630 03872233-n
i56630 03835818-n
i56631 03835988-n
i56632 03836122-n
i56633 03836375-n
Expand Down Expand Up @@ -72310,6 +72314,7 @@ i72351 06836139-n
i72352 06836320-n
i72353 06836441-n
i72354 06836640-n
i72354 06836790-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a distinct and novel sense

i72355 06836975-n
i72356 06837091-n
i72357 06837277-n
Expand Down Expand Up @@ -90652,6 +90657,7 @@ i90719 10229489-n
i90720 10229738-n
i90721 10230113-n
i90722 10230249-n
i90722 10230422-n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a distinct and novel sense

i90723 10230581-n
i90724 10230706-n
i90725 10230873-n
Expand Down Expand Up @@ -114599,14 +114605,14 @@ i114669 14800682-n
i114670 14800845-n
i114671 14800963-n
i114672 14801083-n
i114673 14802098-n
i114674 14802098-n
i114673 14801263-n
i114674 14801347-n
i114675 14801436-n
i114676 14802098-n
i114677 14802098-n
i114676 14801600-n
i114677 14801682-n
i114678 14801765-n
i114679 14802098-n
i114680 14802098-n
i114679 14801927-n
i114680 14802015-n
i114681 14802098-n
i114682 14802178-n
i114683 14802595-n
Expand Down