Skip to content

Commit

Permalink
Added data release
Browse files Browse the repository at this point in the history
  • Loading branch information
abenton committed Mar 21, 2015
0 parents commit 9d37751
Show file tree
Hide file tree
Showing 4 changed files with 38,786 additions and 0 deletions.
21 changes: 21 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
This directory contains data collected for "Entity Linking for Spoken
Language" (NAACL 2015, forthcoming). Also included are all named entity
annotations used for evaluation in "OOV Sensitive Named-Entity Recognition
in Speech" (INTERSPEECH, 2011), of which a subset were annotated with
knowledge base entities. The utterances were drawn from the HUB4
dataset https://catalog.ldc.upenn.edu/LDC98S71 , and the knowledge base is
the same Wikipedia dump used in the TAC 2009 KBP track.

"folds.txt": Mapping from each entity linking query ID to fold

"el_queries.txt": Entity linking queries. Format:
QUERY_ID ENTITY_MENTION HUB4_UTTERANCE_ID ENTITY_TYPE KB_ID

"ne_el_labels.txt": All named entities annotated in the HUB4 transcripts.
Any entity that does not have an linking annotation to the Wikipedia KB
is marked as "NOT_ANNOTATED". Format:
HUB4_UTTERANCE_ID ENTITY_MENTION START_SPAN END_SPAN ENTITY_TYPE KB_ID

If you have any questions, please send them to:

adrian dot benton at gmail dot com
Loading

0 comments on commit 9d37751

Please sign in to comment.