Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 446 Bytes

README.md

File metadata and controls

8 lines (6 loc) · 446 Bytes

LEXB corpus

The LEXB corpus is a bilingual (Italian-German) collection of South Tyrolean legislation.

The corpus is built in three versions:

  • LEXB_full: a full version of the corpus, annotated with contextual, structural and linguistic information.
  • LEXB_tm: a raw version of the corpus to be used as a translation memory.
  • LEXB_mt: a fully cleaned and filtered version of the corpus to be used for MT training and/or MT adaptation.