Skip to content

Twitter Eval Confusion Matrices

Gordon Vidaver edited this page May 11, 2016 · 8 revisions

Twitter Results

Twitter : Evaluating language identification performance

From Evaluating language identification performance

Precision dataset

Note that the precision_oriented dataset had 69000 tweets but we only could actually download 51567 tweets.

Train/Test 85/15 split all labels, no text normalization, minimum 500 examples per label

Labels with fewer than 500 examples were excluded.

Info Value
Train 44352
Test 6654
Labels(44) ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,und,ur,vi,zh-CN,zh-TW
Accuracy 0.85813046
Confusion Matrix
am ar bn ckb de el en es fa fr gu he hi hi-Latn hy id it ja ka km kn lo ml mr my ne nl pa pl ps pt ru sd si sr sv ta te th und ur vi zh-CN zh-TW N class %
en 0 0 0 0 2 0 173 1 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 1 0 0 0 205 0 0 0 0 388 0.445876
zh-TW 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0 7 58 102 0.568627
hi-Latn 0 0 0 0 0 0 2 1 0 0 0 0 0 55 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33 0 0 0 0 92 0.597826
id 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 61 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 0 0 98 0.622449
zh-CN 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 62 5 91 0.681319
Train/Test 85/15 split all labels, no text normalization but skip the und label

The und label marked undefined tweets which could match several languages.

Info Value
Train 34546
Test 5183
Labels(43) ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,ur,vi,zh-CN,zh-TW
Accuracy 0.9488713
Confusion Matrix
am ar bn ckb de el en es fa fr gu he hi hi-Latn hy id it ja ka km kn lo ml mr my ne nl pa pl ps pt ru sd si sr sv ta te th ur vi zh-CN zh-TW N class %
hi-Latn 0 0 0 0 0 0 11 6 0 0 0 0 1 71 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 92 0.771739
sd 0 15 0 0 0 0 1 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 0 0 0 0 0 0 0 0 0 0 102 0.813725
pt 0 0 0 0 1 0 3 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 107 0 0 0 0 0 0 0 0 0 0 0 0 127 0.842520
es 0 0 0 0 0 0 15 166 0 1 0 0 0 4 0 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 195 0.851282
fr 0 0 0 0 0 0 8 7 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 104 0.855769

This model can be found in the release directory if you want to try it yourself.

Train/Test 85/15 split all labels, with text normalization, minimum 500 examples per label

Ran a python script to attempt to normalize tweet text to remove markup, hashtags, etc.

Info Value
Train 44080
Test 6614
Labels(44) ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,und,ur,vi,zh-CN,zh-TW
Accuracy 0.86815846
Confusion Matrix
am ar bn ckb de el en es fa fr gu he hi hi-Latn hy id it ja ka km kn lo ml mr my ne nl pa pl ps pt ru sd si sr sv ta te th und ur vi zh-CN zh-TW N class %
en 0 0 0 0 2 0 172 1 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 0 2 0 0 1 0 0 0 1 0 0 0 203 0 0 0 0 388 0.443299
hi-Latn 0 1 0 0 0 0 1 2 0 0 0 0 0 60 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 24 0 0 0 0 92 0.652174
id 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 98 0.704082
zh-CN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 0 68 5 91 0.747253
es 0 0 0 0 0 0 5 148 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 36 0 0 0 0 195 0.758974
Train/Test 85/15 split all labels, with text normalization but skip the und label
Info Value
Train 34540
Test 5183
Labels(43) ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,ur,vi,zh-CN,zh-TW
Accuracy 0.95504534
Confusion Matrix
am ar bn ckb de el en es fa fr gu he hi hi-Latn hy id it ja ka km kn lo ml mr my ne nl pa pl ps pt ru sd si sr sv ta te th ur vi zh-CN zh-TW N class %
sd 0 12 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87 0 0 0 0 0 0 0 0 0 0 102 0.852941
pt 0 0 0 0 0 0 4 11 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 109 0 0 0 0 0 0 0 0 0 0 0 0 127 0.858268
hi-Latn 0 1 0 0 0 0 5 2 0 0 0 0 0 79 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 1 0 0 92 0.858696
zh-CN 0 0 0 0 0 0 3 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79 7 91 0.868132
zh-TW 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 89 102 0.872549

Recall

Train/Test 85/15 split all labels, no text normalization, minimum 500 examples per label

Initial data had 72000 of 87585 tweets from recall_oriented.

Info Value
Train 71196
Test 10682
Labels (67) am, ar, bg, bn, bo, bs, ca, ckb, cs, cy, da, de, dv, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hi-Latn, hr, ht, hu, hy, id, is, it, ja, ka, km, kn, ko, lo, lv, ml, mr, my, ne, nl, no, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, sr, sv, ta, te, th, tl, tr, uk, ur, vi, zh-CN, zh-TW
Accuracy 0.9246396
Confusion Matrix for 5 least accurate.
am ar bg bn bo bs ca ckb cs cy da de dv el en es et eu fa fi fr gu he hi hi-Latn hr ht hu hy id is it ja ka km kn ko lo lv ml mr my ne nl no pa pl ps pt ro ru sd si sk sl sr sv ta te th tl tr uk ur vi zh-CN zh-TW N class %
hr 0 0 1 0 0 62 0 0 0 0 0 1 0 0 3 0 0 0 0 2 0 0 0 0 0 136 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 6 2 0 0 0 0 2 0 0 0 0 0 0 221 0.615385
bs 0 0 0 0 0 133 0 0 0 0 0 0 0 0 0 0 0 3 0 1 1 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 12 1 0 0 1 0 1 0 0 0 0 0 0 214 0.621495
da 0 0 0 0 0 0 0 0 0 0 53 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79 0.670886
sr 0 0 2 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 2 97 0 0 0 0 1 0 0 0 0 0 0 132 0.734848
zh-TW 0 0 0 0 0 0 1 0 0 0 0 2 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 14 92 118 0.779661

See model in releases.

Train/Test 85/15 split all labels, with text normalization, minimum 500 examples per label

Initial data had 72000 of 87585 tweets from recall_oriented.

Info Value
Train 71187
Test 10680
Labels (67) am, ar, bg, bn, bo, bs, ca, ckb, cs, cy, da, de, dv, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hi-Latn, hr, ht, hu, hy, id, is, it, ja, ka, km, kn, ko, lo, lv, ml, mr, my, ne, nl, no, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, sr, sv, ta, te, th, tl, tr, uk, ur, vi, zh-CN, zh-TW
Accuracy 0.9238764
Confusion Matrix for 5 least accurate.
am ar bg bn bo bs ca ckb cs cy da de dv el en es et eu fa fi fr gu he hi hi-Latn hr ht hu hy id is it ja ka km kn ko lo lv ml mr my ne nl no pa pl ps pt ro ru sd si sk sl sr sv ta te th tl tr uk ur vi zh-CN zh-TW N class %
hr 0 0 0 0 0 78 0 0 0 0 0 0 0 0 0 0 2 0 0 2 0 1 0 0 1 116 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 10 5 0 0 0 0 1 0 0 0 0 0 0 221 0.524887
bs 0 0 0 0 0 125 0 0 1 0 0 0 0 0 0 0 0 1 0 2 1 0 0 0 0 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 3 14 12 0 0 1 0 1 0 0 0 0 0 0 214 0.584112
da 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 79 0.696203
zh-TW 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 30 82 117 0.700855
sr 0 0 1 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 2 98 0 0 0 0 1 0 0 0 0 0 0 132 0.742424

Uniform

Train/Test 85/15 split all labels, no text normalization, minimum 500 examples per label
Info Value
Train 76442
Test 11467
Labels (13) ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr, und
Accuracy 0.91174674
Confusion Matrix
ar en es fr id ja ko pt ru th tl tr und N class %
und 5 284 76 10 88 22 2 37 5 3 14 9 510 1065 0.478873
tl 0 13 1 0 2 0 0 1 0 0 60 0 3 80 0.750000
fr 0 19 2 185 0 1 0 0 0 0 1 0 2 210 0.880952
pt 0 20 38 0 0 0 0 588 0 0 1 0 12 659 0.892261
tr 0 5 1 0 5 0 0 1 0 0 1 146 0 159 0.918239
Train/Test 85/15 split all labels, no text normalization, minimum 500 examples per label, no und label
Info Value
Train 76442
Test 10402
Labels (12) ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr
Accuracy 0.9699096
Confusion Matrix
ar en es fr id ja ko pt ru th tl tr N class %
tl 0 14 0 0 4 0 0 1 0 0 61 0 80 0.762500
fr 0 23 1 183 0 2 0 0 0 0 1 0 210 0.871429
pt 0 26 41 0 0 0 0 591 0 0 1 0 659 0.896813
tr 0 5 1 0 3 0 0 2 0 0 1 147 159 0.924528
ko 0 4 1 1 1 1 99 0 0 0 0 0 107 0.925234
Train/Test 85/15 split all labels, with text normalization, minimum 500 examples per label
Info Value
Train 74854
Test 11228
Labels (13) ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr, und
Accuracy 0.9291058
Confusion Matrix
ar en es fr id ja ko pt ru th tl tr und N class %
und 7 139 75 11 87 25 4 32 8 2 19 12 406 827 0.490931
tl 0 6 0 0 3 0 0 0 0 0 67 0 4 80 0.837500
pt 0 7 40 0 0 0 0 605 0 0 1 0 6 659 0.918058
fr 0 8 1 194 0 0 0 2 0 0 0 1 4 210 0.923810
tr 0 4 1 0 3 0 0 0 0 0 1 148 2 159 0.930818
Train/Test 85/15 split all labels, with text normalization, minimum 500 examples per label, no und label
Info Value
Train 69338
Test 10401
Labels (12) ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr
Accuracy 0.9808672
Confusion Matrix
ar en es fr id ja ko pt ru th tl tr N class %
tl 0 9 0 0 4 0 0 0 0 0 67 0 80 0.837500
fr 0 11 2 194 0 0 0 2 0 0 0 1 210 0.923810
pt 0 9 38 0 1 1 0 609 0 0 1 0 659 0.924127
tr 0 3 1 0 4 0 0 0 0 0 1 150 159 0.943396
th 0 4 0 0 0 0 1 0 0 96 0 0 101 0.950495