-
Notifications
You must be signed in to change notification settings - Fork 7
Twitter Eval Confusion Matrices
Gordon Vidaver edited this page May 11, 2016
·
8 revisions
From Evaluating language identification performance
Note that the precision_oriented dataset had 69000 tweets but we only could actually download 51567 tweets.
Labels with fewer than 500 examples were excluded.
Info | Value |
---|---|
Train | 44352 |
Test | 6654 |
Labels(44) | ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,und,ur,vi,zh-CN,zh-TW |
Accuracy | 0.85813046 |
am | ar | bn | ckb | de | el | en | es | fa | fr | gu | he | hi | hi-Latn | hy | id | it | ja | ka | km | kn | lo | ml | mr | my | ne | nl | pa | pl | ps | pt | ru | sd | si | sr | sv | ta | te | th | und | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
en | 0 | 0 | 0 | 0 | 2 | 0 | 173 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 205 | 0 | 0 | 0 | 0 | 388 | 0.445876 |
zh-TW | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 0 | 0 | 7 | 58 | 102 | 0.568627 |
hi-Latn | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 55 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 0 | 0 | 0 | 0 | 92 | 0.597826 |
id | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 61 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 36 | 0 | 0 | 0 | 0 | 98 | 0.622449 |
zh-CN | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 0 | 0 | 62 | 5 | 91 | 0.681319 |
The und label marked undefined tweets which could match several languages.
Info | Value |
---|---|
Train | 34546 |
Test | 5183 |
Labels(43) | ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,ur,vi,zh-CN,zh-TW |
Accuracy | 0.9488713 |
am | ar | bn | ckb | de | el | en | es | fa | fr | gu | he | hi | hi-Latn | hy | id | it | ja | ka | km | kn | lo | ml | mr | my | ne | nl | pa | pl | ps | pt | ru | sd | si | sr | sv | ta | te | th | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hi-Latn | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 6 | 0 | 0 | 0 | 0 | 1 | 71 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 92 | 0.771739 |
sd | 0 | 15 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 83 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 102 | 0.813725 |
pt | 0 | 0 | 0 | 0 | 1 | 0 | 3 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 107 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 127 | 0.842520 |
es | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 166 | 0 | 1 | 0 | 0 | 0 | 4 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 195 | 0.851282 |
fr | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 7 | 0 | 89 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 104 | 0.855769 |
This model can be found in the release directory if you want to try it yourself.
Ran a python script to attempt to normalize tweet text to remove markup, hashtags, etc.
Info | Value |
---|---|
Train | 44080 |
Test | 6614 |
Labels(44) | ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,und,ur,vi,zh-CN,zh-TW |
Accuracy | 0.86815846 |
am | ar | bn | ckb | de | el | en | es | fa | fr | gu | he | hi | hi-Latn | hy | id | it | ja | ka | km | kn | lo | ml | mr | my | ne | nl | pa | pl | ps | pt | ru | sd | si | sr | sv | ta | te | th | und | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
en | 0 | 0 | 0 | 0 | 2 | 0 | 172 | 1 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 203 | 0 | 0 | 0 | 0 | 388 | 0.443299 |
hi-Latn | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 60 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 24 | 0 | 0 | 0 | 0 | 92 | 0.652174 |
id | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 0 | 0 | 0 | 0 | 98 | 0.704082 |
zh-CN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 0 | 0 | 68 | 5 | 91 | 0.747253 |
es | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 148 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 36 | 0 | 0 | 0 | 0 | 195 | 0.758974 |
Info | Value |
---|---|
Train | 34540 |
Test | 5183 |
Labels(43) | ar,bn,ckb,de,el,en,es,fa,fr,gu,he,hi,hi-Latn,hy,id,it,ja,ka,km,kn,lo,ml,mr,my,ne,nl,pa,pl,ps,pt,ru,sd,si,sr,sv,ta,te,th,ur,vi,zh-CN,zh-TW |
Accuracy | 0.95504534 |
am | ar | bn | ckb | de | el | en | es | fa | fr | gu | he | hi | hi-Latn | hy | id | it | ja | ka | km | kn | lo | ml | mr | my | ne | nl | pa | pl | ps | pt | ru | sd | si | sr | sv | ta | te | th | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sd | 0 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 87 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 102 | 0.852941 |
pt | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 109 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 127 | 0.858268 |
hi-Latn | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 2 | 0 | 0 | 0 | 0 | 0 | 79 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 92 | 0.858696 |
zh-CN | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 7 | 91 | 0.868132 |
zh-TW | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 89 | 102 | 0.872549 |
Initial data had 72000 of 87585 tweets from recall_oriented.
Info | Value |
---|---|
Train | 71196 |
Test | 10682 |
Labels (67) | am, ar, bg, bn, bo, bs, ca, ckb, cs, cy, da, de, dv, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hi-Latn, hr, ht, hu, hy, id, is, it, ja, ka, km, kn, ko, lo, lv, ml, mr, my, ne, nl, no, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, sr, sv, ta, te, th, tl, tr, uk, ur, vi, zh-CN, zh-TW |
Accuracy | 0.9246396 |
am | ar | bg | bn | bo | bs | ca | ckb | cs | cy | da | de | dv | el | en | es | et | eu | fa | fi | fr | gu | he | hi | hi-Latn | hr | ht | hu | hy | id | is | it | ja | ka | km | kn | ko | lo | lv | ml | mr | my | ne | nl | no | pa | pl | ps | pt | ro | ru | sd | si | sk | sl | sr | sv | ta | te | th | tl | tr | uk | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hr | 0 | 0 | 1 | 0 | 0 | 62 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 136 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 6 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 221 | 0.615385 |
bs | 0 | 0 | 0 | 0 | 0 | 133 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 59 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 12 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 214 | 0.621495 |
da | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 53 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 0.670886 |
sr | 0 | 0 | 2 | 0 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 97 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 132 | 0.734848 |
zh-TW | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 14 | 92 | 118 | 0.779661 |
See model in releases.
Initial data had 72000 of 87585 tweets from recall_oriented.
Info | Value |
---|---|
Train | 71187 |
Test | 10680 |
Labels (67) | am, ar, bg, bn, bo, bs, ca, ckb, cs, cy, da, de, dv, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hi-Latn, hr, ht, hu, hy, id, is, it, ja, ka, km, kn, ko, lo, lv, ml, mr, my, ne, nl, no, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, sr, sv, ta, te, th, tl, tr, uk, ur, vi, zh-CN, zh-TW |
Accuracy | 0.9238764 |
am | ar | bg | bn | bo | bs | ca | ckb | cs | cy | da | de | dv | el | en | es | et | eu | fa | fi | fr | gu | he | hi | hi-Latn | hr | ht | hu | hy | id | is | it | ja | ka | km | kn | ko | lo | lv | ml | mr | my | ne | nl | no | pa | pl | ps | pt | ro | ru | sd | si | sk | sl | sr | sv | ta | te | th | tl | tr | uk | ur | vi | zh-CN | zh-TW | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hr | 0 | 0 | 0 | 0 | 0 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 1 | 116 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 10 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 221 | 0.524887 |
bs | 0 | 0 | 0 | 0 | 0 | 125 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 3 | 14 | 12 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 214 | 0.584112 |
da | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 55 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 0.696203 |
zh-TW | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 30 | 82 | 117 | 0.700855 |
sr | 0 | 0 | 1 | 0 | 0 | 18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 98 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 132 | 0.742424 |
Info | Value |
---|---|
Train | 76442 |
Test | 11467 |
Labels (13) | ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr, und |
Accuracy | 0.91174674 |
ar | en | es | fr | id | ja | ko | pt | ru | th | tl | tr | und | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
und | 5 | 284 | 76 | 10 | 88 | 22 | 2 | 37 | 5 | 3 | 14 | 9 | 510 | 1065 | 0.478873 |
tl | 0 | 13 | 1 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 60 | 0 | 3 | 80 | 0.750000 |
fr | 0 | 19 | 2 | 185 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 210 | 0.880952 |
pt | 0 | 20 | 38 | 0 | 0 | 0 | 0 | 588 | 0 | 0 | 1 | 0 | 12 | 659 | 0.892261 |
tr | 0 | 5 | 1 | 0 | 5 | 0 | 0 | 1 | 0 | 0 | 1 | 146 | 0 | 159 | 0.918239 |
Train/Test 85/15 split all labels, no text normalization, minimum 500 examples per label, no und label
Info | Value |
---|---|
Train | 76442 |
Test | 10402 |
Labels (12) | ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr |
Accuracy | 0.9699096 |
ar | en | es | fr | id | ja | ko | pt | ru | th | tl | tr | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tl | 0 | 14 | 0 | 0 | 4 | 0 | 0 | 1 | 0 | 0 | 61 | 0 | 80 | 0.762500 |
fr | 0 | 23 | 1 | 183 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 210 | 0.871429 |
pt | 0 | 26 | 41 | 0 | 0 | 0 | 0 | 591 | 0 | 0 | 1 | 0 | 659 | 0.896813 |
tr | 0 | 5 | 1 | 0 | 3 | 0 | 0 | 2 | 0 | 0 | 1 | 147 | 159 | 0.924528 |
ko | 0 | 4 | 1 | 1 | 1 | 1 | 99 | 0 | 0 | 0 | 0 | 0 | 107 | 0.925234 |
Info | Value |
---|---|
Train | 74854 |
Test | 11228 |
Labels (13) | ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr, und |
Accuracy | 0.9291058 |
ar | en | es | fr | id | ja | ko | pt | ru | th | tl | tr | und | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
und | 7 | 139 | 75 | 11 | 87 | 25 | 4 | 32 | 8 | 2 | 19 | 12 | 406 | 827 | 0.490931 |
tl | 0 | 6 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 67 | 0 | 4 | 80 | 0.837500 |
pt | 0 | 7 | 40 | 0 | 0 | 0 | 0 | 605 | 0 | 0 | 1 | 0 | 6 | 659 | 0.918058 |
fr | 0 | 8 | 1 | 194 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 4 | 210 | 0.923810 |
tr | 0 | 4 | 1 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 148 | 2 | 159 | 0.930818 |
Train/Test 85/15 split all labels, with text normalization, minimum 500 examples per label, no und label
Info | Value |
---|---|
Train | 69338 |
Test | 10401 |
Labels (12) | ar, en, es, fr, id, ja, ko, pt, ru, th, tl, tr |
Accuracy | 0.9808672 |
ar | en | es | fr | id | ja | ko | pt | ru | th | tl | tr | N | class % | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tl | 0 | 9 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 67 | 0 | 80 | 0.837500 |
fr | 0 | 11 | 2 | 194 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 210 | 0.923810 |
pt | 0 | 9 | 38 | 0 | 1 | 1 | 0 | 609 | 0 | 0 | 1 | 0 | 659 | 0.924127 |
tr | 0 | 3 | 1 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 1 | 150 | 159 | 0.943396 |
th | 0 | 4 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 96 | 0 | 0 | 101 | 0.950495 |