Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tests #6

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
57 changes: 57 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import os
import sys
import unittest

from chardet.universaldetector import UniversalDetector


class TestCase(unittest.TestCase):
def __init__(self, file_name, encoding):
unittest.TestCase.__init__(self)
self.file_name = file_name
encoding = encoding.lower()
for postfix in [
'-arabic',
'-bulgarian',
'-cyrillic',
'-greek',
'-hebrew',
'-hungarian',
'-turkish',
]:
if encoding.endswith(postfix):
encoding, _, _ = encoding.rpartition(postfix)
self.encoding = encoding

def runTest(self):
u = UniversalDetector()
for line in open(self.file_name, 'rb'):
u.feed(line)
if u.done:
break
u.close()
self.assertEqual(u.result['encoding'].lower(), self.encoding,
"Expected %s, but got %r in %s" % (
self.encoding, u.result, self.file_name))


def main():
suite = unittest.TestSuite()
if len(sys.argv) > 1:
base_path = sys.argv[1]
else:
base_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'tests')
for encoding in os.listdir(base_path):
path = os.path.join(base_path, encoding)
if not os.path.isdir(path):
continue
for file_name in os.listdir(path):
_, ext = os.path.splitext(file_name)
if ext not in ['.html', '.txt', '.xml']:
continue
suite.addTest(TestCase(os.path.join(path, file_name), encoding))
unittest.TextTestRunner().run(suite)


main()
325 changes: 325 additions & 0 deletions tests/Big5/0804.blogspot.com.xml

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions tests/Big5/_chromium_Big5_with_no_encoding_specified.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<html>
<head>
<title> Big5 </title>
</head>

<body>
�x�_�ݦu�ҬQ�ѩ�d�e�`�γ����󪺩Ъ١A�ިӫ󪺱j�P�����C����Ҥ�Q�ΥL���ٷ|�Ȯɹ�L½�c���d�A�h�ìO�Ө뱴�L�����p�A�L�w�z�L�߮v�V�_�Ҫ��F�u�Y����ij�v�C

�x�_�ݦu�ҬQ�ѥߧY��M�A�u�d�Сv�O�Ҧ椽�ơA�å��b������Ф����ҿת��u½�c���d�v�F��d���G���o�{�H�W���~�A�]���a��������A�٩Ф����ʵ������v�A�@�����������i�d�C

�_�ҰƩҪ����j�˱j�աA�����@���e�H�w���A�C�ѳ��|���w�ɡB���w�I��d�٩СA�Q�ѤU�ȩ�d���e�`�Ϊ٩ЮɡA�]�����󥿻P�ߩe�\�����S�O�����A���۰��W�S�O�߮v�|���A�@�ɨӤ��Χi���������d���ơC

�߮v�G���s�h���A�������ij�u�d�Сv�I�����p�A��ߧ��G��ѷ|�n���Υ��ܡA�Q�ѥL�]��������ѭ��𨾸�ƥ浹��A�קK���e�n���F�L�ë�ij����Q�E��X�x�����W�f�n�A���F�u�I�q��ij�v�A������ܷ|�Ҽ{�C
</body>
</html>

14 changes: 14 additions & 0 deletions tests/Big5/_ude_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
����j��]Wikipedia�^�̡A������¦�F���ѤU���B�|�����B���H�ӡA�Ѧʬ�j�C�l�@�̡A����C�����|�]�C

��F�G�����~�Q�G��ܤ@�A�Τv���~�G��Q�E�A���U��y���G�ʤ��Q�A�X�O�C�ʸU�ءF�G�Q�j�����K���A��^�����L�G�ʸU�C�x��D�ѤU���Ӧ@���Ӧ��F���N�U���A�X�����B�Hġ�@�A�j��_�j�C

����@���A�X�j�ոܺ���C�m���l�����n��J�u���Aô�]�v�A�m����n��J�u��A��l�]�v�A��H�X���A���Nô�����l�]�C��´�����A�H���ڥ��A�����]�C

�j��o���A�P�D�ݵo�F�p�s�D�B�Х��B�����B�ӾǡB�y���B��w���C����ئh�i�w�\�ש��s�������F�\�����׽֡A�ѱo�Pġ��A����_��ɨ�a��H�A�M�h��Ҹ��չ�A���K���G�C

�Z���򤧵��A�Ҿڪ̡A�ꭲ���ۥѤ��ɳ\�i��ij�A�G�i�ۥѼs�ǤѤU�աC

�娥����l������~�C�i�A�����o��G�d�@�ʤK�Q�E�C

commons:����
�n�H�M�T�A��������@�ɡJ����j��C
296 changes: 296 additions & 0 deletions tests/Big5/blog.worren.net.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
<?xml version="1.0" encoding="Big5"?>
<!--
Source: http://blog.worren.net/wp-atom.php
Expect: Big5
-->
<feed version="0.3"
xmlns="http://purl.org/atom/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xml:lang="en">
<title>Worren's Blog</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net" />
<tagline>Worren's Blog</tagline>
<modified>2005-12-27T11:12:12Z</modified>
<copyright>Copyright 2005</copyright>
<generator url="http://wordpress.org/" version="1.2.1">WordPress</generator>

<entry>
<author>
<name>Worren</name>
</author>
<title>Keil C Compiler ���_���g�k</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=80" />
<id>http://blog.worren.net?p=80</id>
<modified>2005-12-27T11:11:09Z</modified>
<issued>2005-12-27T11:09:50Z</issued>

<dc:subject>LCD/�q�l�q�� �⥾</dc:subject>
<dc:subject>8051 Note</dc:subject> <summary type="text/html" mode="escaped">[���] http://ehome.hifly.to/showthread.php?threadid=1972


static void TF0_ISR(void) interrupt TF0_VECTOR using REG_BANK_1;


interrupt �᭱���ۤ@��number�A�o��number�N��8051���@�Ӥ��_�C

�Ьd�A���{�����@TF0_VECTOR �w�q��number�A�M��dKeil C ��menu��interrupt number������A�Y���O���@�Ӥ��_�C

8051��register���|��bank�Ausing�᭱�O���winterrupt routine �n�Ψ��@�� bank�C�@�ˬdREG_BANK_1���w�q�Y���n�ϥΨ��@�� bank�C

�q�w�q���W�٦r�q�Ӭ�
TF0_VECTOR �@���Otimer 0
REG_BANK_1�@���O 1 </summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>C ���򥻦줸�ާ@�Ÿ� vs byte��ƫ��A</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=79" />
<id>http://blog.worren.net?p=79</id>
<modified>2005-12-27T11:12:12Z</modified>
<issued>2005-12-27T11:07:56Z</issued>

<dc:subject>LCD/�q�l�q�� �⥾</dc:subject>
<dc:subject>8051 Note</dc:subject> <summary type="text/html" mode="escaped">i=24 , �H�G�i����� 0000000000011000
i=16, �H�G�i����� 0000000000010000


(1) &amp;#038; �B
0000000000011000
&amp;#038; 0000000000010000
------------------------------
0000000000010000=16

(2) | ��
0000000000010000
| 0000000000011000
------------------------------
0000000000011000=24

(3) ~ ��
~ 0000000000010000
-------------------------------
0000000000011000
1111111111100111 =-25

item 3 ...</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>haha ~ .. �ۺq����</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=78" />
<id>http://blog.worren.net?p=78</id>
<modified>2005-12-13T10:57:20Z</modified>
<issued>2005-12-13T10:56:53Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Life</dc:subject> <summary type="text/html" mode="escaped">���ѭn�i�J�M�ɤF! �����컡���ɨ�15�W, �J�����Ӥ��� ! ... �O�p�w����a�J��F !

���ѴN�n�ۨM�ɤF, ���Ѥ��M���ۧڤ��䪺�u�@ . �U�Z�ɶ���F, �b��EE �Q�פ@�q�������D , ���ɳB���Ѥj�q�����@�ݨ���ڸ�e, ��F�@�U�ڮ�e���j�O�� : �K~ .. ���ѧA�n�h���ɰ�~?���[�o�ζ�? �o�O�@�w�n����~~

ha ~ ... �B�����U���ʦh�H, �]��z�U�� , �٥H���L���|���D�ڳo�Ӥp�@ ~ .. ���G�L���L�e�}�l�b�j���ݤ��G�榳�ݨ�ڪ��W�l�N�Q"�K~?�o���O�ڮa���H��~~" .... ha ~ ..... ����o�L��|�O�o�ڳo�ܤ֦b�L��e�o�����H! ... ^_^"

���ѱߤW�������}��, �j���u���U�z�|�L�h���ڧa ! ... :P .... �B���n�ڱo�@�ӳ̨γy���� .. ~"~ .... pls ~..�ڳo�رo�ʪ��H���X�̨γy����~?�j���o��B�r�Ǥ~�����|�a! ... �i�H�o�̸Ӧ��y����! .... ...</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�βy�z��</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=77" />
<id>http://blog.worren.net?p=77</id>
<modified>2005-12-13T07:44:04Z</modified>
<issued>2005-12-13T07:44:04Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Heart</dc:subject> <summary type="text/html" mode="escaped">�q�B�� FW ���ڪ� mail �ݨ쪺 ....

�`�� start .......
���즳�@�Ѭݴβy�A�ک��M�⮩�F�@���������z�סG

�p�G�ڬO�ӥ����⪺�ܡA�`���Ӥ���y�����a�I

���ӭn��n�y�~���A�p�G�o��X���O�a�y�A

����ڷF���@�����ΩO�H
�`�� end ........

�H�M�H�������۳B�]�O ,
���Q�M�ҩd�����N���O���y���нm�M������
¾�����P��/�󳡪��N���O"�Ķ�"�����M�ۤv(������) ...
�Y�O��軡�F or ���F����O�ۤv���ΪA���� or ��`..
���... �I�q(������)�γ\�i�H���\�h�����n�����D�קK��!

�N���O��X�F�a�y, �Y�w�O�h���i��O�����y����, �]�i��O
�u�a�y����........

�C�Ӥ��P�զX����񳣦���S�w���۳B�Ҧ�, ���ɭԧڷ|���g�N
��X�a�y, �Y������o�ԤU ~... ���N���|������! �۹�a�Y���
�]�O��X�a�y�F ~ .... �ڤ]�n�A�צa�ԤU, ..... �����Y�੼����
�u�I, �������L�ˤj�������I(or���w��誺���I!) .... �@����
�|�ܦn go .... :)
</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�k�j18��?</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=75" />
<id>http://blog.worren.net?p=75</id>
<modified>2005-11-26T16:24:33Z</modified>
<issued>2005-11-26T16:13:36Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Heart</dc:subject> <summary type="text/html" mode="escaped">ha ~ .. ����F�]�Q�A�s�M�誺 VCD ,
(�٥H���OKTV�� ~...:P) ...


�ݤF�o���s�y��, �h�F�@�Ѧ������k�H��,
�]�⤧�e�����M�����F ! .... �[�W�o���q�n ~..
��, �k�j�u�O18�ܣ�! ... �ܬ��F, �����]���
���֤@�I(but �٬O���ܦh�d�s��!...@@a)

�M���ѽ�X�۪�"�A�O�ڪ��Ŭu" �ٯu�O���椧
�e������! ....MV�����y��ı�o�W�Ŭ��� ~ .. :P


���ѥh�������դͷ|...-"- ...�Q�P�ǩԥh��! ..
���e�Ѯv���q�ܵ��ڧڳ��Q��k�����F ~... ���G
�٬O�Q�P�ǩԥh�F ! ... :P ...btw, �O�X�۹�
�̰ߤ@�P����~... �o, �H�e�ٯu�O����_��, �N
�ܳ�ª��@�Ӥk��! �{�b ~...�z��, �X�~����, ��
�F�j��, �X�F���|.... �o���S�ܦ�, ���O��ۤW�o
�h�F�n�@���k�H��~...�o�]�O�M�کf�@�˧b�Ȧ檺!
but �N���Ӫ��D�Oԣ����, �u���D�o�ܦ�~ ..always
on line ! .. @@a ...�j���O�ߤ@�P����, ...</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�c�@�@���k</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=74" />
<id>http://blog.worren.net?p=74</id>
<modified>2005-11-26T16:12:45Z</modified>
<issued>2005-11-26T16:12:45Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Life</dc:subject> <summary type="text/html" mode="escaped">�{�b�S�b���c�@�@���k�x�W�� ....
�L�̱�n�i�R ~....���� ~.....
�L�׬O����y��, �𶢸�, �Q�k��,....�s�Φ糣�n�i�R !.. :P


�Y�ڦ��Ӱ��l���o����i�R�N�n�F ! ..... XD
�@���k�D����o�u�� ! ....

�i�O,...... �ڦۤv�]�ܥ� ~...
�i�R�k�����ӷ|�Q���~�]�a ! ....
ha ~ ... ���פ@�U�ۤv������a !

�� ~...���H��, �S�Q��ڬO�@�ӷ|�h�ݰ����@���H !
but �ڤ]���O�C������~...�����@��~...���M�N�O�̭�
���H�i�R���ܴN�ݰ� ! .... :P </summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�p�Ĥl�t</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=73" />
<id>http://blog.worren.net?p=73</id>
<modified>2005-11-26T16:12:04Z</modified>
<issued>2005-11-26T16:11:43Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Life</dc:subject> <summary type="text/html" mode="escaped">���X��� ~ .. �ڳ������D�ڨ��򦳤p�Ĥl�t !



���w�g�ѰO�O����ɭԤF, ���ӬO�ڰ������e(�t)�a ~?
�p�Ĥl, �S�O�O5���H�U���p�B�� ~.. ��ڦ��خ��߷P ~...
�F�~���p�Ĩ�ڮa��, �ݨ��, �q�`�W�Ĥ@��ƴN�O�e��!
�M��ڤW�e����, �N�O����a�W&amp;#038;�j��!

���ѥh�ǩn�a�M�n�ҳܤ@�M(�ƹ�W�O�]�������o�̧ˤ@�x�s�q��
���ѥh�w��!)... �ܨ�11�I�h, �ڨ���2�Ӥp�Ĥl���}�l�R�t�F~~...
�N���X�n���}�F ~....�Y�ϰs�٨S�ܧ��� ~.......

���G ~... ha ` ... �ڦb���fť��Ѥj(����)�b������n ~....
��ӬO�Q�n�ڵ��L��� ~...:P ..... ���M�ڴN�V�e��L��_��
���L���F�@�Ǧn�� ~..����"�����̨ĤF, ���|���������U�f�f...."
"�Ϋe�n���~.." ....

�L�~�֩�ڨ� ~... ha ~ ...


����o�O����ɭ� ~..�ڶ}�l���p�Ĥl�t ~.... ���ӬO�]���h�F���
��p�a ~ ...�q���ɭԭ���p�B�ͪ��𤣦A�O�@������, �����O��j
�H�@��, �b���, ���X�骺�p�B�ͨ���, �ٷ|����ڨ��W�� ~..�]�|�n�ک�
�L/�o�� ~ ....ha ~...�u�O���H�Q��! ... ���Ѩ���&amp;#038;�ǩn�a ~�j���ڪ�
�ɩѤ���֤F�ܦh(�ݫeXX ...</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�̷R���H�����o�̫�N�O�|�b�@�_ &#8230;</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=72" />
<id>http://blog.worren.net?p=72</id>
<modified>2005-11-23T16:29:18Z</modified>
<issued>2005-11-23T16:29:18Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Heart</dc:subject> <summary type="text/html" mode="escaped">
���P�ӵo

�@���ƴN�O�o��!
�̷R���H�����o�N�O�̫�b�@�_�����ӤH!
�γ\�O�ͩR��Q�����g��t�ۦA���`�]���L�����⪺����~

���h��~�Pı��, �A���_�����p��l, ���q��ӬO����a��! </summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�Ѯv, �藍�_ &#8230;</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=71" />
<id>http://blog.worren.net?p=71</id>
<modified>2005-11-22T02:15:42Z</modified>
<issued>2005-11-22T02:15:42Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Life</dc:subject> <summary type="text/html" mode="escaped">�ܻ��Q�� .... �ȥ�ɶ� ... �����S�̪��a�n�T�_ ! .. �z��, �֦b�o�خɭԥ��q�ܨӰ�!?
�ݸ��X? ... �S�ݹL! ���_��ť?... �n���ܼ��x! but �����D�O��!


���: �A�b��ı��?
��: �!
���: �A�Q�ѽ��Z��?
��: �S�� , �{�b�O�ȥ�ɶ�! �p���@�U�A�����ڰ�!
���: ��~~ ... �S���Y, �Ať�N�n�F~... �ժ���..........
��: ... ||| ...... �p... �O�Ѯv! �藍�_, �ڨSť�X��... ��~... ���󨺥��, �ڭn�ݧڪ� schedule .......
�Ѯv: ���, �ڬO�Ѯv��! �n��, ���A�n���ڦ^����! ... ���n�ѰO�K!

..........�ǤF, �����O�ǧ̦b�p������ ~ ... �j���ɶ������, ...</summary>
</entry>
<entry>
<author>
<name>Worren</name>
</author>
<title>�L�F�o�ӧ�,�N�S�o�ө��F?</title>
<link rel="alternate" type="text/html" href="http://blog.worren.net/index.php?p=70" />
<id>http://blog.worren.net?p=70</id>
<modified>2005-11-22T04:27:24Z</modified>
<issued>2005-11-20T16:13:02Z</issued>

<dc:subject>�@��</dc:subject>
<dc:subject>Life</dc:subject> <summary type="text/html" mode="escaped">�̪��@�ӪB�Ͳ�_�o�y��! ���ӬO���ڦb��L���o�y��!

���L, �ڷQ, ���Ӥ]�o��ڦۤv�����o�y��! ���|�@���Ȩ��F, �N���|�O�A���F! �٬O�ַQ�Q�O���Ƨa ! </summary>
</entry>
</feed>
Loading