Skip to content

Commit f8ed5f5

Browse files
committed
chore: add teds score & fix setup package
1 parent 49ac513 commit f8ed5f5

File tree

5 files changed

+20
-15
lines changed

5 files changed

+20
-15
lines changed

README.md

+14-12
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
- 输入输出格式对齐RapidTable
2424
- 支持模型自动下载
2525
- 增加来自paddle的新表格分类模型
26+
- 增加最新PaddleX表格识别模型测评值
2627

2728
### 简介
2829
💖该仓库是用来对文档中表格做结构化识别的推理库,包括来自阿里读光有线和无线表格识别模型,llaipython(微信)贡献的有线表格模型,网易Qanything内置表格分类模型等。\
@@ -56,18 +57,19 @@
5657
Surya-Tabled 使用内置ocr模块,表格模型为行列识别模型,无法识别单元格合并,导致分数较低
5758

5859
| 方法 | TEDS | TEDS-only-structure |
59-
|:---------------------------------------------------------------------------------------------------------|:-----------:|:-------------------:|
60-
| [surya-tabled(--skip-detect)](https://github.com/VikParuchuri/tabled) | 0.33437 | 0.65865 |
61-
| [surya-tabled](https://github.com/VikParuchuri/tabled) | 0.33940 | 0.67103 |
62-
| [deepdoctection(table-transformer)](https://github.com/deepdoctection/deepdoctection?tab=readme-ov-file) | 0.59975 | 0.69918 |
63-
| [ppstructure_table_master](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.61606 | 0.73892 |
64-
| [ppsturcture_table_engine](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.67924 | 0.78653 |
65-
| [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) | 0.67310 | 0.81210 |
66-
| [RapidTable(SLANet)](https://github.com/RapidAI/RapidTable) | 0.71654 | 0.81067 |
67-
| table_cls + wired_table_rec v1 + lineless_table_rec | 0.75288 | 0.82574 |
68-
| table_cls + wired_table_rec v2 + lineless_table_rec | 0.77676 | 0.84580 |
69-
| [RapidTable(SLANet-plus)](https://github.com/RapidAI/RapidTable) | 0.84481 | 0.91369 |
70-
| [RapidTable(unitable)](https://github.com/RapidAI/RapidTable) | **0.86200** | **0.91813** |
60+
|:---------------------------------------------------------------------------------------------------------|:-----------:|:-----------------:|
61+
| [surya-tabled(--skip-detect)](https://github.com/VikParuchuri/tabled) | 0.33437 | 0.65865 |
62+
| [surya-tabled](https://github.com/VikParuchuri/tabled) | 0.33940 | 0.67103 |
63+
| [deepdoctection(table-transformer)](https://github.com/deepdoctection/deepdoctection?tab=readme-ov-file) | 0.59975 | 0.69918 |
64+
| [ppstructure_table_master](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.61606 | 0.73892 |
65+
| [ppsturcture_table_engine](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.67924 | 0.78653 |
66+
| [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) | 0.67310 | 0.81210 |
67+
| [RapidTable(SLANet)](https://github.com/RapidAI/RapidTable) | 0.71654 | 0.81067 |
68+
| table_cls + wired_table_rec v1 + lineless_table_rec | 0.75288 | 0.82574 |
69+
| table_cls + wired_table_rec v2 + lineless_table_rec | 0.77676 | 0.84580 |
70+
| [PaddleX(SLANetXt+RT-DERT)](https://github.com/PaddlePaddle/PaddleX) | 0.79900 | **0.92222** |
71+
| [RapidTable(SLANet-plus)](https://github.com/RapidAI/RapidTable) | 0.84481 | 0.91369 |
72+
| [RapidTable(unitable)](https://github.com/RapidAI/RapidTable) | **0.86200** | 0.91813 |
7173

7274
### 使用建议
7375
wired_table_rec_v2(有线表格精度最高): 通用场景有线表格(论文,杂志,期刊, 收据,单据,账单)

README_en.md

+3
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,10 @@ Surya-Tabled uses its built-in OCR module, which is a row-column recognition mod
6767
| [RapidTable(SLANet)](https://github.com/RapidAI/RapidTable) | 0.71654 | 0.81067 |
6868
| table_cls + wired_table_rec v1 + lineless_table_rec | 0.75288 | 0.82574 |
6969
| table_cls + wired_table_rec v2 + lineless_table_rec | 0.77676 | 0.84580 |
70+
| [PaddleX(SLANetXt+RT-DERT)](https://github.com/PaddlePaddle/PaddleX) | 0.79900 | **0.92222** |
7071
| [RapidTable(SLANet-plus)](https://github.com/RapidAI/RapidTable) | **0.84481** | **0.91369** |
72+
| [RapidTable(unitable)](https://github.com/RapidAI/RapidTable) | **0.86200** | 0.91813 |
73+
7174

7275
### Usage Recommendations
7376
wired_table_rec_v2 (highest precision for wired tables): General scenes for wired tables (papers, magazines, journals, receipts, invoices, bills)

setup_lineless.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def read_txt(txt_path: Union[Path, str]) -> List[str]:
5353
license="Apache-2.0",
5454
install_requires=read_txt("requirements.txt"),
5555
include_package_data=True,
56-
packages=[MODULE_NAME],
56+
packages=[MODULE_NAME, f"{MODULE_NAME}.utils"],
5757
keywords=["tsr,ocr,table-recognition"],
5858
classifiers=[
5959
"Programming Language :: Python :: 3.6",

setup_table_cls.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def read_txt(txt_path: Union[Path, str]) -> List[str]:
4646
license="Apache-2.0",
4747
install_requires=read_txt("requirements.txt"),
4848
include_package_data=True,
49-
packages=[MODULE_NAME],
49+
packages=[MODULE_NAME, f"{MODULE_NAME}.utils"],
5050
keywords=["table-classifier", "wired", "wireless", "table-recognition"],
5151
classifiers=[
5252
"Programming Language :: Python :: 3.6",

setup_wired.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def read_txt(txt_path: Union[Path, str]) -> List[str]:
5353
license="Apache-2.0",
5454
install_requires=read_txt("requirements.txt"),
5555
include_package_data=True,
56-
packages=[MODULE_NAME],
56+
packages=[MODULE_NAME, f"{MODULE_NAME}.utils"],
5757
keywords=["tsr,ocr,table-recognition"],
5858
classifiers=[
5959
"Programming Language :: Python :: 3.6",

0 commit comments

Comments
 (0)