Skip to content

Commit 9884a60

Browse files
committed
docs: add flowchart and faq
1 parent dad3a9d commit 9884a60

File tree

2 files changed

+61
-171
lines changed

2 files changed

+61
-171
lines changed

README.md

Lines changed: 61 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<div align="center">
22
<div align="center">
3-
<h1><b>📊 Table Structure Recognition</b></h1>
3+
<h1><b>📊 表格结构识别</b></h1>
44
</div>
55
<a href=""><img src="https://img.shields.io/badge/Python->=3.6,<3.12-aff.svg"></a>
66
<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Mac%2C%20Win-pink.svg"></a>
@@ -10,51 +10,43 @@
1010
<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
1111
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
1212
<a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache 2.0-blue"></a>
13-
14-
[简体中文](./docs/README_zh.md) | English
1513
</div>
1614

17-
### Introduction
18-
19-
This repository is a library for structured recognition of tables in documents.
20-
It includes table recognition models from Paddle, Alibaba's DocLight wired and wireless table recognition models,
21-
wired table models contributed by others, and the built-in table classification model from NetEase QAnything.
22-
15+
### 简介
2316

17+
💖该仓库是用来对文档中表格做结构化识别的推理库,包括来自paddle的表格识别模型,
18+
阿里读光有线和无线表格识别模型,llaipython(微信)贡献的有线表格模型,网易Qanything内置表格分类模型等。
2419

25-
#### Features
26-
**Fast**: Uses ONNXRuntime as the inference engine, achieving 1-7 second inference times on CPU.
20+
#### 特点
21+
**** 采用ONNXRuntime作为推理引擎,cpu下单图推理1-7s
2722

28-
🎯 **Accurate**: Combines table type classification models to distinguish between wired and wireless tables, leading to more specialized tasks and higher accuracy.
23+
🎯 ****: 结合表格类型分类模型,区分有线表格,无线表格,任务更细分,精度更高
2924

30-
🛡️ **Stable**: Does not depend on any third-party training frameworks, uses specialized ONNX models, and completely solves memory leak issues.
25+
🛡️ ****: 不依赖任何第三方训练框架,采用onnx专项小模型, 彻底解决了内存泄露问题
3126

32-
### Results Demonstration
27+
### 效果展示
3328
<div align="center">
3429
<img src="https://github.com/RapidAI/TableStructureRec/releases/download/v0.0.0/demo_img_output.gif" alt="Demo" width="100%" height="100%">
3530
</div>
3631

3732
### 指标结果
38-
[TableRecognitionMetric](https://github.com/SWHL/TableRecognitionMetric)
39-
40-
[dataset](https://huggingface.co/datasets/SWHL/table_rec_test_dataset)
41-
42-
[Rapid OCR](https://github.com/RapidAI/RapidOCR)
43-
44-
| model |TEDS|
45-
|:---------------------------------------------------------------------------------------------------------------------------|:-|
46-
| lineless_table_rec |0.50054|
47-
| [RapidTable](https://github.com/RapidAI/RapidStructure/blob/b800b156015bf5cd6f5429295cdf48be682fd97e/docs/README_Table.md) |0.58786|
48-
| wired_table_rec v1 |0.70279|
49-
| table_cls + wired_table_rec v1 + lineless_table_rec |0.74692|
50-
| table_cls + wired_table_rec v2 + lineless_table_rec |0.80235|
51-
52-
### Install
33+
[TableRecognitionMetric 评测工具](https://github.com/SWHL/TableRecognitionMetric) [评测数据集](https://huggingface.co/datasets/SWHL/table_rec_test_dataset) [Rapid OCR](https://github.com/RapidAI/RapidOCR)
34+
35+
| 方法 | TEDS |
36+
|:---------------------------------------------------------------------------------------------------------------------------|:----:|
37+
| lineless_table_rec | 0.53561 |
38+
| [RapidTable](https://github.com/RapidAI/RapidStructure/blob/b800b156015bf5cd6f5429295cdf48be682fd97e/docs/README_Table.md) | 0.58786 |
39+
| wired_table_rec v1 | 0.70279 |
40+
| wired_table_rec v2 | 0.78007 |
41+
| table_cls + wired_table_rec v1 + lineless_table_rec | 0.74692 |
42+
| table_cls + wired_table_rec v2 + lineless_table_rec |0.80235|
43+
44+
### 安装
5345
``` python {linenos=table}
5446
pip install wired_table_rec lineless_table_rec table_cls
5547
```
5648

57-
### Quick Start
49+
### 快速使用
5850
``` python {linenos=table}
5951
import os
6052

@@ -88,38 +80,55 @@ print(f"elasp: {elasp}")
8880
# # 可视化 ocr 识别框
8981
# plot_rec_box(img_path, f"{output_dir}/ocr_box.jpg", ocr_res)
9082
```
91-
### TODO List
92-
- [ ] rotate img fix before rec
93-
- [ ] Increase dataset size
94-
- [ ] Lineless table rec optimization
95-
-
96-
### Acknowledgements
9783

98-
[PaddleOCR Table](https://github.com/PaddlePaddle/PaddleOCR/blob/4b17511491adcfd0f3e2970895d06814d1ce56cc/ppstructure/table/README_ch.md)
84+
## FAQ (Frequently Asked Questions)
85+
86+
1. **问:偏移的图片能够处理吗?**
87+
- 答:该项目暂时不支持偏移图片识别,请先修正图片,也欢迎提pr来解决这个问题。
88+
89+
2. **问:识别框丢失了内部文字信息**
90+
- 答:默认使用的rapidocr小模型,如果需要更高精度的效果,可以从 [模型列表](https://rapidai.github.io/RapidOCRDocs/model_list/#_1)
91+
下载更高精度的ocr模型,在执行时传入ocr_result即可
92+
93+
3. **问:模型支持 gpu 加速吗?**
94+
- 答:目前表格模型的推理非常快,有线表格在100ms级别,无线表格在500ms级别,
95+
主要耗时在ocr阶段,可以参考 [rapidocr_paddle](https://rapidai.github.io/RapidOCRDocs/install_usage/rapidocr_paddle/usage/#_3) 加速ocr识别过程
96+
97+
### TODO List
98+
- [ ] 识别前图片偏移修正
99+
- [ ] 增加数据集数量,增加更多评测对比
100+
- [ ] 优化无线表格模型
101+
102+
### 处理流程
103+
```mermaid
104+
flowchart TD
105+
A[/表格图片/] --> B([表格分类])
106+
B --> C([有线表格识别]) & D([无线表格识别]) --> E([文字识别 rapidocr_onnxruntime])
107+
E --> F[/html结构化输出/]
108+
```
99109

100-
[Cycle CenterNet](https://www.modelscope.cn/models/damo/cv_dla34_table-structure-recognition_cycle-centernet/summary)
110+
### 致谢
101111

102-
[LORE](https://www.modelscope.cn/models/damo/cv_resnet-transformer_table-structure-recognition_lore/summary)
112+
[PaddleOCR 表格识别](https://github.com/PaddlePaddle/PaddleOCR/blob/4b17511491adcfd0f3e2970895d06814d1ce56cc/ppstructure/table/README_ch.md)
103113

104-
[Qanything-RAG](https://github.com/netease-youdao/QAnything)
114+
[读光-表格结构识别-有线表格](https://www.modelscope.cn/models/damo/cv_dla34_table-structure-recognition_cycle-centernet/summary)
105115

106-
llaipython (WeChat, commercial support for table extraction) provides high-precision wired table models.
116+
[读光-表格结构识别-无线表格](https://www.modelscope.cn/models/damo/cv_resnet-transformer_table-structure-recognition_lore/summary)
107117

108-
### Contributing
118+
[Qanything-RAG](https://github.com/netease-youdao/QAnything)
109119

110-
Pull requests are welcome. For major changes, please open an issue first
111-
to discuss what you would like to change.
120+
非常感谢 llaipython(微信,提供全套有偿高精度表格提取) 提供高精度有线表格模型。
112121

113-
Please make sure to update tests as appropriate.
122+
### 贡献指南
114123

115-
### [Sponsor](https://rapidai.github.io/Knowledge-QA-LLM/docs/sponsor/)
124+
欢迎提交请求。对于重大更改,请先打开issue讨论您想要改变的内容。
116125

117-
If you want to sponsor the project, you can directly click the **Buy me a coffee** image, please write a note (e.g. your github account name) to facilitate adding to the sponsorship list below.
126+
请确保适当更新测试。
118127

119-
<div align="left">
120-
<a href="https://www.buymeacoffee.com/SWHL"><img src="https://raw.githubusercontent.com/RapidAI/.github/main/assets/buymeacoffe.png" width="30%" height="30%"></a>
121-
</div>
128+
### [赞助](https://rapidai.github.io/Knowledge-QA-LLM/docs/sponsor/)
129+
130+
如果您想要赞助该项目,可直接点击当前页最上面的Sponsor按钮,请写好备注(**您的Github账号名称**),方便添加到赞助列表中。
122131

123-
### License
132+
### 开源许可证
124133

125-
This project is released under the [Apache 2.0 license](https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE).
134+
该项目采用[Apache 2.0](https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE)开源许可证。

docs/README_zh.md

Lines changed: 0 additions & 119 deletions
This file was deleted.

0 commit comments

Comments
 (0)