首页百科大全正文内容

【文献阅读】Hybrid model for Chinese character recognition based on Tesseract

百科大全

更新时间：2025-10-05 12:54:2245

admin管理员组
文章数量:1794759

【文献阅读】Hybrid model for Chinese character recognition based on Tesseract

总结：openCV(image preprocessing)+KNN(phrase processing)+Tesseract-OCR engine 个人感觉此篇论文质量不高，实验细节未论述，实验结果没有统计分析，言辞重复，存在低级错误

Introduction Chinese OCR is more difficult

The number of English letters is only 26. But the number of Chinese characters that used commonly are about 2,500.
the strokes of Chinese characters are complex and similar.
The differences between the different fonts of Chinese are large.

OCR engines Tesseract-OCR engine

the first OCR engine, supports more than 100 languages (tesseract- ocr/tessdata, github/tesseract-ocr/tessdata).
The OCR engine of Tesseract- version 4.0 uses Long Short-Term Memory (LSTM).
In the Tesseract-OCR Simplified Chinese language library,the character recognition of separate words is based on the feature of standard Chinese characters.

OCRopus

also a OCR engine based on LSTM.

Ocular OCR engine

mostly uses the recognition of historical artefact.

Swift OCR

is a simple and fast OCR, Written in Swift.

Simple-ocr-openCV

is a simple python OCR engine based on OpenCV and NumPy

Background Process of OCR

The main work of this study includes image preprocessing and phrase processing.

2.4.1 Image preprocessing

The methods of image preprocessing include binarisation, noise reduction, image tilt correction, and the like

3 OCR Hybrid recognition model 3.1 Image correction

3.2 KNN phrase detection and correction

本文标签：文献 Model Hybrid Chinese Tesseract

版权声明：本文标题：【文献阅读】Hybrid model for Chinese character recognition based on Tesseract 内容由林淑君副主任自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.xiehuijuan.com/baike/1686990120a126093.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。