tesseract: Tesseract Open Source OCR Engine

1225 shaares

Filters

Links per page

20 50 100

See also:

OCR PDFs and images directly in your browser
https://github.com/allenai/olmocr olmOCR is an open-source tool designed for high-throughput conversion of PDFs and other documents into plain text while preserving natural reading order. It supports tables, equations, handwriting, and more. demo
https://github.com/VikParuchuri/surya 一个开源的文档 OCR 工具包，能够在 90 多种语言中进行文本识别，并且在性能上与云服务相比具有优势。它支持文本检测、版面分析、阅读顺序和表格识别等功能，适用于各种文档，包括图像、PDF、Word 文档和 PowerPoint 演示文稿。surya 提供了一个交互式的 Streamlit 应用程序，用户可以通过它来试验 OCR 功能。此外，还有一个托管的 API 服务，用于处理各种类型的文件。

Filters

Links per page

20 50 100