-
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9e63ed9
commit 2abee4c
Showing
1 changed file
with
10 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Overview: | ||
OCR is implemented with the usage of a python module PyMuPDF, which parses a pdf upload and extracts text from it with very high accuracy. | ||
OCR should be called during each non-text file uploads. Along with the original uploaded file, upload a text file with the same pre-extension name that has the extracted text in it. | ||
Currently the OCR functionality is incomplete. | ||
|
||
Issues: | ||
Had some troubles installing and testing PyMuPDF with the python alpine version used. | ||
|
||
Solution: | ||
Installed the module from source with custom made wheels to import the package |