You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider also giving the LLM the coarsely extracted text from the page using a traditional PDF-to-markdown non-OCR package, e.g. pymupdf4llm, pdf2markdown4llm, etc. The idea is that this could help the LLM ground its OCR response using the given rough text. The prompt will then have to be updated accordingly to ask the LLM to consider both inputs. This can then be an optional enum keyword argument, disabled by default.
You don't have to then bundle the traditional package itself as a hard requirement of aipdf. If a user wants to use it, the user can include it in their personal package requirements. aipdf will just be an integrator.
The text was updated successfully, but these errors were encountered:
Consider also giving the LLM the coarsely extracted text from the page using a traditional PDF-to-markdown non-OCR package, e.g. pymupdf4llm, pdf2markdown4llm, etc. The idea is that this could help the LLM ground its OCR response using the given rough text. The prompt will then have to be updated accordingly to ask the LLM to consider both inputs. This can then be an optional enum keyword argument, disabled by default.
You don't have to then bundle the traditional package itself as a hard requirement of
aipdf
. If a user wants to use it, the user can include it in their personal package requirements.aipdf
will just be an integrator.The text was updated successfully, but these errors were encountered: