Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
_parse_pdf_to_string: catch and ignore PDFException.
This function is a best effort attempt to get text from a PDF file - if the file is malformed (PDFSyntaxError), this function returns an empty string. However, there are other exceptions such as PDFPasswordIncorrect that can occur even if the file is well-formed. Although it would be better to handle these exceptions at a higher level, this is a temporary fix to allow training applications containing encrypted files to be rejected.
- Loading branch information