Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'encode' with load_file #390

Open
umaplehurst opened this issue Sep 4, 2024 · 1 comment
Labels

Comments

@umaplehurst
Copy link

Bug Report

Since v0.12.0 I seem to get this sort of backtrace when loading certain .pdf files:

  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 41, in load_file
    return load(in_file, pdf_file_path=path_to_file, la_params=la_params, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 75, in load
    for page in extract_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\high_level.py", line 197, in extract_pages
    for page in PDFPage.get_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfpage.py", line 151, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 744, in __init__
    self._initialize_password(password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 771, in _initialize_password
    handler = factory(docid, param, password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 358, in __init__
    self.init()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 366, in init
    self.init_key()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 379, in init_key
    self.key = self.authenticate(self.password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 428, in authenticate
    password_bytes = password.encode("latin1")
AttributeError: 'NoneType' object has no attribute 'encode'

Not sure why it only happens with certain files -- has to hit if "Encrypt" in trailer: in pdfdocument.py of pdfminer.six which only happens with certain files? -- but < v0.12.0 is fine. The problem seems to be with: password: str = None that was added in py_pdf_parser/loaders.py for load(...) as part of 02f92ce. I guess this needs to be changed to password: str = "" to match what pdfminer.six has as its default (see pdfpage.py, get_pages) and then everything should be fine again.

@umaplehurst umaplehurst added the bug label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
@umaplehurst and others