Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textract failure #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

textract failure #1

wants to merge 1 commit into from

Conversation

btski
Copy link
Owner

@btski btski commented Sep 21, 2020

Attempted solution to textract error:

Traceback (most recent call last):
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 191, in <module>
    main()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 184, in main
    extract_genbank_loih(args)
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 83, in extract_genbank_loih
    gb_req.process_genbank_ids()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/GenBank.py", line 91, in process_genbank_ids
    self.extract_pubmed_records()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/GenBank.py", line 451, in extract_pubmed_records
    pubmed_req.get_pubmed_texts()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/PubMed.py", line 76, in get_pubmed_texts
    raw_text = extract_text_from_files(SUPPLEMENTAL_DATA_DIR+pmid)
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/utils.py", line 255, in extract_text_from_files
    supp_file_contents = {x:str(textract.process(join(pmcdir, x))).replace("\\n", "\n") for x in supp_files}
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/utils.py", line 255, in <dictcomp>
    supp_file_contents = {x:str(textract.process(join(pmcdir, x))).replace("\\n", "\n") for x in supp_files}
  File "/home/blake_inderski/.local/lib/python3.7/site-packages/textract/parsers/__init__.py", line 77, in process
    return parser.process(filename, encoding, **kwargs)
  File "/home/blake_inderski/.local/lib/python3.7/site-packages/textract/parsers/utils.py", line 47, in process
    unicode_string = self.decode(byte_string)
  File "/home/blake_inderski/.local/lib/python3.7/site-packages/textract/parsers/utils.py", line 65, in decode
    return text.decode(result['encoding'])
  File "/usr/lib/python3.7/encodings/cp1254.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 15031: character maps to <undefined>

New error:

Traceback (most recent call last):
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 191, in <module>
    main()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 184, in main
    extract_genbank_loih(args)
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/run.py", line 83, in extract_genbank_loih
    gb_req.process_genbank_ids()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/GenBank.py", line 91, in process_genbank_ids
    self.extract_pubmed_records()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/GenBank.py", line 451, in extract_pubmed_records
    pubmed_req.get_pubmed_texts()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/PubMed.py", line 85, in get_pubmed_texts
    pubmed_record.extract_entities()
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/PubMed.py", line 132, in extract_entities
    self.spans, self.raw_text = detect(doc_bioc, bioc_json=True)
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/ner/ner_utils.py", line 223, in detect
    doc_sents, doc_text, spans = tokenize_bioc(text)
  File "/mnt/c/Users/Blake Inderski/Documents/geoboost2-master/zodo/ner/ner_utils.py", line 310, in tokenize_bioc
    valid_sections = PMCOA_TYPES if doc_bioc["source"] == "PMC" else PM_TYPES
KeyError: 'source'

Attempted solution to textract error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant