Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Content-Type #6

Open
acabrol opened this issue Jan 11, 2018 · 1 comment
Open

Wrong Content-Type #6

acabrol opened this issue Jan 11, 2018 · 1 comment
Assignees
Labels

Comments

@acabrol
Copy link
Contributor

acabrol commented Jan 11, 2018

DFM doesn't get the correct content type for some documents.

Here under an example:

CURL request:

curl -I -XGET https://arxiv.org/pdf/1801.01681v1.pdf
HTTP/1.1 200 OK
Date: Thu, 11 Jan 2018 08:38:34 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000
Set-Cookie: browser=86.250.248.55.1515659914652413; path=/; max-age=946080000; domain=.arxiv.org
Last-Modified: Mon, 08 Jan 2018 01:42:36 GMT
ETag: "16b79425-213180-56239e9adddf8"
Accept-Ranges: bytes
Content-Length: 2175360
Access-Control-Allow-Origin: *
Content-Type: application/pdf

DFM Log:

DEBUG in feed [cybersecurity-dfm/dfm/feed.py:572]:
Content-Type:text/html; charset=utf-8 url:https://arxiv.org/pdf/1801.01681v1.pdf
@acabrol
Copy link
Contributor Author

acabrol commented Jun 17, 2018

As work around pdf mime type is forced when ".pdf" is included in the link.

However for arxiv the pdf files seem to be non standard format:
ShellError: The command pdftotext /tmp/tmp87ul1x -failed with exit code 1 ------------- stdout ------------- ------------- stderr ------------- Syntax Warning: May not be a PDF file (continuing anyway) Syntax Error (2): Illegal character <21> in hex string Syntax Error (4): Illegal character <4f> in hex string Syntax Error (6): Illegal character <54> in hex string Syntax Error (7): Illegal character <59> in hex string Syntax Error (8): Illegal character <50> in hex string Syntax Error (11): Illegal character <48> in hex string Syntax Error (12): Illegal character <54> in hex string Syntax Error (13): Illegal character <4d> in hex string Syntax Error (14): Illegal character <4c> in hex string Syntax Error (16): Illegal character <50> in hex string Syntax Error (17): Illegal character <55> in hex string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants