-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QOL-9113] ensure that DOCX files are sniffed correctly #88
Conversation
- use filename first, if we recognise it - If filename didn't help, then sniff 2048 bytes instead of 512 (matching ckanext-resource-type-validation)
self.mimetype = resource['mimetype'] = mime.from_buffer(self.upload_file.read(512)) | ||
try: | ||
# Get type from file extension if we recognise it | ||
self.mimetype = mimetypes.guess_type(self.filename, strict=False)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if someone changes the suffix, should we also not be checking that the file is 'valid' internally as well?
looking at
https://github.com/ahupp/python-magic/blob/53aa709affef18cb2d1c5930f780dabfc5e4330c/README.md
recommend using at least the first 2048 bytes, as less can produce incorrect identification
https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type
One of the good security models we have is that you can't upload an exe under a pdf or csv etc. Should we be removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's in ckanext-resource-type-validation, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok then, this is just setting the correct type. maybe added a comment in the code that this is for type setting, for file type validation, please ensure ckanext-resource-type-validation is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more tests please.
return helpers.call_action( | ||
'resource_create', | ||
package_id=dataset['id'], | ||
upload=FlaskFileStorage(io.open(file_path, 'rb')), | ||
url='data.csv') | ||
url=filename) | ||
|
||
def test_resource_upload(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a confusion test, where someone tries to upload a pdf that is actually a csv, and a csv as a pdf.
I feel we need to ensure we are file safe for pdf, csv, xls/xlsx as these get opened automatically in the browser (pdf) and xloader/validation plugin tries to read said data and we don't want to allow low probability ddos or other side attacks (or other events on other sites getting files form us) in allowing non valid file's into our storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusion test would still be good, that file type ending is used.
and if ending is missed, it goes through the mime magic and does it best (i.e if for some reason file type is stripped or mangled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests exist in ckanext-resource-type-validation
, which attempts to cover all the bases for mismatched files and types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many bytes does the resource type validation take in now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #87