Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QOL-9113] ensure that DOCX files are sniffed correctly #88

Merged
merged 2 commits into from
Jul 18, 2022

Conversation

ThrawnCA
Copy link
Contributor

Fixes #87

ThrawnCA added 2 commits July 19, 2022 07:14
- use filename first, if we recognise it
- If filename didn't help, then sniff 2048 bytes instead of 512 (matching ckanext-resource-type-validation)
self.mimetype = resource['mimetype'] = mime.from_buffer(self.upload_file.read(512))
try:
# Get type from file extension if we recognise it
self.mimetype = mimetypes.guess_type(self.filename, strict=False)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if someone changes the suffix, should we also not be checking that the file is 'valid' internally as well?

looking at
https://github.com/ahupp/python-magic/blob/53aa709affef18cb2d1c5930f780dabfc5e4330c/README.md

recommend using at least the first 2048 bytes, as less can produce incorrect identification

https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type

One of the good security models we have is that you can't upload an exe under a pdf or csv etc. Should we be removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in ckanext-resource-type-validation, not here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok then, this is just setting the correct type. maybe added a comment in the code that this is for type setting, for file type validation, please ensure ckanext-resource-type-validation is added.

Copy link
Member

@duttonw duttonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more tests please.

return helpers.call_action(
'resource_create',
package_id=dataset['id'],
upload=FlaskFileStorage(io.open(file_path, 'rb')),
url='data.csv')
url=filename)

def test_resource_upload(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a confusion test, where someone tries to upload a pdf that is actually a csv, and a csv as a pdf.

I feel we need to ensure we are file safe for pdf, csv, xls/xlsx as these get opened automatically in the browser (pdf) and xloader/validation plugin tries to read said data and we don't want to allow low probability ddos or other side attacks (or other events on other sites getting files form us) in allowing non valid file's into our storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusion test would still be good, that file type ending is used.
and if ending is missed, it goes through the mime magic and does it best (i.e if for some reason file type is stripped or mangled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests exist in ckanext-resource-type-validation, which attempts to cover all the bases for mismatched files and types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many bytes does the resource type validation take in now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThrawnCA ThrawnCA merged commit 9bf7d19 into develop Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants