Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #87 Make sanitization happen in Post.htmlize() #88

Merged
merged 5 commits into from
Jun 12, 2017
Merged

Issue #87 Make sanitization happen in Post.htmlize() #88

merged 5 commits into from
Jun 12, 2017

Conversation

janetriley
Copy link
Collaborator

  • Made utils/text. Moved sanitize functions, constants, and tests there.
  • Post.htmlize() accepts and applies a sanitization param
  • Export passes along the level to to Post.

* Made utils/text. Moved sanitize functions, constants, and tests there.
* Post.htmlize() accepts and applies a sanitization param
* Export passes along the level to to Post.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 71.165% when pulling 1507ffc on janetriley:feature-87_extract_html_sanitizer into b89caa2 on DistrictDataLabs:develop.

Copy link
Member

@bbengfort bbengfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janetriley thank you so much, you're absolutely right, this was a good thing to do and I'm glad to have this functionality. I had some questions and comments below, mind taking a gander at them and letting me know your thoughts? Then I can approve/merge.

baleen/models.py Outdated
"""
return self.content

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like what you've done here, it is extremely sensible. However, I'd ask for one minor change so that this is not a complete passthrough to the sanitize function. What would you think about changing the signature to accept None as the argument to level and changing the level argument to sanitize, e.g.:

def htmlize(self, sanitize=None):
    """
    ... 
    
    :param sanitize: str or None, default None 
        Specify a level to sanitize the HTML. If ``None``, then no HTML sanitization will 
        be conducted. Otherwise, specify ``'raw'``, ``'safe'``, or ``'text'`` to sanitize 
        the HTML at different levels. See ref::``sanitize_html`` for more. 
    """
    if sanitize:
        return sanitize_html(html=self.content, level=sanitize)
    return self.content 

I think this makes it a bit more understandable to users, what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You betcha.

)


RAW = 'raw'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These module constants should be hoisted to the top of the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import unittest

from mongomock import MongoClient as MockMongoClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need MockMongoClient here?


from baleen.exceptions import ExportError
from baleen.export import MongoExporter
from baleen.models import connect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports are not needed.

from baleen.export import MongoExporter
from baleen.models import connect
from baleen.utils.text import sanitize_html, RAW, SAFE, TEXT
from tests.test_export import CATEGORIES_IN_DB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is not needed.

@@ -254,57 +247,3 @@ def test_export_with_category_path_failure(self):
exporter.export()


class SanitizeHtmlTests(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It absolutely makes sense to remove these tests, should we have a test for Post.htmlize()? It will just be mocking sanitize_html and making sure it is called with the right arguments. If you think that's overkill though I understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's verging on overkill but I like making percentages go up. Let me look at it this weekend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, ok -- well do you want me to merge this PR and make a new one, or would you rather keep this PR open until you write the tests?

BTW - do you see a button at the bottom entitled "squash and merge", once everything is reviewed and CI approved, please feel free to merge yourself if you do see that button.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this and take up the test in another feature, possibly for #89 .

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling bfafd43 on janetriley:feature-87_extract_html_sanitizer into b89caa2 on DistrictDataLabs:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling bfafd43 on janetriley:feature-87_extract_html_sanitizer into b89caa2 on DistrictDataLabs:develop.

* Add docstrings to the top
* move constants to top
* rename _get_ methods to get_  - they're not hidden inside a class, may as well make them available directly to those that want it
@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling 00415a3 on janetriley:feature-87_extract_html_sanitizer into b89caa2 on DistrictDataLabs:develop.

@janetriley
Copy link
Collaborator Author

@bbengfort, a couple questions:
text_utils.sanitize_html() raises a ValueError rather than ExportError. Should the export function catch that and re-raise it as an ExportError?

How do you generate the ID numbers at the opening comments, like line 10 in utils/timez.py?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling 6eb8e58 on janetriley:feature-87_extract_html_sanitizer into b89caa2 on DistrictDataLabs:develop.

Copy link
Member

@bbengfort bbengfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - nice feature!

@janetriley janetriley merged commit 88d5d7c into DistrictDataLabs:develop Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants