Skip to content

ldsyou92/python-html-purifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python HTML purifier

About

Cuts the tags and attributes from HTML that are not in the whitelist. Their content is leaves. Signature of whitelist:

{
    'enabled tag name' : ['list of enabled tag\'s attributes']
}

You can use the symbol * to allow all tags and/or attributes.

Note that the script and style tags are removed with content.

This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.

Part info in my blog

Package on PyPi

Installation

$ pip install html-purifier

Basic Usage

>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
    'div': ['*'], # разрешает все атрибуты у тега div
    'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span
    # все остальные теги удаляются, но их содержимое остается
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>

Django Usage

As usual used in models and forms. Here is purifier.models.PurifyedCharField, purifier.models.PurifyedTextField for Django ORM and purifier.forms.PurifyedCharField for Django forms

About

Purify HTML string

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 96.7%
  • Python 3.3%