Cuts the tags and attributes from HTML that are not in the whitelist. Their content is leaves. Signature of whitelist:
{
'enabled tag name' : ['list of enabled tag\'s attributes']
}
You can use the symbol *
to allow all tags and/or attributes.
Note that the script
and style
tags are removed with content.
This module is based on HTMLParser Class - in the standard Python package. There are no other dependencies, which can sometimes be a plus.
$ pip install html-purifier
>>> from purifier.purifier import HTMLPurifier
>>> purifier = HTMLPurifier({
'div': ['*'], # разрешает все атрибуты у тега div
'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span
# все остальные теги удаляются, но их содержимое остается
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>
As usual used in models and forms.
Here is purifier.models.PurifyedCharField
, purifier.models.PurifyedTextField
for Django ORM and purifier.forms.PurifyedCharField
for Django forms