开发者

best python lib to clean the tag (not safe), and keep the tag that i think safe

开发者 https://www.devze.com 2023-01-07 04:51 出处:网络
ex: i want to clean the \"script\" tag ,开发者_开发技巧but i want to keep the \'a\' tag , so what lib you using to do this .

ex: i want to clean the "script" tag ,开发者_开发技巧 but i want to keep the 'a' tag ,

so what lib you using to do this .

and i use jquery cleditor for WYSIWYG HTML editor , can it do this for me automatically ?

thanks


I have to do this automatically for a project of mine. The solution I have found is to use the Beautiful Soup module to extract the script tag (I also do this for style and form).

soup = BeautifulSoup(html_string, convertEntities=BeautifulSoup.HTML_ENTITIES)

scripts = soup.findAll('script')   # find and return a list of 'script' entities
for s in scripts:
    s.extract()   # remove it from the DOM completely

Then, you can have BeautifulSoup print out or save the html.


I suppose that BeautifulSoup should do the trick, here.

Actually, here's a question + answers that's exactly about that : Python HTML sanitizer / scrubber / filter


Another option, designed for sanitization, is html5lib.

Whatever you do, do not rely on an editor component to do it for you: That runs on the client, so could easily be manipulated to submit invalid or malicious HTML!

0

精彩评论

暂无评论...
验证码 换一张
取 消