EDIT: For future reference, I'm using non-xhtml content type definition <!html>
I'm creating a website using Django, and I'm trying to embed arbitrary json data in my pages to be used by client-side javascript code.
Let's say my json object is {"foo": "</script>"}. If I embed this directly,
<script type='text/javascript'>JSON={"foo": "</script>"};</script>
The first closes the json object. (also, it will make the site vulnerable to XSS, since this json object will be dynamically generated).
If I use django's HTML escape function, the resulting output is:
开发者_Go百科<script type='text/javascript'>JSON={"foo": "</script>"};</script>
and the browser cannot interpret the <script> tag.
The question I have here is,
- Which characters am i suppose to escape / not escape in this situation?
- Is there automated way to perform this in Python / django?
If you are using XHTML, you would be able to use entity references (<, >, &) to escape any string you want within <script>. You would not want to use a <![CDATA[...]]> section, because the sequence "]]>" can't be expressed within a CDATA section, and you would have to change the script to express ]]>.
But you're probably not using XHTML. If you're using regular HTML, the <script> tag acts somewhat like a CDATA section in XML, except that it has even more pitfalls. It ends with </script>. There are also arcane rules to allow <!-- document.write("<script>...</script>") --> (the comments and <script> opening tag must both be present for </script> to be passed through). The compromise that the HTML5 editors adopted for future browsers is described in HTML 5 tokenization and CDATA Escapes
I think the takeaway is that you must prevent </script> from occurring in your JSON, and to be safe you should also avoid <script>, <!--, and --> to prevent runaway comments or script tags. I think it's easiest just to replace < with \u003c and --> with --\>
I tried backslash escaping the forward slash and that seems to work:
<script type='text/javascript'>JSON={"foo": "<\/script>"};</script>
have you tried that?
On a side note, I am surprised that the embedded </script> tag in a string breaks the javascript. Couldn't believe it at first but tested in Chrome and Firefox.
I would do something like this:
<script type='text/javascript'>JSON={"foo": "</" + "script>"};</script>
For this case in python, I have opened a bug in the bug tracker. However the rules are indeed complicated, as <!-- and <script> play together in quite evil ways even in the adopted html5 parsing rules. BTW, ">" is not a valid JSON escape, so it would better be replaced with "\u003E", thus the absolutely safe escaping should be to escape \u003C and \u003E AND a couple other evil characters mentioned in the python bug...
加载中,请稍侯......
精彩评论