Unescaping HTML in Python

Posted by Michael Giarlo on August 01, 2008

Dear Future Me,

You've forgotten how to decode (or unescape) HTML or XML in Python again, haven't you?  My, my, that old age does catch up with you.

Well, it turns out that xml.sax.saxutils.unescape() works like a charm.  I'm certain that edge cases lurk here and there, so caveat, um, coder.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. gsf Fri, 01 Aug 2008 15:30:15 PDT

    Don\'t forget that, as noted at http://wiki.python.org/moin/EscapingXml, you can pass in additional entities like so:

    [pre]
    >>> unescape(\"' "\", {\"'\": \"\'\", \""\": \'\"\'})
    \'\\\' \"\'
    [/pre]

  2. Websites tagged "coder" on Postsaver Thu, 07 Aug 2008 23:45:14 PDT

    [...] - Unescaping HTML in Python saved by [...]

Comments