If you’re like me, than you’re starting to get tired of turning XHTML into entity-encoded markup, suitable for <code> examples. Today I thought I’d whip up a quick Perl script that will take ordinary markup, process it and do all of the turning of < to &lt; and > to &gt; — for you.

It will also recognize double spaces (to nicely format nesting) and turn those into “&#160;&#160;” (it will do the same for four spaces and six spaces as well).


SimpleCode intends to make my life easier. It’s essentially in beta and took only an hour to put together — in fact I’ll bet there’s something out there that already exists and I just didn’t think to look earlier. I’m sure it’s missing some features, and that’s where you come in. It’ll come in handy for the SimpleQuiz examples — and I’d venture to say it would help for those commenting as well.

Feel free to use it — and grab the source code to put on your own server. It’s a single, self-contained file that just needs to be dropped in your cgi-bin. Set permissions to 755 and you’re good to go.

Update: Jesper has created a neat and tidy JavaScript version, and also helped me simplify a bit of the Perl source as well. v02 (and v03) already!


  1. This is similar to my smart quote enhancement of David Lindquist’s Entity Replacement. Perhaps a combination of the three is in order.
    David’s script takes shorthand and replaces it with encoded entities. My enhancement takes “smart quotes” and such, usually coming from a Word document, and encodes those.

  2. Yop says:


  3. Thats a great utility Dan. Thanks.

  4. Gracias senor.

  5. Max says:

    I usually just edit text in MSFT Word (you could use any suitable word processor or text editor although I prefer the powdery blue nastiness of Word 2K3) and then copy paste to Dreamweaver. This is nice because word automatically formats smart quotes and fractions and dreamweaver turns these into encoded entites. Dreamweaver is also useful for those pesky >code< instances. ;)

  6. Alex says:

    For the PHP folks out there, you can do something like this:
    if (isset($_REQUEST["mystring"])) {
    echo htmlentities(htmlentities(stripslashes($_REQUEST["mystring"])));
    and it will output the string with entities encoded. Note, for multi-byte languages use htmlspecialchars() instead of htmlentities().

  7. I think a few misunderstood what this particular tool is for — simply for formatting and reprinting blocks of HTML code for demonstration purposes. It doesn’t attempt to be an entity filter, and smart quote cleaner. That would’ve been far too much work. :-)
    If you need to print HTML code examples out on an HTML page — then this will save you from encoding brackets and adding spaces for indenting.

  8. Matthew says:

    Very nice. It looks somewhat similar to Brad Choate’s MT plugin MT-Textile 2.0 which was just released yesterday.
    Of course, SimpleCode’s exact functionality can be achieved very easily using JavaScript (mostly utilizing the escape() function, with a little help from some pattern matching).

  9. Graham says:

    I think this is similar to Accessify’s Quick Escape tool, no?

  10. Graham – yes, it’s exactly like Quick Escape. Like I said, I figured there was something out there already — but it was fun to build anyhow :-)

  11. waylman says:

    Thats great.
    However, I noticed that if you want to include “& gt;” or “& lt;” in your code that gets displayed as > or <.
    I even have to add spaces above just to get them to display.
    For example, Option A. of your most recent Simple Quiz becomes:

    <p>You are here:
      <a href="/">Home</a> >
      <a href="/articles/">Articles</a>

  12. Tom Clancy says:

    Color me lazy: not that I’m formatting for the general public, but if I need to dump code out to the page, XMP tags work for me.

  13. Joe Stump says:

    XMP tags are deprecated:

  14. Andrew says:

    this is going to help me out a ton!

  15. Andrew says:

    have you thought about making your own CMS???? You should give it a shot :-)

  16. [m] says:

    Too bad that it doesn’t recognise tabs. I use tabs for indenting code, instead of (yuck!) spaces.

  17. Gambit says:

    Nice tool, save alot of time and hastle. Im interested in giving Mt-Textile a run, should be quite good.

  18. blackfox says:

    I use CDATA sections for this… No problem (Just internet explorer who don’t handle them but it’s not a big problem)

  19. blackfox – It’s actaully a huge deal if something doesn’t work in IE. This is still the most popular browser by far.

  20. stombi says:

    I think there is a problem for me with the javascript version.
    those CR for Mac, CRLF for PC, LF for UNIX/LINUX sucks.
    I use GNU/Linux so it needs

    raw = raw.replace(/\n/g, "<br \/>");

  21. rafa says:

    why not simply use blablablablabla


  22. rafa says:

    ok, forget it, I see it doesn’t work ;-)

  23. Jesper says:

    stombi: To my understanding, textarea’s have ALWAYS used CRLF across all platforms and browsers. And I was using Dan’s regexes. Maybe the CGI module replaces all LFs with CRs. I don’t know, I never use CGI;. If you’d like to report some of your findings, I’d be more than happy to change the script accordingly. jesper AT lindholms DOT com.

  24. Jesper says:

    The script should now replace CR, LF or CRLF.

  25. stombi says:

    sorry Jesper, I was tired and it appears to me that it was easier to post here than at your site.
    Thanks for the modification.

  26. Terry says:

    Nice script. I came across this a while back:
    What I’m wondering, though, is what are the benefits of doing it this as opposed to using a styled textarea?
    readonly=”readonly” cols=”80%” rows=”100%”
    style=”border: 1px solid black; overflow: visible; padding: 10px;”
    Am I missing something? I know using a textarea has extra code required from the form tags itself whereas dumping html entities into a code tags requires less markup. But besides that what are the benefits?

  27. Jesper says:

    Terry: purely ‘semantics’. You CAN use <b><big><big><big>dfhgfd</big></big></big></b> instead of h1 too, but it’s much harder to style with CSS and you don’t instantly recognize what it does. With your method you have to actively investigate whether a <textarea> is a code sample or a form element and/or attach clumsy ids or classes if you want to style it differently than an actual form textarea.

  28. daniel says:

    Great tool, I’ve already been reading your site, and decided to comment. It works great, I’ve already used it a couple times. Thanks!

  29. Synistar says:

    There is already a CPAN module for this. Try something like:

    use HTML::Entities ();
    $encoded = HTML::Entities::encode($a);
    $encoded = HTML::Entities::encode_numeric($a);
    $decoded = HTML::Entities::decode($a);

    Or you can use the built in escapeHTML() function in that ships with all modern versions of perl.

    use CGI;
    my $q=CGI->new();
    print $q->header(),
    print $q->h1($header),

    Re-inventing the wheel can be fun. But the perl community has lots of wheel-makers. :)

  30. Maxwel Leite says:

    Forget replace the tab or “\t”(in JS) in yours JS version (

  31. Steve Smith says:

    With PHP, you could create a simple function to output formatted code:
    function prep_code($string) {
      $string = htmlspecialchars($string);
      $string = str_replace("\t", '  ', $string);
      $string = str_replace('  ', '  ', $string);
      $string = nl2br($string);
      return $string;

    Then just call the function:
    echo prep_code($code_string);
    This includes double-spaces, tabs, etc.

  32. Anonymous says:

    If you copy the output back to the input you’ll notice it doesn’t support the escaped characters, but I guess we shouln’t be using nbsp’s anyway :)