New Lines to Paragraphs


WordPress, the software behind this site, uses a function called autop(), originally written by Photo Matt, to convert new lines to proper HTML paragraphs. It didn’t really handle some common HTML structures too well, so tonight I took a stab at hacking it a bit. Code follows.

function wpautop($pee, $br=1) {
   // replace existing line breaks with newlines
   $pee = preg_replace('|<br\s*/>|', "\n", $pee);
   // make other kinds of line ends into unix-style newlines
   $pee = preg_replace("/(\r\n|\r)/", "\n", $pee); 
   // remove duplicate newlines
   $pee = preg_replace("/\n\n+/", "\n\n", $pee); 
   // extract block-tagged content
   $nm = preg_match_all('!<(table|ul|ol|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee, $blocks);
   // split out non-block-tagged content
   $split_pee = preg_split('!<(table|ul|ol|li|pre|form|blockquote|h[1-6])[^>]*>.*</\1[^>]*>!s', $pee);
   $pee = '';
   foreach ($split_pee as $i => $pee_part)
   {
      // make paragraphs
      $pee_part = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "\t<p>$1</p>\n", $pee_part); 
      // under certain strange conditions it could create a P of entirely whitespace - remove it 
      $pee_part = preg_replace('|<p>\s*?</p>|', '', $pee_part); 
      // optionally make line breaks
      if ($br) 
      {
         $pee_part = preg_replace('|(?<!<br />)\s*\n|', "<br />\n", $pee_part); 
      }
      // remove unwanted line breaks
      $pee_part = preg_replace('!(</?(?:dl|dd|dt|select|p)[^>]*>)\s*<br />!', "$1", $pee_part);
      $pee_part = preg_replace('!<br />(\s*</?p>)!', '$1', $pee_part);
      // add block-tagged code back in
      $pee = $pee.$pee_part."\n".$blocks[0][$i];
   }   
   // replace ampersand thingies
   $pee = preg_replace('/&([^#])(?![a-z]{1,8};)/', '&#038;$1', $pee);
   
   return $pee; 
}

My tactic was to only make paragraphs outside of common block tags. I see it works on much of my stuff – I’m certain it won’t be too hard to break either. Might be a fun problem to tackle in more depth if I get time.


9 responses to “New Lines to Paragraphs”

  1. Interesting. I’ll experiment a bit with this and run it against the test suite and if it works well this or something like this may make 1.0.

  2. I’d be interested to know how it tests too. I tried to make use of your work to actually make the p tags so I wouldn’t have to think through that problem too – but I can’t be sure I didn’t break your logic somewhere.

    It could be done a bit faster in PHP 4.3.0, by adding a flag to preg_split and eliminating the preg_match_all.

  3. I tested this a little, it does a great job of preserving the pre tags, which is my #1 complain of the current autop code. The only thing I see it not doing is adding br and p tags within li tags, etc.

    If you add that, let me know and I’ll do more testing for you.

  4. I’ve since found that this code will sometimes run paragraphs together. I don’t have time to debug it now. If you add a blank line between everything you want as a paragraph, it seems to always work.

  5. You’re code seems to have magic_quotes on adding unneeded escaping of “s. Other than that, time to test this out. Thanks. 🙂

  6. You’re right – if you add a get_magic_quotes() test, I’d like to see it.

    I haven’t actually been using this code in recent releases of WordPress (1.5.1 here), because it seems to do a better job now. I’m curious if there is continued interest in this code what the motivation is?

  7. Is there a way to diable this *feature* altogether from the management GUI in 1.5.1 or will I have to comment the function out?

    I don’t need anybody correcting my tagging behind me. I use extensive tagging in my posts and there no way one simple function can handle it unless it is a XHTML parser itself.

  8. I’ve read that calling remove_filter(‘the_content’, ‘wpautop’) will do the trick. You could do it before “The Loop” in your template, or make a little plugin that calls it…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.