#176 new
Steve Purcell

Unicode whitespace confuses link parsing

Reported by Steve Purcell | November 11th, 2009 @ 07:29 AM

The following textile snippet includes a unicode whitespace character ("\302\240" in UTF-8) after the period following the URL:

p = "Campbell has promised to donate all proceeds from "the upcoming auction of her Hermès bags":http://www.looktothestars.org/news/3171-naomi-campbell-donates-bag-to-charity-auction.  The yet to be named Vuitton bag will retail for about $2,789 and will be available at all of Vuitton’s 439 network stores.\n"
The period and space are incorrectly slurped into the in the HTML generated by RedCloth:
>> RedCloth.new(p2).to_html
=> "<p>Campbell has promised to donate all proceeds from <a href="http://www.looktothestars.org/news/3171-naomi-campbell-donates-bag-to-charity-auction. ">the upcoming auction of her Hermès bags</a> The yet to be named Vuitton bag will retail for about $2,789 and will be available at all of Vuitton’s 439 network stores.</p>"
This kind of problem is likely to occur when source text is pasted from a word processor, as is often the case with our big news site. It would be nice to avoid sanitizing this whitespace before passing the text to RedCloth.

There are a number of unicode characters that should probably be treated as equivalent to the space and newline characters, respectively, but at least in Ruby 1.8.7 they are not matched by the \s regex code. See http://en.wikipedia.org/wiki/Space(punctuation)#Table_of_spaces and http://en.wikipedia.org/wiki/Whitespace(computer_science) for more info.

RedCloth 4.2.2, Ruby 1.8.7 on both Linux & OS X 10.6.

Comments and changes to this ticket

  • Steve Purcell

    Steve Purcell November 12th, 2009 @ 03:20 AM

    BTW, I couldn't get the build and specs to work, otherwise I've have supplied a patch with new tests. What's the procedure for building/testing? Perhaps a quick section in the README could cover this?

    (I tried "ruby setup.rb config", "rake", "rake spec" etc., and all failed.)

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

RedCloth is a Ruby library for converting Textile into HTML

Shared Ticket Bins

People watching this ticket

Pages