
Redcloth 4 in JRuby doesn't support multi-bytes content
Reported by iamtin (at gmail) | May 19th, 2009 @ 05:53 AM
give this test case, run it in JRuby environment:
def test_mutibytes_chars
assert_equal "牛", RedCloth.new(" 牛").to_html
end
Redcloth 4 will return "".
This test will passed in C Ruby environment.
And we found it may be caused by the "when" operator in
redcloth_inline.rl. After comment out all "Semantic Condition",
this test will pass. We tried on Ragel 6.3 and 6.5. The reference
guide of Ragel said it may not support non-alphabet characters in
when operator.
Comments and changes to this ticket
-
Jason Garber May 26th, 2009 @ 10:46 PM
- Tag set to multibyte
- State changed from new to open
I'm aware of the problem from the test that includes "En français." I just don't know how to fix it. Do you have suggestions or a patch?
-
iamtin (at gmail) May 31st, 2009 @ 01:26 AM
I'm not familiar with Ragel. But after debug I think it's a bug of Ragel's java code generator. But the ragel's user guide says it doesn't support multi-bytes characters. The semantic condition feature works only with alphabet types that are smaller in width than the long type.. Changes the inline scanner's declaration, avoid using semantic condition may fix it, but it's expensive for our project. If I have spare time, I will invest on this, see if there is a cheaper way to fix it.
-
Jason Garber June 2nd, 2009 @ 02:47 PM
- Title changed from Redcloth 4 doesn't support multi-bytes content to Redcloth 4 in JRuby doesn't support multi-bytes content
I still don't have a solution. I tried making all the conditionals just return true/false, but it didn't work.
Resources:
A unicode script in ragel contrib -
Jason Garber June 7th, 2009 @ 06:27 AM
- Milestone cleared.
- Tag changed from multibyte to difficult, multibyte
This one's tough. Not going to happen in this release.
-
valters September 4th, 2009 @ 05:52 AM
This is unfortunate - I need to html-ize some unicode text articles (uses Baltic characters), and I was looking forward to use RedCloth for this, because pure-ruby library that I use right now is too slow.
This, unfortunately, is showstopper - on first unicode char (say: ā) RedCloth stops, and returns only the part of article up to that first unicdoe char. -
valters September 4th, 2009 @ 06:05 AM
(Yes, I am hosting my application on JRuby (looking forward to use Google AppEngine/J), and am trying to use Java-backed text-manipulation libraries, because that's pretty fast actually.
-
Jason Garber September 6th, 2009 @ 09:35 AM
Look for multibyte support in the rewrite of RedCloth (treetop or a divide-and-conquer parser). Ain' gonna happen in RedCloth w/ Ragel.
-
Benjamin Bock December 4th, 2009 @ 07:47 AM
I'm using this work around code in an initializer file for my JRuby projects:
if RedCloth::EXTENSION_LANGUAGE == "Java" module RedCloth class TextileDoc def initialize( string, restrictions = [] ) restrictions.each { |r| method("#{r}=").call( true ) } super( string.chars.map{|x| x.size > 1 ? "&##{x.unpack("U*")};" : x}.join ) end end end end
-
Tommy Li February 6th, 2010 @ 03:13 AM
Benjamin's solution is a good workaround. I'm calling this from the JVM though, so I did the character replacement in the calling language (Scala), may or may not be a speedup - but probably it's faster than JRuby.
def textile_render(textile_input: String) : Node = { // RedCloth under JRuby does not support multi-byte characters, so replace ahead of time // http://jgarber.lighthouseapp.com/projects/13054/tickets/149-redcloth-4-doesnt-support-multi-bytes-content val amended_input = textile_input.map(c => { val c_code = c.toLong // return if ascii otherwise give html entity if(c_code < 128) c else "&#" + c_code.toString + ";" }).mkString Unparsed(RedClothParser.makeTextile(amended_input)) }
-
Marek Kowalski May 26th, 2010 @ 05:44 AM
I made a lot of progress with debugging - solving this issue, but I'm stuck and need help. My work in progress can be seen on github fork:
http://github.com/kowalski/redcloth.
So the reason for the problem is that ruby doesn't care about encoding. String is just an array of bytes. If this is encode, so be it. If not.. who cares. So to make RubyString work with Java you have to make an assumption about encoding of the input.
Second step for fixing this is to switch Ragel into char mode with:
alphtype char;
and to store input data in char[] array instead of byte[].
When I did all that I managed to run a simple test:
puts RedCloth.new("Zażółć gęślą jaźń").to_html
"Zażółć gęślą jaźńZażółć gęślą jaźńZażółć gęślą jaźńZażółć gęślą jaźńZażółć gęślą jaźń
"I can see the UTF characters but wtf!? Every line input is repeated 4 times. This is were I'm stuck.
I did a lot of debugging and learned that problems begin in RedclothInline.inline method, which is generated by Ragel. Unfortunatelly I don't know Ragel enough to deal with this problem. Help would be very much appreciated, lets solve this together!
-
Marek Kowalski May 26th, 2010 @ 06:39 AM
Update:
The problem is solved. It was an obvious bug in my code. However after running rake spec I have 37 failures. Still need to track them down. Help would be still appreciated. -
Marek Kowalski May 27th, 2010 @ 05:47 AM
Update:
32 failures to go, but they are all connected with html_esc methd.
I'm very close :) -
glebm December 10th, 2012 @ 04:20 PM
Any update on this? Benjamin's solution did not work for me on jruby 1.7.1
-
glebm December 10th, 2012 @ 04:56 PM
My workaround for jruby >= 1.7.1 or jruby 1.6 in 1.9 compat mode
require 'redcloth/textile_doc' module RedCloth class TextileDoc def initialize(string, restrictions = []) restrictions.each { |r| method("#{r}=").call(true) } super(string.chars.map { |x| x.bytesize > 1 ? "&##{x.unpack("U*").first};" : x }.join) end end end
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
RedCloth is a Ruby library for converting Textile into HTML