-
-
Notifications
You must be signed in to change notification settings - Fork 942
Description
The following tests fail with -Dfile.encoding=windows-1252 but pass with -Dfile.encoding=UTF-8 :
import java.io.StringWriter;
import java.io.Writer;
import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import org.jruby.embed.ScriptingContainer;
import org.junit.Test;
import static org.hamcrest.Matchers.*;
import static org.junit.Assert.*;
public class TestUnicodeCharacters {
String orig = "\u6625\u304C\u6765\u305F\u3002";
String scriptlet = "#encoding: UTF-8\n" +
"str = \"" + orig + "\"\n" +
"puts str\n" +
"str\n";
Writer writer = new StringWriter();
@Test
public void testCharacterEncodingViaScriptEngine() throws Exception {
ScriptEngine engine = new ScriptEngineManager().getEngineByExtension("rb");
ScriptContext context = engine.getContext();
context.setWriter(writer);
String result = (String) engine.eval(scriptlet, context);
checkValues(result);
}
@Test
public void testCharacterEncodingViaScriptContainer() throws Exception {
ScriptingContainer container = new ScriptingContainer();
container.setWriter(writer);
String result = (String) container.runScriptlet(scriptlet);
checkValues(result);
}
private void checkValues(String returnedResult) {
assertThat(returnedResult, is(equalTo(orig)));
assertThat(writer.toString().trim(), is(equalTo(orig)));
}
}
Most likely, the failure output you get will be confusing as well:
java.lang.AssertionError:
Expected: is "?????"
but: was "?????"
The "Expected" line is "?????" because Java is encoding the output as windows-1252.
The "but" line is "?????" because JRuby has encoded the strings to windows-1252 internally and then written and returned the question marks. I find it particularly odd that it would do this, both because the script is passed as a string directly from Java in the first place, but also because the script itself clearly says the strings are UTF-8.
This was JRUBY-4890 on the old tracker. The script had to be updated a bit to have a #encoding: UTF-8 directive because JRuby now complains if you omit it.