Rants and Ruminations

Using TagCleaner in Java with JRuby
13 Jan 05 - http://ruminations.willemvandenende.com/rublog/rublog.cgi/SysAdmin/InstallingJRuby.rdoc
After using the TagCleaner from the command-line with text-files, it appeared that it would be nice to be able to use TagCleaner in Itor too. But, I-tor is in Java, and requiring a working installation of ruby to run i-tor would introduce yet another dependency. Adding JRuby ( a Ruby interpreter for java, see jruby.sourceforge.net for the project) might be easier - it would be just another .jar file in Itor’s lib directory.

Even though JRuby (at the time of writing) has not had a release since October 2002, it wasn’t so hard to get it to work. I first start with running jruby from the command-line with:

 java -jar ~/install/jruby/jruby-0.5.3/lib/jruby.jar

after that, you can add your own file, I tried a ‘hello world’ file hello.rb, containing just ‘puts hello jruby world’ and it worked straight away. Running the tag-cleaner proved a bit more difficult. First, it complained it was missing files. The documentation of JRuby doesn’t state much about the use of paths, but apparently, it supports command-line options ruby supports as well. I found the -I option, for including library directories worked.

So now my command-line (with jruby invocation in a home-made shell script) looks like this:

 jruby -I/usr/lib/ruby/site_ruby/1.6 -I/usr/lib/ruby/1.6 tag-cleaner.rb aDocument.html

Now it complains it hasn’t opened the file. So first, I try the unit-tests in tagcleanertest.rb to see if the SGML parser will run at all, and it does. The unit tests, however, do not test opening a file (they work by feeding the tag-cleaner with html in strings). Apparently, the function IO.readlines doesn’t succeed in opening the file (the message is a bit cryptic). After rewriting the code to ‘File.new(inputFile).gets ’ it appears jruby can’t find the file. If I append the full path of ‘aDocumen.html’ it works :-) With bash, we can replace the current path with $PWD like so:

 java -jar ~/install/jruby/jruby-0.5.3/lib/jruby.jar -I/usr/lib/ruby/1.6
  -I/usr/lib/ruby/site_ruby/1.6/ tag-cleaner.rb $PWD/aDocument.html

unfortunately, jruby does not support the -C command line option (to change to a directory) like ruby does. IO.readlines (retrying after I found the path problem) doesn’t seem to work either. File.gets does. So, after modifying two lines, the tag-cleaner now also works in Java.

only now I find out, the JRuby distribution includes a ready-made shell-script in the bin directory… (DOH!) that script sets the path correctly, so the $PWD hack is obsolete. The -I’s are still necessary.

Integrating TagCleaner with Java.

In order to integrate TagCleaner in Itor, it seems best to rewrite the unit-tests in tagcleanertest.rb in Java, and then integrate the implementation with JRuby. That way, we can modify the tests to suit integration into Itor if necessary, and do the integration in small, repeatable steps. JRuby works according to something called the BSF, which is short for the BeanScriptingFramework, a Jakarta project. The BSF is a standard way to call other languages from within Java.

Before I start with the unit tests, I try to get an example provided by JRuby to run from within Eclipse. This example is, how appropriately, called BSFExample. This example requires jruby.jar and bsf.jar (from JRuby’s lib directory) to be on the classpath. This example works straight away, showing a small dialog, with the dialogs’ components accessible from a ruby command line. I couldn’t resist trying this bit of reflection:

 puts $frame.methods

And it works! showing me all methods of the Java frame, mixed in with the methods of Ruby’s Object. Now, to get my own thing running, I dive into the BSF’s documentation and use the example provided on the JRuby homepage. After having evaluated a simple expression and checking the result, it is time to run the Ruby version of the unit tests, to see if the ruby libraries are included correctly. This turns out a bit difficult, since there is no method on the BSFManager to set the library path. Eventually, it works by accessing the ruby library path from the ruby side, by adding elements to the variable $:, which turns out to be empty by default, so JRuby only looks in the current working directory.

This line of ruby does the trick:

 $: << '/usr/lib/ruby/1.6' << '/usr/lib/ruby/site_ruby/1.6'

In order to have my ruby files in the org.cq2.spike package, I need to add that as well:

 $: << '/usr/lib/ruby/1.6' << '/usr/lib/ruby/site_ruby/1.6' << 'src/org/cq2/spike'

the ’<<’ is a shorthand notation for adding elements to an array. For integration into the Itor package it is probably better to include the necessary libraries in the Eclipse project, but at least we know now how to tell JRuby where to look for files.

I don’t like the ‘eval’ and ‘exec’ functions of BSF very much. Why would I want to specify a line number every time? I probably missed some convenience functions, but with Eclipse it’s easy to add my own.

Here’s the full sourcecode of running the tagcleanertest. I tricked the BSF into loading (and executing) the tagcleaner script, by asking it to ‘require’ (include) the script. Probably missed something there as well, but I couldn’t find it in the documentation.

The printouts show how to retrieve a String back from JRuby and print it. Strangely, $: is not returned as an array.

        public static void main(String[] args) throws BSFException {
                BSFManager.registerScriptingEngine(
                        "ruby",
                        "org.jruby.javasupport.bsf.JRubyEngine",
                        new String[] { "rb" });

                BSFManager manager = new BSFManager();

                rubyEval(manager, $: << '/usr/lib/ruby/1.6' << '/usr/lib/ruby/site_ruby/1.6'
                  << 'src/org/cq2/spike');
                Object obj = rubyEval(manager, "$:");
                System.out.println(obj.toString());
                System.out.println(obj.getClass());

                String expression = "require 'tagcleanertest.rb'";
                exec(manager, expression);
        }

        private static void exec(BSFManager manager, String expression) throws BSFException {
                manager.exec("ruby",
                "(java)",
                1,
                1,expression);
        }

        private static Object rubyEval(BSFManager manager, String rubyExpression)
          throws BSFException {
                return manager.eval(
                        "ruby",
                        "(java)",
                        1,
                        1,
                        rubyExpression);
        }

After running this, the output shows the library path, its’ class and four running tests:

        /usr/lib/ruby/1.6/usr/lib/ruby/site_ruby/1.6src/org/cq2/spike
        class java.lang.String

        CleanFontTagsTest#testNoStripping .
        CleanFontTagsTest#testSpecialCharacters .
        CleanFontTagsTest#testStripTagsOnly .
        CleanFontTagsTest#testTwoAttributesPreserved .
        Time: 0.803
        OK (4/4 tests  4 asserts)

So, I’m fairly happy with this result. Now I can continue and integrate the main tagcleaner with my Java code. The tests in JRuby do run almost 80 times slower than the tests in ‘native’ ruby, that one only takes 0.012543 seconds. But at least I get to recycle my code instead of rewriting it and profit from within Java from well-written parser libraries etc. in Ruby.