Working with Java Arrays 3

One improvement that I’d like to see in Clojure is more examples in the doc strings (or maybe in a separate :example metadata item). Still, nothing to stop me building up a set of my own.

So, here are some simple examples of working with Java arrays in Clojure…

Given some sample data:

(def my-list '(1 2 3 4 5))
(def my-vector [1 2 3 4 5])
(def my-map {:a "apple" :b "banana" :c "chopped liver"})

To convert to Java arrays:

(to-array my-list)
#<Object[] [Ljava.lang.Object;@962522b>
(to-array my-vector)
#<Object[] [Ljava.lang.Object;@37e55794>
(to-array my-map)
#<Object[] [Ljava.lang.Object;@52cd19d>

Note that this always returns Object[] regardless of the contents of the collection. Note also that the map isn’t flattened (the pp function used here is in clojure.contrib.pprint):

user=> (pp)
[[:a "apple"], [:b "banana"], [:c "chopped liver"]]

If the array is 2-dimensional there is a corresponding function:

user=> (def my-vec-2d [[1 2 3] [4 5 6] [7 8 9]])
#'user/my-vec-2d
user=> (to-array-2d my-vec-2d)
#<Object[][] [[Ljava.lang.Object;@3a42f352>
user=> (pp)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
nil

If you need to use a specific type of array (e.g. to pass a String[] into a Java method) or need to use an array with more than 3 dimensions it’s a little trickier:

user=> (into-array my-list)
#<Integer[] [Ljava.lang.Integer;@60c0c8b5>
user=> (pp)
[1, 2, 3, 4, 5]
nil
user=> (into-array my-vector)
#<Integer[] [Ljava.lang.Integer;@2151b0a5>
user=> (pp)
[1, 2, 3, 4, 5]
nil
user=> (into-array my-map)
#<MapEntry[] [Lclojure.lang.MapEntry;@7ae0c7c3>
user=> (into-array (vals my-map))
#<String[] [Ljava.lang.String;@731de9b>
user=> (pp)
["apple", "banana", "chopped liver"]
nil

There, that should serve as a handy reference for myself for when I’m feeling forgetful…

Clojure Purists? 0

I’ve been following Tim Bray’s excellent Concur.next article series covering approaches to concurrency in various languages, and currently (no pun intended!) covering Clojure. The latest article talks about a super efficient log parsing implementation done by Alex Osborne an includes the following comment:

“Lots of the performance wins come from dipping into Java-land (AtomicLongs, LinkedBlockingQueue), which is perfectly OK, but a Clojure purist would probably see those occasions as maybe highlighting gaps in that language’s coverage.”

I’d take issue with that, one of the real strengths of Clojure it that it has easy and fast access to the whole Java ecosystem. As Rich says:

“Clojure is designed to be a hosted language, sharing the JVM type system, GC, threads etc. It compiles all functions to JVM bytecode. Clojure is a great Java library consumer, offering the dot-target-member notation for calls to Java.”

That seems pretty clear to me. I wonder if the people claiming to be Clojure purists are all coming from a Lisp background rather than the Java world?

Clojure Evaluation 0

I’ve been loking at the early-access version of Manning’s forthcoming Clojure in Action book as well as some of the criticism of it. One of the complaints is that the current drafts describe macros as a run-time concept and that this is wrong. The confusion arises from the fact that Clojure (and Lisps in general) don’t follow the same path from source to execution as a more conventional programming languages like C and Java. I’ll compare four different languages to see how they differ: C, Java, a traditional Lisp compiler, and Clojure.

C Like Languages

This is what most people are used to, the traditional compiled language. Here, source code is read in and then parsed (this is normally broken out into multiple stages, e.g. a separate lexing stage, but for our purposes we can gloss over these details). The parser emits some form of intermediate representation, usually an abstract syntax tree (AST), this is then used by the compiler to generate executable code.

C Like Languages

Again, this potentially glosses over some details: optimizers can world on the intermediate representation for example, or the compiler could require a separate linking stage to generate an executable. For our purposes though this is sufficient: we go from source to AST to executable.

Also, it could be argued that the C pre-processor operates on the source code before the parser sees it, but in practice the C macros system is so primitive it doesn’t really warrant being called out as a separate stage.

Java Like Languages

Java Like Languages

The Java like languages differ from C in that they run on top of a virtual machine rather than being executed directly by the OS; as a reult their compiler emits ‘bytecode’ rather than a finished executable. This bytecode is then executed by the virtual machine. In all modern desktop and server virtual machines this is means just in time compiling the bytecode down to native machine code.

Traditional Lisps

The Lisp view of things is a little different; it’s more complicated but also more powerful. The first thing to note is that Lisp code is already basically in the form of an AST - there is no (or not much) syntax getting in the way. Next, there are 2 types of macros which are applied to Lisp code: macros and reader macros. I’ll duscuss them in the opposite order to the way they are applied…

Lisp Like Languages

The standard type of Lisp macros are what most people rave about when extolling the virtues of the language: these are chunks of code that are executed after the source has been loaded into an AST (remember, Lisp source code is basically in this form already, so this just involves moving from a textual representation to something that the Lisp runtime can work with). If a node in the AST is a macro then it is evaluated as the code is loaded and the result of the evaluation is used to replace the macro node in the AST. Stop and think about that for a minute - this happens before the code is evaluated by the regular Lisp runtime, but yet at this point you already have access to the full Lisp programming language. All of this means that you can do some pretty cool tricks: how about writing your own control constructs? Writing a DSL compiler? Logging constructs that have zero runtime overhead when not used (but that can be switched on and off by users of the program, unlike #define DEBUG 0 in C)?

The second, and much less common, type of macro is the reader macro. Reader macros operate on the character stream as it is read in, before the AST is constructed. Basically, when the reader sees a specific character (usually #) it then looks at the next character and uses that as a key into a table of functions (the read table) that tell it how to handle future input. Using reader macros it’s possible to create DSLs that don’t use s-expression syntax (s-expressions are the paren enclosed lists that Lisp is (in)famous for); or do something as simple as allowing the use of brackets to write quoted lists without needing an explicit quote (i.e. writing [1 2 3] instead of '(1 2 3)).

Only once all of this has finished is a traditional lisp ready to let it’s compiler go to work turning the (now macro-free) AST into executable code.

Clojure

Clojure is very similar to the traditional Lisp model, with two main differences. The first difference is the fact that, like Java, the output is bytecode which is then loaded and executed by a standard Java virtual machine. The second difference is that while Clojure does have reader macros, the read table isn’t exposed to user programs; that is, while it operates in the same way as a traditional Lisp there is no way for user code to alter the behaviour of the reader. This is probably good thing as Clojure includes a relatively large number of predefined reader macros including a literal syntax for lists, sets, and maps, as well as lambda-expressions (anonymous functions) and metadata.

Summary

C and Java like languages have a huge amount of syntax baked in, but don’r provide any way to modify this or to manipulate the program before it is compiled. Lisp has almost no syntax but provides a mechanism for users to add their own, and provides a mechanism to manipulate programs before they are compiled. Clojure has some syntax (more than other Lisps, but way less than C/Java/&c.) and provides the same mechanism for program manipulation as other Lisps.

The OmniGraffle file for the images in this post is available here if anybody is interested.

Update: this post is intended to compare traditional C-style languages with Lisps, it doesn’t cover, for examples, so-called scripting languages such as Perl, Python, and Ruby.

Strawman Arguments and Coding Styles 0

So there’s this blog post over on the Best in Class blog that talks about ceremony in programming languages and compares Clojure with Java on this basis. While I’d agree with the basic premise of the article (that there is less ceremony in Clojure), I’m less keen on the way it’s presented: by way of a needlessly verbose strawman example. To be fair the article does kind of admit that this is what is being done, but it’s still annoying.

With this in mind let’s see how well we can do with the Java version of the code, relying on a better coding style and a couple of freely available libraries (one of the platforms much touted strengths). For the original — 28 line — version of the code I’ll refer you to the original post (but warn you that it’s presented in that well known code storage format, PNG!).

The same code rewritten in a smarter manner, but still using only the core Java libraries. This gets it down to 10 lines of code and also makes the intent of the code clearer. There’s still a fair amount of ceremony about this however: the multiple imports, and all of the class and static main method boilerplate.

import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
 
class Distinct {
  public static void main(String... args) {
    Set<String> distinct = new HashSet<String>(Arrays.asList(new String[] {
        "foo", "bar", "baz", "foo"
    }));
  }
}

Let’s see if we can’t do a little better with the addition of some open source libraries. Enter Google Collections, a really neat library that improves the collections API from the JDK. We’re now down to 7 lines of code, and 2 of those are just closing braces! In any reasonably size program the class and main statements disappear into the noise, so we’re really saying that we have 2 import statements and a single line of code. That’s not too different from the Clojure version all things considered.

import java.util.Set;
import static com.google.common.collect.Sets.newHashSet;
 
class Distinct {
  public static void main(String... args) {
    Set<String> distinct = newHashSet("foo", "bar", "baz", "foo");
  }
}

It’s interesting to note that the second Java version weighs in at 10 lines of code, versus 8 for the equivalent clojure version; not much of a difference really.

I think that the benefits of Clojure come from it’s functional style, macro system, and excellent concurrency support — not from the fact that you can save a few lines of code here and there.

Further discord around JSR-294 0

Peter Kriens of OSGi fame has posted some comments about the current EDR from JSR-294, the proposed Java language changes in support of module systems:

In Java 1..6 the language offered a pretty pure model that was mapped
to reality in the VM. With class loader tricks we could tweak the
perspective each JAR had of this pure world, solving many real world
problems. In JSR 294, we will for the first time introduce this messy
and complex runtime world in the language. Untold millions have been
spent to make Java run on hundreds of platforms, and with one simple
JSR we bring back the need for #ifdef …

Read the relevant posts on the mailing list, especially this one.

I generally agree with the OSGi camp here, this is a giant case of ‘not invented here’ syndrome from the Sun people. It’ll be interesting to see if the acquisition by Oracle has any effect on this (or the JCP in general) but I guess we’ll only find out about that after the deal goes through (i.e. months away yet).

Hat tip to Chris Aniszczyk.

Embedding Clojure (part 2) 0

Following on from the last post, it actually turns out to be much easier to do most of the work in Clojure itself — no need for all of that tiresome messing around with Vars and Symbolss on the Java side of things! The trick is to define an abstract class in Java to set a few things up and use as a hook, than implement this in Clojure. I’ll go through both sides of this, starting with the Java stuff.

The Abstract Class in Java

Basically, I’m using the Java side of things to set up my text pane with a standard stylesheet (I’d like it to use a proportional font, with different colours for input, output, and error text) and to install a key listener to send commands to the Clojure repl whenever the user hits enter or return.

The basic class then, is

1
2
3
4
5
6
7
8
9
10
11
12
public abstract class InteractiveConsole {
    private final Document _document = createStyledDocument():
    private final JTextPane _textpane;
    private final PipedWriter _inWriter = new PipedWriter();
    private final Reader _in;
    private final Writer _out = new PrintWriter(new DocWriter("output"), true);
    private final Writer _err = new PrintWriter(new DocWriter("error"), true);
    protected InteractiveConsole(JTextPane textpane) throws IOException {
        _textpane = textpane;
        _in = new PipedReader(_inWriter);
    }
}

The createStyledDocument method, which I won’t include here, just sets up the style context and my colour scheme. The DocWriter class that is references is an trivial writer subclass that just calls insertString on the document with the named style. The other class that I’ll be wanting to use is a runnable so that I can launch the Clojure REPL on it’s own thread. It’s about as trivial as it gets, it just calls back into the two abstract methods that I’m going to provide to provide my Clojure code somewhere to hook onto.

1
2
3
4
5
6
7
8
9
10
11
12
13
private class ConsoleRunner implements Runnable {
    private final Map&lt;String, Object> _context;
    public ConsoleRunner(Map&lt;String, Object> context) {
      _context = context;
    }
    @Override
    public void run() {
        for (Map.Entry&lt;String, Object> var : _context.entrySet()) {
            bindVariable(var.getKey(), var.getValue());
        }
        doStart(_in, _out, _err);
    }
}

With this I can provide a start method that installs my key listener and then launches a new thread with an instance of this runnable. The two hook methods that I’m providing are

  • abstract void bindVariable(String,Object) to allow me to set up some domain objects on the clojure side of things; and

  • abstract void doStart(Reader,Writer,Writer) to actually start the REPL, using the provided input, output, and error streams.

The Clojure Implementation

Turns out to be trivial as well, the implementation of bindVariable just interns the object passed in into the user namespace, it’s a one liner in Clojure.

1
2
(defn -bindVariable [this name value]
    (intern 'user (symbol name) value))

The doStart method isn’t much more involved either, it just sets up the bindings and then launches the REPL.

1
2
3
4
5
(defn -doStart [this #^Reader in #^Writer out #^Writer err]
    (binding [*in*  (LineNumberingPushbackReader. in)
              *out* out
              *err* err]
        (clojure.main/repl)))

Notice that here I have added type annotations so that the correct method gets implemented, without these the Clojure code compiled but then I got abstract method errors at runtime. Check out the docstring for the repl function as well, there are a few useful options (for example in my actual code I have an :init function to switch to the user namespace, and a custom prompt).

For completeness, here’s the rest of the Clojure file with the code required to inherit from the Java base class.

1
2
3
4
5
6
7
8
9
10
11
(ns com.example.ClojureConsole
    (:import (clojure.lang LineNumberingPushbackReader)
             (java.io Reader Writer)
             (java.util Map))
    (:require (clojure main))
    (:gen-class
     :extends bg.beer.editor.InteractiveConsole
     :init init
     :constructors {[javax.swing.JTextPane] [String javax.swing.JTextPane]}))
(defn -init [textpane]
    [[textpane]])

Summary

This approach has the advantage that any additional configuration can happen in the clojure code. It would also be easy, for example, to have an additional script that was always run at start up, to allow the user to customize the console further (similar to the .emacs file in Emacs).

You could also move most of the work that I’m doing in Java into the Clojure code. I haven’t done this as I may want to support multiple languages in my console and it’s nice to have a common stylesheet and keybindings (e.g. for history) across languages. Your mileage may vary.

Embedding Clojure in an Existing Application 4

I’ve been taking a look at Clojure lately, as a JVM friendly flavour of lisp I’ve got to say it looks pretty interesting. One problem that I’ve had though is that all of the documentation out there (of which there’s very little, to be honest) seems to assume that you’ll be writing/running a pure Clojure program as you main application. There’s plenty of information about calling Java code from Clojure programs, and some information about extending Java classes and interfaces with Clojure code, but nothing about getting the two talking together at runtime.

So here’s how it’s done.

First off you need to set up the symbols and namespaces that you are going to need to start up a clojure environment.

1
2
3
4
5
6
Symbol main = Symbol.create("main");
Symbol clojureMain = Symbol.create("clojure.main");
Symbol user = Symbol.create("user");
Symbol require = Symbol.create("require");
Namespace userNS = Namespace.findOrCreate(user);
Namespace clojureMainNS = Namespace.findOrCreate(clojureMain);

Once you have these you can require the clojure main namespace, this is the same one that is used to run scripts or start a REPL from the comand line.

1
Var.intern(RT.CLOJURE_NS, require).invoke(clojureMain);

Then, and this is the bit that I had trouble working out, you need to bind your application’s domain model (or at least those bits of it that you want to expose) into the user namespace in Clojure.

1
2
3
4
5
for (Map.Entry&lt;String, Object> global : globals.entrySet()) {
    String key = global.getKey();
    Object value = global.getValue();
    Var.intern(userNS, Symbol.create(key), value);
}

Finally you’re ready to grab the main method and run it.

Var.intern(clojureMainNS, main).applyTo(RT.seq(new String[0]));

The emtpy string array here is emulating an emty sargument list at the command line.

Some things to note here: you need to intern vars before you can use them, even for core library features like require, this surprised me at first but when you remember that most Lisps are built around a very small core of special forms with everything else defined in Lisp, it makes some sense.

Guice Development Stages 0

Guice has a concept called ‘develpment stages’ which affect how the library works. Guice is written with server-side applications in mind (it is from Google, after all) so the behaviour for production mode is to do a bunch of work up front so that it can catch errors as soon as possible. The development mode defers work (i.e. object creation) until the last moment;

Here’s how the Javadoc notes describe them:

  • Development we want fast startup times at the expense of runtime performance and some up front error checking.

  • Production we want to catch errors as early as possible and take performance hits up front.

  • Tool we’re running in a tool (an IDE plugin for example).

ignoring tool mode, this is exactly the opposite of the behaviour that you want to see for client-side development. For client applications you want to have an much error checking as possible during development but for production use (i.e. use by a real user) you want to go for faster start-up times.

Note that for production use it is also advantageous to defer error checking until later as well; after all, there is no point in refusing to start an app because feature ‘foo’ isn’t working, if the user doesn’t use or care about ‘foo’!

Understanding Event Propagation in Ardor3D 0

Ardor3D adds a mechanism for subscribing to events in the scene graph; things like when a node is added or removed, or has one of it’s render states or it’s bounding volume altered. The way notifications are handled is a little confusing at first, as the listener interface (DirtyEventListener) has a method which takes a Spatial as it’s first argument; hoever, the spatial is the node in the scene graph on which the event occurred, it is not the node that the event is fired from.

By way of example: if there are 2 scene nodes, a Spatial named ‘child’ and a Node named ‘parent’, and the child is attached to the parent via a call to Node#attachChild(Spatial) then the following events will be fired (assuming that the child was not already attached to another node):

  1. from the child: spatialDirty(child, Attached)

  2. from the parent: spatialDirty(child, Attached) if the parent was attached to another node (for example, ‘grandparent’) then this pattern would continue, like so:

  3. from the grandparent: spatialDirty(child, Attached)

also, if at any point in this one of the spatialDirty calls returns true, then the event is assumed to have been handled and propagation will stop at that point.

What this means in practice is that if you need to know which node an event is being fired from you cannot reuse listeners on multiple nodes. For detaching nodes the situation is a little different: the event propagation starts at the parent node, with the detached child passed as as the method argument. Be aware that the parent will already have been set to null when the event is fired though, so if you need to be able to do something with the parent you will need to add distinct listeners to each node of interest.

Object Pooling Performance 2

This is an attempt to compare the performance of various object reuse strategies for JMonkeyEngine (and, indirectly, Ardor3D). See this JME forum topic for background info. Also, it’s worth bearing in mind that the main driver for this is to reduce GC pauses, not to improve throughput (this is mentioned in the forum topic).

This is timings from my initial object pooling implementation using 1 thread (all timings throughout this article are in milliseconds):

1793, 2109, 1485, 1418, 1414, 1592, 1675, 2206, 1768, 1685

An interesting (although unrelated to the topic at hand) thing to note here is that it’s pretty easy to spot when hotspot kicked in, although the jump in timings between runs 4 and 5 bears investigating further and the slow run (run 7) is certainly a worry. Run 7 seems to be consistently slow and turning on -verbose:gc doesn’t reveal anything here.

This is timings from my object pooling implementation using 2 threads, note here that this is each thread doing a fixed chunk of work, not a fixed amount of work shared between threads (i.e. 2x number of threads = 2x amount of work):

2234, 2965, 1796, 1830, 2029, 2145, 2105, 2242, 2214, 2247

So the timings are higher although not twice as high as might be expected , this is probably due to that fact that my laptop has a 2-core CPU. Now let’s look at the timings from the same with 10 threads:

8869, 9814, 9203, 8966, 9182, 8973, 9176, 9166, 9358, 9182

Yep, about 5x the time as the 2 thread runs. Now let’s see how the existing implementation compares to these.

OK, so now we have an idea how the thread local and pooling based implementation performs, let’s have a look at the existing implementation to give us a baseline to compare to. This is timings from the existing implementation using 1 thread:

1501, 1469, 1379, 1374, 1471, 1553, 1553, 1589, 1652, 1547

Well, that’s quite a bit faster for a single thread, although not an order magnitude type difference, and it didn’t use any locking as it’s running from a single thread, a more realistic implementation would need to have the locks in place in case future code tried to use multiple threads. Let’s have a look at 2 threads now:

3038, 2906, 2797, 3140, 3120, 3124, 3143, 3179, 3340, 3352

Oops! This is about what we’d expect to see, a doubling of the amount of work doubles the amount of time needed (as the quaternion class is now a shared resource). And 10 threads:

15538, 16145

OK, I got bored of waiting after 2 runs! But it’s clear to see that it’s much slower.

Finally, based on suggestions from vear and also looking at the code used in the Javolution library (a set of real-time Java classes), I decided to try a version that reduced the number of thread local look-ups needed, this comes at the cost of not providing a single reusable ObjectPool class, but as that class is pretty trivial anyway it’s no great loss leaving it out of the framework.

With 1 thread:

1189, 1149, 1096, 1229, 1261, 1506, 1352, 1415, 1295, 1424

With 2 threads:

1644, 1582, 1799, 1640, 1584, 1587, 1766, 1658, 1624, 1806

With 10 threads:

6853, 7037, 7469, 7549, 7438, 7851, 7748, 7769, 7661, 7703

Wow! it’s pretty clear that the pooled approach is much faster and that the cost of performing the thread local look-up is fairly significant. Interestingly I also tried this using raw arrays instead of ArrayLists and it was much slower, I can only surmise that because ArrayList is so heavily used throughout Java it gets insanely optimised by hotspot.

As a side note, here’s my Java environment:

1
2
3
4
5
6
7
~/tmp/perftests$ uname -a
Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008;
root:xnu-1228.9.59~1/RELEASE_I386
~/tmp/perftests$ java -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

And the code used for these tests is available here.

I also tried this using 1.5 with both the server and client VMs, the 1.5 server VM is noticeably slower and the 1.5 client VM is frankly a dog, it was 5-6 times slower than the 1.6 times given here.

Next Page »