Digital Magpie

Ooh, ooh, look - shiny things!

Understanding Event Propagation in Ardor3D

Ardor3D adds a mechanism for subscribing to events in the scene graph; things like when a node is added or removed, or has one of it’s render states or it’s bounding volume altered. The way notifications are handled is a little confusing at first, as the listener interface (DirtyEventListener) has a method which takes a Spatial as it’s first argument; hoever, the spatial is the node in the scene graph on which the event occurred, it is not the node that the event is fired from.

By way of example: if there are 2 scene nodes, a Spatial named ‘child’ and a Node named ‘parent’, and the child is attached to the parent via a call to Node#attachChild(Spatial) then the following events will be fired (assuming that the child was not already attached to another node):

  1. from the child: spatialDirty(child, Attached)
  2. from the parent: spatialDirty(child, Attached) if the parent was attached to another node (for example, ‘grandparent’) then this pattern would continue, like so:

  3. from the grandparent: spatialDirty(child, Attached) also, if at any point in this one of the spatialDirty calls returns true, then the event is assumed to have been handled and propagation will stop at that point.

What this means in practice is that if you need to know which node an event is being fired from you cannot reuse listeners on multiple nodes. For detaching nodes the situation is a little different: the event propagation starts at the parent node, with the detached child passed as as the method argument.

Be aware that the parent will already have been set to null when the event is fired though, so if you need to be able to do something with the parent you will need to add distinct listeners to each node of interest.

Cool Glass

And not the camera kind for a change… but [this][01] is pretty neat. [01]: http://sirtified.com/products/hopsidedown/

Object Pooling Performance

This is an attempt to compare the performance of various object reuse strategies for JMonkeyEngine (and, indirectly, Ardor3D). See this JME forum topic for background info. Also, it’s worth bearing in mind that the main driver for this is to reduce GC pauses, not to improve throughput (this is mentioned in the forum topic).

This is timings from my initial object pooling implementation using 1 thread (all timings throughout this article are in milliseconds):

1
1793, 2109, 1485, 1418, 1414, 1592, 1675, 2206, 1768, 1685

An interesting (although unrelated to the topic at hand) thing to note here is that it’s pretty easy to spot when hotspot kicked in, although the jump in timings between runs 4 and 5 bears investigating further and the slow run (run 7) is certainly a worry. Run 7 seems to be consistently slow and turning on -verbose:gc doesn’t reveal anything here. This is timings from my object pooling implementation using 2 threads, note here that this is each thread doing a fixed chunk of work, not a fixed amount of work shared between threads (i.e. 2x number of threads = 2x amount of work):

1
2234, 2965, 1796, 1830, 2029, 2145, 2105, 2242, 2214, 2247

So the timings are higher although not twice as high as might be expected, this is probably due to that fact that my laptop has a 2-core CPU. Now let’s look at the timings from the same with 10 threads:

1
8869, 9814, 9203, 8966, 9182, 8973, 9176, 9166, 9358, 9182

Yep, about 5x the time as the 2 thread runs. Now let’s see how the existing implementation compares to these. OK, so now we have an idea how the thread local and pooling based implementation performs, let’s have a look at the existing implementation to give us a baseline to compare to. This is timings from the existing implementation using 1 thread:

1
1501, 1469, 1379, 1374, 1471, 1553, 1553, 1589, 1652, 1547

Well, that’s quite a bit faster for a single thread, although not an order magnitude type difference, and it didn’t use any locking as it’s running from a single thread, a more realistic implementation would need to have the locks in place in case future code tried to use multiple threads. Let’s have a look at 2 threads now:

1
3038, 2906, 2797, 3140, 3120, 3124, 3143, 3179, 3340, 3352

Oops! This is about what we’d expect to see, a doubling of the amount of work doubles the amount of time needed (as the quaternion class is now a shared resource). And 10 threads:

1
15538, 16145

OK, I got bored of waiting after 2 runs! But it’s clear to see that it’s much slower. Finally, based on suggestions from vear and also looking at the code used in the Javolution library (a set of real-time Java classes), I decided to try a version that reduced the number of thread local look-ups needed, this comes at the cost of not providing a single reusable ObjectPool class, but as that class is pretty trivial anyway it’s no great loss leaving it out of the framework.

1
2
3
 1 thread : 1189, 1149, 1096, 1229, 1261, 1506, 1352, 1415, 1295, 1424
 2 threads: 1644, 1582, 1799, 1640, 1584, 1587, 1766, 1658, 1624, 1806
10 threads: 6853, 7037, 7469, 7549, 7438, 7851, 7748, 7769, 7661, 7703

Wow! it’s pretty clear that the pooled approach is much faster and that the cost of performing the thread local look-up is fairly significant. Interestingly I also tried this using raw arrays instead of ArrayLists and it was much slower, I can only surmise that because ArrayList is so heavily used throughout Java it gets insanely optimised by hotspot. As a side note, here’s my Java environment:

1
2
3
4
5
6
7
~/tmp/perftests$ uname -a
Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008;
root:xnu-1228.9.59~1/RELEASE_I386
~/tmp/perftests$ java -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

And the code used for these tests is available here. I also tried this using 1.5 with both the server and client VMs, the 1.5 server VM is noticeably slower and the 1.5 client VM is frankly a dog, it was 5-6 times slower than the 1.6 times given here.

InfoQ Presentations

InfoQ are one of my favourite tech information sites anyway and the signal to noise ratio there is normally pretty high, they’ve got 3 great new presentations out today:

  • in Herding Racehorses, Racing Sheep, Dave Thomas talks about the problems of treating developers as interchangeable cogs (or fungible resources, as the project management crowd would say). Some great ideas for bringing new team members up to speed quickly (about 18 minutes in);

  • in The Top 10 Ways to Botch Enterprise Java Scalability and Reliability, Cameron Purdy of Oracle talks about scaling Java. Obviously it’s not an impartial view but his points are good ones and it’s an interesting talk;

  • in Domain-Driven Design in an Evolving Architecture, Mat Wall and Nik Silver talk about their experiences applying DDD at the Guardian’s web site, it covers why they selected DDD as a method as well as the benefits that they feel it brought. I’ve heard them talk about this before, on SE Radio, and to be honest, it worked better in that format I thought - the written version here needs to be edited down to about half it’s current length;

  • finally, Rebecca Wirfs-Brock talks about the benefits of, and problems of conducting, architectural reviews. I’ve listened to some of this before as Rebecca did an interview with SE Radio a while back, but this is good for people who haven’t heard her talk before.