Digital Magpie

Ooh, ooh, look - shiny things!

Vote Now!

Fed up with all of the pre-election coverage in the (UK) media right now? Instead of listening to call-me-Dave drone on and on, or hearing how the one-eyed Gobblin’ King has once again saved the world, go cast your vote for Computer Engineer Barbie!

SQL: Just Say No!

Just in time for the weekend, the SF NoSQL community have put up a collection of talks and slides from their recent get together. I’ll have more to say after I’ve had a chance to look these over…

Update: not actually a new set of talks, these have been online for about 6 months now, but it’s the first time that I’ve seen them…

Prevalent Synchronicity

Maybe it’s just an idea whose time has come, but in the past few days there’ve been 2 prevalent database systems announced for Clojure: FleetDB and Persister.

Prevalent Databases

The idea behind prevalent databases has been around for a while being, if not ‘popularised’ exactly, at least pushed by the guys behind Prevayler. Here’s how they describe them:

Prevayler is an open source object persistence library for Java. It is an implementation of the Prevalent System design pattern, in which business objects are kept live in memory and transactions are journaled for system recovery.

Fleet DB

While Mark McGranaghan’s Fleet DB doesn’t use the term prevalent database, but right now that’s basically what it is. The core of Fleet DB is a Clojure based append-only log based database; it provides a native clojure query language (with built in query optimiser), schema-less records, indexes, and a server with a JSON based network protocol.

For a new new project Fleet DB also has a good set of documentation and it sounds like Mark has some big plans for it in the future. As an added benefit there are also clients for the network protocol in languages other than Clojure (Ruby officially, and a set of Python bindings in development).

Persister

Sergey Didenko’s Simple Persistence for Clojure project is a much less ambitious offering, but with the really cool feature of being a single (255 line, ~11KB) file that you could just drop into your project and start using - that’s pretty lightweight! From the read me file:

Simple Persistence for Clojure is a journal-based persistence library for Clojure programs. It follows “Prevalent system” design pattern.

The intended usage is assist you in making a prevalent system. Thus you work with your in-memory data and wrap every writing call into one of (apply-transaction*) macros. A nice feature is that the log files are just Clojure code: transactions are logged as a valid Clojure code, so they are easy to read and run separately.

Squeeze!

One of the neat features of Clojure is the sequence abstraction — it makes solving a whole host of data processing tasks much easier, simply get you data into a sequence and you’ve got a huge toolbox available to work on it. Of course being a guy I’m firmly of the belief that more tools are better, with that in mind let’s add another one to our toolbox. Given a sequence the squeeze function returns another sequence with any adjacent items which match a supplied predicate merged together using a supplied function. It’s probably easier to illustrate by example, suppose I have a sequence of strings and I want to merge them together when the trailing string starts with whitespace, I can squeeze them like this:

1
2
3
(squeeze #(and %2 (re-matches #"\A\s.*" %2))
         #(apply str (apply concat %&))
         ["hello" " world." "foo" " bar"])

Another example, given a sequence of characters (read from an InputStream for example), I could group them into words by squeezing then thusly (the first line is just to remind you that calling seq on a string produces a sequence of characters):

1
2
3
4
5
6
7
8
user=> (seq "Cheers, chars!")
(\C \h \e \e \r \s \, \space \c \h \a \r \s \!)

user=> (map str/trim
         (squeeze #(and %2 (not= \space %2))
                  #(apply str (apply concat %&))
                   (seq "I'm sorry Dave, I can't let you do that.")))
("I'm" "sorry" "Dave," "I" "can't" "let" "you" "do" "that.")

So how does it work? Well, here’s the interface:

1
2
3
(defn squeeze
  [pred merge-fn coll]
  (squeeze- pred merge-fn coll nil))

And here’s the actual function that does the work, it’s declared private because I don’t want to expose the matched parameter to the outside world.

1
2
3
4
5
6
7
8
9
10
11
(defn- squeeze-
  ([pred merge-fn coll matched]
    (lazy-seq
      (when-let [s (seq coll)]
        (let [f (first s)
              s (second s)
              rest (rest coll)]
          (if (pred f s)
            (squeeze- pred merge-fn rest (cons f matched))
            (let [next (if matched (merge-fn (cons f (reverse matched))) f)]
              (cons next (squeeze- pred merge-fn rest nil)))))))))

I should probably point out that all of this playing around with sequences was inspired by Sean Devlin’s excellent proposal for some new sequence functions for Clojure 1.2. The full code for this is available here (it’s just the above, but with an added doc comment on the squeeze function definition).

Working With Java Arrays

One improvement that I’d like to see in Clojure is more examples in the doc strings (or maybe in a separate :example metadata item). Still, nothing to stop me building up a set of my own. So, here are some simple examples of working with Java arrays in Clojure… Given some sample data:

1
2
3
(def my-list '(1 2 3 4 5))
(def my-vector [1 2 3 4 5])
(def my-map {:a "apple" :b "banana" :c "chopped liver"})

To convert to Java arrays:

1
2
3
4
5
6
(to-array my-list)
#
(to-array my-vector)
#
(to-array my-map)
#

Note that this always returns Object[] regardless of the contents of the collection. Note also that the map isn’t flattened (the pp function used here is in clojure.contrib.pprint):

1
2
user=> (pp)
[[:a "apple"], [:b "banana"], [:c "chopped liver"]]

If the array is 2-dimensional there is a corresponding function:

1
2
3
4
5
6
7
user=> (def my-vec-2d [[1 2 3] [4 5 6] [7 8 9]])
#'user/my-vec-2d
user=> (to-array-2d my-vec-2d)
#
user=> (pp)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
nil

If you need to use a specific type of array (e.g. to pass a String[] into a Java method) or need to use an array with more than 3 dimensions it’s a little trickier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
user=> (into-array my-list)
#
user=> (pp)
[1, 2, 3, 4, 5]
nil
user=> (into-array my-vector)
#
user=> (pp)
[1, 2, 3, 4, 5]
nil
user=> (into-array my-map)
#
user=> (into-array (vals my-map))
#
user=> (pp)
["apple", "banana", "chopped liver"]
nil

There, that should serve as a handy reference for myself for when I’m feeling forgetful…