I’ve recently had to set up and manage about a dozen Linux based virtual servers (not a huge number, but still) and it’s handly to have a separate user account on these machines that I can use to connect to for interactive use. With that in mind here are a couple of quick tips for setting things up for a more pleasant experience with minimal fuss.
I like to use zsh as my shell, but that’s easy enough to install via apt-get and so can be added to an automated build script. The fastest way to get a good configuration installed is via Oh My Zsh!, it uses the fairly horrible curl | sh installation style, so I don’t like to add it to an automated script, but it’s easy enough to check and run manually:
$ curl -L http://install.ohmyz.sh | sh
The other thing is to get a decent vim setup, for this I use SPF-13 which installs in a similar manner:
$ curl http://j.mp/spf13-vim3 -L -o - | sh
Then I usually set up a couple of aliases based on what the machine will be used for, and I’m done.
Following on from my previous post I’m attempting the next Programming Praxis excercise: computing the probability of a winning board at Bingo tournaments of various sizes.
Probably the most common means of solving this kind of probability problem is the Monte Carlo method, which uses randomization and statistical sampling to estimate probabilities. Luckiy for me, the problem space for Bingo is small enough that it is possible completely analyze the game and present exact numbers.
The individual functions are all pretty short so I’ll walk through the maths and code at the same time.
Laying the Foundations
Now, some of the numbers are going to get pretty big so primitive math won’t cut it. I’m going to define a few helper functions to make working with arbitrary-precision maths easier. I’m also gong to specify a limited precision context to use for division operations, as I’d rather lose a small amount of precision that get an ArithmeticException if we hit any irrational numbers.
We’re also going to need a combination function, which will return the number of distinct combinations of k values drawn from a set S. Formally this is known as the binomial coefficient and is described by:
The code that implements this is pretty much a literal translation of the equation, with a little data conversion and a shortcut for the case where we want to select no items or the entire set.
There are faster algorithms than this, but this is easy to understand and sufficient for our needs.
Describing Bingo Mathematically
A Bingo board is comprised of a set of squares S, each with a number n chosen at random from the set N. Typically S comprises a 5x5 grid and N is the set of integers from 1 to 75. As an added wrinkle the centre square in the grid is a free space which is always assumed to be hit, this means that a 5x5 grid is actually a 24 element set.
To keep things simple I’m going to assume that we are using a typical board and number set, so I’ll just use constants to hold the cardinalities (sizes) of the two sets.
The first thing we need to figure out is the cumulative probabity of winning after n numbers have been called (i.e. the probability of winning when the 4th number is called, when the 5th number is called, and so on). This can be found by looking at the probability of there being 4 hits and those 4 forming a Bingo, plus the probability of there being 5 hits and those 5 forming a Bingo, and so on up to the number of calls made so far. Note that we start at 4 as this is the minimum number of calls needed for a Bingo.
Assuming that we have a function p(n) which returns the probability of there being a Bingo when there are n hits, the cumulative probability of winning after n calls is given by:
But we still need to define p(n), this is where the small problem space comes in handy…
Assemble a Brute Squad
The probability of a Bingo given n hits on the board is the number of possible Bingo positions divided by the total number of positions, so before we can work out the probability we need to get the number of Bingo positions for each hit count.
As I mentioned earlier will use a brute force method of calculating the probability of a Bingo given n hits on the board. To do this we first need to come up with a representation of a board, as it only takes a single bit to represent the state of each square and there are only 24 squares, we can just use an int for this.
Here’s how we do that: number the squares from 0–23 left-to-right and top-to-bottom like so (FS represents the free space in the centre):
0
1
2
3
4
5
6
7
8
9
10
11
FS
12
13
14
15
16
17
18
19
20
21
22
23
Then starting from the least significant bit we can use a 1 to represent a hit square and a 0 to represent an empty square. So a Bingo across the top row, with all other squares being empty is represented by 0b111110000000000000000000; this and the remaining 11 possible Bingo positions (5 rows, 5 columns, and the 2 diagonals) can be represented like so:
To create the array of Bingo combinations (indexed by number of hits, so a 25 element array) we will just loop through every possible board layout (i.e. the integers 0 through 0b11111111111111111111111) and check each of them against the list of Bingo positions, if there is a match we can use the Integer.bitCount() method to count the number of hits on the board and then increment the counter there. The full code for this is:
The last thing we need before moving on to calculate the actual chances of winning is a function to determine the probability of there being k hits hits on the board after n numbers have been called (the bit in equation \eqref{cumulative} that is being multiplied by p(n) and then summed). It’s just a literal translation of the equation:
We’ve now got everything we need to write the code for equation \eqref{cumulative}, given the previous function and the pre-computed array of proabilities the code to do this is simple:
This is fine for a single board, but what about the odds for a real game, with many players? The odds of finding a winner when there are k boards is given by:
If you want to play with this some more, an interesting excercise would be to generalize this for any size of board and token (number) set. To do that you’ll need to get rid of the static BINGOS array and calculate id dynamically, based on the size of the square, other than that and a coupl of constants that need to be replaced by variables everything else should work as is.
The full code, including the implementations of the print methods can be found on GitHub.
Over the weekend I stumbled across the Programming Praxis web site, a blog that aims to “publishes new programming exercises weekly, at least, so that savvy programmers can maintain their skills by working the exercises and thinking outside their normal skill set”. It’s been running for about three years now and at the time of writing there are almost 400 problems there. I thought that I’d like to have a go at solving a few of them, I’m going to try solving them in Java initially, and then maybe revisit them in other languages to compare the solutions.
I’ve got solutions to the first couple of problems ready to go, and all of my solutions can be found in this project on GitHub.
Reverse Polish Notation Calculator
This is trivial to implement in Java using the built in Console class, the complete implementation (sans class boilerplace and imports) is:
Compared with other solutions the only interesting things are the use of BigDecimal instead of primitive types, this means that the calculator supports a wider range of numbers and input formats, and the use of a Deque as the stack, this is a more modern class than the old Java 1.0 vintage Stack class.
This classic algotrithm is a bit more interesting: my first thought was to lazily create the list of primes using modulo checks to filter out non-prime numbers. Technically this isn’t the Sieve of Eratosthenes, but it’s logically the same. Well, it performed terribly taking several seconds to compute the first million primes.
It turns out that one of the reasons the actual sieve is so fast is that it only uses addition rather than the more expensive modulo operations. This, plus the memory saving gained from using a BitSet instead of a list of Integers gave me a nice, zippy, implementation. The relevant method is:
I started to write a ‘quick’ response to this post about builders and it kind of got out of hand, so I’m putting it up here instead.
The post isn’t bad, but I think that Marcin is getting the builder and factory patterns a little mixed up. To recap:
The intent of the Builder Pattern is to separate out the construction of an object from it’s final representation. Doing this makes it easier to enforce preconditions and invariants, and also makes the object construction easier to read in languages without keyword arguments.
The intent of the Factory Pattern on the other hand, is to delegate responsibility for creating an object to somebody else. It is commonly used in dependency injection frameworks.
A concrete example should serve to illustrate the differences.
A Builder Example
Assume that we have an interface for a simple Pojo:
Without referring to the docs or source code how do you know what all of those strings mean? How do you know that you have them in the correct order? And if it seems reasonable clear in this example imaging if your Pojo was mainly non-string data!
which is much clearer. Ruby, Python, and other languages all have similar constructs. Adding a builder allows us to gain the same level of clarity in Java, and it provides a good place for us to perform any additional checks before creating the object. Here’s a typical implementation and an example of calling it:
publicclassBuilder{privatestaticfinalAtomicInteger_ids=newAtomicInteger();privatestaticStringcheckString(Stringvalue,Stringname){value=nullToEmpty(value).trim();checkArgument(!value.isEmpty(),"%s cannot be null or empty",name);returnvalue;}privateString_id;privateString_name;privateString_title;privateInteger_salary;privateDate_hiredAt;publicBuilderhiredAt(DatehiredAt){_hiredAt=hiredAt;returnthis;}publicBuilderid(Stringid){_id=id;returnthis;}publicBuildername(Stringname){_name=name;returnthis;}publicBuildersalary(intsalary){checkArgument(salary>=0,"salary cannot be negative");_salary=salary;returnthis;}publicBuildertitle(Stringtitle){_title=title;returnthis;}publicEmployeebuild(){returnnewImpl(_id!=null?_id:String.format("emp:%06d",_ids.incrementAndGet()),checkString(_name,"name"),checkString(_title,"title"),checkNotNull(_salary,"salary"),_hiredAt!=null?_hiredAt:newDate());}}// elsewhere ...publicEmployeefindEmployeeById(Stringid){if("1".equals(id)){returnnewBuilder().id(id).name("Fred Foobar").title("Engineer").salary(100000).hiredAt("2001-03-24").build();}returnnull;}
Again, all pretty clear now, and HotSpot will inline all of those method calls no there should be no additional overhead once the JVM is up and running.
A Factory Example
Factories are different, but it would be common for a factory to use a builder to create the objects that it vends. For example, here is a factory for employee objects (it doesn’t need to have the word Factory in it’s name):
We can then have different implementations of this, maybe one that loads data from a CSV or JSON file for testing purposes, and one that loads data via JDBC for production use.
Aside: if you’re familiar with domain-driven design you’ll be forgiven for noticing a lot of overlap between the factory pattern and DDD’s concept of repositories, they’re very similar concepts. One difference being that factories are often able to create new objects ex nihilo while repositories usually retreive objects from external sources. Compare the findById() method with the newInstance() methods employed by many of the factory classes in the JDK.
Hopefully you can see from this post that the two patterns have different—if complementary—aims.
A complete example project with all of this code, as well as test cases and a CSV based implementation the the factory are available on Github.