Lachlan's Rambling

Monday, June 01, 2009

Project Euler Problem 8

I guess I should warn that these Project Euler posts have spoilers, in case you want to try the problems yourself ☺. My Scala solution to problem 8:

val input = "73167176531330624919225119674426574742355349194934" +
"96983520312774506326239578318016984801869478851843" +
"85861560789112949495459501737958331952853208805511" +
"12540698747158523863050715693290963295227443043557" +
"66896648950445244523161731856403098711121722383113" +
"62229893423380308135336276614282806444486645238749" +
"30358907296290491560440772390713810515859307960866" +
"70172427121883998797908792274921901699720888093776" +
"65727333001053367881220235421809751254540594752243" +
"52584907711670556013604839586446706324415722155397" +
"53697817977846174064955149290862569321978468622482" +
"83972241375657056057490261407972968652414535100474" +
"82166370484403199890008895243450658541227588666881" +
"16427171479924442928230863465674813919123162824586" +
"17866458359124566529476545682848912883142607690042" +
"24219022671055626321111109370544217506941658960408" +
"07198403850962455444362981230987879927244284909188" +
"84580156166097919133875499200524063689912560717606" +
"05886116467109405077541002256983155200055935729725" +
"71636269561882670428252483600823257530420752963450"

val digits = input.map(_.asDigit).toArray

def multiply(index: Int) = digits.slice(index, index + 5)
    .foldLeft(1)(_*_)

val multiples: Stream[Int] = {
    def rec(n: Int): Stream[Int] = Stream.cons(multiply(n),
        if (n > digits.length - 5) Stream.empty else rec(n+1))
    Stream.cons(mult(0), rec(1))
}
println(Iterable.max(multiples))

For convenience, this specifies the input as a string, then uses RichChar.asDigit to create a corresponding array of integers. The multiply function uses left fold to multiply together sequences of five digits. The multiples value is a stream (lazy list) of the multiples of all groups of five consecutive digits from the input. And finally, the Iterable object provides a convenient method to find the maximum value in any Iterable containing Orderable things.

Project Euler in Scala

Project Euler is a bunch of mathematical/programming problems. I've been trying to improve my Scala skills by using it on some of the Euler problems. I'm trying to use functional programming styles as much as possible. My solutions definitely haven't been very optimized, but I am learning, which is the aim.

Euler Problem 1

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

My solution:

def sum(nums: Iterable[Int]): Int = (0 /: nums)(_ + _)

println(sum((3 until 1000).filter(i => i%3 == 0 || i%5 == 0)))

The 3 until 1000 creates a "range", which is a lazy sequence of numbers from 3 to 999 inclusive. Because it's lazy, you don't get 997 integers sitting around in memory, you only get those integers which pass the filter. The filter itself shows the Scala syntax for a "lambda expression" (anonymous function). The full syntax is (arg1, arg2, ..) => function body, but when you only have one argument (a single Int, in this case) you can leave out the brackets.

The sum function is the operator syntax for a left fold over nums. The _+_ is an even more terse form of a lambda expression that you can use for really simple expressions. The first underscore is replaced by the first argument to the function, and the second is replaced by the second argument.

Tuesday, October 17, 2006

Logging vs String Construction

A classic logging problem in Java is this:

logger.finest(expensiveStringConstruction());

The problem with this is that even when the logger's level is above FINEST and the message is not logged, the message string must still be constructed. In cases where constructing the string involves a lot of work, your code can be slowed down significantly even when logging is off. The standard solution is like so:

if (logger.isLoggable(Level.FINEST))
{
  logger.finest(expensiveStringConstruction());
}

By adding the if statement around the logging, the expensive string construction is only done if the message is actually going to be used. The problem with this is that if you have a lot of logging statements, your code soon ends up littered with these ifs that just obscure what is really happening. Not to mention that you have to type it all in.

This problem could have been tackled in a different way. It was a mistake for the logging methods to take String objects as parameters. This mandates that the messages must be constructed before passing them to the logger. A better approach is to delay creating the message string until the last possible moment, when we know it will be needed. One option would be to have an interface called, say, LogMessageFactory with a method String createLogMessage() and have the log methods accept the interface instead of String. But a new interface isn't necessarily needed, as good old Object already has a toString method. If loggers took Object's instead of String's, then they could just call toString when logging actually takes place.

Well, the logging API actually does let you do this. LogRecord objects can store a message string and an arbitrary set of parameter objects. The standard formatters will use MessageFormat to construct the string that is finally logged. Here's an example:

public final class TestLog
{

    public static final Logger logger =
        Logger.getLogger("test");

    public static void main(final String[] args)
    {
        final TestLog testLog = new TestLog();
        logger.finest(testLog.toString());     // 1
        logger.log(Level.FINEST, "{0}", testLog);   // 2
    }

    public String toString()
    {
        try
        {
            // expensive string construction here
            Thread.sleep(5000L);
        }
        catch (InterruptedException e)
        {
            assert false;
        }
        return "Hello, world";
    }

}

If the logger is configured to log at FINEST or below, the lines marked 1 and 2 behave identically and the message is logged. However, if the logger's level is above FINEST, then line 1 will take 5 seconds to do nothing and line 2 will just check the level and exit, without calling toString at all.

The advantage of logging this way is that you don't have to put all those ifs statements in. But there are disadvantages. You can't use the convenience logging methods like severe and info, etc. You have to use one of the log methods that accept Object parameters. Unfortunate, but not too egregious.

The bigger disadvantage is that the string generation has to be inside a toString method. While this can be done inside an anonymous inner class, the syntax for that isn't exactly convenient. If you have a bunch of log messages that are generated completely differently, then using anonymous inner classes everywhere will be far worse than just using the if statements. Where it would make sense is if you have a bunch of log statements where the expensive portion of the message creation is the same.

For example, you could replace this:

if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Stuff happened " + expensiveStringCreation());
}
if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Foo happened " + expensiveStringCreation());
}
if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Bar happened " + expensiveStringCreation());
}

with:

final Object msgFactory = new Object()
{
    public String toString()
    {
        return expensiveStringCreation();
    }
};
logger.log(Level.FINEST, "Stuff happened {0}", msgFactory);
logger.log(Level.FINEST, "Foo happened {0}", msgFactory);
logger.log(Level.FINEST, "Bar happened {0}", msgFactory);

I'm assuming here that there's code in between these log calls that changes the state and hence the string returned by expensiveStringCreation.

There's a lot of talk at the moment about adding closures to Java. Whatever ends up being adopted, this is definitely one of the scenarios it should assist with (although what I'm describing only requires the most basic capabilities of closures). I'm a fan of the CICE proposal, although some of the objections raised seem valid. The CICE proposal requires an interface with a single method, so if we did have the LogMessageFactory interface I proposed above, we could just do:

logger.finest(LogMessageFactory()
    { return "Stuff happened " + expensiveStringCreation(); });

Monday, September 25, 2006

XML vs Streams

Ever get the feeling that XML sometimes makes things harder than they need to be? One of the problems that I and others have run into is that when parsing XML using SAX (or, in my case, using JAXB to unmarshall XML, which uses SAX under the hood) from an InputStream, the stream is always closed after the parsing is complete. This is surprising behaviour in Java, because you're not supposed to close streams you don't own. This can be a problem when using network sockets or when you want to write two or more unrelated XML documents to the same stream.

The root cause of this isn't anything in Java or SAX, it's the XML specification itself. There's simply no way to tell when an XML document ends. Every well-formed XML document will have an end tag, so you'd think the parser could simply stop when it reached the end tag. However, the XML specification allows for an arbitrary amount of comments and processing instructions to follow the main element. So when reading from a stream, the parser has no choice but to keep reading until it gets to the end of the stream. After reading to the end of the stream, SAX calls close on it. I believe the reason for this is that the stream has been completely read anyway, so there's no point keeping it open.

I've had a couple of cases where I've wanted to write two or more independent XML documents to the same stream. In one case, I had a thread producing XML documents and writing them to a PipedOutputStream. Another thread would read from the corresponding PipedInputStream. The second case was a save file for an application that happened to include two unrelated XML documents. Because of the "uncertain ending" feature of the XML spec, neither of these things are possible.

Not unless you use customised input and output streams, that is. After some searching, I found some classes from Xerces called WrappedOutputStream and WrappedInputStream that allowed me to do what I wanted. They actually solve the more general problem of reading data of unknown length without reading too much. I ended up writing my own implementations, but I used the same "protocol" as these classes. I can now read and write XML over streams without having to use a new stream for each document.

Sunday, March 26, 2006

Java's clone mechanism is teh suq

Doing cloning properly in Java is way harder than it should be. I thought I knew all the traps, but recently found yet another. In a nutshell, making inner classes cloneable is almost always a bad idea. The reason is that inner classes have an internal reference to an instance of their enclosing class, which is implicitly final and can't be changed as part of a clone. For example:

class Outer implements Cloneable
{
    int i = 0;
    Inner inner = new Inner();

    public Object clone()
    {
        try
        {
            Outer clone = (Outer)super.clone();
            clone.inner = (Inner)inner.clone(); // A.
                 Cloned inner still refers to this Outer instance
            return clone;
        }
        catch (CloneNotSupportedException e)
        {
        }
    }

    class Inner implements Cloneable
    {
        int j = 0;
        
        public Object clone()
        {
            try
            {
                return super.clone();  // B. WARNING!
                      - this does the Wrong Thing
            }
            catch (CloneNotSupportedException e)
            {
            }
        }

        void inc()
        {
            i++;
            j++;
        }
    }
}

This code is attempting to do a deep clone of an Outer instance, which requires a deep clone of the Inner it references. The problem occurs at B, because the object returned by super.clone refers to the same Outer instance as the original. So if you do this:

Outer outer1 = new Outer();
Outer outer2 = (Outer)outer1.clone();
outer2.inner.inc();  // will change i in outer1 instead of outer2
System.out.println(outer1.i);
System.out.println(outer2.i);

You get:

1
0

Because the cloned outer2.inner has an internal reference to the original outer1 instance. There may be cases where you want it to work that way, but it seems very unlikely. So what to do? Calling super.clone() from an inner class is basically broken, and there's no way to fix it. If you're able to make the inner class final, this can be avoided relatively easily by using a copy constructor to clone the inner class instead of Object.clone:

class Outer implements Cloneable
{
    int i = 0;
    Inner inner = new Inner();

    public Object clone()
    {
        try
        {
            Outer clone = (Outer)super.clone();
            clone.inner = clone.new Inner(inner);  // C.
                  Use "qualified" copy constructor
            return clone;
        }
        catch (CloneNotSupportedException e)
        {
        }
    }

    final class Inner  // no longer Cloneable
    {
        int j = 0;
        
        Inner(Inner inner)
        {
            j = inner.j;
        }

        void inc()
        {
            i++;
            j++;
        }
    }
}

This gives the desired output of:

0
1

Check out the change at line C, it's actually valid syntax. It's called a "qualified" new, where you explicitly specify the enclosing instance of the new inner class instance.

Using copy constructors for cloning is only OK for final classes, because they aren't polymorphic. If we had sublcasses of Inner involved here, then it gets harder - Object.clone is the only easy way to get the right class instantiated. Idea will actually give a warning on line C for this reason.

Another, possibly better, solution is to make Inner a standard (non-inner) class with an explicit, non-final reference to its Outer instance. Basically, do the inner class stuff by hand. Because the reference to the outer class isn't final, it can be fixed up before returning from clone(). Of course, this lacks to convenience that was the motivation for inner classes in the first place.

More info on Java's cloning woes: http://www.adtmag.com/java/articleold.asp?id=364.

Friday, December 23, 2005

Handling InvocationTargetException

A quick hint on handling java.lang.reflect.InvocationTargetException.

This sucker is thrown whenever you invoke a method via reflection (typically via Method.invoke), and that method throws any sort of exception. So it doesn't indicate a problem with the reflection mechanism, it is simply how the reflection mechanism notifies you of exceptions thrown by the method you called.

Whenever you catch this exception (and you will have to catch it at some level), the first thing you should do is call getCause(). This returns the exception thrown from the method that was invoked via reflection, and that's what you care about. So whenever you need to log or report an occurrence of InvocationTargetException, always use the cause object rather than the InvocationTargetException itself.

Thursday, December 22, 2005

How not to catch Exceptions

When it comes to bad Java practices, top of my list would have to be catching exceptions like this:

try
{
   // stuff - anything really
}
catch (Exception e)
{
   // maybe log it if you're lucky
}

This is usually done because the "stuff" throws more than one kind of checked exception, and the code to handle each case is the same. So what's wrong with it? The biggest problem is that this not only catches the checked exceptions you're interested in, but also all RuntimeExceptions. The vast majority of RuntimeExceptions are only thrown from buggy code, and catching them is rarely a good idea. The above catch handles all kinds of exceptions the same way, regardless of whether it was caused by a bug or an ordinary failure that can happen from time to time when a program is running (e.g. a network failure). This only obscures bugs.

Consider the case where a bug causes a NullPointerException to be thrown - perhaps the most common type of bug in Java. If the catch clause simply logs the exception message (not uncommon), then you have to check the log to have any idea why your app is misbehaving. And if you don't log the stack trace (again, not uncommon), you still won't have a clue what caused it. A much better option is to simply let RuntimeExceptions go, which produces a screeching halt in your app and, in most cases, an immediate stack trace dump telling you exactly what happened. The details of what happens when you don't catch a RuntimeException at all will vary depending on the environment you're running in, but it's always safe to do so. For example, servlet containers usually output the stack trace in the HTTP response when a servlet throws a RuntimeException.

A good principle of catching exceptions is to only catch the those specific exceptions that you can actually deal with. E.g.

try
{
   // stuff that throws AException
   // stuff that throws BException
}
catch (AException e)
{
    // deal
}
catch (BException e)
{
    // deal
}

This way just deals with the exceptions that you can sensibly recover from and lets any RuntimeExceptions that indicate bugs to be thrown to the caller. If the code to handle both AException and BException is the same, you could always move it to a method.

"But what if the code throws, like, 10 different kinds of checked exceptions?". I've found it's actually pretty rare to have a block of code that throws more than about 4 unrelated checked exceptions. Some of the reflection APIs are pretty bad with this, however. It can be a pain to enter basically identical catch clauses for every one. A good alternative is:

try
{
   // stuff that throws AException
   // stuff that throws BException
   // stuff that throws CException
   // stuff that throws DException
}
catch (RuntimeException e)
{
   throw e;
}
catch (Exception e)
{
   // deal with all checked exceptions in the same way
}

This is actually not a bad way to get the best of both worlds (although I usually find it clearer to just handle each checked exception individually).

I think in hindsight, the inheritance hierarchy of exceptions is backwards. Instead of the RuntimeException extends Exception extends Throwable hierarchy, it should have been CheckedException extends UncheckedException extends Throwable. That would allow catching all checked exceptions in a single catch clause, while not obscuring unchecked exceptions thrown by buggy code.

In a similar vein, you almost never want to declare a method with throws Exception. This is basically throwing away the whole checked exception system and saying "this method can throw any kind of exception". This tends to make exception handling in the caller painful, and may give the caller no good option but to declare throws Exception itself. Before long, you end up with all your methods declared as throwing Exception and you have little indication of what exceptions you might actually have to deal with at runtime. (There is an argument to made that checked exceptions are a bad idea altogether, but that subject will have to wait for another day.)

There are a few cases where you might have a legitimate reason to catch Exception, or Error, or Throwable. Here are some of them:

When implementing Runnable. Reporting RuntimeExceptions thrown from Runnable.run is a bit of a pain in Java versions 1.4 and earlier. It's often easier just to catch them yourself and log the entire stack trace somewhere it can't be missed. Java 5 has a nicer mechanism for doing this, however.
When calling some sort of plug-in module. In this case, a bug in the plug-in doesn't mean the whole application needs to abort. You can catch the exception, notify the user, unload the plug-in and move on.
When dealing with poorly designed APIs that throw Exception.