Monday, July 06, 2009

The FSF doesn't care if they're lonely

Daniel Jalkut, a leading light of the Mac developer community recently blogged about the GPL and some of consequences of it for collaborative software development. I agree with what he says, and I dislike how the GPL limits collaboration in a number of ways. But I thought it might be useful to go over the background and objectives of the GPL.

There's an important thing to remember when talking about the GPL: the impact, good or bad, on the practicalities of software development have nothing to do with its objectives. The authors of the GPL, the Free Software Foundation just aren't concerned by the kinds of issues that Daniel raises. The fundamental principle of the FSF is that anyone should have complete freedom to study, modify and share all computer software. The end-game of the FSF is that all restrictions that apply to software, such as copyrights and patents, be abolished altogether (in so far as they currently apply to software). In other words, they want any software written by anyone to have no legal protection from examination, modification and redistribution by anyone else. The FSF really wants all software to be in the public domain, without the option of copyright protection.
The GPL is their attempt to bring this about given the reality that copyright of software exists today and shows no signs of going away. The GPL is designed to use the copyright system against itself; the FSF calls this "copyleft". In typical commercial software, the copyright is used to prevent the licensee (user) of the software from doing things with it that the author doesn't like (such as modification or redistribution). The GPL instead prevents the licensee from adding any new restrictions. The end result of putting software under the GPL is basically the same as if copyright protection for that software didn't exist at all.
Now, you can make many arguments pointing out the practical downsides of removing copyright restrictions on software (whether by abolishing software copyright, or via the GPL). Daniel's post is a good example of this. But to the FSF, these considerations are secondary to the principle that any restrictions to what users can do with their software is morally Wrong. You can see an example of this in the GNU Manifesto where they acknowledge that adopting their policies could result in programmers earning less. Their view seems to be that doing the "right thing" can have effects that aren't necessarily desirable, but we just have to live with as best we can. Perhaps a good analogy is free speech, which necessarily increases the risk of children accessing pornographic or violent material.
I try to take a pragmatic "what best serves the common good?" approach to matters, which leads me to agree with the FSF with regard to software patents, but disagree with them on copyright. I think patents, when applied to software, are all harm and no benefits. For copyright, I can see greater upside than downside to the current system. It's true that without software copyrights we would all have greater freedom regarding how we can use our computers, but it's also likely that our computers would be of much less practical use to us. Do professional photographers and artists prefer GIMP or Photoshop?
Just to be clear, arguing over the implications of the GPL on software development practice is a very useful thing to do. Many people who don't care about the political position of the FSF still chose to use the GPL simply because it happens to suit their needs, and there's nothing wrong with that. But having a clear understanding of the costs and benefits is essential.

Monday, June 01, 2009

Project Euler Problem 8

I guess I should warn that these Project Euler posts have spoilers, in case you want to try the problems yourself ☺. My Scala solution to problem 8:

val input = "73167176531330624919225119674426574742355349194934" +
"96983520312774506326239578318016984801869478851843" +
"85861560789112949495459501737958331952853208805511" +
"12540698747158523863050715693290963295227443043557" +
"66896648950445244523161731856403098711121722383113" +
"62229893423380308135336276614282806444486645238749" +
"30358907296290491560440772390713810515859307960866" +
"70172427121883998797908792274921901699720888093776" +
"65727333001053367881220235421809751254540594752243" +
"52584907711670556013604839586446706324415722155397" +
"53697817977846174064955149290862569321978468622482" +
"83972241375657056057490261407972968652414535100474" +
"82166370484403199890008895243450658541227588666881" +
"16427171479924442928230863465674813919123162824586" +
"17866458359124566529476545682848912883142607690042" +
"24219022671055626321111109370544217506941658960408" +
"07198403850962455444362981230987879927244284909188" +
"84580156166097919133875499200524063689912560717606" +
"05886116467109405077541002256983155200055935729725" +
"71636269561882670428252483600823257530420752963450"

val digits = input.map(_.asDigit).toArray

def multiply(index: Int) = digits.slice(index, index + 5)
    .foldLeft(1)(_*_)

val multiples: Stream[Int] = {
    def rec(n: Int): Stream[Int] = Stream.cons(multiply(n),
        if (n > digits.length - 5) Stream.empty else rec(n+1))
    Stream.cons(mult(0), rec(1))
}
println(Iterable.max(multiples))

For convenience, this specifies the input as a string, then uses RichChar.asDigit to create a corresponding array of integers. The multiply function uses left fold to multiply together sequences of five digits. The multiples value is a stream (lazy list) of the multiples of all groups of five consecutive digits from the input. And finally, the Iterable object provides a convenient method to find the maximum value in any Iterable containing Orderable things.

Project Euler in Scala

Project Euler is a bunch of mathematical/programming problems. I've been trying to improve my Scala skills by using it on some of the Euler problems. I'm trying to use functional programming styles as much as possible. My solutions definitely haven't been very optimized, but I am learning, which is the aim.

Euler Problem 1

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

My solution:

def sum(nums: Iterable[Int]): Int = (0 /: nums)(_ + _)

println(sum((3 until 1000).filter(i => i%3 == 0 || i%5 == 0)))

The 3 until 1000 creates a "range", which is a lazy sequence of numbers from 3 to 999 inclusive. Because it's lazy, you don't get 997 integers sitting around in memory, you only get those integers which pass the filter. The filter itself shows the Scala syntax for a "lambda expression" (anonymous function). The full syntax is (arg1, arg2, ..) => function body, but when you only have one argument (a single Int, in this case) you can leave out the brackets.

The sum function is the operator syntax for a left fold over nums. The _+_ is an even more terse form of a lambda expression that you can use for really simple expressions. The first underscore is replaced by the first argument to the function, and the second is replaced by the second argument.

Tuesday, October 17, 2006

Logging vs String Construction

A classic logging problem in Java is this:

logger.finest(expensiveStringConstruction());

The problem with this is that even when the logger's level is above FINEST and the message is not logged, the message string must still be constructed. In cases where constructing the string involves a lot of work, your code can be slowed down significantly even when logging is off. The standard solution is like so:

if (logger.isLoggable(Level.FINEST))
{
  logger.finest(expensiveStringConstruction());
}

By adding the if statement around the logging, the expensive string construction is only done if the message is actually going to be used. The problem with this is that if you have a lot of logging statements, your code soon ends up littered with these ifs that just obscure what is really happening. Not to mention that you have to type it all in.

This problem could have been tackled in a different way. It was a mistake for the logging methods to take String objects as parameters. This mandates that the messages must be constructed before passing them to the logger. A better approach is to delay creating the message string until the last possible moment, when we know it will be needed. One option would be to have an interface called, say, LogMessageFactory with a method String createLogMessage() and have the log methods accept the interface instead of String. But a new interface isn't necessarily needed, as good old Object already has a toString method. If loggers took Object's instead of String's, then they could just call toString when logging actually takes place.

Well, the logging API actually does let you do this. LogRecord objects can store a message string and an arbitrary set of parameter objects. The standard formatters will use MessageFormat to construct the string that is finally logged. Here's an example:

public final class TestLog
{

    public static final Logger logger =
        Logger.getLogger("test");

    public static void main(final String[] args)
    {
        final TestLog testLog = new TestLog();
        logger.finest(testLog.toString());     // 1
        logger.log(Level.FINEST, "{0}", testLog);   // 2
    }

    public String toString()
    {
        try
        {
            // expensive string construction here
            Thread.sleep(5000L);
        }
        catch (InterruptedException e)
        {
            assert false;
        }
        return "Hello, world";
    }

}

If the logger is configured to log at FINEST or below, the lines marked 1 and 2 behave identically and the message is logged. However, if the logger's level is above FINEST, then line 1 will take 5 seconds to do nothing and line 2 will just check the level and exit, without calling toString at all.

The advantage of logging this way is that you don't have to put all those ifs statements in. But there are disadvantages. You can't use the convenience logging methods like severe and info, etc. You have to use one of the log methods that accept Object parameters. Unfortunate, but not too egregious.

The bigger disadvantage is that the string generation has to be inside a toString method. While this can be done inside an anonymous inner class, the syntax for that isn't exactly convenient. If you have a bunch of log messages that are generated completely differently, then using anonymous inner classes everywhere will be far worse than just using the if statements. Where it would make sense is if you have a bunch of log statements where the expensive portion of the message creation is the same.

For example, you could replace this:

if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Stuff happened " + expensiveStringCreation());
}
if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Foo happened " + expensiveStringCreation());
}
if (logger.isLoggable(Level.FINEST))
{
   logger.finest("Bar happened " + expensiveStringCreation());
}

with:

final Object msgFactory = new Object()
{
    public String toString()
    {
        return expensiveStringCreation();
    }
};
logger.log(Level.FINEST, "Stuff happened {0}", msgFactory);
logger.log(Level.FINEST, "Foo happened {0}", msgFactory);
logger.log(Level.FINEST, "Bar happened {0}", msgFactory);

I'm assuming here that there's code in between these log calls that changes the state and hence the string returned by expensiveStringCreation.

There's a lot of talk at the moment about adding closures to Java. Whatever ends up being adopted, this is definitely one of the scenarios it should assist with (although what I'm describing only requires the most basic capabilities of closures). I'm a fan of the CICE proposal, although some of the objections raised seem valid. The CICE proposal requires an interface with a single method, so if we did have the LogMessageFactory interface I proposed above, we could just do:

logger.finest(LogMessageFactory()
    { return "Stuff happened " + expensiveStringCreation(); });

Monday, September 25, 2006

XML vs Streams

Ever get the feeling that XML sometimes makes things harder than they need to be? One of the problems that I and others have run into is that when parsing XML using SAX (or, in my case, using JAXB to unmarshall XML, which uses SAX under the hood) from an InputStream, the stream is always closed after the parsing is complete. This is surprising behaviour in Java, because you're not supposed to close streams you don't own. This can be a problem when using network sockets or when you want to write two or more unrelated XML documents to the same stream.

The root cause of this isn't anything in Java or SAX, it's the XML specification itself. There's simply no way to tell when an XML document ends. Every well-formed XML document will have an end tag, so you'd think the parser could simply stop when it reached the end tag. However, the XML specification allows for an arbitrary amount of comments and processing instructions to follow the main element. So when reading from a stream, the parser has no choice but to keep reading until it gets to the end of the stream. After reading to the end of the stream, SAX calls close on it. I believe the reason for this is that the stream has been completely read anyway, so there's no point keeping it open.

I've had a couple of cases where I've wanted to write two or more independent XML documents to the same stream. In one case, I had a thread producing XML documents and writing them to a PipedOutputStream. Another thread would read from the corresponding PipedInputStream. The second case was a save file for an application that happened to include two unrelated XML documents. Because of the "uncertain ending" feature of the XML spec, neither of these things are possible.

Not unless you use customised input and output streams, that is. After some searching, I found some classes from Xerces called WrappedOutputStream and WrappedInputStream that allowed me to do what I wanted. They actually solve the more general problem of reading data of unknown length without reading too much. I ended up writing my own implementations, but I used the same "protocol" as these classes. I can now read and write XML over streams without having to use a new stream for each document.

Sunday, March 26, 2006

Java's clone mechanism is teh suq

Doing cloning properly in Java is way harder than it should be. I thought I knew all the traps, but recently found yet another. In a nutshell, making inner classes cloneable is almost always a bad idea. The reason is that inner classes have an internal reference to an instance of their enclosing class, which is implicitly final and can't be changed as part of a clone. For example:

class Outer implements Cloneable
{
    int i = 0;
    Inner inner = new Inner();

    public Object clone()
    {
        try
        {
            Outer clone = (Outer)super.clone();
            clone.inner = (Inner)inner.clone(); // A.
                 Cloned inner still refers to this Outer instance
            return clone;
        }
        catch (CloneNotSupportedException e)
        {
        }
    }

    class Inner implements Cloneable
    {
        int j = 0;
        
        public Object clone()
        {
            try
            {
                return super.clone();  // B. WARNING!
                      - this does the Wrong Thing
            }
            catch (CloneNotSupportedException e)
            {
            }
        }

        void inc()
        {
            i++;
            j++;
        }
    }
}

This code is attempting to do a deep clone of an Outer instance, which requires a deep clone of the Inner it references. The problem occurs at B, because the object returned by super.clone refers to the same Outer instance as the original. So if you do this:

Outer outer1 = new Outer();
Outer outer2 = (Outer)outer1.clone();
outer2.inner.inc();  // will change i in outer1 instead of outer2
System.out.println(outer1.i);
System.out.println(outer2.i);

You get:

1
0

Because the cloned outer2.inner has an internal reference to the original outer1 instance. There may be cases where you want it to work that way, but it seems very unlikely. So what to do? Calling super.clone() from an inner class is basically broken, and there's no way to fix it. If you're able to make the inner class final, this can be avoided relatively easily by using a copy constructor to clone the inner class instead of Object.clone:

class Outer implements Cloneable
{
    int i = 0;
    Inner inner = new Inner();

    public Object clone()
    {
        try
        {
            Outer clone = (Outer)super.clone();
            clone.inner = clone.new Inner(inner);  // C.
                  Use "qualified" copy constructor
            return clone;
        }
        catch (CloneNotSupportedException e)
        {
        }
    }

    final class Inner  // no longer Cloneable
    {
        int j = 0;
        
        Inner(Inner inner)
        {
            j = inner.j;
        }

        void inc()
        {
            i++;
            j++;
        }
    }
}

This gives the desired output of:

0
1

Check out the change at line C, it's actually valid syntax. It's called a "qualified" new, where you explicitly specify the enclosing instance of the new inner class instance.

Using copy constructors for cloning is only OK for final classes, because they aren't polymorphic. If we had sublcasses of Inner involved here, then it gets harder - Object.clone is the only easy way to get the right class instantiated. Idea will actually give a warning on line C for this reason.

Another, possibly better, solution is to make Inner a standard (non-inner) class with an explicit, non-final reference to its Outer instance. Basically, do the inner class stuff by hand. Because the reference to the outer class isn't final, it can be fixed up before returning from clone(). Of course, this lacks to convenience that was the motivation for inner classes in the first place.

More info on Java's cloning woes: http://www.adtmag.com/java/articleold.asp?id=364.

Friday, December 23, 2005

Handling InvocationTargetException

A quick hint on handling java.lang.reflect.InvocationTargetException.

This sucker is thrown whenever you invoke a method via reflection (typically via Method.invoke), and that method throws any sort of exception. So it doesn't indicate a problem with the reflection mechanism, it is simply how the reflection mechanism notifies you of exceptions thrown by the method you called.

Whenever you catch this exception (and you will have to catch it at some level), the first thing you should do is call getCause(). This returns the exception thrown from the method that was invoked via reflection, and that's what you care about. So whenever you need to log or report an occurrence of InvocationTargetException, always use the cause object rather than the InvocationTargetException itself.