Government data – please do it right

The Australian government 2.0 taskforce has an initiative to make data available for public remixing and use: after all its public property anyway, right? They have even run a mashup competition.

Notably missing from the excellent collection of data that has been opened is the NSW Transport and Infrastructure dataset for public transport in NSW. There is a similar dataset for the Northern Territory in the mashup transport section.

The NT dataset is under the fantastic cc-by licence. You can write an iphone app with this, a journey planner that you can cart with you while disconnected; a ‘find the closest bus I can walk to’ tool, or – well let the imagination run wild.

The NSW dataset is under a heavily restrictive license. Its so restrictive I’m not sure its feasible to write an open source tool using its data.

The meta-issue is that NSW T&I department wants control over the applications built with this data. This adds a tremendous chilling effect on potential uses of the data: the department will have to approve, with a long lead time, every use of the data, and get to tell the ‘application developer’ what to changes to make to their application.

I strongly doubt that a simple remixing of the data (e.g. with weather reports to prefer buses on very wet day) would be permitted, as it would allow other users to just read the remix and get the original data /without entering into a license agreement/.

I’m sure there is some unstated risk of openess, or benefit of control, that is shaping this problematic approach. Whatever the cause, its not open at all.

Given that the overall approach is fundamentally flawed, a blow by blow analysis of the custom license isn’t particularly useful, however I thought I would pick some highlights out to save folk the trouble ;)

  1. The dataset is behind a username/password wall [that you cannot share with others].
  2. Licensees may not be private – everyone must know you’re using the data.
  3. You must link to the website
  4. You may not charge users for an app that has to be redeveloped if the dataset changes shape
  5. Any application written to use the dataset must be given to the department 30 days before release to the public.
  6. The department gets to ‘suggest changes’ to any announcement related to the developers app, the license agreement or the dataset.
  7. The dataset is embargoed – you cannot share it with others.
  8. The use of the dataset has to be logged and reported.
  9. There is a restraint of use in there as well – related to Inappropriate and Offensive Material. It wouldn’t affect me, but sheese, given all the other restraints its hardly needed.

There are more gems in the details, but in short:

The department will control what, where, when and how (the data is accessed, the application’s functionality/appearance, how it was used). Hell, the 30 day requirement alone makes for slow delivery of whatever someone wants to build.

I really hope this can be improved on.


Subunit 0.0.3 should be a great little release. Its not ready yet, but some key things have been done.

Firstly, its been relicensed under BSD/Apache version 2. This makes using Subunit with other test frameworks much easier, as those frameworks tend to be permissive licenses such as the LGPL, BSD or Apache. Thanks go out to the contributors to Subunit who made this process very painless.

Secondly, the C client code is getting a few small touch ups, probably not enough to reach complete feature parity with the Python reporter.

Thirdly, the CPPUnit patch that Subunit has carried for ages has been turned into a small library built by Subunit, so you’ll be able to just install that into an existing CPPUnit environment without rebuilding CPPUnit.

Lastly, but most importantly it will have hopefully the last major protocol change (still backwards compatible!) needed for 1.0 – the ability to attach fairly arbitrary debug data in an outcome (things like ‘stdout’, ‘stderr’, ‘a log file X’ and so forth). This will be used via an experimental object protocol – the one I proposed on the Testing In Python list.

I should get the protocol changes done on the flight to Montreal tomorrow, which would be a great way for me to get my mind fully focused on testing for the sprint next week.