Learning is hard

I feel like I’m taking a big personal risk writing this, even though I know the internet is large and probably no-one will read this :-).

So, dear reader, please be gentle.

As we grow – as people, as developers, as professionals – some lessons are are hard to learn (e.g. you have to keep trying and trying to learn the task), and some are hard to experience (they might still be hard to learn, but just being there is hard itself…) I want to talk about a particular lesson I started learning in late 2008/early 2009 – while I was at Canonical – sadly one of those that was hard to experience.

At the time I was one of the core developers on Bazaar, and I was feeling pretty happy about our progress, how bzr was developing, features, community etc. There was a bunch of pressure on to succeed in the marketplace, but that was ok, challenges bring out the stubborn in me :). There was one glitch though – we’d been having a bunch of contentious code reviews, and my manager (Martin Pool) was chatting to me about them.

I was – as far as I could tell – doing precisely the right thing from a peer review perspective: I was safeguarding the project, preventing changes that didn’t fit properly, or that reduced key aspects- performance, usability – from landing until they were fixed.

However, the folk on the other side of the review were feeling frustrated, that nothing they could do would fix it, and generally very unhappy. Reviews and design discussions would grind to a halt, and they felt I was the cause. [They were right].

And here was the thing – I simply couldn’t understand the issue. I was doing my job; I wasn’t angry at the people submitting code; I wasn’t hostile; I wasn’t attacking them (but I was being shall we say frank about the work being submitted). I remember saying to Martin one day ‘look, I just don’t get it – can you show me what I said wrong?’ … and he couldn’t.

Canonical has a 360′ review system – every 6 months / year (it changed over time) you review your peers, subordinate(s) and manager(s), and they review you. Imagine my surprise – I was used to getting very positive reports with some constructive suggestions – when I scored low on a bunch of the inter-personal metrics in the review. Martin explained that it was the reviews thing – folk were genuinely unhappy, even as they commended me on my technical merits. Further to that, he said that I really needed to stop worrying about technical improvement and focus on this inter-personal stuff.

Two really important things happened around this time. Firstly, Steve Alexander, who was one of my managers-once-removed at the time, reached out to me and suggested I read a book – Getting out of the box – and that we might have a chat about the issue after I had read it. I did so, and we chatted. That book gave me a language and viewpoint for thinking about the problem. It didn’t solve it, but it meant that I ‘got it’, which I hadn’t before.

So then the second thing happened – we had a company all hands and I got to chat with Claire Davis (head of HR at Canonical at the time) about what was going on. To this day the sheer embarrassment I felt when she told me that the broad perception of me amongst other teams managers was – and I paraphrase a longer, more nuance conversation here – “technically fantastic but very scary to have on the team – will disrupt and cause trouble”.

So, at this point about 6 months had passed, I knew what I wanted – I wanted folk to want to work with me, to find my presence beneficial and positive on both technical and team aspects. I already knew then that what I seek is technical challenges: I crave novelty, new challenges, new problems. Once things become easy, it call all too easily slip into tedium. So at that time my reasoning was somewhat selfish: how was I to get challenges if no-one wanted to work with me except in extremis?

I spent the next year working on myself as much as specific projects: learning more and more about how to play well with others.

In June 2010 I got a performance review I could be proud of again – I was – in no way – perfect, but I’d made massive strides. This journey had also made huge improvements to my personal life – a lot of stress between Lynne and I had gone away. Shortly after that I was invited to apply for a new role within Canonical as Technical Architect for Launchpad – and Francis Lacoste told me that it was only due to my improved ability to play well with others that I was even considered. I literally could not have done the job 18 months before. I got the job, and I think I did pretty well – in fact I was awarded an internal ‘Spotlight on Success’ award for what we (it was a whole Launchpad team team effort) achieved while I was in that role.

So, what did I change/learn? There’s just a couple of key changes I needed to make in myself, but a) they aren’t sticky: if I get overly tired, ye old terrible Robert can leak out, and b) there’s actually a /lot/ of learnable skills in this area, much of which is derived – lots of practice and critical self review is a good thing. The main thing I learnt was that I was Selfish. Yes – capital S. For instance, in a discussion about adding working tree filter to bzr, I would focus on the impact/risk on me-and-things-I-directly-care-about: would it make my life harder, would it make bzr slower, was there anything that could go wrong. And I would spend only a little time thinking about what the proposer needed: they needed support and assistance making their idea reach the standards the bzr community had agreed on. The net effect of my behaviours was that I was a class A asshole when it came to getting proposals into a code base.

The key things I had to change were:

  1. I need to think about the needs of the person I’m speaking to *and not my own*. [Thats not to say you should ignore your needs, but you shouldn’t dwell on them: if they are critical, your brain will prompt you].
  2. There’s always a reason people do things: if it doesn’t make sense, ask them!  [The crucial conversations books have some useful modelling here on how and why people do things, and on how-and-why conversations and confrontations go bad and how to fix them.]

Ok so this is all interesting and so forth, but why the blog post?

Firstly, I want to thank four folk who were particularly instrumental in helping me learn this lesson: Martin, Steve, Claire and of course my wife Lynne – I owe you all an unmeasurable debt for your support and assistance.

Secondly, I realised today that while I’ve apologised one on one to particular folk who I knew I’d made life hard for, I’d never really made a widespread apology. So here it is: I spent many years as an ass, and while I didn’t mean to be one, intent doesn’t actually count here – actions do. I’m sorry for making your life hell in the past, and I hope I’m doing better now.

Lastly, if I’m an ass to you now, I’m sorry, I’m probably regressing to old habits because I’m too tired – something I try to avoid, but it’s not always possible. Please tell me, and I will go get some sleep then come and apologise to you, and try to do better in future.


key transition time

I’ve transitioned to a new key – announcement here or below. If you’ve signed my key in the past please consider signing my new key to get it integrated into the web of trust. Thanks!

Hash: SHA1,SHA256

Sun, 2013-10-13

Time for me to migrate to a new key (shockingly late - sorry!).

My old key is set to expire early next year. Please use my new key effective
immediately. If you have signed my old key then please sign my key - this
message is signed by both keys (and the new key is signed by my old key).

old key:
pub 1024D/FBD3EB8E 2002-07-20
Key fingerprint = 9222 8732 859D 25CC 2560 B617 867B F9A9 FBD3 EB8E

new key:
pub 4096R/AAC0E286 2013-10-13
Key fingerprint = 8244 0CEA B440 83C7 9431 D2CC 298E 9A19 AAC0 E286

The new key is up on the keyservers, so you can just pull it from there.

- -Rob
Version: GnuPG v2.0.19 (GNU/Linux)


El cheapo 10Gbps networking

I’ve been hitting the limits of gigabit ethernet at home for quite a while now, and as I spend more time working with cloud technologies this started to frustrate me.

I’d heard of other folk getting good results with second hand Infiniband cards and decided to give it a go myself.

I bought two Voltaire dual-port Infiniband adapters – a 4X SDR PCI-E x4 card. And in a 2 metre 8470 cable, and we’re in business.

There are other, more comprehensive guides around to setting this up – e.g. http://davidhunt.ie/wp/?p=2291 or http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html

On ubuntu the hardware was autodetected; all I needed to do was:

modprobe ib_ipoib
sudo apt-get install opensm # on one machine

And configure /etc/network/interfaces – e.g.:

iface ib1 inet static
up echo connected >`find /sys -name mode | grep ib1`
up echo 65520 >`find /sys -name mtu | grep ib1`

With no further tuning I was able to get 2Gbps doing linear file copies via Samba, which I suspect is rather pushing the limits of my circa 2007 home server – I’ll investigate futher to identify where the bottlenecks are, but the networking itself I suspect is ok – netperf got me 6.7Gbps in a trivial test.

Launchpadlib without gnome-keyring

Recently I’ve been doing my personal development SSH’d into my personal laptop. I found that launchpadlib (which various projects use for release automation) was failing – the gnome keyring API threw an error because the keyring was locked, and python-keyring didn’t try to unlock it.

I needed a workaround to be able to release stuff, and with a bit of digging and help from #launchpad, came up with this:

mkdir ~/.cache/keyring
mkdir ~/.local/share/python_keyring
echo > ~/.local/share/python_keyring/keyringrc.cfg << EOF

(There is already encryption in place, so I chose an uncrypted store – read the keyring source to find other alternatives).

With this done, I can now use lp-shell etc over SSH, for when I’m not physically at my machine.

Running juju against a private openstack instance.

My laptop has somewhat less than 1/2 the grunt of my desktop at home, but I prefer to work on it as I can go sit in the sun etc, very hard to do that with a mini tower case 🙂

However, running everything through ssh to another machine makes editing and iterating more clumsy; I need to do agent forwarding etc – not terribly hard, but not free either, particularly when I travel, I need to remember to sync my source trees back to my laptop. So I prefer to live on my laptop and use my desktop for compute power.

I had a couple of Juju charms I wanted to investigate, but I needed enough compute power to make my laptop really quite warm – so I thought, its time to update my local cloud provider from Eucalyptus to Openstack. This was easy enough, until I came to run Juju. Turns out that Juju’s commands really want to talk to the public DNS name of the instance (in order to SSH tunnel a connection to Zookeeper).

But! Openstack returns DNS names like ‘Server-3’, and if you think about a home network, its fairly rare to have a local DNS server *anyway*, so putting a suffix on names like that won’t help at all: you either need to use a DNS naming provider (openstack ships with an LDAP provider, which adds even more complexity), and configure your clients to know how to find it, or you need to use the public IP addresses (which default to the FlatNetwork, which is routable within a home LAN by simply adding a route to to your wifi interface). Adding to confusion, some wifi routers fail to forward avahi messages, which is a) terrible and b) breaks the only obvious way of doing no-config local DNS :(.

So, I did some yak shaving this morning. Turns out other folk have already run into this and filed a Juju bug and a supporting txaws bug. The txaws bug was fixed, but just missed the release of Precise. Clint Byrum is going to SRU it this week though, so we’ll have it soon. I’ve put a patch up to address the Juju side, which is now pending review. Running the two together works very happily for me. \o/

dmraid (fakeraid) mirror + striped

While some folk look down on fakeraid (that is BIOS based RAID-until-OS-takes-over) solutions, I think they are pretty neat: they let a user get many of the benefits of dedicated controller cards at a fraction of the cost. The benefits include the usual ones for RAID – more spindles to handle IO, tolerance of disk failures. And unlike pure LVM solutions, you can boot from a degraded RAID 1 / 5 / 10 set because the BIOS knows how.

In some ways this is better than dedicated cards, because we have the software take over, so we can change the algorithms for IO dispatch all the way down to the individual devices 🙂

However, these RAID volumes are in a pretty awkward spot for installers and bootloaders: inside a running Linux environment they look like software RAID which cannot be depended on for booting, but at boot time they look like hard disks which cannot be looked under the hood.

I recently got a new desktop machine which has one of these motherboards, and fortuitously my old desktop I was replacing had the same size disks – so I had 4 disks and the option of using a RAID setup. Apparently I’m a sucker for punishment because I went for a RAID 10 (that is two RAID volumes made up of two-disk mirrors (the RAID 1 component), and then those two volumes are combined via striping (the RAID 0 component). This has the potential for pretty nice performance: in principle any read can come from one of 2 disks, and every 64KB (the stripe size) of linear data will switch to the other mirror set, giving a nice boost. Writes need to write to 2 disks always, but every 64KB worth of data will alternate mirror sets, also giving a boost.

Sadly we (Ubuntu) aren’t ready for this yet: there are two key bugs that make this layout almost impossible to install into. This blog post is for my exo-memory, I want to be able to figure out what I did next time around :).

Firstly parted_devices, a helper used by Ubiquity and debian-installer to determine which block devices are actually disk drives that one can partition and install onto, has a confused heuristic – when dealing with dmraid it looks for devices which are not layered on other dmraid devices. This handily excludes partitions, but has the undesirable effect of excluding that striped device – because it is layered on the two mirrored devices. Bug 560748 was filed about that, and I’ve added a workaround to it – basically disabling the filtering, so its not suitable as a long term fix, but it will let one select the RAID volume correctly.

Secondly, grub2, which needs to figure out what the name at boot time of the RAID volume will be currently gets confused. I don’t know enough to really explain – and be correct in my explanation – but I do have a fugly patch which worked for me. Bug 803658 tracks this defect. The basic approach I took was to say that dmraid devices should be an abstraction layer we don’t peek under: if it claims to be a disk, well then its a disk. As grub does actually work that way  – it talks to INT 13h – the BIOS support for booting off of the RAID volume is entirely sufficient.

Sadly neither bug is at the point where the patches can be rolled into Ubuntu itself, but the workaround should let folk get up and running.

In both cases, build the package locally in the installer, install it, then after than run ubiquity and things should install.

After the install, you will need to reapply the patch in the resulting installed environment, or things like update-grub will die on you!

(huge thanks to cjwatson and ev for giving me some tips while I investigated this)


Ok, so micro rant time: this is the effect of not taking things upstream: hardware doesn’t work Out Of The Box.

Very briefly, I purchased a Vodafone prepaid mobile broadband package today, which comes with a modem and SIM. The modem is a K3571-Z, and Ubuntu *thinks* it knows how they work (it doesn’t). So it fails to connect in NetworkManager with a rather opaque ‘NO CARRIER’ message.

Thanks to excellent assistance from Matt Trudel, we tracked this down to a theory that perhaps modemmanager is using the wrong serial port – and voila, it is. From there, the config file (/lib/udev/rules.d/77-mm-zte-port-types.rules) was an obvious next step – and indeed there is no entry in there for the 19d2:1010 – the K3571-Z. Google found one immediately though, on a Vodafone research site.

The awful shame is this: that was committed to the bcm project in March this year. If Vodafone had shipped off a patch to modemmanager, we could have had that in 10.10, and possibly even in 10.04. There are plenty of users having trouble on Whirlpool etc with this model who would have had a better experience – helping Vodafone’s users be happier.

All it would have taken is an email 😦

I’m sure Vodafone want a great experience for their users, but I think they’re failing to separate out platform improvements – share and share alike, and branding / custom facilities. The net impact is harmful, not helpful.

Anyhow, Natty will support this modem.

Using UEC instead of EC2

So, we wanted to move a Hudson CI server at Canonical from using chroots to VM’s (for better isolation and security), and there is this great product Ubuntu Enterprise Cloud (UEC – basically Eucalyptus). To do this I needed to make some changes to the Hudson EC2 plugin – and thats where the fun starts. While I focus on getting Hudson up and running with UEC in this post, folk generally interested in the differences between UEC and EC2, or getting a single-machine UEC instance up for testing should also find this useful.

Firstly, getting a test UEC instance installed was a little tricky – I only had one machine to deploy it on, and this is an unusual configuration. Nicely though, it all worked, once a few initial bugs and misconfiguration items got fixed up. I wrote up the crux of the outcome on the Ubuntu community help wiki. See ‘1 Physical system’. The particular trap to watch out for seems to be that this configuration is not well tested, so the installation scripts have a hard time getting it right. I haven’t tried to make it play nice with Network Manager in the loop, but I’m pretty sure that that can be done via interface aliasing or something similar.

Secondly I needed to find out what was different between EC2 and UEC (Note that I was running on Karmic (Ubuntu 9.10) – so things could be different in Lucid). I couldn’t find a simple description of this, so this list may be incomplete:

  1. UEC runs an old version of the EC2 API. This is because it hasn’t implemented everything in the new API versions yet.
  2. UEC defaults to port 8773, not port 80 (for both the EC2 and S3 API’s)
  3. The EC2 and S3 API’s are rooted differently: at AWS they are at /, for UEC they are at /services/Eucalyptus and /services/Walrus
  4. UEC doesn’t supply a SSL API port as far as I can tell.
  5. DescribeImages has something wonky with it.

So the next step then is to modify the Hudson EC2 plugin to support these differences. Fortunately it is in Java, and the Java community has already updated the various libraries (jets3t and typica) to support UEC – I just needed to write a UI for the differences and pass the info down the various code paths. Kohsuke has let me land this now even though it has an average UI (in rev 27366), and I’m going to make the UI better now by consolidating all the little aspects into a couple of URL’s. Folk comfortable with building their own .hpi can get this now by svn updating and rebuilding the ec2 plugin. We’ve also filed another bug asking for a single API call to establish the endpoints, so that its even easier for users to set this up.

Finally, and this isn’t a UEC difference, I needed to modify the Hudson EC2 plugin to work with the ubuntu user rather than root, as Ubuntu AMI’s ship with root disabled (as all Ubuntu installs do). I chose to have Hudson reenable root, rather than making everything work without root, because the current code paths assume they can scp things as root, so this was less disruptive.

With all that done, its now possible to configure up a Hudson instance testing via UEC nodes. Here’s how:

  1. Install UEC and make sure you can run up instances using euca-run-instances, ssh into them and that networking works for you. Make sure you have installed at least one image (EMI aka AMI) to run tests on. I used the vanilla in-store UEC Karmic images.
  2. Install Hudson and the EC2 plugin (you’ll need to build your own until a new release (1.6) is made).
  3. Go to /configure and near the bottom click on ‘Add a new cloud’ and choose Amazon EC2.
  4. Look in ~/.euca/eucarc, or in the zip file that the UEC admin web page lets you download, to get at your credentials. Fill in the Access Key and Secret Access key fields accordingly. You can put in the private key (UEC holds onto the public half) that you want to use, or (once the connection is fully setup) use the “Generate Key’ button to have a dedicated Hudson key created. I like to use one that I can ssh into to look at a live node – YMMV. (Or you could add a user and as many keys as you want in the init script – more  on that in a second).
  5. Click on Advanced, this will give you a bunch of details like ‘EC2 Endpoint hostname’. Fill these out.
  6. Sensible values for a default UEC install are: 8773 for both ports, /services/Eucalyptus and /services/Walrus for the base URLs, and SSL turned off. (Note that the online help tells you this as well).
  7. Set an instance cap, unless you truely have unlimited machines. E.g. 5, to run 5 VMs at a time.
  8. Click on ‘Test Connection’ – it should pretty much instantly say ‘Success’.
  9. Thats the Cloud itself configured, now we configure VM’s that Hudson is willing to start. Click on ‘Add’ right above the ‘List of AMIs to be launched as slaves’ text.
  10. Fill out the AMI with your EMI – e.g. emi-E027107D is the Ubuntu 9.10 image I used.
  11. for remote FS root, just put /hudson or something, unless you have a preseeded area (e.g. with a shared bzr repo or something) inside your image.
  12. For description describe the intent of the image – e.g. ‘DB test environment’
  13. For the labels put one or more tags that you will use to tell test jobs they should run on this instance. They can be the same as labels on physical machines – it will act as an overflow buffer. If no physical machines exist, a VM will be spawned when needed. For testing I put ‘euca’
  14. For the init script, its a little more complex. You need to configure up java so that hudson itself can run:
    cat >> /etc/apt/sources.list << EOF
    deb http://archive.ubuntu.com/ubuntu/ karmic multiverse
    deb http://archive.ubuntu.com/ubuntu/ karmic-updates multiverse
    deb http://archive.ubuntu.com/ubuntu/ karmic-security multiverse
    export http_proxy=
    export DEBIAN_FRONTEND=noninteractive
    apt-get update
    echo "buildd shared/accepted-sun-dlj-v1-1 boolean true" | debconf-set-selections
    apt-get install -y -f sun-java6-jre

    Note that I have included my local HTTP proxy there – just remove that line if you don’t have one.

  15. Click on Advanced, to get at the less-common options.
  16. For remote user, put ‘ubuntu’, and for root command prefix put ‘sudo’.
  17. For number of executors, you are essentially choosing the number of CPU’s that the instance will request. E.g. putting 20 will ask for an extra-large high-cpu model machine when it deploys. This will then show up as 20 workers on the same machine.
  18. Click save 🙂
  19. Now, when you add a job a new option in the job configuration will appear – ‘tie this job to a node’. Select one of the label(s) you put in for the AMI, and running the job will cause that instance to start up if its not already available.

Note that Hudson will try to use java from s3 if you don’t install it, but that won’t work right for a few reasons – I’ll be filing an issue in the Hudson tracker about it, as thats a bit of unusual structure in the existing code that I’m happier leaving well enough alone :).

Adding new languages to Ubuntu

Scott recently noted that we don’t have Klingon available in Ubuntu. Klingon is available in ISO 639, so adding it  should be straight forward.

Last time I blogged about this three packages needed changing, as well as Launchpad needing a translation team for the language. The situation is a little better now: only two packages need changing as gdm now dynamically looks for languages based on installed locales.

libx11 still needs changing – a minimal diff would be:

=== modified file 'nls/compose.dir.pre'
--- libx11-1.2.1/nls/compose.dir.pre
+++ libx11-1.2.1/nls/compose.dir.pre
@@ -406,0 +406,1 @@
+en_US.UTF-8/Compose:        tlh_GB.UTF-8
=== modified file 'nls/locale.alias.pre'
--- libx11-1.2.1/nls/locale.alias.pre
+++ libx11-1.2.1/nls/locale.alias.pre
@@ -1083,0 +1083,1 @@
+tlh_GB.utf8:                    tlh_GB.UTF-8
 === modified file 'nls/locale.dir.pre'
--- libx11-1.2.1/nls/locale.dir.pre
+++ libx11-1.2.1/nls/locale.dir.pre
@@ -429,0 +429,1 @@
+en_US.UTF-8/XLC_LOCALE:            tlh_GB.UTF-8

Secondly, langpack-locales has to change for two reasons. Firstly a locale definition has to be added (and locales define a place – a language and locale information like days of the week, phone number formatting etc. Secondly the language needs to be added to the SUPPORTED list in that package, so that language packs are generated from Launchpad translations.

Now, gdmautodetects, but it turns out that only ‘complete’ locales were being shown. And that on Ubuntu, this was not looking at language pack directories, rather at


which langpack-built packages do not install translations into. So it could be a bit random about whether a language shows up in gdm. Martin Pitt has kindly turned on the ‘with-incomplete-locales’ configure flag to gdm, and this will permit less completely translated locales to show up (when their langpack is installed – without the langpack nothing will show up).

Debianising with bzr-builddeb

Bzr build-deb is very nice, but it can be very tricky to get started. I recently did a fresh debianisation of a project that is in bzr upstream, and I thought I’d record the recipe to make it work (at least until the various bugs making it hard re fixed).

Assuming that the upstream uses bzr, it goes like this:

  1. Start with a branch that is close to the code you want to Debianise. E.g. if the release was off trunk, 3 commits back: bzr branch trunk -r -3 debian
  2. Debianise as normal: put the tarball with the right name in the parent dir,  add a debian directory and fiddle until you build a package you’re happy with. Don’t commit while doing this.
  3. Build a source package- debuild -S, or bzr builddeb -S
  4. Revert your changes – bzr revert.
  5. Import the dsc – bzr import-dsc ../*.dsc
  6. Now, you may find that some dot files, such as .bzrignore have been discarded inappropriately (there is a bug open on this). If that happened, keep going. Otherwise, you’re done: you can now use merge-upstream on future upstream releases, and debcommit etc.
  7. bzr uncommit
  8. bzr revert .bzrignore (and any other files that you want to get back)
  9. debcommit
  10. All done, see point  6 for details.