Eucalyptus – Code happens

So, we wanted to move a Hudson CI server at Canonical from using chroots to VM’s (for better isolation and security), and there is this great product Ubuntu Enterprise Cloud (UEC – basically Eucalyptus). To do this I needed to make some changes to the Hudson EC2 plugin – and thats where the fun starts. While I focus on getting Hudson up and running with UEC in this post, folk generally interested in the differences between UEC and EC2, or getting a single-machine UEC instance up for testing should also find this useful.

Firstly, getting a test UEC instance installed was a little tricky – I only had one machine to deploy it on, and this is an unusual configuration. Nicely though, it all worked, once a few initial bugs and misconfiguration items got fixed up. I wrote up the crux of the outcome on the Ubuntu community help wiki. See ‘1 Physical system’. The particular trap to watch out for seems to be that this configuration is not well tested, so the installation scripts have a hard time getting it right. I haven’t tried to make it play nice with Network Manager in the loop, but I’m pretty sure that that can be done via interface aliasing or something similar.

Secondly I needed to find out what was different between EC2 and UEC (Note that I was running on Karmic (Ubuntu 9.10) – so things could be different in Lucid). I couldn’t find a simple description of this, so this list may be incomplete:

UEC runs an old version of the EC2 API. This is because it hasn’t implemented everything in the new API versions yet.
UEC defaults to port 8773, not port 80 (for both the EC2 and S3 API’s)
The EC2 and S3 API’s are rooted differently: at AWS they are at /, for UEC they are at /services/Eucalyptus and /services/Walrus
UEC doesn’t supply a SSL API port as far as I can tell.
DescribeImages has something wonky with it.

So the next step then is to modify the Hudson EC2 plugin to support these differences. Fortunately it is in Java, and the Java community has already updated the various libraries (jets3t and typica) to support UEC – I just needed to write a UI for the differences and pass the info down the various code paths. Kohsuke has let me land this now even though it has an average UI (in rev 27366), and I’m going to make the UI better now by consolidating all the little aspects into a couple of URL’s. Folk comfortable with building their own .hpi can get this now by svn updating and rebuilding the ec2 plugin. We’ve also filed another bug asking for a single API call to establish the endpoints, so that its even easier for users to set this up.

Finally, and this isn’t a UEC difference, I needed to modify the Hudson EC2 plugin to work with the ubuntu user rather than root, as Ubuntu AMI’s ship with root disabled (as all Ubuntu installs do). I chose to have Hudson reenable root, rather than making everything work without root, because the current code paths assume they can scp things as root, so this was less disruptive.

With all that done, its now possible to configure up a Hudson instance testing via UEC nodes. Here’s how:

Install UEC and make sure you can run up instances using euca-run-instances, ssh into them and that networking works for you. Make sure you have installed at least one image (EMI aka AMI) to run tests on. I used the vanilla in-store UEC Karmic images.
Install Hudson and the EC2 plugin (you’ll need to build your own until a new release (1.6) is made).
Go to /configure and near the bottom click on ‘Add a new cloud’ and choose Amazon EC2.
Look in ~/.euca/eucarc, or in the zip file that the UEC admin web page lets you download, to get at your credentials. Fill in the Access Key and Secret Access key fields accordingly. You can put in the private key (UEC holds onto the public half) that you want to use, or (once the connection is fully setup) use the “Generate Key’ button to have a dedicated Hudson key created. I like to use one that I can ssh into to look at a live node – YMMV. (Or you could add a user and as many keys as you want in the init script – more on that in a second).
Click on Advanced, this will give you a bunch of details like ‘EC2 Endpoint hostname’. Fill these out.
Sensible values for a default UEC install are: 8773 for both ports, /services/Eucalyptus and /services/Walrus for the base URLs, and SSL turned off. (Note that the online help tells you this as well).
Set an instance cap, unless you truely have unlimited machines. E.g. 5, to run 5 VMs at a time.
Click on ‘Test Connection’ – it should pretty much instantly say ‘Success’.
Thats the Cloud itself configured, now we configure VM’s that Hudson is willing to start. Click on ‘Add’ right above the ‘List of AMIs to be launched as slaves’ text.
Fill out the AMI with your EMI – e.g. emi-E027107D is the Ubuntu 9.10 image I used.
for remote FS root, just put /hudson or something, unless you have a preseeded area (e.g. with a shared bzr repo or something) inside your image.
For description describe the intent of the image – e.g. ‘DB test environment’
For the labels put one or more tags that you will use to tell test jobs they should run on this instance. They can be the same as labels on physical machines – it will act as an overflow buffer. If no physical machines exist, a VM will be spawned when needed. For testing I put ‘euca’

For the init script, its a little more complex. You need to configure up java so that hudson itself can run:

cat >> /etc/apt/sources.list << EOF
deb http://archive.ubuntu.com/ubuntu/ karmic multiverse
deb http://archive.ubuntu.com/ubuntu/ karmic-updates multiverse
deb http://archive.ubuntu.com/ubuntu/ karmic-security multiverse
EOF
export http_proxy=http://192.168.1.1:8080/
export DEBIAN_FRONTEND=noninteractive
apt-get update
echo "buildd shared/accepted-sun-dlj-v1-1 boolean true" | debconf-set-selections
apt-get install -y -f sun-java6-jre

Note that I have included my local HTTP proxy there – just remove that line if you don’t have one.

Click on Advanced, to get at the less-common options.
For remote user, put ‘ubuntu’, and for root command prefix put ‘sudo’.
For number of executors, you are essentially choosing the number of CPU’s that the instance will request. E.g. putting 20 will ask for an extra-large high-cpu model machine when it deploys. This will then show up as 20 workers on the same machine.
Click save 🙂
Now, when you add a job a new option in the job configuration will appear – ‘tie this job to a node’. Select one of the label(s) you put in for the AMI, and running the job will cause that instance to start up if its not already available.

Note that Hudson will try to use java from s3 if you don’t install it, but that won’t work right for a few reasons – I’ll be filing an issue in the Hudson tracker about it, as thats a bit of unusual structure in the existing code that I’m happier leaving well enough alone :).