Money doesn’t matter

Well, obviously it does. But the whole ‘government cannot pay for healthcare’, or land, or education : thats nonsense.

And any politician that claims that is either ignorant, or has an agenda that involves deliberate repression of the population.

These are strong claims, so let me break it down. Also, I’m not an economist, if I’ve gotten the wrong end of the stick economics-wise, I’ll happily update this or at least add errata to it…

Money isn’t wealth. Its a thing you can exchange for other things, but it itself is not wealth. Easy example: when countries have had runaway inflation, and the price of e.g. potatoes has been going up 100% a day, it doesn’t matter how much money you have, you will eventually be unable to buy potatoes. But a potato farmer with 10’s of thousands of potatoes won’t run out and go hungry.

We use money to scale our society. Without money, we have some problems. Firstly, if I want something you have, but I don’t have anything you want, I have to find someone who wants something I have, and something you want that they don’t want, and then do that trade, then come back to you to trade the thing you wanted for what I wanted. This quickly becomes a bottleneck on actually getting stuff done. Secondly, once someone, say a potato farmer :), has what they want right now, they will be very hard to trade with : if they trade potatoes for things they don’t want, they are gambling that other folk will want them in the future. That requires everyone to become a good gambler on the future value of things.

But just like money isn’t wealth, money also isn’t work. We work to exchange our time for wealth; except money isn’t wealth, so really we’re exchanging our time for this thing we can exchange for the actual things we want. Government *literally* create money anytime they want, and they destroy it at will too. If there’s too much money floating around, then (at whatever prices folk are used to) everything will be purchasable, and its very likely folk selling stuff will run out and raise prices as a result. Then it becomes harder to buy stuff, although everyone that recieved those raised prices has more money to buy with, so this continues for a while : this is inflation.

Too little money, and things that could be sold won’t sell, because there isn’t enough money at the prices folk are used to, and the folk selling don’t want to “lose money” (which is odd, because money is a promise not a thing, so if you’re in a deflationary situation, selling *right now* may well be better than holding on and selling later :)), so they will be slow to lower prices, will recieve less either way, and just like with increased prices, the decrease gets spread amongst the participants – vendors, owners, employees.

But these things don’t happen instantly :- there’s slack in the system.

So what does matter? What actually matters is a combination of resources and productivity: those are the things that determine whether we, as a society, can produce enough things for our people to have what they want and need. For instance, building a house needs the following resources: land, building materials, labour, power, as well as ongoing supplies of power, water and sewage processing.

If, given the people currently in our country, and what they are being paid to do today, we have both enough resources, and enough labour-and-productivity, to house, feed, heat, transport and entertain everyone, then the failure to do so is not one of money but one of choice. That builder friend you know who doesn’t have work right now could be building a house for that other friend you’ve got whose family is sleeping in a garage. The builder that’s not working because the family in question can’t afford to pay for the land or the resources, and the builder has nowhere to do the building, nor any stuff to make the building out of.

The core choice is : do we as a society think its reasonable anyone should have to sleep rough, or miss out on school, or any of a thousand examples of poverty, when we’ve got the resources and production capability to fix it? Do we think that? Really? And what are we willing to do to fix it? Right now, a lot of the production capability of our society is owned by 1% of our society. So less than 1% of people are deciding what is made and how its made.

Now, there’s a bunch of curly questions like, what about the foreign account deficit? What about the fact that lots of land is already owned by someone? How do we fairly get that family the house they deserve? Won’t some people just ride on the coat-tails of others? Isn’t this going to require taking things other people have already earnt?

These are all fair questions. My answers to those are:

  • If everyone had their needs met we’d have many more people contributing to creative things we can sell to foreign countries, more than enough to address any changes in the foreign account deficit from sorting things out here.
  • Our current system has huge wealth inequality; it doesn’t matter whether that inequality is in the form of money, or ownership of things, either we leave that 1% controlling 99%, or we redistribute things in some equitable ongoing basis. Wealth taxes, CGTs, estate taxes. Lots of options.
  • I’m not sure. I think ultimately it means capping the maximum wealth ratio between our richest and poorest people. e.g. the more wealth you have the more you’re taxed until eventually – at say 500K / year (gross) wealth growth, your marginal tax rate becomes 90%, and at some higher figure, say 1M/year (gross) wealth growth your marginal tax rate exceeds 95%. That way wealthy folk get to choose what things they keep : there’s no central planning department or other bureaucracy involved.
  • Folk already ride on the coat tails of other people. But its nowhere near as simple as ‘those dole bludgers’. Folk on the pension don’t work. Folk with ‘passive income’ (read investments whose growth is high enough those folk don’t need to work). School kids. And yes, folk on the dole. For some folk on the dole, the marginal tax rate already exceeds 100% – there are some steps in our tax system that make part time work while receiving the dole very very hard. Home makers are also something we support as a society. though less directly. But lets assume fully 10% of the country simply don’t want to work. Consider this in productivity terms. We get 10% less things done. Big deal. We’ve enough resources and people to deliver those essentials: food, shelter, power, education, with waaay less than 90% of our workforce. And as automation inproves expect that 90% to drop down towards 10%. At that point we’d want 90% of folk not working, I suspect.
  • Yes, folk will have to get taxed on what they have not just on what they are gaining. This makes sense though: we want the system to slowly drive equity for everyone. (Not equality, and not sameness, just equity). Taxing what you have is actually a lot fairer than taxing what you earn. Because if you have nothing, but start earning a lot, you’re starting way behind everyone else, so not taxing you much is pretty nice. And if you have a lot, but aren’t earning anymore, not taxing you is really just giving you a free pass: supporting you in terms of every single shared resource and infrastructure.



Monads and Python

When I wrote this I was going to lead in by saying: I’ve been spending a chunk of time recently thinking about how best to represent Monads in Python. Then I forgot I had this draft for 3 years. So.. I *did* spend a chunk of time. Perhaps it will be of interest anyway… though I had not finished it (otherwise it wouldn’t still be draft would it :))

Why would I do this? Because there are some nifty things you get with them: you get some very mature patterns for dealing with error (Either, Maybe), with nondeterminism (List), with DSLs (Free).

Why wouldn’t you do this? Because you get some baggage. There are two bits in particular. firstly, Monads solve a problem Python doesn’t have. Consider:

x = read_file('fred')
y = delete_file('fred')

In Haskell, the compiler is free to run those functions in either order as there is no data dependency between them. In Python, it is not – the order is specified directly by the code. Haskell requires a data dependency to force ordering (and in fact RealWorld in order to distinguish different invocations of IO). So to define a sequence here it defines a new operator (really just an infix function) called bind (>>= in haskell). You then create a function to run after the monad does whatever it needs to do. Whenever you see code like this in Haskell:

do x <- action1
     y >=
  \x action2 >>=
     \y return x+y

A direct transliteration into Python is possible a few ways. One of the key things though is to preserve the polymorphism – bind is dependent on the monad instance in use, and the original code is valid under many instances.

def action1(m): return m.unit(1)
def action2(m): return m.unit(2)
m = MonadInstance()
    lambda m, x: action2(m).bind(
        lambda m, y: m.unit(x+y)))

In this style functions in a Monad would take a monad instance as a parameter and use that to access the type. Note in particular that the behavior of bind is involved at every step here.

I’ve recently been diving down into Effect as part of preparing my talk for Kiwi PyCon. Effect was described to me as modelling the Free monad, and I wrote my talk on that basis – only to realise, in doing so, that it doesn’t. The Free monad models a domain specific language – it lets you write interpreters for such a language, and thanks to the lazy nature of Haskell, you essentially end up iterating over a (potentially) infinitely recursive structure until the program ends – the Free bind method steps forward once. This feels very similar to Effect in some ways. Its also used (in some cases) for similar reasons: to let more code be pure and thus reliably testable.

But writing an interpreter for Effect is very different to writing one for Free. Compare these blog posts with the howto for Effect. In the Free Monad the interpreter can hand off to different interpreters at any point. In Effect, a single performer is given just a single Intent, and Intents just return plain values. Its up to the code that processes values and returns new Effect’s to perform flow control.

That said, they are very similar in feel: it feels like one is working with data, not code. Except, in Haskell, its possible to use do notation to write code in the Free monad in imperative style… but Effect provides no equivalent facility.

This confused me, so I reached out to Chris and we had a really fascinating chat about it. He pointed me at another way that Haskellers separate out IO for testing. That approach is to create a class specifically for the IO in your code and have two implementations. One production one and one test implementation. In Python:

class Impure:
    def readline(self):
        raise NotImplementedError(self.readline)
class Production:
    def readline(self):
        return sys.stdin.readline()
class Test:
    def __init__(self, inputs):
        self.inputs = inputs
    def readline(self):
        return self.inputs.pop(0)

Then you write code using that directly.

def echo(impl):

This seems to be a much more direct way to achieve the goal of being able to write pure testable code. And it got me thinking about the actual basic premise of porting monads to Python.

The goal is to be able to write Pythonic, pithy code that takes advantage of the behaviour in the bind for that monad. Lets consider Maybe.

class Something:
    def __init__(self, thing):
        self.thing = thing
def unit(klass, thing):
    return Something(thing)
def bind(self, l):
    return l(self, self.thing)
def __str__(self):
    return str(self.thing)
def action1(m): return m.unit(1)
def action2(m): return m.unit(2)
m = Something
r = action1(m).bind(
    lambda m, x: action2(m).bind(
        lambda m, y: m.unit(x+y)))
print("%s" % r)
# 3

Trivial so far, though having to wrap the output types in our functions is a bit ick. Lets add in None to our example.

class Nothing:
    def bind(self, l):
        return self
    def __str__(self):
        return "Nothing"
def action1(m): return Nothing()
def action2(m): return m.unit(2)
m = Something
r = action1(m).bind(
    lambda m, x: action2(m).bind(
        lambda m, y: m.unit(x+y)))
print("%s" % r)
# Nothing

The programmable semicolon aspect of monads comes in from the bind method – between each bit of code we write, Something chooses to call forward, and Nothing bypasses our code entirely.

But we can’t use that unless we start writing our normally straight forward code such that every statement becomes a closure – which we don’t want.. so we want to interfere with the normal process by which Python chooses to run new code.

There is a mechanism that Python gives us where we get control over that: generators. While they are often used for concurrency, they can also be used for flow control.

Representing monads as generators has been done here, here, and don’t forget other languages like Scala.

The problem is, that its still not regular Python code, and its still somewhat mental gymnastics. Natural for someone thats used to thinking in those patterns, and it works beautiful in Haskell, or Rust, or other languages.

There are two fundamental underpinnings behind this for Haskell; type control from context rather than as part of the call signature and do notation which makes code using it look like Python.  In python we are losing the notation, but gaining the bind operator on the Maybe monad which short circuits Nothing to Nothing across an arbitrary depth of of computation.

What else short circuits across an arbitrary depth of computation?


This won’t give the full generality of Monads (for instance, a Monad that short circuits up to 50 steps but no more is possible) – but its possibly

Python basically is do notation, and if we just had some way of separating out the side effects from the pure code, we’d have pure code. And we have that from above.

So there you have it, a three year old mull: perhaps we shouldn’t port Monads to Python at all, and instead just:

  • Write pure code
  • Use a strategy object to represent impure activity
  • Use exceptions to handle short circuiting of code

I think there is room if we wanted to to do a really nice, syntax integrated Monad style facility in Python (and Maybe would be a great reference case for it), but generator overloading – possibly async might let a nicer thing be done but I haven’t investigated that yet.

SkyDNS in Kubernetes 1.3 local clusters

If you want to run kubernetes locally – not in a VM – then you’ll probably also want DNS service integration to work.  Thats fine, except by default it doesn’t work :(. This may be due to DNS being a built-in add-on now, but the current docs around that are inconsistent – referencing the deleted 1.2 dns addon docs :/.

I’ve put a pull request up to fix the errors I encountered trying to use the local-up-cluster script per the current in-tree documentation in build. You also need to run it slightly differently than the basic docs suggest. The basic setup (sensibly) doesn’t listen on, avoiding exposing your insecure cluster to the world. But since you’re going to be partitioning off your machine into containers, and the kube-dns component which handles DNS integration needs to talk to the kubernetes API, so you need to override that.


Will run a local cluster for you with DNS happily working, assuming the other preconditions (like – you’re not using needed to run a local cluster are true. You can start with no environment variables set ar all to check that that works – kubernetes itself runs happily with no DNS integration. Note though, that if you have DNS enabled, it has to work, or the kubernetes API itself will fail to register endpoints, and then gets itself firewalled off.

Some quick debugging things I found useful.

Find the pod

$ cluster/ --namespace kube-system get pods
kube-dns-v18-mi26o 3/3 Running 0 18m

Check it has registered endpoints successfully

$ cluster/ --namespace kube-system get ep
kube-dns, 18m

Check its logs

$ cluster/ logs --namespace kube-system kube-dns-v18-mi26o -c kubedns

Deploy something and check it both can use DNS and is listed in DNS

I made a trivial Ubuntu image with a little more in it:

$ cat rob/Dockerfile
FROM ubuntu

RUN apt-get update
RUN apt-get install -y iputils-ping curl openssh-client iproute2 dnsutils
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

Which I then deploy via a trivial definition:

apiVersion: v1
kind: Pod
  name: ubuntu
  namespace: default
  - image: ubuntu-debug
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: ubuntu
  restartPolicy: Always

And a call to kubectl:

$ cluster/ create -f rob/ubuntu.yaml

And if successfully integrated with DNS, it will be registered with DNS under A-B-C-D.default.pod.cluster.local.

$ cluster/ exec ubuntu -ti /bin/bash
root@ubuntu:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
48: eth0@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:3/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
root@ubuntu:/# ping 172-17-0-3.default.pod.cluster.local
PING 172-17-0-3.default.pod.cluster.local ( 56(84) bytes of data.
64 bytes from ubuntu ( icmp_seq=1 ttl=64 time=0.013 ms
--- 172-17-0-3.default.pod.cluster.local ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.013/0.013/0.013/0.000 ms

diagnosing flaky tests

Victor (here, here) grabbed me on IRC yesterday for some testr help. He had a super frustrating bug in glance: about one in thirty unit test runs, a test would fail. He’d spent hours tracking it down and still couldn’t reliably reproduce it.

Part of that was due to the glance tests taking a few minutes to run, so each iteration was slow, but another part was a lack of familiarity with the test tooling we use in OpenStack which can give rich data to help analyse such things.

I helped him out – and this post is a step by step handbook of what I did so that I can point people at it 🙂


  1. start by duplicating the environment
  2. setup automation so you are only doing the interesting (or at least not time consuming) bits
  3. bisect and bisect and bisect

Firstly, I pulled down exactly the same code he was working on:

cd glance; git review -d 250083

This let me try to reproduce the thing. However, my normal reproduction facility couldn’t be used because the glance testr configuration depended on invoking testr within lockutils-wrapper. I’m still working through the implications, but for the short term I moved that to be testr’s problem.

So now I could make a python 34 venv and run testr directly:

tox -epy34 --notest; . .tox/py34/bin/activate; testr run --parallel

This is pretty important – it lets me get under the wrapper that projects use and now I have more control over what is happening. Plus I’m not dealing with tox recreating the venv or anything like that.

It turned out that only the unit tests had been ported to Python3, so I needed to filter down to just those tests. And because I didn’t want to sit here watching it, I set testr off to find a reproduction example on its own:

testr run --parallel --until-failure tests.unit

This runs the same set of tests – whatever you’ve specified in the normal way – in parallel, in a loop. It specifically reschedules and starts new backends (processes that are actually executing test code) each time around, so its very close to just scripting it in shell around testr, with only minor differences (such as not re-querying all the tests each time, because testr knows the full set already).

After an hour or so I had toggled back to look at the terminal, and there was a lovely backtrace and information on the failure. It looks something like this:

lockutils-wrapper \
${PYTHON:-python} -m discover -t ./ ./glance/tests --load-list /tmp/tmpafpyzyd5
Ran 2 tests in 0.485s (+0.011s)
PASSED (id=1614)
lockutils-wrapper \
${PYTHON:-python} -m discover -t ./ ./glance/tests --load-list /tmp/tmpafpyzyd5
Traceback (most recent call last):
glance.common.exception.NotFound: b'Image not found'
FAIL: glance.tests.unit.v1.test_api.TestGlanceAPI.test_upload_image_http_nonexistent_location_url
tags: worker-0
Traceback (most recent call last):
File "/home/robertc/work/openstack/glance/glance/tests/unit/v1/", line 1149, in test_upload_image_http_nonexistent_location_url
self.assertEqual(404, res.status_int)
File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/testtools/", line 350, in assertEqual
self.assertThat(observed, matcher, message)
File "/home/robertc/work/openstack/glance/.tox/py34/lib/python3.4/site-packages/testtools/", line 435, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 404 != 201
Ran 2 tests in 0.419s (-0.061s)
FAILED (id=1615, failures=1 (+1))

Except that the id was much lower, there were multiple concurrent processes being run on each iteration, and the number of tests run much higher – 506 in fact. The important bit to me was the id, because with that we could get programmatic on the problem.

While testr has support for automatically isolating faults, this depends on deterministic behaviour- there isn’t [yet] any support for things that fail 1 time in N when doing bisection, looking for cross-test interactions and so forth. So normally, one would just run:

testr run --analyze-isolation

And testr would churn away and give a useful answer, in this case I needed to do it by hand. I think there is room for getting really smart about dealing with this sort of situation, but the simplest method is to just repeat each test (a run of some X tests looking for a failure) until some confidence level is reached, rather than assuming a pass is actually a pass.

To do that we needed to do two things: we needed a set of tests to run, and a way to reduce the set and repeat. We could use –until-failure as a way to repeat a given test, and stop it after a couple of hours if it hadn’t failed.

Extracting the tests that a given backend ran is straightforward, if not something you’d just luck upon. If the run id you want to investigate is 6, and the backend that you want to report on is worker-0 (see the test tag in the error report above):

cat .testr/6 | subunit-1to2 | subunit-filter -s --xfail --with-tag=worker-0 | subunit-ls > worker-0

This takes the subunit stream from the repository, which is in a legacy format 1, upgrades it to subunit v2, then includes successful tests and expected failures, but only includes tests run on worker-0, pulls out the test ids (thats the subunit-ls bit) and writes it into a file ‘worker-0’.

To run just those tests:

testr run –load-list worker-0

More interestingly though, lets start by not running all the tests that took place after our failure. Inside the file it looks like this:


Note that the test we’re interested in is in the middle there – though the file looks sorted, thats due to the test backend, what we have is the actual order tests executed in (and we don’t need to worry about concurrency because we pulled out just one backend process, and in Python unittest thats single-threaded).

First step then is to delete all the tests after the one we care about:


And then we don’t want to run all the tests: we’re assuming there is an interaction with a single other test leaving a stale process or something, so we want to run the one that failed, and half of the earlier tests; if they end up being reliable, we switch to the half we hadn’t been running, and then repeat the process – take half, run until we’re satisfied its reliable, repeat.

The way I do this by hand is to just edit the text file, making a new copy each step, so that I can backtrack easily. So delete half the preceeding lines, save to new file, then run:

testr run --load-list newfile --until-failure

Walk away, do something else, and then come back in a couple of hours.

This worked: from 506 tests on the worker, I then had about 300 that ran before the failing test after the first trim, so 150 was my first bisection which ran for a couple hours before failing. Then 75, then 35, then 18, then 9, then 5, then 2, then 1, then – 0 – thats right, eventually we found that the failing test could fail on its own!

And from that Victor was able to dig deep into what it was doing with confidence – he found a race condition in the test server setup stuff (I haven’t looked closely – I’m parroting what he said on IRC) – and is confident he’s found the bug. Yay!

I also ran one of the smaller sets overnight using Python 2.7, and that didn’t fail at all, so I suspect the failure is in some area that was masked by Python 2.7/eventlet on 2.7 handling of *something*. We saw a bug in subunit of that nature earlier this year, where a different (but legitimate) behaviour in eventlet on 3.4 led to subunit dropping writes silently. Thats fixed now in both eventlet and subunit 🙂

signalling via exit status in Python

A common idiom in non-trivial command line tools is to have more than two return codes. For instance, diff uses 0 for ‘same inputs’, 1 for ‘different inputs’, 2 for ‘trouble’.

Doing that in Python is a little harder though, and since I’ve gotten it wrong in the past, I want to write it down for both myself and anyone else contemplating it.

The issue is that both your program and the Python VM itself can fail, and so if you attempt to use a common status code with those the Python VM uses for failures, you have to make sure that the meanings are at least broadly compatible. There’s also a bug in existing Python releases that will cause an exit status of 0 sometimes when an error is actually appropriate.

I’ve only researched this on CPython, its possible that other Python VM’s behave differently, and as far as I know this is not a language spec issue (but perhaps it should be).


  1. Always flush stdout and stderr yourself, even when signalling errors.
  2. Never use status 1 or 2 for non-error conditions.
  3. (Provisional) don’t use status 120 at all.


CPython exits with 0 when the interpreter cleanup code fails to flush stdout/stderr, even though that would be an error if it happened earlier. To address that, add an explicit flush of both streams before your program ends. We may end up making CPython exit with 120 when the stdout/err flushing fails. There’s also a possibility that a very early threading error may result in a 0 exit code, though I haven’t managed to make this actually happen yet.

CPython exits with 1 when fails to import, so using 1 for non-error conditions makes it hard for callers to discriminate between your meaning and failures.

Cpython exits with 2 when CLI arguments fail to parse, so using 2 for non-error conditions is similar there. optparse also uses 2 for this, so even if you are using a different interpreter, it is not a safe status code to reuse with different semantics.


OpenStack Mitaka debrief

Well, last week was the 6-monthly OpenStack summit in Tokyo. It was fantastic to catch up with many folk, but with 5000 attendees, there are many more that I didn’t see than those that I did. Yet I find the sheer volume of face-to-face stuff nearly overwhelming. I wish it was quite a bit longer and less intense.

Over the next cycle I’ve committed to a few things…

  1. Kicking off TC leadership of scaling for OpenStack. That is, sparking the conversation with the broader community about what scaling means for us, and ensuring each project is paying some attention to it – in the same way that each project already pays attention to e.g. backwards compatibility – they can choose how much, and implementation and so on, but the basic user expectations and framework for thinking about it are shared across OpenStack. The performance working group is certainly related to this but scaling is different to performance.
  2. Replacing the oslo incubator process with one that creates the package straight away. This will go up as a spec for approval of course. The crux of the issue will be finding a way to preserve the freedom of early refactorings without API commitments, without breaking everything. The current approach in my head is to use versioned submodules within the package during the pre-1.0.0 phase, and liberally copy-paste things when API breaks are needed.
  3. Helping the app catalog folk a little bit by doing a review of their review guidelines – looking specifically for gaps (e.g. like the currently unsecured http attack vector).
  4. Start a broad discussion over changing the way we use minimum versions of requirements. Today we raise the minimum version of most requirements quite eagerly. Yet for some like libvirt we instead use feature detection and degrade gracefully when non-latest versions are installed. It seems likely that it would increase compatibility with distributions if we took that approach more widely, but we’d need some care to think through the ramifications.
  5. Kicking off a discussion about leadership training for TC & PTL members. We vote folk into these rolls, but leading isn’t a innate skill. With our constituency of over two thousand developers, spending some money on good leadership training seems like a sound investment. If the TC agrees that its a good idea, my plan is to seek funding from the Board, and aim to make the training be a pre-summit event. This was suggested to me by Colette Alexander.
  6. Seek some more eyeballs on the olso.messaging Kafka driver spec from the HP folk that have been working with Kafka.
  7. Establish connections between Yahoo & HP’s iLO team – they’re seeing the same sort of lockups we did with IPMI on the TripleO test cloud (and the infra-cloud folk are still seeing that) – so I want to see if we can get the bug fixed for everyone.
  8. Work up a clear spec on refactoring the testrepository and subunit2sql layers so that we have all the data store backends in one common repository, an HTTP REST API for consumers like openstack-health, and still have a good experience for CLI users.
  9. Lastly, but not least, work up a formal stabilisation cycle proposal to try and give everyone (product working group, users, core developers) what they want which we seem deadlocked on not doing today. The basic thing to me seems to be fear of the consequences of saying no to feature patches – for pretty good reason; many developers have their income directly tied to achieving things upstream, and when upstream says no, the ensuing discussion is fraught (and there is often information asymmetry present). What we probably need to do is find some balance point – and then socialise the plan very broadly – including the Board, so they can encourage member companies to look after their developers properly.

If any of these things are of interest to you, please feel free to reach out to me :).

Graceful introduction of test servers

A test server acts as a little RPC server where we can ask it to run some tests without paying a full new-process startup cost each time. They are a necessary precondition to online scheduling of tests (because without them the latency of scheduling a test will be orders of magnitude more time than executing the test), as well as potentially enabling better debugger glue by providing an explicit out of band interface.

It’s vital that we don’t break existing users of or testrepository when we bring this in – folk don’t react well to having their environment broken. Breaks could occur several different ways – but lets assume that an unmodified .testr.conf will not result in the server code being activated. (It would be nice in theory to Just Work and make things better, but there are lots of ways it could fail, starting with the fact that we have no negotiation step with the things we’re running, and anything else (e.g. exported environment variables) stands a high chance of being eaten by intermediaries like ssh, tox and so on).

So, assuming a new .testr.conf:

  1. A newer running with an old testrepository might drop into server mode and then not actually run any tests.
  2. A newer testrepository with an older might not go into server mode but not error cleanly.

Testr’s run command has two key interfaces with test backends. Firstly the list interface, where it queries for tests. This is only done when testr needs to know what tests exist (e.g. for offline scheduling). Secondly, the run interface where tests are executed.

In the server based world. testr will have one invoke-a-process interface, and that will offer the two existing interfaces over the basic RPC layer.

To avoid failure 1, we need to ensure we never ask for to go into server mode except when testrepository itself can handle it. That implies that we must not insert whatever change we are making into the run_command in .testr.conf, and instead either use a variable substition, or a whole new command key to configure it. I’m in favour of a new command key, because it places less constraints on implementors of other languages.

To avoid failure 2, we need to be able to rigorously determine if a process has gone into server mode. E.g. the server has to send a handshake command of some sort.

Lets talk about failure modes that can occur once we have .testr.conf configured and new and testrepository code.

  1. We might have version skew between releases of and testrepository on future updates to the RPC server.
  2. We might have a broken testr -> server channel
  3. We might have a broken server -> testr channel
  4. The server might go off into a busy loop or something

For 3, we should version the RPC protocol carefully so that any semantic differences can be detected. Obviously there is a tonne of prior art and everyone is going to scream ‘use grpc’ (or fav RPC of choice). Thats a very sensible thing to do, and subunit can actually sit on top of pretty arbitrary transports as long as they can handle bytestrings and timestamps. That said, my focus in this iteration is to enable the server, porting subunit’s transport to something else won’t save time there (because the RPC angle is going to be a tiny fraction of the development time). I think a simple (new, old) version scheme will do fine (think autotools library soname calculations). If testr offered (5, 1) it means it can speak all versions between 1 and 5, and as long as it speaks one of them, should pick that and use it. If we find the need to drop compatibility entirely with a version at some point, we raise the old to a version up from that and move on.

We can deal with 4 by pre-emptively sending a message from testr to the server – a hello message with the supported versions. Likewise, 5 can be dealt with by not considering the server ‘ok’ until we get its initial hello message with a chosen version.

If the server goes into a busy loop – I think we can largely ignore this for now, as its no different than today. (which is, the user notices, gets annoyed, and hits ctrl-C, or their CI job times out. Being able to discriminate between ‘the server is stuck’ and ‘a test is stuck’ would be good – and remembering that in a routed world we don’t know necessarily know the end point for any server…. it might just be routing.

What else – well one long running thing has been the desire to move away from requiring a clean stdin/stdout for test processes. Being broken when some test code decide to write to stdout is *not cool*. This new feature seems like an ideal time to address that. We can’t assume working networking (because e.g. tunnelling over ssh or a container console are important use cases). We could however write a little proxy that uses stdin/stdout with no test code, and then signals (however we’re doing that) that testr is listening on a local port, and tunnel it backwards. (If we choose something simple enough, it may even be possible to do that via parameterised ssh commands and no proxy at all). That does imply that testr itself still needs to be able to talk stdout/stdin. So – because testr has to keep doing that, I’m going to defer tackling this for now: it’s clearly scope creep and as such a dangerous temptation. Layer wise, it’s up to each server to decide how to be responsive when tests are cranky, and how to keep test output from compromising things. That does put the debugger integration work back (or at least, it leaves it as no better than the status quo) but its not in any way prejuidicial to it that I can tell.

Draft RPC spec

RPC packets will be stock subunit packets. Each packet will be for a test called ‘testrepository-rpc’ and contain a ‘application/json’ file attachment (with utf8 encoded text, per the default). The JSON message will be one of the messages defined below.

There are two endpoints, client (the initiator of the connection) and the server. Messages are not idempotent, and may be sent at any time from the client to the server. If a message requires a reply, the server may do so at any time, in any order. Subunit packets may be sent at any time from the client to the server, or the server to the client.

Overall lifecycle of a server:
  1. Client sends a Hello message.
  2. Server sends a Hello response.
  3. Both ends pick the highest common version to define future messages.
  4. Client sends commands, and server actions them.
  5. Client sends a Goodbye message.
  6. Server terminates itself.
Message definitions (version 1):
  • Hello

    Advises the peer of the protocol versions supported.
    {“msg”: “Hello”, “max”: 1, “min”: 1}

  • Goodbye

    Tells the server the client is finished and does not want to run any more tests. The server should cleanup and stop accepting messages. If the server was e.g. a trapdoor into a longer running process, it is undefined whether that longer running process should also terminate or not. No reply is permitted.
    {“msg”: “Goodbye”}

  • List

    Tells the server to list some tests. A “Done” reply is required after all the tests have been listed. The output from the command should be subunit “exists” packets describing the tests that the server can run that were listed in the message. The tests property is optional – if absent, list all available tests.
    {“msg”: “List”, “tests”: [“testid”, …], “nonce”: “arbitrary string here”}

  • Run

    Tells the server to run some tests. A “Done” reply is required after the tests have completed running. The output from the command should be a normal subunit stream resulting from running the tests specified. If the tests property is missing, run all available tests. Tests may be run in whatever order is most useful to the server.
    {“msg”: “List”, “tests”: [“testid”, …], “nonce”: “arbitrary string here”}

  • Done

    Tells the client that some requested command has completed. The nonce must be the nonce for the message that this is in reply to.
    {“msg”: “Done”, “nonce”: “arbitrary string here”}

Implementation sketch

The RPC protocol needs to be accessible to anyone doing this in Python, client *and* server, so subunit seems like the sensible place to define the protocol. It will be pure code – no IO interactions – along with sufficient feature work in subunit’s API to make glueing it into e.g. testrepository and straight forward.

In testr, we’ll look for a new command in .testr.conf, expressed much like the run command, and use that to determine that a server mode has been requested. If the server fails to start up, thats an error (e.g. it is up to users to get compatible code in place). When listing and running tests we’ll reuse the server except in isolation modes – both –isolated and –analyze-isolation – where reusing the server would violate the contract they have.

In, we’ll add a command line flag to opt-in to the server. In the first implementation, the server is going to just be in-line in the call stack ; no threads or anything. So each command will just be an API call within the existing testtools/unitest2 API with a single subunit packet tacked on the end. We may need to do some ugly stuff to get out of the stock run framework – but I think it is doable.