Graceful introduction of test servers

A test server acts as a little RPC server where we can ask it to run some tests without paying a full new-process startup cost each time. They are a necessary precondition to online scheduling of tests (because without them the latency of scheduling a test will be orders of magnitude more time than executing the test), as well as potentially enabling better debugger glue by providing an explicit out of band interface.

It’s vital that we don’t break existing users of subunit.run or testrepository when we bring this in – folk don’t react well to having their environment broken. Breaks could occur several different ways – but lets assume that an unmodified .testr.conf will not result in the server code being activated. (It would be nice in theory to Just Work and make things better, but there are lots of ways it could fail, starting with the fact that we have no negotiation step with the things we’re running, and anything else (e.g. exported environment variables) stands a high chance of being eaten by intermediaries like ssh, tox and so on).

So, assuming a new .testr.conf:

  1. A newer subunit.run running with an old testrepository might drop into server mode and then not actually run any tests.
  2. A newer testrepository with an older subunit.run might not go into server mode but not error cleanly.

Testr’s run command has two key interfaces with test backends. Firstly the list interface, where it queries for tests. This is only done when testr needs to know what tests exist (e.g. for offline scheduling). Secondly, the run interface where tests are executed.

In the server based world. testr will have one invoke-a-process interface, and that will offer the two existing interfaces over the basic RPC layer.

To avoid failure 1, we need to ensure we never ask for subunit.run to go into server mode except when testrepository itself can handle it. That implies that we must not insert whatever change we are making into the run_command in .testr.conf, and instead either use a variable substition, or a whole new command key to configure it. I’m in favour of a new command key, because it places less constraints on implementors of other languages.

To avoid failure 2, we need to be able to rigorously determine if a process has gone into server mode. E.g. the server has to send a handshake command of some sort.

Lets talk about failure modes that can occur once we have .testr.conf configured and new subunit.run and testrepository code.

  1. We might have version skew between releases of subunit.run and testrepository on future updates to the RPC server.
  2. We might have a broken testr -> server channel
  3. We might have a broken server -> testr channel
  4. The server might go off into a busy loop or something

For 3, we should version the RPC protocol carefully so that any semantic differences can be detected. Obviously there is a tonne of prior art and everyone is going to scream ‘use grpc’ (or fav RPC of choice). Thats a very sensible thing to do, and subunit can actually sit on top of pretty arbitrary transports as long as they can handle bytestrings and timestamps. That said, my focus in this iteration is to enable the server, porting subunit’s transport to something else won’t save time there (because the RPC angle is going to be a tiny fraction of the development time). I think a simple (new, old) version scheme will do fine (think autotools library soname calculations). If testr offered (5, 1) it means it can speak all versions between 1 and 5, and subunit.run as long as it speaks one of them, should pick that and use it. If we find the need to drop compatibility entirely with a version at some point, we raise the old to a version up from that and move on.

We can deal with 4 by pre-emptively sending a message from testr to the server – a hello message with the supported versions. Likewise, 5 can be dealt with by not considering the server ‘ok’ until we get its initial hello message with a chosen version.

If the server goes into a busy loop – I think we can largely ignore this for now, as its no different than today. (which is, the user notices, gets annoyed, and hits ctrl-C, or their CI job times out. Being able to discriminate between ‘the server is stuck’ and ‘a test is stuck’ would be good – and remembering that in a routed world we don’t know necessarily know the end point for any server…. it might just be routing.

What else – well one long running thing has been the desire to move away from requiring a clean stdin/stdout for test processes. Being broken when some test code decide to write to stdout is *not cool*. This new feature seems like an ideal time to address that. We can’t assume working networking (because e.g. tunnelling over ssh or a container console are important use cases). We could however write a little proxy that uses stdin/stdout with no test code, and then signals (however we’re doing that) that testr is listening on a local port, and tunnel it backwards. (If we choose something simple enough, it may even be possible to do that via parameterised ssh commands and no proxy at all). That does imply that testr itself still needs to be able to talk stdout/stdin. So – because testr has to keep doing that, I’m going to defer tackling this for now: it’s clearly scope creep and as such a dangerous temptation. Layer wise, it’s up to each server to decide how to be responsive when tests are cranky, and how to keep test output from compromising things. That does put the debugger integration work back (or at least, it leaves it as no better than the status quo) but its not in any way prejuidicial to it that I can tell.

Draft RPC spec

RPC packets will be stock subunit packets. Each packet will be for a test called ‘testrepository-rpc’ and contain a ‘application/json’ file attachment (with utf8 encoded text, per the default). The JSON message will be one of the messages defined below.

There are two endpoints, client (the initiator of the connection) and the server. Messages are not idempotent, and may be sent at any time from the client to the server. If a message requires a reply, the server may do so at any time, in any order. Subunit packets may be sent at any time from the client to the server, or the server to the client.

Overall lifecycle of a server:
  1. Client sends a Hello message.
  2. Server sends a Hello response.
  3. Both ends pick the highest common version to define future messages.
  4. Client sends commands, and server actions them.
  5. Client sends a Goodbye message.
  6. Server terminates itself.
Message definitions (version 1):
  • Hello

    Advises the peer of the protocol versions supported.
    {“msg”: “Hello”, “max”: 1, “min”: 1}

  • Goodbye

    Tells the server the client is finished and does not want to run any more tests. The server should cleanup and stop accepting messages. If the server was e.g. a trapdoor into a longer running process, it is undefined whether that longer running process should also terminate or not. No reply is permitted.
    {“msg”: “Goodbye”}

  • List

    Tells the server to list some tests. A “Done” reply is required after all the tests have been listed. The output from the command should be subunit “exists” packets describing the tests that the server can run that were listed in the message. The tests property is optional – if absent, list all available tests.
    {“msg”: “List”, “tests”: [“testid”, …], “nonce”: “arbitrary string here”}

  • Run

    Tells the server to run some tests. A “Done” reply is required after the tests have completed running. The output from the command should be a normal subunit stream resulting from running the tests specified. If the tests property is missing, run all available tests. Tests may be run in whatever order is most useful to the server.
    {“msg”: “List”, “tests”: [“testid”, …], “nonce”: “arbitrary string here”}

  • Done

    Tells the client that some requested command has completed. The nonce must be the nonce for the message that this is in reply to.
    {“msg”: “Done”, “nonce”: “arbitrary string here”}

Implementation sketch

The RPC protocol needs to be accessible to anyone doing this in Python, client *and* server, so subunit seems like the sensible place to define the protocol. It will be pure code – no IO interactions – along with sufficient feature work in subunit’s API to make glueing it into e.g. testrepository and subunit.run straight forward.

In testr, we’ll look for a new command in .testr.conf, expressed much like the run command, and use that to determine that a server mode has been requested. If the server fails to start up, thats an error (e.g. it is up to users to get compatible code in place). When listing and running tests we’ll reuse the server except in isolation modes – both –isolated and –analyze-isolation – where reusing the server would violate the contract they have.

In subunit.run, we’ll add a command line flag to opt-in to the server. In the first implementation, the server is going to just be in-line in the call stack ; no threads or anything. So each command will just be an API call within the existing testtools/unitest2 API with a single subunit packet tacked on the end. We may need to do some ugly stuff to get out of the stock run framework – but I think it is doable.