Bootstrapping developer environments for OpenStack

When I first joined OpenStack, it was incredibly hard to get a development environment working. Its better now in some ways, but not in others.

There are three key things.

Firstly, if you’re going to run devstack (or equally tempest-on-devstack), make a dedicated VM to do it. Use whatever hypervisor you want, but don’t you dare run it on baremetal. Yes it can work on baremetal, but you’ll spend way more time learning about quirks than getting stuff done. Once you’re familiar with devstack you can make an educated choice.

Secondly, if you’re not running devstack [e.g. you’re just hacking on an oslo library or gertty or something] there are a bunch of system packages that will need installing to let various Python packages compile. This was one of the places that was hardest to get going on when I started, and still is pretty hard. I started the bindep project to address this, and its in the process of being rolled out now within OpenStack projects. If the project you’re hacking on has an other-requirements.txt file, then you can just do sudo apt-get install $(bindep -b) – but you’ll need bindep installed first, and its not in distros yet. If there is no such file there is a master fallback one which brings in everything maintained by the OpenStack Infra team. Download it somewhere and add the -f bindep-fallback.txt to your bindep call.

Which brings me to the last point. How do you get bootstrapped with bindep and tox and all the other things that you’ll need. The OpenStack developer docs tell you to install system-wide using apt-get install, and then system-wide install tox. This usually works, but isn’t suitable for multi-user machines, and some distributions don’t include enough metadata for pip to do the right thing, and its confusing to have two different systems managing the same packages.

I prefer to use virtualenv directly, with no reliance on system packages other than truely global things like Python itself. virtualenv is a lovely thing, and it bundles three other important Python tools – pip, wheel and setuptools. All combined this will let you get a fully up to date development environment going without root access – and without confusing your package manager. I still use it even with Python versions that include the venv module, because having the bundled components be up to date is important.

With a virtualenv to host tox and bindep and any other globally-relevant tools, you can run those tools without interacting with your system package manager at all[except when you finally remove the version of Python they use from the system].

Here is how I accomplish this.

  1. Start with Python [any version (2.7 or newer)] installed.
  2. pick a place to make the virtualenv. E.g. ~/devtools
  3. Get a copy of virtualenv to run locally from source. 13.1.0 is a good safe version (at time of writing).
  4. cd into virtualenv-13.1.0 (or whatever number)
  5. Run python ./virtualenv.py ~/devtools
  6. Activate the virtualenv
  7. Install bindep and tox and anything else you need globally using pip. The pip in the virtualenv will be nice and fresh so you’ll get full caching and so on.
  8. (optional) symlink those tools into your user specific path. E.g. from ~/devtools/bin to ~/bin, or some similar directory.
    There are two key gotchas here. Firstly, if you use a bind-dir mounted home in containers, you won’t want this in your user path, since venvs are not very portable. Instead make a per-container path directory you can symlink into.
    Secondly, there is cruft in ~/devtools/bin, so don’t even for a second think of just adding it to your PATH: you’ll end up running the python within the virtualenv by mistake at some point and deeply regretting it.
    If you don’t do this, then just activate the virtualenv any time you want to use the tools.

And now you can circle back, use bindep to install binary dependencies, and off you go!

Revisiting the Fixture API – handling leaky resources

Fixtures are one of the innovations I’m most happy with.

A Fixture is an enhanced context manager. The enhancements are:

  • There’s an API for gathering debugging information from the fixture (rather than depending on side effects such as the logging module or stdout). This makes it easy to attach log files from servers (for instance rabbitfixture does this).
  • There is glue to support composing other fixtures while still exposing errors from any fixture in the composed set.

OpenStack’s Neutron has been using fixtures in its test suite for some time, but is finding that writing correct fixtures is hard. In particular, they were leaking processes when a fixture would fail during setUp / __enter__ – and then not be cleaned up by the testtools / fixtures useFixture function.

There are several things we can do to improve the situation.

  • We could make the convenience APIs like useFixture add a try:/finally: and call cleanUp() when setUp fails. This involves making cleanUp() be callable in more situations than it is today.
  • We could make setUp itself do that, advising users to override a different function; this would hide the failure interactions internally, but wouldn’t benefit existing fixtures until they are rewritten to not override setUp.
  • We could provide a decorator that folk with fragile setUp’s (e.g. those that involve IO) could use to robustify their fixtures.

The highest leverage change is the first, but is it safe and suitable? Lets look at PEP-343.

In PEP-343 we see the following translation of with expressions:

with EXPR as VAR:
    BLOCK
....
mgr = (EXPR)
exit = type(mgr).__exit__
value = type(mgr).__enter__(mgr)
exc = True
try:
    try:
        VAR = value
        BLOCK
    except:
        exc = False
        if not exit(mgr, *sys.exc_info()):
            raise
finally:
    if exc:
        exit(mgr, None, None, None)

This means that using a Fixture which may leak external resources when setUp fails is unsafe via with. Therefore we can’t use the first solution.

Decorators are nice, but somewhat noisy and opt-in. Both decorators and a different setUp in the base class will require extending the protocol to specify when cleanUp can be called more precisely.

If we make the documentation advise users to override a specific method, and setUp does this in the event of failure, I think we’ll have somewhat more uptake. So – thats the route I’m going to head down.

There’s one more thing to consider, which is access to debugging information of failures in setUp. Since the object will have been cleaned up, accessing logs etc will be hard. I think if we raise an additional exception into the MultiException with the details objects, it will be possible for fixtures to provide those details, though they will need buffering in memory (or some sophisticated lazy-delete logic such as holding a reference to an unlinked fd).

Improving dependency handling upstream (for openstack)

This is, in part, a follow up to my post a few weeks ago.

I want to touch on the things we need to improve to have robust plumbing supporting openstack’s CI and devstack needs.

Extras

We want to be able to use ‘extras’ to declare the dependencies needed for different backends. This is a setuptools requirement syntax where a project can advertise additional dependencies for different use cases, which users (or other depending projects) can then trigger using '[]'. E.g. 'pip install requests[security]' says ‘install requests and the additional ‘security’ extras. We don’t know yet whether we will use 'nova[mysql]' or 'nova oslo.db[mysql]', but something like that. To use this we need to:

  1. teach pbr about reflecting requirements into the 'extras_require' keyword to setup (because while setuptools supports it in setup.py, we want a constant value setup.py with everything about individual projects declarative).  James Polley has a patch for pbr.
  2. Fix pip to handle 'pip install ./nova[mysql]'. This is issue 1236 – which has an open PR that may fix it. We should help review and test it.

Testing different setups may well need a similar facility, but its not clear yet how to best express that. We may need to standardise on using an extra called 'test' and just ensure our tox.ini knows to install that. That would be nice anyway, to get away from having to know about 'test_requirements.txt'.

pip dependency resolution

Currently pip has a very straight forward resolution algorithm: Only user supplied requirements can conflict at all, and the first mention of any distribution causes a distribution to be selected that matches that mention – all other mentions are simply ignored. This is issue 988, and its one of a cluster that affect OpenStack. The impact on OpenStack is that we have things install ok with pip, and then break in CI, because an incompatible version is installed. I have a patch up for this. Early adopters solicited!

incremental installations need dependency resolution

Say you’ve installed Neutron, which depends on oslo.db >=1.10. And you then install an older Nova which depends on oslo.db <1.10. What should happen? Ideally an error in this case, because the requirements are disjoint. And if they do overlap, the installed version should be adjusted to be compatible. Right now, no error occurs and oslo.db will be downgraded breaking Neutron. This is pip issue 2687. Currently no-one is working on this, and since it requires dependency resolution, fixing 988 first makes a lot of sense. It should be possible to at least make things error with a much more shallow patch though, if someone wished to work on it right now – or you could build on top of my resolver branch. This has also been a cause of numerous CI failures when we do releases, typically right around the time the servers branch. One thing that might be nice for us, since we know a full set of working packages, is to be able to say upfront to pip what versions are compatible, and then let only the needed things be brought in. pip issue 2731

PEP-426 environment markers need polish

PEP-426 introduced a micro-language for describing the situations when a particular dependency applies. For instance, to use argparse on Python < 2.7, you can say "python_version<'2.7'" as a marker for the argparse entry in your requirements. But there are some rough edges.

  • Some comparison operators are missing.
  • The documentation and user guidance needs improvement.
  • Environment markers can’t be used inside individual requirements, only as a filter on extra_requires. To express the argparse example above today (using a working operator), you need to pass the following to setup().
    extra_requires={':python_version=="2.6"': ['argparse']}

    It would be more straightforward to permit the syntax pip supports, where each requirement can be annotated with a marker.

    install_requires=['argparse:python_version=="2.6"']

    This might be setuptools issue 353.

  • pbr doesn’t reflect environment markers from its input files (requirements.txt etc) into setup keyword argument. James Polley has a patch for this (the same one enabling extras support in setup.cfg).

pip handling setup_requires

We run into setup_requires in two places in OpenStack; firstly we use that ourselves for pbr, but to avoid triggering easy_install we manually install pbr everywhere ourselves. Secondly, projects that are in the transitive dependencies of OpenStack use setup_requires, and we end up triggering easy_install for them. easy_install is a concern for us because of the decreased reliability and issues with corporate egress firewalls, and its security is not as robust as pips – and there’s no reason it should be, with pip being such a good tool.

However pip can’t handle setup_requires today. Doing so requires changes to setuptools and to pip.

  • setuptools needs some way to report to pip what the setup_requires are without triggering easy_install. Ronny Pfannschmidt has mentioned he may be working on this, but I’m not sure if there is a patch ready or not. A possible further enhancement would be to put the setup_requires in setup.cfg in a totally declarative fashion, but this may require environment marker support first, since the current procedural approach is very flexible and can take Python version and platform into account.
  • pip needs to be able to temporarily put things that it won’t be installing into the PYTHONPATH for packages it is building. The current internals are not suited for this (the target and source and needs of requirements being downloaded are all confounded). However once my resolver patch lands, there will be a nice cache layer that can deliver a ready-to-install directory for any requirement, which should make a simple recursive implementation quite reasonable. The resolver work will probably need further refactoring to make the resolver be decoupled from the user supplied requirements, but compared to the ground already covered, that should be straight forward. One thing folk tackling this should be aware of is an open question around location requirements. Say someone is installing foo from a git repository. And foo is also a setup requirement of some other package bar being installed at the same time. Should that foo from git be used for the setup of bar? I’m not sure of the answer (what if the version of foo is incompatible with version bar needs?) – but one is needed :).

So thats about it – if you’re interested in helping the plumbing that supports OpenStacks CI and devstack systems, please pick one of these issues and help out. Test patches, review code, write a patch, or just tell me why we don’t need to do something :)

Dealing with deps in OpenStack

We’ve got a problem in OpenStack.. dependency management.

In this post I explore it as input to the design summit session on this in Vancouver.

Goals

We have some goals that are broadly agreed:

  1. Guarantee co-installability of a single release of OpenStack
  2. Be able to deliver known-good installs of OpenStack at any point in time – e.g. ‘this is known to work’
  3. Deliver good, clear dependency metadata to redistributors
  4. Support CD deployments of OpenStack from git. Both production and devstack for developers to hack on/with
  5. Avoid firedrills in CI – both internal situations where we run incompatible things we produced, and external situations where some dependency releases a broken version, like the pycparsing one last week
  6. Deployments using the Python dependencies should be up to date and secure
  7. Support doing upgrades in the same Python environment

Assumptions

And we have some baseline assumptions:

  1. We cooperate with the Python ecosystem – publishing our libraries to PyPI for instance
  2. Every commit of server projects is a ‘release’ from the perspective of e.g. schema management
  3. Other things release when they release, not per-commit

The current approach uses a single global list of acceptable install-requires for all our projects, and then merges that into the git trees being tested during the test. Note in particular that this doesn’t take place for things not being tested, which we install from PyPI. We create a branch of that global list for each stable release, and we also create branches of nearly everything when we do the stable release, a system that has evolved in part due to the issues in CI when new releases would break stable releases. These new branches have tightly defined constraints – e.g. “DEP >= version-at-this-release < next-point-release”‘. The idea behind this is that if the transitive closure of deps is constrained, we can install from PyPI such a version, and it won’t bring in a different version. One of the reasons we needed that was PIP bug 988, where pip takes the first occurrence of a dependency, and so servers would depend on oslo.utils which would depend on an unversioned cliff or some such, and if cliff wasn’t already installed we’d get the next releases cliff. Now – semver says we’re keeping those things compatible, but mistakes happen, and for stable branches there’s really little reason to upgrade.

Issues

We have some practical issues with the current system:

  1. Just one dependency uncapped anywhere in the wider ecosystem (including packages outside of OpenStack) that depends on a dependency that we wanted to stay unchanged, and if that dep is encountered first by the pip scanner… game over. Worse, there are components out there that introspect the installed dependencies and fail hard if one is not listed as compatible, which takes a ‘testing with unexpected version’ situation and makes it a hard error
  2. We have to run stable branches for everything, even things like OpenStackClient which are intended for end users, and are aimed at a semver rather than branched release model
  3. Due to PIP bug 2687 each time we call pip may introduce the skew that breaks the gate
  4. We don’t deliver goal 1:- because we override the requirements at test time, the actual co-installability may be different, and we don’t know
  5. We deliver goal 2 but its hard to use:- you have to dig through a specific CI log, and if the CI system has pruned it, you’re toast
  6. We don’t avoid external firedrills:- because most of our external dependencies are broad, external releases break us trivially and frequently
  7. Lastly, our requirements are too tight to support upgrades: if bug 2687 was fixed, installing the first upgraded server component would error because its requirements are declared as being incompatible with the last release.

We do deliver goals 3,4 and 6 though, which is good.

So what can we do differently? In an ideal world, can we get all 6 goals?

Proposal

I think we can. Here’s one way it could work:

  1. We fix the two pip bugs above (I’m working on that now)
  2. We teach pip about constraints *if* something is requested without actually requesting it
  3. We change our project overrides in CI to use a single constraints file rather than merging into each projects requirements
  4. The single constraints file would be exactly specified: “DEP == VERSION”, not semver or compatible matched.
  5. We make changes to the single constraints file by running a proposed set of constraints
  6. We find out that we should change the constraints file by having a periodic task which compares the constraints file to the published versions on  PyPI and proposes changes to the constraints repository automatically
  7. We loosen up the constraints in all our release branches to permit upgrade co-installability

And some optional bits…

  1. We could start testing new-library old-servers again
  2. We could potentially change our branching strategy for non-server components, but I don’t think it harms things – it may just be unnecessary
  3. We could add periodic jobs for testing with unreleased versions of dependencies

Working through each point. Bug 988 causes compatible requirements to be ignored – if we have one constraint of “X > 1.4″ and another of “X > 1.3, !=1.5.1″ but the “> 1.4″ constraint is encountered first, we can end up with 1.5.1 installed, violating a known-bad constraint. Fixing this means that rather than having to have global knowledge of deps at the point where pip is being entered, we can have local knowledge about compatible versions in each package, and as long as the union of requirements is satisfiable, we’ll be ok. Bug 2687 causes the constraints that thing A had when it was installed by pip be ignored by the requirements checking for thing B. For instance, pip install python-openstackclient after pip install nova, will meet python-openstackclient’s requirements even if that means breaking nova’s requirements.

The reason we can’t just use a requirements file today, is that a requirements file specifies what needs to be installed as well as what versions are acceptable. We don’t want devstack, when configured for nova-network, to install neutron dependencies. But it would today unless we put in place a bunch of complex processing logic. Whereas pip could do this very easily internally.

Merging each requirement into things we’re installing from git fails when we install releases – e.g. of client libraries, in particular because of the interactions with bug 988 above. A single constraints file could include all known good versions of everything we might use, and would apply globally in concert with local project requirements. Best of both worlds, in theory :)

The use of inexact versions is a hard limitation today – we can’t upgrade multiple project trees local version needs atomically, and because we’re supplying all the version constraints in one place – the project’s merged install_requirements – they have to be broad enough to co-exist during changes to the requirements, and to remain co-installed during upgrades from release to release of OpenStack. But inexact versions leads to variation in CI – every single run becomes a gamble. The primary goal of CI is to tell  us whether a new commit X meets all of our quality criteria – change one thing at a time. Running with every new version of every dependency doesn’t tell us more about X, it tells us about ecosystem things. Using exact constraints will solve this: we’ll decouple ‘update dependencies’ or ‘pycparsing Y is broken’ from testing X – e.g. ‘improve nova cells’.

We need to be able to update those dependencies though, and the existing global requirements mechanisms are pretty much right, they just need to work with a constraints file instead of patching each repo at test time. We will still want to check that the local requirements are compatible with the global constraints file.

One of the big holes such approaches have is that we may miss out on important improvements – security, performance or just plain old features – if we don’t update our constraints. So we need to be on top of that. A small amount of automation can give us a lot of assistance on that. Just try the new versions and if they work – great. If they don’t, show a failing proposal where we can assess what to do.

As I mentioned earlier today we can’t actually upgrade: kilo’s version locks exclude liberty versions of our libraries, meaning that trying to upgrade nova/kilo to nova/liberty will bring in library versions that conflict with the version deps neutron expresses. We need to open up the project local requirements to avoid this – and we also need to make some guarantees about compatibility with our prior release in our library development (otherwise rebooting a server with only one component upgraded will be a gamble).

Making those guarantees will either require testing every commit against the prior server, or if we can find some way of doing it, testing proposed releases against the prior servers – which would allow more latitude during development of our libraries. The use of constraints files will give us hermetic insulation against bad releases though – we’ll be able to stay productive while we fix the issue and issue a new better release. The crucial thing is to have a tight feedback loop though – so I’m in favour of us either testing each commit against last-stable, or figuring out the ‘tests before releases’ logic (perhaps by removing direct tag access and instead having a thing we propose the intent to as a review).

All this might be enough that we choose to make less stable branches of libraries and go back to plain semver – but its not a requirement: thats something we can discuss in detail if people care, or just wait and see what the overheads and benefits of keeping those branches are.

Lastly, this new structure will make it possible, if we want to, to test that unreleased versions of external dependencies work with a given component, by using a periodic job. Why periodic? There are two sides to each dependencies, and neither side would want their gate to wedge if an accident breaks the other side. E.g. using two of our own components – oslo.messaging and nova. oslo.messaging releases must not break nova, but an individual oslo.messaging commit isn’t necessarily constrained (if we have the before-release testing described above). External dependencies are exactly the same, except even less closely aligned than intra-OpenStack components. So running tests with a git version of e.g. libvirt in a periodic job might give us (and libvirt) valuable prior warning about issues.

A few thoughts on defects vs bugs vs blueprints vs tasks

One of the constant debates while I’ve been programming has been that of how to organise work. Are bugs different to blueprints?

I’ve been mulling over a new tracker for a while (just because none of the ones out there /really/ fit me) and was thinking about this angle over the weekend.

Defects/bugs/crash reports/blueprints/specs all share two common themes: firstly they are associated with a delta between someones desired behaviour of the code, and the actual code. Secondly, they may represent a commitment to actually enact that change, but such a commitment is not a guaranteed feature of any of these things. One can write a specification and fail to get consensus and agreement on it in the community. The presence of a bug report is not agreement that the thing is wrong.

I think tasks then are a good place to factor out such commitments – they probably want to be linked to the reason for the task, design documents, crash data etc. A number of projects I’ve been involved with have used Kanban boards to manage their work in progress – essentially each card on the board is a task, not a bug/spec/etc/etc.

Another interesting thing about considering the resulting task as a separate thing is that it provides an understandable boundary between ‘my view’ and ‘your view’ for scheduling work on a shared codebase. E.g. consider a codebase with many drivers such as Neutron. One organisation may care a great deal about the driver for switch vendor X. Another may care about vendor Y – they likely won’t  both be considering bugs in the other drivers to be of equal importance. So priority or importance is only understandable in the context of a single organisation, whereas something like impact is relevant in the context of users. Users can probably be partitioned up too – users that use driver X will naturally be much more impacted by issues within driver X – but users generally are not as heavy consumers of project trackers as developers are.

I’ve no final conclusion to draw here yet, just starting a discussion on it :)

what-poles-for-the-tent

So Monty and Sean have recently blogged about about the structures (1, 2) they think may work better for OpenStack. I like the thrust of their thinking but had some mumblings of my own to add.

Firstly, I very much like the focus on social structure and needs – what our users and deployers need from us. That seems entirely right.

And I very much like the getting away from TC picking winners and losers. That was never an enjoyable thing when I was on the TC, and I don’t think it has made OpenStack better.

However, the thing that picking winners and losers did was that it allowed users to pick an API and depend on it. Because it was the ‘X API for OpenStack’. If we don’t pick winners, then there is no way to say that something is the ‘X API for OpenStack’, and that means that there is no forcing function for consistency between different deployer clouds. And so this appears to be why Ring 0 is needed: we think our users want consistency in being able to deploy their application to Rackspace or HP Helion. They want vendor neutrality, and by giving up winners-and-losers we give up vendor neutrality for our users.

Thats the only explanation I can come up with for needing a Ring 0 – because its still winners and losers (e.g. picking an arbitrary project) keystone, grandfathering it in, if you will. If we really want to get out of the role of selecting projects, I think we need to avoid this. And we need to avoid it without losing vendor neutrality (or we need to give up the idea of vendor neutrality).

One might say that we must pick winners for the very core just by its, but I don’t think thats true. If the core is small, many people will still want vendor neutrality higher up the stack. If the core is large, then we’ll have a larger % of APIs covered and stable granting vendor neutrality. So a core with fixed APIs will be under constant pressure to expand: not just from developers of projects, but from users that want API X to be fixed and guaranteed available and working a particular way at [most] OpenStack clouds.

Ring 0 also fulfils a quality aspect – we can check that it all works together well in a realistic timeframe with our existing tooling. We are essentially proposing to pick functionality that we guarantee to users; and an API for that which they have everywhere, and the matching implementation we’ve tested.

To pull from Monty’s post:

“What does a basic end user need to get a compute resource that works and seems like a computer? (end user facet)

What does Nova need to count on existing so that it can provide that. ”

He then goes on to list a bunch of things, but most of them are not needed for that:

We need Nova (its the only compute API in the project today). We don’t need keystone (Nova can run in noauth mode and deployers could just have e.g. Apache auth on top). We don’t need Neutron (Nova can do that itself). We don’t need cinder (use local volumes). We need Glance. We don’t need Designate. We don’t need a tonne of stuff that Nova has in it (e.g. quotas) – end users kicking off a simple machine have -very- basic needs.

Consider the things that used to be in Nova: Deploying containers. Neutron. Cinder. Glance. Ironic. We’ve been slowly decomposing Nova (yay!!!) and if we keep doing so we can imagine getting to a point where there truly is a tightly focused code base that just does one thing well. I worry that we won’t get there unless we can ensure there is no pressure to be inside Nova to ‘win’.

So there’s a choice between a relatively large set of APIs that make the guaranteed available APIs be comprehensive, or a small set that that will give users what they need just at the beginning but might not be broadly available and we’ll be depending on some unspecified process for the deployers to agree and consolidate around what ones they make available consistently.

In sort one of the big reasons we were picking winners and losers in the TC was to consolidate effort around a single API – not implementation (keystone is already on its second implementation). All the angst about defcore and compatibility testing is going to be multiplied when there is lots of ecosystem choice around APIs above Ring 0, and the only reason that won’t be a problem for Ring 0 is that we’ll still be picking winners.

How might we do this?

One way would be to keep picking winners at the API definition level but not the implementation level, and make the competition be able to replace something entirely if they implement the existing API [and win hearts and minds of deployers]. That would open the door to everything being flexible – and its happened before with Keystone.

Another way would be to not even have a Ring 0. Instead have a project/program that is aimed at delivering the reference API feature-set built out of a single, flat Big Tent – and allow that project/program to make localised decisions about what components to use (or not). Testing that all those things work together is not much different than the current approach, but we’d have separated out as a single cohesive entity the building of a product (Ring 0 is clearly a product) from the projects that might go into it. Projects that have unstable APIs would clearly be rejected by this team; projects with stable APIs would be considered etc. This team wouldn’t be the TC : they too would be subject to the TC’s rulings.

We could even run multiple such teams – as hinted at by Dean Troyer one of the email thread posts. Running with that I’d then be suggesting

  • IaaS product: selects components from the tent to make OpenStack/IaaS
  • PaaS product: selects components from the tent to make OpenStack/PaaS
  • CaaS product (containers)
  • SaaS product (storage)
  • NaaS product (networking – but things like NFV, not the basic Neutron we love today). Things where the thing you get is useful in its own right, not just as plumbing for a VM.

So OpenStack/NaaS would have an API or set of APIs, and they’d be responsible for considering maturity, feature set, and so on, but wouldn’t ‘own’ Neutron, or ‘Neutron incubator’ or any other component – they would be a *cross project* team, focused at the product layer, rather than the component layer, which nearly all of our folk end up locked into today.

Lastly Sean has also pointed out that we have large N N^2 communication issues – I think I’m proposing to drive the scope of any one project down to a minimum, which gives us more N, but shrinks the size within any project, so folk don’t burn out as easily, *and* so that it is easier to predict the impact of changes – clear contracts and APIs help a huge amount there.

Test processes as servers

Since its very early days subunit has had a single model – you run a process, it outputs test results. This works great, except when it doesn’t.

On the up side, you have a one way pipeline – there’s no interactivity needed, which makes it very very easy to write a subunit backend that e.g. testr can use.

On the downside, there’s no interactivity, which means that anytime you want to do something with those tests, a new process is needed – and thats sometimes quite expensive – particularly in test suites with 10’s of thousands of tests.Now, for use in the development edit-execute loop, this is arguably ok, because one needs to load the new tests into memory anyway; but wouldn’t it be nice if tools like testr that run tests for you didn’t have to decide upfront exactly how they were going to run. If instead they could get things running straight away and then give progressively larger and larger units of work to be run, without forcing a new process (and thus new discovery directory walking and importing) ? Secondly, testr has an inconsistent interface – if testr is letting a user debug things to testr through to child workers in a chain, it needs to use something structured (e.g. subunit) and route stdin to the actual worker, but the final testr needs to unwrap everything – this is needlessly complex. Lastly, for some languages at least, its possibly to dynamically pick up new code at runtime – so a simple inotify loop and we could avoid new-process (and more importantly complete-enumeration) *entirely*, leading to very fast edit-test cycles.

So, in this blog post I’m really running this idea up the flagpole, and trying to sketch out the interface – and hopefully get feedback on it.

Taking subunit.run as an example process to do this to:

  1. There should be an option to change from one-shot to server mode
  2. In server mode, it will listen for commands somewhere (lets say stdin)
  3. On startup it might eager load the available tests
  4. One command would be list-tests – which would enumerate all the tests to its output channel (which is stdout today – so lets stay with that for now)
  5. Another would be run-tests, which would take a set of test ids, and then filter-and-run just those ids from the available tests, output, as it does today, going to stdout. Passing somewhat large sets of test ids in may be desirable, because some test runners perform fixture optimisations (e.g. bringing up DB servers or web servers) and test-at-a-time is pretty much worst case for that sort of environment.
  6. Another would be be std-in a command providing a packet of stdin – used for interacting with debuggers

So that seems pretty approachable to me – we don’t even need an async loop in there, as long as we’re willing to patch select etc (for the stdin handling in some environments like Twisted). If we don’t want to monkey patch like that, we’ll need to make stdin a socketpair, and have an event loop running to shepard bytes from the real stdin to the one we let the rest of Python have.

What about that nirvana above? If we assume inotify support, then list_tests (and run_tests) can just consult a changed-file list and reload those modules before continuing. Reloading them just-in-time would be likely to create havoc – I think reloading only when synchronised with test completion makes a great deal of sense.

Would such a test server make sense in other languages?  What about e.g. testtools.run vs subunit.run – such a server wouldn’t want to use subunit, but perhaps a regular CLI UI would be nice…