Dealing with deps in OpenStack

We’ve got a problem in OpenStack.. dependency management.

In this post I explore it as input to the design summit session on this in Vancouver.

Goals

We have some goals that are broadly agreed:

  1. Guarantee co-installability of a single release of OpenStack
  2. Be able to deliver known-good installs of OpenStack at any point in time – e.g. ‘this is known to work’
  3. Deliver good, clear dependency metadata to redistributors
  4. Support CD deployments of OpenStack from git. Both production and devstack for developers to hack on/with
  5. Avoid firedrills in CI – both internal situations where we run incompatible things we produced, and external situations where some dependency releases a broken version, like the pycparsing one last week
  6. Deployments using the Python dependencies should be up to date and secure
  7. Support doing upgrades in the same Python environment

Assumptions

And we have some baseline assumptions:

  1. We cooperate with the Python ecosystem – publishing our libraries to PyPI for instance
  2. Every commit of server projects is a ‘release’ from the perspective of e.g. schema management
  3. Other things release when they release, not per-commit

The current approach uses a single global list of acceptable install-requires for all our projects, and then merges that into the git trees being tested during the test. Note in particular that this doesn’t take place for things not being tested, which we install from PyPI. We create a branch of that global list for each stable release, and we also create branches of nearly everything when we do the stable release, a system that has evolved in part due to the issues in CI when new releases would break stable releases. These new branches have tightly defined constraints – e.g. “DEP >= version-at-this-release < next-point-release”‘. The idea behind this is that if the transitive closure of deps is constrained, we can install from PyPI such a version, and it won’t bring in a different version. One of the reasons we needed that was PIP bug 988, where pip takes the first occurrence of a dependency, and so servers would depend on oslo.utils which would depend on an unversioned cliff or some such, and if cliff wasn’t already installed we’d get the next releases cliff. Now – semver says we’re keeping those things compatible, but mistakes happen, and for stable branches there’s really little reason to upgrade.

Issues

We have some practical issues with the current system:

  1. Just one dependency uncapped anywhere in the wider ecosystem (including packages outside of OpenStack) that depends on a dependency that we wanted to stay unchanged, and if that dep is encountered first by the pip scanner… game over. Worse, there are components out there that introspect the installed dependencies and fail hard if one is not listed as compatible, which takes a ‘testing with unexpected version’ situation and makes it a hard error
  2. We have to run stable branches for everything, even things like OpenStackClient which are intended for end users, and are aimed at a semver rather than branched release model
  3. Due to PIP bug 2687 each time we call pip may introduce the skew that breaks the gate
  4. We don’t deliver goal 1:- because we override the requirements at test time, the actual co-installability may be different, and we don’t know
  5. We deliver goal 2 but its hard to use:- you have to dig through a specific CI log, and if the CI system has pruned it, you’re toast
  6. We don’t avoid external firedrills:- because most of our external dependencies are broad, external releases break us trivially and frequently
  7. Lastly, our requirements are too tight to support upgrades: if bug 2687 was fixed, installing the first upgraded server component would error because its requirements are declared as being incompatible with the last release.

We do deliver goals 3,4 and 6 though, which is good.

So what can we do differently? In an ideal world, can we get all 6 goals?

Proposal

I think we can. Here’s one way it could work:

  1. We fix the two pip bugs above (I’m working on that now)
  2. We teach pip about constraints *if* something is requested without actually requesting it
  3. We change our project overrides in CI to use a single constraints file rather than merging into each projects requirements
  4. The single constraints file would be exactly specified: “DEP == VERSION”, not semver or compatible matched.
  5. We make changes to the single constraints file by running a proposed set of constraints
  6. We find out that we should change the constraints file by having a periodic task which compares the constraints file to the published versions on  PyPI and proposes changes to the constraints repository automatically
  7. We loosen up the constraints in all our release branches to permit upgrade co-installability

And some optional bits…

  1. We could start testing new-library old-servers again
  2. We could potentially change our branching strategy for non-server components, but I don’t think it harms things – it may just be unnecessary
  3. We could add periodic jobs for testing with unreleased versions of dependencies

Working through each point. Bug 988 causes compatible requirements to be ignored – if we have one constraint of “X > 1.4” and another of “X > 1.3, !=1.5.1” but the “> 1.4” constraint is encountered first, we can end up with 1.5.1 installed, violating a known-bad constraint. Fixing this means that rather than having to have global knowledge of deps at the point where pip is being entered, we can have local knowledge about compatible versions in each package, and as long as the union of requirements is satisfiable, we’ll be ok. Bug 2687 causes the constraints that thing A had when it was installed by pip be ignored by the requirements checking for thing B. For instance, pip install python-openstackclient after pip install nova, will meet python-openstackclient’s requirements even if that means breaking nova’s requirements.

The reason we can’t just use a requirements file today, is that a requirements file specifies what needs to be installed as well as what versions are acceptable. We don’t want devstack, when configured for nova-network, to install neutron dependencies. But it would today unless we put in place a bunch of complex processing logic. Whereas pip could do this very easily internally.

Merging each requirement into things we’re installing from git fails when we install releases – e.g. of client libraries, in particular because of the interactions with bug 988 above. A single constraints file could include all known good versions of everything we might use, and would apply globally in concert with local project requirements. Best of both worlds, in theory 🙂

The use of inexact versions is a hard limitation today – we can’t upgrade multiple project trees local version needs atomically, and because we’re supplying all the version constraints in one place – the project’s merged install_requirements – they have to be broad enough to co-exist during changes to the requirements, and to remain co-installed during upgrades from release to release of OpenStack. But inexact versions leads to variation in CI – every single run becomes a gamble. The primary goal of CI is to tell  us whether a new commit X meets all of our quality criteria – change one thing at a time. Running with every new version of every dependency doesn’t tell us more about X, it tells us about ecosystem things. Using exact constraints will solve this: we’ll decouple ‘update dependencies’ or ‘pycparsing Y is broken’ from testing X – e.g. ‘improve nova cells’.

We need to be able to update those dependencies though, and the existing global requirements mechanisms are pretty much right, they just need to work with a constraints file instead of patching each repo at test time. We will still want to check that the local requirements are compatible with the global constraints file.

One of the big holes such approaches have is that we may miss out on important improvements – security, performance or just plain old features – if we don’t update our constraints. So we need to be on top of that. A small amount of automation can give us a lot of assistance on that. Just try the new versions and if they work – great. If they don’t, show a failing proposal where we can assess what to do.

As I mentioned earlier today we can’t actually upgrade: kilo’s version locks exclude liberty versions of our libraries, meaning that trying to upgrade nova/kilo to nova/liberty will bring in library versions that conflict with the version deps neutron expresses. We need to open up the project local requirements to avoid this – and we also need to make some guarantees about compatibility with our prior release in our library development (otherwise rebooting a server with only one component upgraded will be a gamble).

Making those guarantees will either require testing every commit against the prior server, or if we can find some way of doing it, testing proposed releases against the prior servers – which would allow more latitude during development of our libraries. The use of constraints files will give us hermetic insulation against bad releases though – we’ll be able to stay productive while we fix the issue and issue a new better release. The crucial thing is to have a tight feedback loop though – so I’m in favour of us either testing each commit against last-stable, or figuring out the ‘tests before releases’ logic (perhaps by removing direct tag access and instead having a thing we propose the intent to as a review).

All this might be enough that we choose to make less stable branches of libraries and go back to plain semver – but its not a requirement: thats something we can discuss in detail if people care, or just wait and see what the overheads and benefits of keeping those branches are.

Lastly, this new structure will make it possible, if we want to, to test that unreleased versions of external dependencies work with a given component, by using a periodic job. Why periodic? There are two sides to each dependencies, and neither side would want their gate to wedge if an accident breaks the other side. E.g. using two of our own components – oslo.messaging and nova. oslo.messaging releases must not break nova, but an individual oslo.messaging commit isn’t necessarily constrained (if we have the before-release testing described above). External dependencies are exactly the same, except even less closely aligned than intra-OpenStack components. So running tests with a git version of e.g. libvirt in a periodic job might give us (and libvirt) valuable prior warning about issues.