Why platform specific package systems exist and won’t go away

27Aug12

A while back mdz blogged about challenges facing Ubuntu and other Linux distributions. He raises the point that runtime libraries for Python / Ruby etc have a unique set of issues because they tend to have their own packaging systems. Merely a month later he attended Debconf 2010 where a presentation was given on the issues that Java packages have on Dpkg based systems. Since then the conversation seems to have dried up. I’ve been reminded of it recently in discussions within Canonical looking at how we deploy web services.

Matt suggested some ways forward, including:

  • Decouple applications from the core
  • Treat data as a service (rather than packages) – get data live from the web rather than going web -> distro-package -> user machines.
  • Simplify integration between packaging systems (including non-packaged things)

I think its time we revisit and expand on those points. Nothing much has changed in how Ubuntu or other distributions approach integration with other packaging systems… but the world has kept evolving. Internet access is growing ever more ubiquitous, more platforms are building packaging systems – clojure, scala, node.js, to name but three, and a substantial and ever growing number of products expect to operate in a hybrid fashion with an evolving web service plus a local client which is kept up to date via package updates. Twitter, Facebook and Google Plus are three such products. Android has demonstrated a large scale app store on top of Linux, with its own custom packaging format.

In order to expand them, we need some background context on the use cases that these different packaging systems need to support.

Platforms such as antivirus scanners, node.js, Python, Clojure and so forth care a great deal about getting their software out to their users. They care about making it extremely easy to get the latest and greatest versions of their libraries. I say this because the evidence is all around us: every successful development community / product has built a targeted package management system which layers on top of Windows, and Mac OSX, and *nux. The only rational explanation I can come up for this behaviour is that the lower level operating system package management tools don’t deliver what they need. E.g. this isn’t as shallow as wanting a packaging system written in their own language, which would be easy to write off as parochialism rather than a thoughtful solution to their problems.

In general packaging systems provide a language for shipping source or binary form, from one or more repositories, to users machines. They may support replications, and they may support multiple operating systems. They generally end up as graph traversal engines, pulling in dependencies of various sorts – you can see the DOAP specification for an attempt at generic modelling of this. One problem that turns up rapidly when dealing with Linux distribution package managers is that the versions upstream packages have, and the versions a package has in e.g. Debian, differ. They differ because at some stage, someone will need to do a new package for the distribution when no upstream change has been made. This might be to apply a local patch, or it might be to correct a defect caused by a broken build server. Whatever the cause, there is a many to one relationship between the package versions that end users see via dpkg / rpm etc, and those that upstream ship. It is a near certainty that once this happens to a library package, that comparing package versions across different distribution packages becomes hard. You cannot reliably infer whether a given package version is sufficient as a dependency or not, when comparing binary packages between Red Hat and Debian. Or Debian and Ubuntu. The result of this is that even when the software (e.g. rpm) is available on multiple distributions (say Ubuntu and RHEL), or even on multiple operating systems (say Ubuntu and Windows), that many packages will /have/ to be targeted specifically to build and execute properly. (Obviously, compilation has to proceed separately for different architectures, its more the depedency metadata that says ‘and build with version X of dependency Y’ that has to be customised).

The result of this is that there is to the best of my knowledge no distribution of binary packages that targets Debian/Ubuntu and RHEL and Suse and Windows and Mac OS X, although there are vibrant communities building distributions of and for each in isolation. Some of the ports systems come close, but they are still focused on delivering to a small number of platforms. There’s nothing that gives 99% coverage of users. And that means that to reach all their users, they have to write or adopt a new system. For any platform X, there is a strong pressure to have the platform be maintainable by folk that primarily work with X itself, or with the language that X is written in. Consider Python – there is strong pressure to use C, or Python, and nothing else, for any tools – that is somewhat parochial, but also just good engineering – reducing variables and making the system more likely to be well maintained. The closest system I know of – Steam – is just now porting to Ubuntu (and perhaps Linux in general), and has reached its massive popularity by focusing entirely on applications for Windows, with Mac OSX a recent addition.

Systems like pypi which have multi platform eggs do target the wide range of platforms I listed above, but they do so both narrowly and haphazardly: whether a binary or source package is available for a given platform is up to the maintainer of the package, and the packages themselves are dealing with a very narrow subset of the platforms complexity: Python provides compilation logic, they don’t create generic C libraries with stable ABI’s for use by other programs, they don’t have turing complete scripts for dealing with configuration file management and so forth. Anti virus updaters similarly narrow the problem they deal with, and add constraints on latency- updates of anti virus signatures are time sensitive when a new rapidly spreading threat is detected.

A minor point, but it adds to the friction of considering a single packaging tool for all needs is the different use cases of low level package management tools like dpkg or rpm vs the use cases that e.g. pypi has. A primary use case for packages on pypi is for them to be used by people that are not machine administrators. They don’t have root, and don’t want it. Contrast that with dpkg or rpm where the primary use case (to date) is the installation of system wide libraries and tools. Things like man page installation don’t make any sense for non-system-wide package systems, whereas they are a primary feature for e.g. dpkg.

In short, the per-platform/language tools are (generally):

  1. Written in languages that are familiar to the consumers of the tools.
  2. Targeted at use on top of existing platforms, by non-privileged users, and where temporary breakage is fine.
  3. Intended to get the software packaged in them onto widely disparate operating systems.
  4. Very narrow – they make huge assumptions about how things can fit together, which their specific language/toolchain permits, and don’t generalise beyond that.
  5. Don’t provide for security updates in any specific form: that is left up to folk that ship individual things within the manager.

operating system package managers:

  1. Are written in languages which are very easy to bootstrap onto an architecture, and to deploy onto bare metal (as part of installation).
  2. Designed for delivering system components, and to avoid be able to upgrade the toolchain itself safely.
  3. Originally built to install onto one operating system, ports to other operating systems are usually fragile and only adopted in niche.
  4. Are hugely broad – they install data, scripts, binaries, and need to know about late binding, system caches etc for every binary and runtime format the operating system supports
  5. Make special provision to allow security updates to be installed in a low latency fashion, without requiring anything consuming the package that is updated to change [but usually force-uninstalling anything that is super-tightly coupled to a library version].

Anti virus package managers:

  1. Exist to update daemons that run with system wide escalated privileges, or even file system layer drivers.
  2. Update datasets in realtime.
  3. Without permitting updates that are produced by third parties.

Given that, lets look at the routes Matt suggested…

Decoupling applications from the core as a strategy makes an assumption – that the core and applications are partitionable. If they are not, then applications and the core will share common elements that need to be updated together. Consider, for instance,  a Python application. If you run with a system installed Python, and it is built without zlib for some reason, but the Python application requires zlib, you have a problem. A classic example of this problem is facing Ubuntu today, with all the system provided tools moving to Python 3, but vast swathes of Python applications still being unported to Python 3 at all. Currently, the Python packaging system – virtualenv/buildout + distribute – don’t provide a way to install the Python runtime itself, but will happily install their own components for everything up the stack from the runtime. Ubuntu makes extensive use of Python for its own tools, so the system Python has a lot of packages installed which buildout etc cannot ignore – this often leads to issues with e.g. buildout, when the bootstrap environment has (say) zope.interfaces, but its then not accessible from the built-out environment that disables the standard sys.path (to achieve more robust separation). If we want to pursue decoupling, whether we build a new package manager or use e.g. virtualenv (or gem or npm or …), we’ll need to be aware of this issue – and perhaps offer, for an extended time, a dedicated no-frills, no-distro-packages install, to avoid it, and to allow an extended supported period for application authors without committing to a massive, distro sponsored porting effort. While its tempting to say we should install pip/npm/lein/maven and other external package systems, this is actually risky: they often evolve sufficiently fast that Ubuntu will be delivering an old, incompatible version of the tool to users well before Ubuntu goes out of support, or even befor the next release of Ubuntu.

Treating data as a service. All the cases I’ve seen so far of applications grabbing datasets from the web have depended on web infrastructure for validating the dataset. E.g. SSL certificates, or SSL + content checksums. Basically, small self-rolled distribution systems. I’m likely ignorant of details here, and I depend on you, dear reader, to edumacate me. There is potential value in having data repackaged, when our packaging system has behind-firewall support, and the adhoc system that (for instance) a virus scanner system has does not. In this case, I specifically mean the problem of updated a machine which has no internet access, not even via a proxy. The challenge I see it is again the cross platform issue: The vendor will be supporting Ubuntu + Debian + RHEL + Suse, and from their perspective its probably cheaper to roll their own solution than to directly support dpkg + rpm + whatever Apple offer + Windows – the skills to roll an adhoc distribution tool are more common than the skills to integrate closely with dpkg or rpm…

What about creating a set of interfaces for talking to dpkg / rpm / the system packagers on Windows and Mac OSX ? Here I think there is some promise, but it needs – as Matt said – careful thought. PackageKit isn’t sufficient, at least today.

There are, I think, two specific cases to cater to:

  1. The anti-virus / fresh data set case.
  2. The egg/gem/npm/ specific case.

For the egg/gem/npm case, we would need to support a pretty large set of common functionality, on Windows/Mac OSX / *nux (because otherwise upstream won’t adopt what we create: losing 90% of their users (windows) or 5% (mac) isn’t going to be well accepted :) . We’d need to support multiple installations (because of mutually incompatible dependencies between applications), and we’d need to support multiple language bindings in some fashion – some approachable fashion where the upstream will feel capable of fixing and tweaking what we offer. We’re going to need to support offline updates, replication, local builds, local repositories, and various signing strategies – to match the various tradeoffs made by the upstream tools.

For the anti-virus / fresh data case, we’d need to support a similar set of operating systems, though I strongly suspect that there would be more tolerance for limited support – in that most things in that space either have very platform specific code, or they are just a large-scale form of the egg/gem/npm problem, which also wants easy updates.

What next?

We should validate this discussion with at least two or three upstreams. Find out whats missing – I suspect a lot – and whats wrong – I hope not much :). Then we’ll be in a position to decide if there is a tractable, widespread solution *possible*.

Separately, we should stop fighting with upstreams that have their own packaging systems. They are satisfying different use cases than our core distro packaging systems are designed to solve. We should stop mindlessly repackaging things from e.g. eggs to debs, unless we need that specific thing as part of the transitive runtime or buildtime dependencies for the distribution itself. In particular, if us folk that build system packaging tools adopt and use the upstream application packaging tools, we can learn in a deep way the (real) advantages they have, and become more able to reason about how to unify the various engineering efforts going into them – and perhaps even eventually satisfy them using dpkg/rpm on our machines.

About these ads


19 Responses to “Why platform specific package systems exist and won’t go away”

  1. So the question is where does Chrome fit into this?

    On Windows and Mac, Chrome ships with it’s own updater which uses many tricks to keep Chrome updates as small as possible so they can be delivered quickly and cheaply. It kinda feels like it would almost be similar to the anti-virus case?

    • 2 rbtcollins

      Its arguably in the antivirus / data category yes – because: it wants push updates and it wants to install software system wide.

      • 3 naesten

        Really? System-wide? Since when is %UserProfile% system-wide?

  2. I agree almost completely, one symptom of this is how I manage php based productions systems … while the php jit compiler is installed via APT any modules like APC or Mongodb are installed and updated via Pear/PECL and not from the php-* packages. Mostly because it decouples the module versions and php versions in a way that is not handled by APT easily.

    Anyhow I need a bit more time to digest this fully and add more thoughts of my own. I will likely reply in full with my own blogpost linking back for length reasons.

  3. 5 Felipe

    I think the divide between distro package manager and platform package manager is larger than you think. I used npm for a while and I think dpkg/rpm will never be able to replace it. The reasons for why are as follows:

    1. NPM is designed to be *very* easy to deploy a package to. Distro archives are not.
    2. Interface stability in non-core packages is not important.
    3. NPM keeps every single version uploaded to the archive forever.
    4. (2) and (3) mean most applications and libraries have fairly strict version requirements on the required libraries.
    5. This means that, in order to install multiple apps, I’m likely to need to install multiple versions of each library.

    (5) is quite the opposite of what a distro package manager aims to do. Doing things locally for the application, not system-wide or user-wide, is the only sane way. If you go over the npm page, there is a FAQ about this:

    > I installed something globally, but I can’t require() it
    >
    > Install it locally.
    > The global install location is a place for command-line utilities to put their
    > bins in the system PATH. It’s not for use with require().
    > If you require() a module in your code, then that means it’s a dependency,
    > and a part of your program. You need to install it locally in your program.

    The fundamental needs are at odds with what a distro package manager supports. In time, when the platforms mature, *maybe* there will stop being a need for multiple versions of each library. I am increasingly convinced that providing distro packages is only worthwhile for ver widely used, stable libraries and applications.

    • 6 rbtcollins

      Thanks for the details! I agree with the analysis – but in principle, if dpkg supported both a global and N local per-app facilities, it could meet that use case gracefully. It definitely cannot today, and its an open question whether it /should/ ever try to do that. If it doesn’t do that, then the next question is – could we (the open source community) build a single tool to replace or at least provide the underpinnings for npm + buildout + maven + lein + …

    • 7 Jonathan

      > (5) is quite the opposite of what a distro package manager aims to do.

      Most (binary, implemented in C or C++) libraries are packaged in distros exactly that way to support run-time dependencies on a specific ABI. Am I misunderstanding?

      • 8 rbtcollins

        You understand fine. There are three subtleties here. Firstly, in many non-C/C++ languages there is usually no concept of ABI, its all API. Specifically Python, ruby, and javascript. In those languages, its not (currently, cleanly) possible to concurrently install multiple versions of a library (or when it is possible for them to be installed, its not possible for the concurrent versions to be consumed from one process (each process limited to one version) – and that causes problems with diamond dependency graphs with incompatible version constraints for compatibility. Secondly, for languages that do have an ABI/API split (e.g. via the soname implementation), upstreams are often poor at maintaining complete binary compatibility – this means that *nix distributors have to choose between manually updating the soname themselves, or using finer grained dependency rules to prevent breakage between different consuming applications, and it means you need two versions of that library with the same soname if you want two applications installed (which is impossible). Relatedly, if the distribution does choose to update the soname itself, it becomes incompatible with other distributions, due to exactly the sort of version mismatch skew I described.

  4. Good stuff.

    > the Python packaging system – virtualenv/buildout + distribute – don’t
    > provide a way to install the Python runtime itself

    Right, not directly. However, the zc.recipe.cmmi package makes it pretty easy to build things, including Python when you reall need to (details at http://bluedynamics.com/articles/jens/build-python-in-buildout).

  5. 10 Snark

    I don’t get why manpages don’t make sense for non-system installations… or perhaps I misunderstood you?

    • 11 rbtcollins

      Technically you can have ~/local paths on your MANPATH, but by default its not set, and manpath gives:
      $ manpath
      /usr/local/man:/usr/local/share/man:/usr/share/man

      So installing manpages per-user won’t actually let them be used by default. I think its because of this that most non-system packagers don’t do man pages (that and manpages are unix-centric, and most non-system packagers do Windows too…)

      • 12 Colin Watson

        manpath is not quite as static as you’re assuming here; in particular it (at least the man-db implementation) will automatically use …/man directories paralleling …/bin directories on your $PATH if they exist.

  6. In my experience, the primary reason debs are unsuitable for development use is that they assume you only want one version of a given library installed at a time. When you need to break this assumption, you have to resort to horrible hacks like putting the version number in the package name itself.

    For end-user applications this is usually fine once things have been thoroughly integration-tested; all they care about is the final product, and they appreciate not having to think about whether out-of-date libraries are left around. But it’s absolutely unsuitable for application developers, who require systems that are designed from the ground-up with development in mind.

    I wrote about this in more detail on my blog: http://technomancy.us/151

    I suspect with some work that nix could bridge the gap: http://nixos.org/nix

    • 14 rbtcollins

      So nix is indeed very interesting. And yes, the single-library-constraint is a big part of this, its that that makes packaging Python apps (for instance) with dpkg so painful compared to installing via virtualenv. Sadly, I realise I didn’t explicitly touch on that at all in my post – so thank you *very* much for bringing it up. Felipe touches on it too in an earlier comment – the global vs local split in npm.

      I’d say that this is a primary reason why local installs are attractive to end users even when they are using a mature system such as Ubuntu with thousands of packaged libraries – they get greater functionality with less failure rates than when using dpkg packages.

    • The class of package managers we’re discussing here seem to embody a distinct separation between development concerns and user concerns. In fact I’d posit that they were invented to insulate users from the complexity of the software development process (by simplifying software installation and management).

      There’s definitely a part of me that feels that this is an appropriate design tradeoff. I wonder whether one system can actually meet both sets of needs without being so complex as to be unattractive to one or both audiences.

      I think this is a case of having a very fine hammer, and trying to apply it to unsuitable nails. The Debian system’s goal is to produce a coherent installation of OS and applications which is maintained in a simple and consistent way. It does this job well enough that the end result is consumable by casual users (e.g. Ubuntu) as well as experienced system administrators. In contrast, systems like nix and conary seem to serve a much narrower audience.

  7. 16 Simon Hibbs

    >Decoupling applications from the core as a strategy makes an
    >assumption – that the core and applications are partitionable. If they
    >are not, then applications and the core will share common elements
    >that need to be updated together. Consider, for instance, a
    >Python application. …

    The mistake here is to assume it makes any sense whatsoever for a user, who happens to be a developer, to expect to be able to use system dev tools for userspace development work.

    It doesn’t.

    The default assumption should be that if a user wants to do development work they should install the dev tools they need as applications, not as system components. Traditionally Un*x systems have included dev tools such as perl and python interpreters and exposed them for the use of users, but this blurs a vital distinction, as the situation you describe shows. This is because developers traditionally have special status in Un*x culture as semi-system level users.

    Going forward there needs to be a sharp, clear and rigorously enforced distinction between components that are part of the system and components that are user applications. I might even go so far as to split out the system shell interpreters from user shells and package those as applications. After all, suppose my system is running bash v3 and I want to use bash v4, or zsh, or fish?

    All the successful user space platforms (Windows, MacOS X, iOS, Android, etc) make a sharp distinction between system component installation and update management and application installation and update management, and for very good reasons.

    Applications should not ever (if at all possible) depend on the presence of optional system level components. Dependency management just shouldn’t be a user application level concern.

    I honestly believe this is the major problem holding back Un*x systems from becoming mainstream platforms. Modern Un*x package managers are great, powerful systems with lots of very clever features, but the requirements for the installation and management of third party applications, especially commercial ones, are very different. I firmly believe Un*x will never have a truly strong, vibrant third party desktop app ecosystem until the rpms and debian packages of this world are firmly pushed back into the system maintenance role and a robust system for third party user application management that eliminates dependency management is agreed upon.

    • The system you described has emerged in the form of the (modern, application-oriented) web. It insulates users from the concerns of dependency management and versioning entirely. It does not, of course, lead to the development of a strong, vibrant ecosystem for desktop applications on any platform.

      I would not argue that this approach is without its shortcomings, but I think it has emerged as a solution to many of the problems described here.

  8. I definitely agree with your thesis, that the dream of “one package manager to rule them all” is effectively dead. The driving force is more than just tooling: it’s about control and coupling. These platform communities are doing their own release management, and don’t want or need the operating system doing it for them. Language runtimes, and in particular their modules, don’t release on the same schedule as operating systems, nor should they.

  9. Given that CPAN packages integrate quite well with deb and rpm systems, I’d say that the problem is with python (and ruby and other languages).

    To be even more specific, the problem is that the packaging systems for python etc are written by programmers rather than systems administrators, and programmers typically have little or no interest in the system that their application (or library or whatever) is running on, their interest and their focus is on THEIR application.

    The classic example is that sysadmins see important systems stuff running out of someone’s home directory and dependent on their idiosyncratic environment as a hanging offence, while programmers can’t see what the problem is or what the fuss is about.

    This is, of course, an exaggerated generalisation – but there’s enough truth in it to indicate the source of the problem.

    It probably explains why perl and CPAN modules integrate well with systems – perl is a sysadmin’s language, while python is a programmer’s language. The difference in focus results in a very different point of view and attitude towards systems.

    Aslo, as a sysadmin and as a user, I *don’t want* a dozen different incompatible versions of libfoo installed. I want one, and I want it to work, and I want all programs that use it to work with that version. If appA requires libfoo 0.1 and appB requires libfoo 0.2 then I regard that as a bug in appA that needs to be fixed – it isn’t keeping up with developments in an important library that it depends upon. I most emphatically do not see it as a sufficient reason to have both libfoo 0.1 and libfoo 0.2 installed except in special circumstances (at least C libraries allow different sonames for different lib versions – perhaps that’s something that other languages should learn from).

    If a developer happens to need a different version while they’re working on something then installing and using it from their home dir is entirely appropriate – but when their work is ready for use by others, it should integrate with the existing system (i.e. use the system version of libfoo, or create a new system package for the latest libfoo if required) rather than require everyone else to duplicate the developer’s idiosyncratic environment.

    These are all lessons that we learnt during the transition from proprietary unixes in the 80s and early 90s to package-managed linux distributions in the mid 90s and later – faffing about with hunting for and manually downloading and resolving dependencies and conflicts is a major PITA, something that the system should handle. We seem to be in the process of forgetting that important lesson.

    (btw, there seems to be a problem with your comment field. cuts off the RHS of the comments, no matter what font size i try. width specified in pixels, perhaps.)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 877 other followers

%d bloggers like this: