Distributed bugtracking – quick thoughts

Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new – the earliest I personally encountered was Aaron Bentley’s Bugs everywhere – coming up on it’s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn’t taken over the world – and I think this is in large part due to the propogation nature of bugs being very different to code – different solutions are needed.

XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts – semantic conflicts are common, and representation conflicts extremely common.The idions

Take for example https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/805661. Here we can look at the nature of the content:

  1. Concurrent cannot-conflict content – e.g. the discussion about the bug. In general everyone should have this in their local bug database as soon as possible, and anyone can write to it.
  2. Observations of fact – e.g. ‘the code change that should fix the bug has landed in Ubuntu’ or ‘Commit C should fix the bug’.
  3. Reports of symptoms – e.g. ‘Foo does not work for me in Ubuntu with package versions X, Y and Z’.
  4. Collaboratively edited metadata – tags, title, description, and arguably even the fields like package, open/closed, importance.

Note that only one of these things – the commit to fix the bug – happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1).

Anyhow – my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline – e.g. adding a tag ‘foo’ is an event to add ‘foo’, rather than an event setting the tags list to ‘bar,foo’ – then multiple editors adding ‘foo’ do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though – last-writer-wins isn’t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low – so one could defer that to manual fixups.

Further, as a developer wanting local access to my bug database, syncing all of these things is appealing – but if I’m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space).

Lastly, I don’t think distributed bug tracking is needed – it doesn’t solve a deeply burning use case – offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality – something I think we badly need.


14 thoughts on “Distributed bugtracking – quick thoughts

  1. Not only does not distributed bugtracking not solve any important use case, I think it actually misses the point. A bug tracking system is, in big part, a communication tool that helps in organizing the project. But for that the information needs to be kept in sync. So different approach is needed from code where it’s reasonable to work on some feature for days without having to synchronize.

    1. Forking conversations isn’t particularly useful (and the UI implications are -huge-). Federating conversations might be very useful – but this is not the same as building a single distributed system. Bug collaboration between related projects – two primary cases – firstly variants of the project where one can reasonablyexpect the same bug, and secondly distinct projects where collaboration is required to solve the one bug. The first one, separate conversations will be required, because the bugs will be actioned separately – and bug trackers are primarily conversations. The second one likewise, and in both cases we need separate metadata for open/closed/priority/etc.

      symptom reports and fact noting could well be copied around effectively.

      So if we make some features up:
      – be able to reply offline
      – have replies synced to arbitrarily many locations
      – be able to change status metadata and have that replicated

      and so on

      I think we’ll find, very quickly, that folk don’t actually gain anything other than offline from /most/ of these features. It costs to start merging things with other people, and while with development of code that makes sense – need a stable base to evolve the code while you hack on a feature – it doesn’t bring anything to counteract the overheads when looking at a collaborative forum like a bug tracker.

      But – offline is huge. And perhaps the best way to create a really robust offline bug tracker would be to make one that is distributed.

      *separately* I think it may make a lot of sense to federate bug trackers – to be able to, on a per bug basis, link up to other trackers; and again good distributed plumbing might be the way forward, but at this stage I don’t think it is, at least as distributed things are considered in this post-DVCS world.

      I should do an essay about this, if this isn’t obvious 🙂

  2. Nice breakdown of the difference between tracking code and tracking issues.

    > I don’t think distributed bug tracking is needed
    > – it doesn’t solve a deeply burning use –
    > case offline access would be a 90% solution for most people.

    I disagree here. Are perhaps the “most people” for whom you consider offline access to be a 90% solution developers who only need to focus on the small part of the world they are responsible for? Practically every software project has number of dependencies, and they have dependencies of their own, and communication between those groups is I think today still a major challenge. There is an awful amount of duplicated effort and lack of coordination between projects in the free software ecosystem, which affects project managers, users, developers, distribution integrators, …

    If you have the time to look at yet another tool I have been working a distributed bug tracker of my own – bif (http://bifax.org/bif/). It doesn’t attempt to use the underlying VCS for storage, and so keeps a bit of distance between code and issues. With bif I am trying to make a serious attempt at truly distributed bug tracking.

    p.s. XXXX paragraph not quite finished?

    1. Argh indeed the XXXX paragraph was for me to finish drafting. Sadface.

      So by a 90% solution. I think my assertion is that if you take the amount of bugtracking-and-related work folk do, and you ask yourself how much of it is (or would be if they could):
      A – working with a forked copy of the bug data
      B – working with the primary data
      C – performing data mining (e.g. find oldest/most impact/most frequent etc)
      D – something something other

      You’d find for both upstream and downstream (e.g. Debian) developers 90% of what they do is in, and would always be in, the primary repository, and that those things that aren’t are enough of an exception that having them flow automatically would be a negative – e.g. imagine if every gnome bug report flowed into Debian, or vice versa? Bugs that only show up in RHEL releases turning up in Debian would drive Debian maintainers mad. We know from previous experience that forwarding unassessed bugs to Gnome from Ubuntu resulted in poor outcomes – many of the bugs were Ubuntu context specific.

      tl;dr by most people I mean most developers && most users of software who would be reporting issues. By 90% solution I mean that 90% of their needed and desired interactions with the bug database universe would be well satisfied merely by a good offline tracker, vs a distributed system with arbitrary merge flows etc etc.

      But see my reply to Jelmer – I think there is good reason to build a system that federates, possibly with large chunks of heuristic policy, but I hesitate to call that ‘distributed’. Perhaps I’m being picky/pedantic.

      1. PS you’ve clearly been exploring the space – your bif thing has “bif push ID PATH [HUB]” which looks much more useful than full repo forks to me :).

      2. [ Side note: I think one of the largest difficulties in discussing this topic in the post-DVCS world is how we now associate words like “distributed” with a particular way of doing things.

        DVCS does “distributed” in the way that makes sense for code, and we all now have a somewhat pre-conceived (or pre-constrained?) idea of what that word means, even in completely different contexts (e.g. bug tracking).

        You mention that some kind of different distribution is needed for bugs in your original post, but I think our replies/comments afterwards still suffer from the language issue.]

        Having thought about this for a few years I think my best insight is that for bug tracking we want “distributed” to refer to individual bugs (or conversations) and not to entire repositories. Perhaps this is what you mean by federation. But to my mind if an issue is reported in Debian and they forward it upstream, then there isn’t in fact two issues to be solved; there is a single issue that two projects want to track and possibly collaborate on, but with different status/resolution values.

        I agree that offline is probably the feature developers would most appreciate, however I think the software _eco-system_ would benefit more from distribution.

      3. I agree with your sidenote, and I’m sure my replies do suffer 🙂

        So reported to Debian, then excluded as being Debian local and forwarded upstream. There facets to the conversation:
        – there’s the upstream analysis (this is what happens, where – plain builds, builds in Debian, perhaps builds in Fedora etc)
        – there’s the actual mechanics of doing the work – code review, testing, qa etc; which Debian devs generally aren’t interested in
        – theres the mechanics of taking the result and putting it into the Debian repository – something that upstream devs generally aren’t interested in.

        I guess what I’m getting at is that it’s not one task; and whether it is one issue or not is a very nuanced thing. Oh – a great example of that is G+ sharing – there is ‘one X’ being shared, but each person that shares it can choose whether to promote an existing discussion about it (‘+1 on a share’), share the thing itself independently, or reshare a share of it, all of which have slightly different impact. And the main thing is whether the discussion context is kept or not, or chained or not. There are very bad UX consequences when munging separate discussions together – it just frustrates folk (and in fact this is why I want to do a total re-think of what bug trackers /are/).

        On eco system benefits – if we have something that helps the ecosystem but which the cost of using doesn’t outweight the individual developer benefits, we’ll end up with something developers don’t use. Precisely what the current success rate amongst dbts’s is 🙂 : I’m not arguing against ecosystem benefits, but we have to have something that sells itself, if we want volunteers to use it.

      4. [Tried to send this via email but wordpress complained – sorry if it turns up twice]

        I don’t think that communication between say Debian and an upstream is
        always quite as separate as you describe. Issues cannot always just be
        handed off and ignored until a new release comes out. There may be
        clarification needed from the original reporter. A debian developer may
        be propose and discuss a solution and it would be nice if they could do
        so without having to sign up to a third party bug tracker. And the
        debian developer may want to see where the upstream discussion is
        going, in case it isn’t the type of solution that works for downstream,
        and so on.

        So let me re-phrase some of what you wrote. For an issue reported in
        Debian and pushed upstream we could break the communication interest
        into three types:

        1. Debian specific (e.g. re-packaging, …)

        2. Debian + Upstream (e.g. analysis, questions to original user, etc)

        3. Upstream specific (e.g. qa/release, …)

        If items 1 and 3 are “noisy” in comparison to item 2, then yes, there
        needs to be some separation between the issue itself and the tasks for
        resolution. And that is as you say a much large user-interface

        On the other hand, I would say that issue meta-data (status) falls
        squarely into type 2: Debian will always want to know what upstream
        decides. Upstream may want to know about downstream so that possibility
        should be allowed (and in a distributed system up/down doesn’t have
        much relevence anyway). I think displaying external status (or not) is
        an easier user-interface challenge than for the conversation.

        > or chained or not. There are very bad UX consequences when munging
        > separate discussions together – it just frustrates folk (and in fact
        > this is why I want to do a total re-think of what bug trackers
        > /are/).

        If you want to share what your re-think result would be I’d be very
        interested to know them.

      5. Oh I should add – thats a living document evolving as the folk who have been talking about it change their mind on things;) I know its a little inconsistent with my view point here – thats something I am meaning to drill down into.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s