Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new – the earliest I personally encountered was Aaron Bentley’s Bugs everywhere – coming up on it’s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn’t taken over the world – and I think this is in large part due to the propogation nature of bugs being very different to code – different solutions are needed.
XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts – semantic conflicts are common, and representation conflicts extremely common.The idions
Take for example https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/805661. Here we can look at the nature of the content:
- Concurrent cannot-conflict content – e.g. the discussion about the bug. In general everyone should have this in their local bug database as soon as possible, and anyone can write to it.
- Observations of fact – e.g. ‘the code change that should fix the bug has landed in Ubuntu’ or ‘Commit C should fix the bug’.
- Reports of symptoms – e.g. ‘Foo does not work for me in Ubuntu with package versions X, Y and Z’.
- Collaboratively edited metadata – tags, title, description, and arguably even the fields like package, open/closed, importance.
Note that only one of these things – the commit to fix the bug – happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1).
Anyhow – my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline – e.g. adding a tag ‘foo’ is an event to add ‘foo’, rather than an event setting the tags list to ‘bar,foo’ – then multiple editors adding ‘foo’ do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though – last-writer-wins isn’t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low – so one could defer that to manual fixups.
Further, as a developer wanting local access to my bug database, syncing all of these things is appealing – but if I’m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space).
Lastly, I don’t think distributed bug tracking is needed – it doesn’t solve a deeply burning use case – offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality – something I think we badly need.
Filed under: Uncategorized | 3 Comments
Tags: data, distributed, trackers
I feel like I’m taking a big personal risk writing this, even though I know the internet is large and probably no-one will read this :-).
So, dear reader, please be gentle.
As we grow – as people, as developers, as professionals – some lessons are are hard to learn (e.g. you have to keep trying and trying to learn the task), and some are hard to experience (they might still be hard to learn, but just being there is hard itself…) I want to talk about a particular lesson I started learning in late 2008/early 2009 – while I was at Canonical – sadly one of those that was hard to experience.
At the time I was one of the core developers on Bazaar, and I was feeling pretty happy about our progress, how bzr was developing, features, community etc. There was a bunch of pressure on to succeed in the marketplace, but that was ok, challenges bring out the stubborn in me :). There was one glitch though – we’d been having a bunch of contentious code reviews, and my manager (Martin Pool) was chatting to me about them.
I was – as far as I could tell – doing precisely the right thing from a peer review perspective: I was safeguarding the project, preventing changes that didn’t fit properly, or that reduced key aspects- performance, usability – from landing until they were fixed.
However, the folk on the other side of the review were feeling frustrated, that nothing they could do would fix it, and generally very unhappy. Reviews and design discussions would grind to a halt, and they felt I was the cause. [They were right].
And here was the thing – I simply couldn’t understand the issue. I was doing my job; I wasn’t angry at the people submitting code; I wasn’t hostile; I wasn’t attacking them (but I was being shall we say frank about the work being submitted). I remember saying to Martin one day ‘look, I just don’t get it – can you show me what I said wrong?’ … and he couldn’t.
Canonical has a 360′ review system – every 6 months / year (it changed over time) you review your peers, subordinate(s) and manager(s), and they review you. Imagine my surprise – I was used to getting very positive reports with some constructive suggestions – when I scored low on a bunch of the inter-personal metrics in the review. Martin explained that it was the reviews thing – folk were genuinely unhappy, even as they commended me on my technical merits. Further to that, he said that I really needed to stop worrying about technical improvement and focus on this inter-personal stuff.
Two really important things happened around this time. Firstly, Steve Alexander, who was one of my managers-once-removed at the time, reached out to me and suggested I read a book – Getting out of the box – and that we might have a chat about the issue after I had read it. I did so, and we chatted. That book gave me a language and viewpoint for thinking about the problem. It didn’t solve it, but it meant that I ‘got it’, which I hadn’t before.
So then the second thing happened – we had a company all hands and I got to chat with Claire Davis (head of HR at Canonical at the time) about what was going on. To this day the sheer embarrassment I felt when she told me that the broad perception of me amongst other teams managers was – and I paraphrase a longer, more nuance conversation here – “technically fantastic but very scary to have on the team – will disrupt and cause trouble”.
So, at this point about 6 months had passed, I knew what I wanted – I wanted folk to want to work with me, to find my presence beneficial and positive on both technical and team aspects. I already knew then that what I seek is technical challenges: I crave novelty, new challenges, new problems. Once things become easy, it call all too easily slip into tedium. So at that time my reasoning was somewhat selfish: how was I to get challenges if no-one wanted to work with me except in extremis?
I spent the next year working on myself as much as specific projects: learning more and more about how to play well with others.
In June 2010 I got a performance review I could be proud of again – I was – in no way – perfect, but I’d made massive strides. This journey had also made huge improvements to my personal life – a lot of stress between Lynne and I had gone away. Shortly after that I was invited to apply for a new role within Canonical as Technical Architect for Launchpad – and Francis Lacoste told me that it was only due to my improved ability to play well with others that I was even considered. I literally could not have done the job 18 months before. I got the job, and I think I did pretty well – in fact I was awarded an internal ‘Spotlight on Success’ award for what we (it was a whole Launchpad team team effort) achieved while I was in that role.
So, what did I change/learn? There’s just a couple of key changes I needed to make in myself, but a) they aren’t sticky: if I get overly tired, ye old terrible Robert can leak out, and b) there’s actually a /lot/ of learnable skills in this area, much of which is derived – lots of practice and critical self review is a good thing. The main thing I learnt was that I was Selfish. Yes – capital S. For instance, in a discussion about adding working tree filter to bzr, I would focus on the impact/risk on me-and-things-I-directly-care-about: would it make my life harder, would it make bzr slower, was there anything that could go wrong. And I would spend only a little time thinking about what the proposer needed: they needed support and assistance making their idea reach the standards the bzr community had agreed on. The net effect of my behaviours was that I was a class A asshole when it came to getting proposals into a code base.
The key things I had to change were:
- I need to think about the needs of the person I’m speaking to *and not my own*. [Thats not to say you should ignore your needs, but you shouldn't dwell on them: if they are critical, your brain will prompt you].
- There’s always a reason people do things: if it doesn’t make sense, ask them! [The crucial conversations books have some useful modelling here on how and why people do things, and on how-and-why conversations and confrontations go bad and how to fix them.]
Ok so this is all interesting and so forth, but why the blog post?
Firstly, I want to thank four folk who were particularly instrumental in helping me learn this lesson: Martin, Steve, Claire and of course my wife Lynne – I owe you all an unmeasurable debt for your support and assistance.
Secondly, I realised today that while I’ve apologised one on one to particular folk who I knew I’d made life hard for, I’d never really made a widespread apology. So here it is: I spent many years as an ass, and while I didn’t mean to be one, intent doesn’t actually count here – actions do. I’m sorry for making your life hell in the past, and I hope I’m doing better now.
Lastly, if I’m an ass to you now, I’m sorry, I’m probably regressing to old habits because I’m too tired – something I try to avoid, but it’s not always possible. Please tell me, and I will go get some sleep then come and apologise to you, and try to do better in future.
Filed under: Uncategorized | 9 Comments
Tags: Bazaar, Canonical, Launchpad, ubuntu
I’ve transitioned to a new key – announcement here or below. If you’ve signed my key in the past please consider signing my new key to get it integrated into the web of trust. Thanks!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1,SHA256 Sun, 2013-10-13 Time for me to migrate to a new key (shockingly late - sorry!). My old key is set to expire early next year. Please use my new key effective immediately. If you have signed my old key then please sign my key - this message is signed by both keys (and the new key is signed by my old key). old key: pub 1024D/FBD3EB8E 2002-07-20 Key fingerprint = 9222 8732 859D 25CC 2560 B617 867B F9A9 FBD3 EB8E new key: pub 4096R/AAC0E286 2013-10-13 Key fingerprint = 8244 0CEA B440 83C7 9431 D2CC 298E 9A19 AAC0 E286 The new key is up on the keyservers, so you can just pull it from there. - -Rob -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJZ8FEACgkQhnv5qfvT644WxACfWBoKdVW+YDrMR1H9IY6iJUk8 ZC8AoIMRc55CTXsyn3S7GWCfOR1QONVhiQEcBAEBCAAGBQJSWfBRAAoJEInv1Yjp ddbfbvgIAKDsvPLQil/94l7A3Y4h4CME95qVT+m9C+/mR642u8gERJ1NhpqGzR8z fNo8X3TChWyFOaH/rYV+bOyaytC95k13omjR9HmLJPi/l4lnDiy/vopMuJaDrqF4 4IS7DTQsb8dAkCVMb7vgSaAbh+tGmnHphLNnuJngJ2McOs6gCrg3Rb89DzVywFtC Hu9t6Sv9b0UAgfc66ftqpK71FSo9bLQ4vGrDPsAhJpXb83kOQHLXuwUuWs9vtJ62 Mikb0kzAjlQYPwNx6UNpQaILZ1MYLa3JXjataAsTqcKtbxcyKgLQOrZy55ZYoZO5 +qdZ1+wiD3+usr/GFDUX9KiM/f6N+Xo= =EVi2 -----END PGP SIGNATURE-----
Filed under: Uncategorized | Leave a Comment
Tags: Debian, gpg, ubuntu
Python 3 recently introduced a nice feature – subtests. When I was putting subunit version 2 together I tried to cater for this via a heuristic approach – permitting the already known requirement that some tests which are reported are not runnable be combined with substring matching to identify subtests.
However that has panned out poorly, when I went to integrate this with testr the code started to get fugly.
So, I’m going to extend the StreamResult API to know about subtests, and issue a subunit protocol bump – to 2.1 – to add a new field for labelling subtest events. My plan is to make this build a recursive tree structure – that is given test “test_foo” with subtest “i=3″ which the Python subtest code would identify as “test_foo (i=3)”, they should be identified in StreamResult as test_id “test_foo (i=3)” and parent_test_id “test_foo”. This can then nest arbitrarily deep if test runners decide to do that, and the individual runnability becomes up to the test runner, not testrepository / subunit / StreamResult.
Filed under: Uncategorized | Leave a Comment
Tags: Python, Subunit, testing, testrepository, testtools, upstream
The Rackspace docs describe how to use rackspace’s custom extensions, but not how to use plain ol’ nova. Using plain nova is important if you want cloud portability in your scripts.
So – for future reference – these are the settings:
Filed under: Uncategorized | Leave a Comment
Subunit V2 is coming along very well.
- I have a complete implementation of the StreamResult API up as a patch for testtools. Thats 2K LOC including comeprehensive tests.
- Similarly, I have an implementation of a StreamResult parser and emitter for subunit. Thats 1K new LOC including comprehensive tests, and another 500 lines of churn where I migrate all the subunit filters to v2.
- pdb debugging works through subunit v2, permitting dropping into a debugger to work. Yay.
Remaining things to do:
- Update the other language bindings – the C library in particular.
- Teach testrepository to expect v2 input (and probably still store v1 for a while)
- Teach testrepository to use pipes for the stdin of test runner backends, and some control mechanism to switch input between different backends.
- Discuss the in-Python API with more folk.
- Get code merged :)
Filed under: Uncategorized | 8 Comments
Tags: Python, Subunit, testing, testrepository, testsupport, testtools, unittest
I’ve been hitting the limits of gigabit ethernet at home for quite a while now, and as I spend more time working with cloud technologies this started to frustrate me.
I’d heard of other folk getting good results with second hand Infiniband cards and decided to give it a go myself.
I bought two Voltaire dual-port Infiniband adapters – a 4X SDR PCI-E x4 card. And in a 2 metre 8470 cable, and we’re in business.
There are other, more comprehensive guides around to setting this up – e.g. http://davidhunt.ie/wp/?p=2291 or http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html
On ubuntu the hardware was autodetected; all I needed to do was:
modprobe ib_ipoib sudo apt-get install opensm # on one machine
And configure /etc/network/interfaces – e.g.:
iface ib1 inet static address 192.168.2.3 netmask 255.255.255.0 network 192.168.2.0 up echo connected >`find /sys -name mode | grep ib1` up echo 65520 >`find /sys -name mtu | grep ib1`
With no further tuning I was able to get 2Gbps doing linear file copies via Samba, which I suspect is rather pushing the limits of my circa 2007 home server – I’ll investigate futher to identify where the bottlenecks are, but the networking itself I suspect is ok – netperf got me 6.7Gbps in a trivial test.
Filed under: Uncategorized | 4 Comments
Tags: cloud, infiniband, performance, ubuntu
StreamResult, covered in my last few blog posts, has panned out pretty well.
Until that is, that I sat down to do a serialised version of it. It became fairly clear that the wire protocol can be very simple – just one event type that has a bunch of optional fields – test ids, routing code, file data, mime-type etc. It is up to the recipient at the far end of a stream to derive semantic meaning, which means that encoding a lot of rules (such as a data packet can have either a test status or file data) into the wire protocol isn’t called for.
If the wire protocol doesn’t have those rules, Python parsers that convert a bytestream into StreamResult API calls will have to manually split packets that have both status() and file() data in them… this means it would be impossible to create many legitimate bytestreams via the normal StreamResult API.
That seems to be an unnecessary restriction, and thinking about it, having a very simple ‘here is an event about a test run’ API that carries any information we have and maps down a very simple wire protocol should be about as easy to work with as the current file or status API.
Most combinations of file+status parameters is trivially interpretable, but there is one that had no prior definition – a test_status with no test id specified. Files with no testid are easily considered as ‘global scope’ for their source, so perhaps test_status should be treated the same way? [Feedback in comments or email please]. For now I’m going to leave the meaning undefined and unconstrained.
So I’m preparing a change to my patchset for StreamResult to:
- Drop the file() method altogether.
- Add file_bytes, mime_type and eof parameters to status().
- Make the test_id and test_status parameters to status() optional.
This will make the API trivially serialisable (both to JSON or protobufs or whatever, or to the custom binary format I’m considering for subunit), and equally trivially parsable, which I think is a good thing.
Filed under: Uncategorized | Leave a Comment
Tags: Python, Subunit, testing, testsupport, testtools, unittest
My last two blog posts were largely about the needs of subunit, but a key result of any protocol is how easy working with it in a high level language is.
In the weekend and evenings I’ve done an implementation of a new set of classes – StreamResult and friends – that provides:
- Adaption to and from the existing TestResult APIs (the 2.6 and below API, 2.7 API, and the testtools extended API).
- Multiplexing multiple streams together.
- Adding timing data to a stream if it is absent.
- Summarising a stream.
- Copying a stream to multiple outputs
- A split out API for instructing a test run to stop.
- A simple test-at-a-time stream processor that makes it easy to just deal with tests rather than the innate complexities of an event based interface.
So far the code has been uniformly simple to write. I started with an API that included an ‘estimate’ function, which I’ve since removed – I don’t believe the complexity is justified; enumeration is not significantly more expensive than counting, and runners that want to be efficient can either not enumerate or remember the enumeration from prior runs.
The documentation in the linked pull request is a good place to start to get a handle on the API; I’d love feedback.
Next steps for me are to do a subunit protocol revision that maps to the Python API, both parser and generator and see how it feels. One wrinkle there is that the reason for doing this is to fix intrinsic limits in the existing protocol – so doing forward and backward wire protocol compatibility would defeat the point. However… we can make the output side explicitly choose a protocol version, and if we can autodetect the protocol version in the parser, even if we cannot handle mixed streams we can get the benefits of the new protocol once data has been detected. That said, I think we can start without autodetection during prototyping, and add it later. Without autodetection, programs like TestRepository will need configuration options to control what protocol variant to expect. This could be done by requiring this new protocol and providing a stream filter that can be deployed when needed.
Filed under: Uncategorized | Leave a Comment
Tags: performance, Python, Subunit, testing, testrepository, testsupport, testtools, unittest
Of course, as happens sadly often, the scope creeps..
Additional pain points
Zope’s test runner runs things that are not tests, but which users want to know about – ‘layers’. At the moment these are reported as individual tests, but this is problematic in a couple of ways. Firstly, the same ‘test’ runs on multiple backend runners, so timing and stats get more complex. Secondly, if a layer fails to setup or teardown, tools like testrepository that have watched the stream will think a test failed, and on the next run try to explicitly run that ‘test’ – but that test doesn’t really exist, so it won’t run [unless an actual test that needs the layer is being run].
Openstack uses python coverage to gather coverage statistics during test runs. Each worker running tests needs to gather and return such statistics. The current subunit protocol has no space to hand this around, without it pretending to be a test [see a pattern here?]. And that has the same negative side effect – test runners like testrepository will try to run that ‘test’. While testrepository doesn’t want to know about coverage itself, it would be nice to be able to pass everything around and have a local hook handle the aggregation of that data.
The way TAP is reflected into subunit today is to mangle each tap ‘test’ into a subunit ‘test’, but for full benefits subunit tests have a higher bar – they are individually addressable and runnable. So a TAP test script is much more equivalent to a subunit test. A similar concept is landing in Python’s unittest soon – ‘subtests’ – which will give very lightweight additional assertions within a larger test concept. Many C test runners that emit individual tests as simple assertions have this property as well – there may be 5 or 10 executables each with dozens of assertions, but only the executables are individually addressable – there is no way to run just one assertion from an executable as a ‘test’. It would be nice to avoid the friction that currently exists when dealing with that situation.
Minimum requirements to support these
Layers can be supported via timestamped stdout output, or fake tests. Neither is compelling, as the former requires special casing in subunit processors to data mine it, and the latter confuses test runners. A way to record something that is structured like a test (has an id – the layer, an outcome – in progress / ok / failed, and attachment data for showing failure details) but isn’t a test would allow the data to flow around without causing confusion in the system.
TAP support could change to just show the entire output as progress on one test and then fail or not at the end. This would result in a cognitive mismatch for folk from the TAP world, as TAP runners report each assertion as a ‘test’, and this would be hidden from subunit. Having a way to record something that is associated with an actual test, and has a name, status, attachment content for the TAP comments field – that would let subunit processors report both the addressable tests (each TAP script) and the individual items, but know that only the overall scripts are runnable.
Python subtests could use a unique test for each subtest, but that has the same issue has layers. Python will ensure a top level test errors if a subtest errors, so strictly speaking we probably don’t need an associated-with concept, but we do need to be able to say that a test-like thing happened that isn’t actually addressable.
Coverage information could be about a single test, or even a subtest, or it could be about the entire work undertaken by the test process. I don’t think we need a single standardised format for Coverage data (though that might be an excellent project for someone to undertake). It is also possible to overthink things :). We have the idea of arbitrary attachments for tests. Perhaps arbitrary attachments outside of test scope would be better than specifying stdout/stderr as specific things. On the other hand stdout and stderr are well known things.
Proposal version 2
A packetised length prefixed binary protocol, with each packet containing a small signature, length, routing code, a binary timestamp in UTC, a set of UTF8 tags (active only, no negative tags), a content tag – one of (estimate + number,
stdin, stdout, stderr, file, test), test-id, runnable, test-status (one of exists/inprogress/xfail/xsuccess/success/fail/skip), an attachment name, mime type, a last-block marker and a block of bytes.
The std/stdout/stderr content tags are gone, replaced with file. The names stdin,stdout,stderr can be placed in the attachment name field to signal those well known files, and any other files that the test process wants to hand over can be simply embedded. Processors that don’t expect them can just pass them on.
Runnable is a boolean, indicating whether this packet is describing a test that can be executed deliberately (vs an individual TAP assertion, Python sub-test etc). This permits describing things like zope layers which are top level test-like things (they start, stop and can error) though they cannot be run.. and it doesn’t explicitly model the setup/teardown aspect that they have. Should we do that?
Testid is for identifying tests. With the runnable flag to indicate whether a test really is a test, subtests can just be namespaced by the generator – reporters can choose whether to be naive and report every ‘test’, or whether to use simple string prefix-with-non-character-seperator to infer child elements.
Impact on Python API
If we change the API to:
class TestInfo(object): id = unicode status = ('exists', 'inprogress', 'xfail', 'xsuccess', 'success', 'fail', 'error', 'skip') runnable = boolean class StreamingResult(object): def startTestRun(self): pass def stopTestRun(self): passs def estimate(self, count, route_code=None, timestamp=None): pass def file(self, name, bytes, eof=False, mime=None, test_info=None, route_code=None, timestamp=None): """Inform the result about the contents of an attachment.""" def status(self, test_info, route_code=None, timestamp=None): """Inform the result about a test status with no attached data."""
This would permit the full semantics of a subunit stream to be represented I think, while being a narrow interface that should be easy to implement.
Please provide feedback! I’ll probably start implementing this soon.
Filed under: Uncategorized | 1 Comment
Tags: coverage, junit, Python, Subunit, TAP, testing, testrepository, unittest