Since its very early days subunit has had a single model – you run a process, it outputs test results. This works great, except when it doesn’t.
On the up side, you have a one way pipeline – there’s no interactivity needed, which makes it very very easy to write a subunit backend that e.g. testr can use.
On the downside, there’s no interactivity, which means that anytime you want to do something with those tests, a new process is needed – and thats sometimes quite expensive – particularly in test suites with 10’s of thousands of tests.Now, for use in the development edit-execute loop, this is arguably ok, because one needs to load the new tests into memory anyway; but wouldn’t it be nice if tools like testr that run tests for you didn’t have to decide upfront exactly how they were going to run. If instead they could get things running straight away and then give progressively larger and larger units of work to be run, without forcing a new process (and thus new discovery directory walking and importing) ? Secondly, testr has an inconsistent interface – if testr is letting a user debug things to testr through to child workers in a chain, it needs to use something structured (e.g. subunit) and route stdin to the actual worker, but the final testr needs to unwrap everything – this is needlessly complex. Lastly, for some languages at least, its possibly to dynamically pick up new code at runtime – so a simple inotify loop and we could avoid new-process (and more importantly complete-enumeration) *entirely*, leading to very fast edit-test cycles.
So, in this blog post I’m really running this idea up the flagpole, and trying to sketch out the interface – and hopefully get feedback on it.
Taking subunit.run as an example process to do this to:
- There should be an option to change from one-shot to server mode
- In server mode, it will listen for commands somewhere (lets say stdin)
- On startup it might eager load the available tests
- One command would be list-tests – which would enumerate all the tests to its output channel (which is stdout today – so lets stay with that for now)
- Another would be run-tests, which would take a set of test ids, and then filter-and-run just those ids from the available tests, output, as it does today, going to stdout. Passing somewhat large sets of test ids in may be desirable, because some test runners perform fixture optimisations (e.g. bringing up DB servers or web servers) and test-at-a-time is pretty much worst case for that sort of environment.
- Another would be be std-in a command providing a packet of stdin – used for interacting with debuggers
So that seems pretty approachable to me – we don’t even need an async loop in there, as long as we’re willing to patch select etc (for the stdin handling in some environments like Twisted). If we don’t want to monkey patch like that, we’ll need to make stdin a socketpair, and have an event loop running to shepard bytes from the real stdin to the one we let the rest of Python have.
What about that nirvana above? If we assume inotify support, then list_tests (and run_tests) can just consult a changed-file list and reload those modules before continuing. Reloading them just-in-time would be likely to create havoc – I think reloading only when synchronised with test completion makes a great deal of sense.
Would such a test server make sense in other languages? What about e.g. testtools.run vs subunit.run – such a server wouldn’t want to use subunit, but perhaps a regular CLI UI would be nice…
Filed under: Uncategorized | 2 Comments
Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new – the earliest I personally encountered was Aaron Bentley’s Bugs everywhere – coming up on it’s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn’t taken over the world – and I think this is in large part due to the propogation nature of bugs being very different to code – different solutions are needed.
XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts – semantic conflicts are common, and representation conflicts extremely common.The idions
Take for example https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/805661. Here we can look at the nature of the content:
- Concurrent cannot-conflict content – e.g. the discussion about the bug. In general everyone should have this in their local bug database as soon as possible, and anyone can write to it.
- Observations of fact – e.g. ‘the code change that should fix the bug has landed in Ubuntu’ or ‘Commit C should fix the bug’.
- Reports of symptoms – e.g. ‘Foo does not work for me in Ubuntu with package versions X, Y and Z’.
- Collaboratively edited metadata – tags, title, description, and arguably even the fields like package, open/closed, importance.
Note that only one of these things – the commit to fix the bug – happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1).
Anyhow – my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline – e.g. adding a tag ‘foo’ is an event to add ‘foo’, rather than an event setting the tags list to ‘bar,foo’ – then multiple editors adding ‘foo’ do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though – last-writer-wins isn’t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low – so one could defer that to manual fixups.
Further, as a developer wanting local access to my bug database, syncing all of these things is appealing – but if I’m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space).
Lastly, I don’t think distributed bug tracking is needed – it doesn’t solve a deeply burning use case – offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality – something I think we badly need.
Filed under: Uncategorized | 14 Comments
Tags: data, distributed, trackers
I feel like I’m taking a big personal risk writing this, even though I know the internet is large and probably no-one will read this :-).
So, dear reader, please be gentle.
As we grow – as people, as developers, as professionals – some lessons are are hard to learn (e.g. you have to keep trying and trying to learn the task), and some are hard to experience (they might still be hard to learn, but just being there is hard itself…) I want to talk about a particular lesson I started learning in late 2008/early 2009 – while I was at Canonical – sadly one of those that was hard to experience.
At the time I was one of the core developers on Bazaar, and I was feeling pretty happy about our progress, how bzr was developing, features, community etc. There was a bunch of pressure on to succeed in the marketplace, but that was ok, challenges bring out the stubborn in me :). There was one glitch though – we’d been having a bunch of contentious code reviews, and my manager (Martin Pool) was chatting to me about them.
I was – as far as I could tell – doing precisely the right thing from a peer review perspective: I was safeguarding the project, preventing changes that didn’t fit properly, or that reduced key aspects- performance, usability – from landing until they were fixed.
However, the folk on the other side of the review were feeling frustrated, that nothing they could do would fix it, and generally very unhappy. Reviews and design discussions would grind to a halt, and they felt I was the cause. [They were right].
And here was the thing – I simply couldn’t understand the issue. I was doing my job; I wasn’t angry at the people submitting code; I wasn’t hostile; I wasn’t attacking them (but I was being shall we say frank about the work being submitted). I remember saying to Martin one day ‘look, I just don’t get it – can you show me what I said wrong?’ … and he couldn’t.
Canonical has a 360′ review system – every 6 months / year (it changed over time) you review your peers, subordinate(s) and manager(s), and they review you. Imagine my surprise – I was used to getting very positive reports with some constructive suggestions – when I scored low on a bunch of the inter-personal metrics in the review. Martin explained that it was the reviews thing – folk were genuinely unhappy, even as they commended me on my technical merits. Further to that, he said that I really needed to stop worrying about technical improvement and focus on this inter-personal stuff.
Two really important things happened around this time. Firstly, Steve Alexander, who was one of my managers-once-removed at the time, reached out to me and suggested I read a book – Getting out of the box – and that we might have a chat about the issue after I had read it. I did so, and we chatted. That book gave me a language and viewpoint for thinking about the problem. It didn’t solve it, but it meant that I ‘got it’, which I hadn’t before.
So then the second thing happened – we had a company all hands and I got to chat with Claire Davis (head of HR at Canonical at the time) about what was going on. To this day the sheer embarrassment I felt when she told me that the broad perception of me amongst other teams managers was – and I paraphrase a longer, more nuance conversation here – “technically fantastic but very scary to have on the team – will disrupt and cause trouble”.
So, at this point about 6 months had passed, I knew what I wanted – I wanted folk to want to work with me, to find my presence beneficial and positive on both technical and team aspects. I already knew then that what I seek is technical challenges: I crave novelty, new challenges, new problems. Once things become easy, it call all too easily slip into tedium. So at that time my reasoning was somewhat selfish: how was I to get challenges if no-one wanted to work with me except in extremis?
I spent the next year working on myself as much as specific projects: learning more and more about how to play well with others.
In June 2010 I got a performance review I could be proud of again – I was – in no way – perfect, but I’d made massive strides. This journey had also made huge improvements to my personal life – a lot of stress between Lynne and I had gone away. Shortly after that I was invited to apply for a new role within Canonical as Technical Architect for Launchpad – and Francis Lacoste told me that it was only due to my improved ability to play well with others that I was even considered. I literally could not have done the job 18 months before. I got the job, and I think I did pretty well – in fact I was awarded an internal ‘Spotlight on Success’ award for what we (it was a whole Launchpad team team effort) achieved while I was in that role.
So, what did I change/learn? There’s just a couple of key changes I needed to make in myself, but a) they aren’t sticky: if I get overly tired, ye old terrible Robert can leak out, and b) there’s actually a /lot/ of learnable skills in this area, much of which is derived – lots of practice and critical self review is a good thing. The main thing I learnt was that I was Selfish. Yes – capital S. For instance, in a discussion about adding working tree filter to bzr, I would focus on the impact/risk on me-and-things-I-directly-care-about: would it make my life harder, would it make bzr slower, was there anything that could go wrong. And I would spend only a little time thinking about what the proposer needed: they needed support and assistance making their idea reach the standards the bzr community had agreed on. The net effect of my behaviours was that I was a class A asshole when it came to getting proposals into a code base.
The key things I had to change were:
- I need to think about the needs of the person I’m speaking to *and not my own*. [Thats not to say you should ignore your needs, but you shouldn't dwell on them: if they are critical, your brain will prompt you].
- There’s always a reason people do things: if it doesn’t make sense, ask them! [The crucial conversations books have some useful modelling here on how and why people do things, and on how-and-why conversations and confrontations go bad and how to fix them.]
Ok so this is all interesting and so forth, but why the blog post?
Firstly, I want to thank four folk who were particularly instrumental in helping me learn this lesson: Martin, Steve, Claire and of course my wife Lynne – I owe you all an unmeasurable debt for your support and assistance.
Secondly, I realised today that while I’ve apologised one on one to particular folk who I knew I’d made life hard for, I’d never really made a widespread apology. So here it is: I spent many years as an ass, and while I didn’t mean to be one, intent doesn’t actually count here – actions do. I’m sorry for making your life hell in the past, and I hope I’m doing better now.
Lastly, if I’m an ass to you now, I’m sorry, I’m probably regressing to old habits because I’m too tired – something I try to avoid, but it’s not always possible. Please tell me, and I will go get some sleep then come and apologise to you, and try to do better in future.
Filed under: Uncategorized | 10 Comments
Tags: Bazaar, Canonical, Launchpad, ubuntu
I’ve transitioned to a new key – announcement here or below. If you’ve signed my key in the past please consider signing my new key to get it integrated into the web of trust. Thanks!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1,SHA256 Sun, 2013-10-13 Time for me to migrate to a new key (shockingly late - sorry!). My old key is set to expire early next year. Please use my new key effective immediately. If you have signed my old key then please sign my key - this message is signed by both keys (and the new key is signed by my old key). old key: pub 1024D/FBD3EB8E 2002-07-20 Key fingerprint = 9222 8732 859D 25CC 2560 B617 867B F9A9 FBD3 EB8E new key: pub 4096R/AAC0E286 2013-10-13 Key fingerprint = 8244 0CEA B440 83C7 9431 D2CC 298E 9A19 AAC0 E286 The new key is up on the keyservers, so you can just pull it from there. - -Rob -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlJZ8FEACgkQhnv5qfvT644WxACfWBoKdVW+YDrMR1H9IY6iJUk8 ZC8AoIMRc55CTXsyn3S7GWCfOR1QONVhiQEcBAEBCAAGBQJSWfBRAAoJEInv1Yjp ddbfbvgIAKDsvPLQil/94l7A3Y4h4CME95qVT+m9C+/mR642u8gERJ1NhpqGzR8z fNo8X3TChWyFOaH/rYV+bOyaytC95k13omjR9HmLJPi/l4lnDiy/vopMuJaDrqF4 4IS7DTQsb8dAkCVMb7vgSaAbh+tGmnHphLNnuJngJ2McOs6gCrg3Rb89DzVywFtC Hu9t6Sv9b0UAgfc66ftqpK71FSo9bLQ4vGrDPsAhJpXb83kOQHLXuwUuWs9vtJ62 Mikb0kzAjlQYPwNx6UNpQaILZ1MYLa3JXjataAsTqcKtbxcyKgLQOrZy55ZYoZO5 +qdZ1+wiD3+usr/GFDUX9KiM/f6N+Xo= =EVi2 -----END PGP SIGNATURE-----
Filed under: Uncategorized | Leave a Comment
Tags: Debian, gpg, ubuntu
Python 3 recently introduced a nice feature – subtests. When I was putting subunit version 2 together I tried to cater for this via a heuristic approach – permitting the already known requirement that some tests which are reported are not runnable be combined with substring matching to identify subtests.
However that has panned out poorly, when I went to integrate this with testr the code started to get fugly.
So, I’m going to extend the StreamResult API to know about subtests, and issue a subunit protocol bump – to 2.1 – to add a new field for labelling subtest events. My plan is to make this build a recursive tree structure – that is given test “test_foo” with subtest “i=3″ which the Python subtest code would identify as “test_foo (i=3)”, they should be identified in StreamResult as test_id “test_foo (i=3)” and parent_test_id “test_foo”. This can then nest arbitrarily deep if test runners decide to do that, and the individual runnability becomes up to the test runner, not testrepository / subunit / StreamResult.
Filed under: Uncategorized | Leave a Comment
Tags: Python, Subunit, testing, testrepository, testtools, upstream
The Rackspace docs describe how to use rackspace’s custom extensions, but not how to use plain ol’ nova. Using plain nova is important if you want cloud portability in your scripts.
So – for future reference – these are the settings:
Filed under: Uncategorized | Leave a Comment
Subunit V2 is coming along very well.
- I have a complete implementation of the StreamResult API up as a patch for testtools. Thats 2K LOC including comeprehensive tests.
- Similarly, I have an implementation of a StreamResult parser and emitter for subunit. Thats 1K new LOC including comprehensive tests, and another 500 lines of churn where I migrate all the subunit filters to v2.
- pdb debugging works through subunit v2, permitting dropping into a debugger to work. Yay.
Remaining things to do:
- Update the other language bindings – the C library in particular.
- Teach testrepository to expect v2 input (and probably still store v1 for a while)
- Teach testrepository to use pipes for the stdin of test runner backends, and some control mechanism to switch input between different backends.
- Discuss the in-Python API with more folk.
- Get code merged :)
Filed under: Uncategorized | 8 Comments
Tags: Python, Subunit, testing, testrepository, testsupport, testtools, unittest
I’ve been hitting the limits of gigabit ethernet at home for quite a while now, and as I spend more time working with cloud technologies this started to frustrate me.
I’d heard of other folk getting good results with second hand Infiniband cards and decided to give it a go myself.
I bought two Voltaire dual-port Infiniband adapters – a 4X SDR PCI-E x4 card. And in a 2 metre 8470 cable, and we’re in business.
There are other, more comprehensive guides around to setting this up – e.g. http://davidhunt.ie/wp/?p=2291 or http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html
On ubuntu the hardware was autodetected; all I needed to do was:
modprobe ib_ipoib sudo apt-get install opensm # on one machine
And configure /etc/network/interfaces – e.g.:
iface ib1 inet static address 192.168.2.3 netmask 255.255.255.0 network 192.168.2.0 up echo connected >`find /sys -name mode | grep ib1` up echo 65520 >`find /sys -name mtu | grep ib1`
With no further tuning I was able to get 2Gbps doing linear file copies via Samba, which I suspect is rather pushing the limits of my circa 2007 home server – I’ll investigate futher to identify where the bottlenecks are, but the networking itself I suspect is ok – netperf got me 6.7Gbps in a trivial test.
Filed under: Uncategorized | 4 Comments
Tags: cloud, infiniband, performance, ubuntu
StreamResult, covered in my last few blog posts, has panned out pretty well.
Until that is, that I sat down to do a serialised version of it. It became fairly clear that the wire protocol can be very simple – just one event type that has a bunch of optional fields – test ids, routing code, file data, mime-type etc. It is up to the recipient at the far end of a stream to derive semantic meaning, which means that encoding a lot of rules (such as a data packet can have either a test status or file data) into the wire protocol isn’t called for.
If the wire protocol doesn’t have those rules, Python parsers that convert a bytestream into StreamResult API calls will have to manually split packets that have both status() and file() data in them… this means it would be impossible to create many legitimate bytestreams via the normal StreamResult API.
That seems to be an unnecessary restriction, and thinking about it, having a very simple ‘here is an event about a test run’ API that carries any information we have and maps down a very simple wire protocol should be about as easy to work with as the current file or status API.
Most combinations of file+status parameters is trivially interpretable, but there is one that had no prior definition – a test_status with no test id specified. Files with no testid are easily considered as ‘global scope’ for their source, so perhaps test_status should be treated the same way? [Feedback in comments or email please]. For now I’m going to leave the meaning undefined and unconstrained.
So I’m preparing a change to my patchset for StreamResult to:
- Drop the file() method altogether.
- Add file_bytes, mime_type and eof parameters to status().
- Make the test_id and test_status parameters to status() optional.
This will make the API trivially serialisable (both to JSON or protobufs or whatever, or to the custom binary format I’m considering for subunit), and equally trivially parsable, which I think is a good thing.
Filed under: Uncategorized | Leave a Comment
Tags: Python, Subunit, testing, testsupport, testtools, unittest
My last two blog posts were largely about the needs of subunit, but a key result of any protocol is how easy working with it in a high level language is.
In the weekend and evenings I’ve done an implementation of a new set of classes – StreamResult and friends – that provides:
- Adaption to and from the existing TestResult APIs (the 2.6 and below API, 2.7 API, and the testtools extended API).
- Multiplexing multiple streams together.
- Adding timing data to a stream if it is absent.
- Summarising a stream.
- Copying a stream to multiple outputs
- A split out API for instructing a test run to stop.
- A simple test-at-a-time stream processor that makes it easy to just deal with tests rather than the innate complexities of an event based interface.
So far the code has been uniformly simple to write. I started with an API that included an ‘estimate’ function, which I’ve since removed – I don’t believe the complexity is justified; enumeration is not significantly more expensive than counting, and runners that want to be efficient can either not enumerate or remember the enumeration from prior runs.
The documentation in the linked pull request is a good place to start to get a handle on the API; I’d love feedback.
Next steps for me are to do a subunit protocol revision that maps to the Python API, both parser and generator and see how it feels. One wrinkle there is that the reason for doing this is to fix intrinsic limits in the existing protocol – so doing forward and backward wire protocol compatibility would defeat the point. However… we can make the output side explicitly choose a protocol version, and if we can autodetect the protocol version in the parser, even if we cannot handle mixed streams we can get the benefits of the new protocol once data has been detected. That said, I think we can start without autodetection during prototyping, and add it later. Without autodetection, programs like TestRepository will need configuration options to control what protocol variant to expect. This could be done by requiring this new protocol and providing a stream filter that can be deployed when needed.
Filed under: Uncategorized | Leave a Comment
Tags: performance, Python, Subunit, testing, testrepository, testsupport, testtools, unittest