Back from hiatus

Well, the new blog seems to be up and running – and gathering modest numbers of comments already. Woo.

I’ve a bunch of mail about test suite performance to gather and refine into a follow up post, but that can wait a day or two.

In bzr we suffer from a long test suite, which we let grow while we had some other very pressing performance concerns. 2.0 fixes these concerns, and we’re finding the odd moment to address our development environment a little now.

One of the things I want to do is to radically reduce the cost of testing inside bzr; code coverage is a great way to get a broad picture of what is tested. Rather than reinvent the wheel (and I’ve written one to throw away, so far) – are there tools out there that can:

  • build a per-test coverage map
  • do it quickly
  • don’t include setUp/tearDown/cleanUp code paths in the map
  • report on the difference between two such maps (at the suite level)

The third point is possibly contentious, so I should expand on it. While code that is executed by code within the test’s run() method is – from the outside – all part-of-the-test, its not (by definition) the focus of the test. And I find focused tests substantially easier to analyse failures in, because they tend to check preconditions, poke at object state etc.

As I want this coverage map to help preserve coverage as we refactor the test suite, I don’t want to include accidental coverage in the map.

8 thoughts on “Back from hiatus

  1. Sadly, I don’t know of any tools.

    I think that setUp/tearDown and test methods themselves need to be profiled separately. With Launchpad, I imagine that much of the low hanging fruit is actually in the set up and tear down. It’d be nice to have some data to support it.

    1. So there are two separate metrics needed:
      – where does the time go
      – what value does the test add

      I think you’re talking about the former, and I’m talking about the latter 🙂

    1. Not last I looked :line) rather than overall coverage; thats a weak area at the moment I think. Well lsprof does well at it, but its rather different.

  2. I’d be (pleasantly) surprised to discover that such a tool existed.

    figleaf (which is based on coverage.py) had the ability to enable/disable coverage tracking at will, so if you control the test framework, you can invoke your setUp, enable coverage, run test, disable coverage, run tearDown.

    coverage.py itself recently leapfrogged figleaf with a new 3.0 release that has fixes and performance enhancements (through a C extension module), so it may be worth checking out.

    I haven’t found the time to look at either of those (other than an evening’s toying with figleaf); so far I’ve been using Python’s trace.py for coverage information and collecting it for the whole testsuite rather than each test individually. trace.py is _slow_ and sometimes needs nannying to figure out why coverage partially stopped working and fix that.

    I haven’t used lsprof either; it’s on my wishlist.

    1. Actually, it just occurred to me that py.test could conceivably have per-test coverage. I don’t know if it does or does not; but it wouldn’t surprise me too much; py.test has all sorts of nifty surprising features.

  3. This is a very interesting idea. Figleaf has a feature for this called sections.

    I’d be interested to hear your thoughts on how to present this test/coverage association. That to me seems to be the largest stumbling block.

    1. So the dataset for a run of bzr with full coverage was about 2GB (because of gathering it per-test,and keeping the arcs).

      Thats very hard to show, so my thoughts were to make a db, and provide a query language to answer questions like:

      * whats are the closest tests to a given function (find the function in all tests, sorted by call-depth needed to get to the function).
      * what tests duplicate the coverage of [test] – find what [test] covers, look for those functions in other tests.

      If the queries are fast, some scripting on that would let better questions like ‘what tests run the most code other than the thing they test’ can be answered (and they are interesting because that is a good place to add test doubles).

      Some stuff is core infrastructure, and other bits like ‘define what code this is meant to test’ are really per-project conventions, so that would be nice to be able to extend/configure.

Leave a reply to Marius Gedminas Cancel reply