I have a vision; I dream of a python testing library that:
- Is in the python core
- Is simple
- Is extensible
- Has tests take care of testing
- Has results take care of reporting
- Aids communication from test to test reader
Hopefully those are pretty modest and agreeable things to want.
However we don’t have this: nose is lovely but not in the core [and is a moderately complex API]. py.test is also not in the core, and has previously tripped my too-much-magic alerts. I must admit to not having checked if this is fixed yet. unittest itself is in the core but has some cruft which we should clean up, but more importantly is not extensible enough, which leads to extensions such as the zope testrunner having to muddy the waters between testing and reporting.
The point “Aids communication from test to test reader” is worth expanding on: automated testing is something that doesn’t need observation…until the unexpected happens. At that point some poor schmuck such as you or I ends up trying to guess what went wrong. The more data that we gather and communicate about the event, the greater the chance it can be corrected without needing a repeat run under a debugger, or worse, single stepping through the code.
There is a problem with ‘assertFoo’ methods in unittest, something that I’m not going to cram into this blog post. I will say, if you find the tendency of such methods to crawl to the base class frustrating, that you should look at hamcrest – it and similar things have been very successful in the Java unit testing world; we can learn from them.
Going back to my vision, we need to make unittest more powerfully extensible to allow projects like nose to do all the cool things they want to while still being unittest compatible. I don’t mean that nose can’t run unittest tests; I mean that unittest can’t run nose tests: nose has had to expand the contract, not simply add implementations that do more.
To that end I have a number of bugs which I need to file. Solving them piecemeal will create a fractured API – particularly if this is done over more than one release. So I am planning on prototyping in other projects, discussing like mad on the testing-in-python list, and when it all starts to come together writing up a PEP.
The bugs I have are:
- streams nicely: countTestCases must die/be made optional. This function is inherently incompatible with generative tests or anything beyond the simplest lightweight environments
- no way to wrap code around a single test. This would permit profiling, debugging, tracing, and I’m sure other things more cleanly. (At the moment, one must ‘turn on’ the profiler in startTestCase, and turn it off in stopTestCase. This is much more awkward than simply being in the call stack). Some care will be needed here, particularly for generative tests.
- code that isn’t part of the implementation in the core needs to be able to work with the reporting code; allowing an optionally wider API permits extensions to be debuggable. This needs thought: do we allow direct access to TestResults? Do we come up with some added level of indirection and ‘events’? I don’t know.
- More data than just the backtrace needs to be included when an outcome is reporter. I’ve started a discussion on the testing in python list about this. I’m proposing that we use a dict of named content objects, and use the HTTP content-type abstraction to make the content objects introspectable and reliably handleable without tying the unittest object protocol to any given wire format – loose coupling is good!
- The way we signal outcomes between TestCase and TestResult – the addFailure etc methods is concerning: there are many grades of outcome that users of the framework may usefully wish to represent; in fact there are more than we probably want to put in the core. Finding a way to decouple the intent of a particular outcome from how its signalled would allow users more control while still being able to use the core framework. One particular issue in this area is that its possible with the current API to have a single test object succeed multiple times. Or fail (addFailure) then succeed (addSuccess). This causes no end of confusion, as test counts can mismatch failure counts, and so on.
I’ve got some ideas about these bugs, but I’m approaching a kiloword already, and I hope this post has enough to provoke some serious thought about how we can fix these 5 bugs, compatibly, and end up with a significantly better unittest module. We’ll have been sucessful if projects like Trial, nose and the zope testrunner are able to remove all their code that duplicates standard library functionality or otherwise worksaround these bugs, and can instead focus on adding the specific test support needed by their environments (in the Trial and zope cases), or on UI and plug-n-play (for nose).