Multi-machine parallel testing of nova with testrepository

I recently added a formal interface to testrepository to enable cross-machine scaling of test runs. As testrepository is still a static scheduler, this isn’t perfect, but its quite a minimal interface, which makes it easy to implement. I will likely evolve it in reaction to feedback and experience.

In the long term I’d love to have a super generic tool that matches that interface, so the project VCS copy of .testr.conf can just call out to it. However I don’t yet have that, but I do have a simple by-hand implementation that I use to run nova’s tests across my personal laptop, desktop and work laptop.

Testr models this by assuming each test running process can be mapped to a single ‘instance id’ (which could be a chroot, vm, cloud instances, …) and then running one or more commands in the instance, before disposing of it.

This by hand implementation consists of 4 things:

A tiny script to rsync my source directory to the relevant places before I run tests. (This takes <2seconds on my home wifi).
A script to allocate instance ids (I just use ints)
A script to discard them
And a script to copy tempfiles onto the target machine and run a given command.

I do my testing in lxc containers, because I like my primary environment to be free of project-specific quirks and workarounds. lxc is not needed though, if you don’t want it.

So, to set this up for yourself:

on each host, make an lxc container (e.g. following) http://wiki.openstack.org/DependsOnUbuntu
start them all (lxc-start -n nova -d)

Make SSH config entries for the lxc containers, so you can get at them remotely. (make sure your host * rules are at the end of the file otherwise the master overrides won’t work [and you might not notice for some time…]):

Host desktop-nova.lxc
# lxc addresses may be present on localhost too, so namespace the control
# path to avoid connecting to the wrong container.
  ControlPath ~/.ssh/master-lxc-%r@%h:%p
  hostname 10.0.3.19
  ProxyCommand ssh 192.168.1.106 nc -q0 %h %p

Host hplaptop-nova.lxc
# lxc addresses may be present on localhost too, so namespace the control
# path to avoid connecting to the wrong container.
  ControlPath ~/.ssh/master-lxc-%r@%h:%p
  hostname 10.0.3.244
  ProxyCommand ssh 192.168.1.116 nc -q0 %h %p

make a script to copy your nova source tree to each test location. I called mine ‘sync’

#!/bin/bash           
cd $(dirname $0)
echo syncing in $(pwd) 
(rsync -a . desktop-nova.lxc:source/openstack/nova --delete-after && echo dell done) &
(rsync -a . hplaptop-nova.lxc:source/openstack/nova --delete-after && echo hp done)

Make sure you have the base directory on each location

ssh desktop-nova.lxc mkdir -p source/openstack
ssh hplaptop-nova.lxc mkdir -p source/openstack

Sync your code over.
```
./sync
```
And check tests run by running a few.
```
ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute"
ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute"
```
This will check the test environment: we’re not going to be running tests on each node via run-tests or even testr (because it gets immediately meta), but if this fails, later attempts won’t work. Your test virtualenv is inside the source tree, so it is copied implicitly by the sync.
Decide what concurrency you want. For me, I picked 12: I have a desktop i7 with 4 cores, and two laptops with 2 cores each, and hyperthreads are on on all of them – I’m going to set a concurrency figure of 12 – between the cores (8) and threads (16) counts, and possibly balance it more in future. A higher number assumes less contention between ALU’s and other elements of the core pipeline, and I expect quite some contention because most of nova’s unittests are CPU bound not I/O. If the test servers are not busy, I can always raise it later.
Create scripts to create / dispose / execute logical worker threads.

Creation. I call this ‘instance-provision’ and all it does is find the lowest ints not currently allocated and return them.

#!/usr/bin/env python
import os.path
import sys

if not os.path.isdir('.instances'):
    os.mkdir('.instances')

running_ids = os.listdir('.instances')
count = int(sys.argv[1])
top = count + len(running_ids)
ids = [str(i) for i in range(top)]
new = set(ids) - set(running_ids)
for id in new:
    file('.instances/%s' % id, 'w').close()
print(' '.join(new))

Disposal is easy: remove the file marking the instance as in-use.
```
#!/bin/bash
echo freeing $@
cd .instances
rm $@
```

Execution is a little trickier. We need to run some commands locally, and other ones by copying in temp files that testr has setup to the machine sshing to the remote machine, cd’ing to the right directory, sourcing the virtual env, and finally running the command.

#!/bin/bash
instance="$(($1 % 4))"
case $instance in
[0]) node=
     local="true"
     ;;
[1]) node=hplaptop-nova.lxc
     local=""
     ;;
[2-3]) node=desktop-nova.lxc
     local=""
     ;;
*)   echo "Unknown instance $instance" >&2
     exit 1
     ;;
esac
shift
files=
# accumulate files to copy
while [ "--" != "$1" ]; do 
files="$files $1"
shift ; done 
shift   
if [ -n "$files" -a -z "$local" ]; then
    echo copying $files to node.
    for f in $files; do
        rsync $f $node:$(dirname $f) ;
    done
fi  
if [ -n "$local" ]; then
    eval $@
else
    echo ssh to $node
    ssh $node "cd source/openstack/nova && . .venv/bin/activate && $@"
fi

Finally, tell testr how to use this. (Don’t commit this change to nova, as it would break other people). Add this to your .testr.conf.

test_run_concurrency=echo 12
instance_provision=./instance-provision $INSTANCE_COUNT
instance_execute=./instance-execute $INSTANCE_ID $FILES -- $COMMAND
instance_dispose=./instance-dispose $INSTANCE_IDS

Now, when you run testr run –parallel, it will run across your machines. Just do a ./sync before running tests to get the code out there. It is possible to wrap all of this up via automation (or to include just-in-time provisioned cloud instances), but I like the results of still rough scripts here – it strikes a good balance between effort, reliability and performance.

Edit: I spent a bit of time poking at my config – it turns out that my laptop (coming up on 3 years old now) has relatively less grunt – so I’m now running mod 8, with 0 my laptop, 1-2 my work laptop, 3-7 my desktop, and interestingly by running a proportionately overloaded set of tests I get a time reduction.

time testr run --parallel --concurrency=16 ... real 2m34.950s

Hi Robert

I was just trying out the above with a test file(test_dummy.py),to verify if the tests were being distributed above multiple hosts.

This was my .testr.conf

[DEFAULT]
test_command=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery $LISTOPT $IDOPTION
test_id_option=–load-list $IDFILE
test_list_option=–list
test_run_concurrency=echo 12
instance_provision=./instance-provision $INSTANCE_COUNT
instance_execute=./instance-execute $INSTANCE_ID $FILES — $COMMAND
instance_dispose=./instance-dispose $INSTANCE_IDS
group_regex=([^\.]+\.)+

I think the following from ‘instance-execute’ getting executed;but dont see the log ‘ssh to nodea’ in stdoutput
‘echo ssh to $node
ssh $node “cd source/openstack/nova && . .venv/bin/activate && $@”‘

This the output I am getting

api-venv)root@nodea11:~/contrail-test/scripts# testr run –parallel
running=./instance-provision 12
running=./instance-execute 11 — OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery –list
running=./instance-execute 11 /tmp/tmphItivo — OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery –load-list /tmp/tmphItivo
Ran 10 tests in 21.108s (+21.106s)
PASSED (id=40)
running=./instance-dispose 0 1 10 11 2 3 4 5 6 7 8 9
freeing 0 1 10 11 2 3 4 5 6 7 8 9

So where are those print statements getting logged?

One thought on “Multi-machine parallel testing of nova with testrepository”

Sandip Dey says:

April 14, 2014 at 8:06 pm

Hi Robert

I was just trying out the above with a test file(test_dummy.py),to verify if the tests were being distributed above multiple hosts.

This was my .testr.conf

[DEFAULT]
test_command=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery $LISTOPT $IDOPTION
test_id_option=–load-list $IDFILE
test_list_option=–list
test_run_concurrency=echo 12
instance_provision=./instance-provision $INSTANCE_COUNT
instance_execute=./instance-execute $INSTANCE_ID $FILES — $COMMAND
instance_dispose=./instance-dispose $INSTANCE_IDS
group_regex=([^\.]+\.)+

I think the following from ‘instance-execute’ getting executed;but dont see the log ‘ssh to nodea’ in stdoutput
‘echo ssh to $node
ssh $node “cd source/openstack/nova && . .venv/bin/activate && $@”‘

This the output I am getting

api-venv)root@nodea11:~/contrail-test/scripts# testr run –parallel
running=./instance-provision 12
running=./instance-execute 11 — OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery –list
running=./instance-execute 11 /tmp/tmphItivo — OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-500} \
${PYTHON:-python} -m subunit.run discover -t ./ ./discovery –load-list /tmp/tmphItivo
Ran 10 tests in 21.108s (+21.106s)
PASSED (id=40)
running=./instance-dispose 0 1 10 11 2 3 4 5 6 7 8 9
freeing 0 1 10 11 2 3 4 5 6 7 8 9

So where are those print statements getting logged?

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Share this:

Related

One thought on “Multi-machine parallel testing of nova with testrepository”

Leave a comment Cancel reply