Recently I’ve been doing my personal development SSH’d into my personal laptop. I found that launchpadlib (which various projects use for release automation) was failing – the gnome keyring API threw an error because the keyring was locked, and python-keyring didn’t try to unlock it.
I needed a workaround to be able to release stuff, and with a bit of digging and help from #launchpad, came up with this:
echo > ~/.local/share/python_keyring/keyringrc.cfg << EOF
(There is already encryption in place, so I chose an uncrypted store – read the keyring source to find other alternatives).
With this done, I can now use lp-shell etc over SSH, for when I’m not physically at my machine.
Filed under: Uncategorized | Leave a Comment
Tags: Launchpad, ubuntu
I recently added a formal interface to testrepository to enable cross-machine scaling of test runs. As testrepository is still a static scheduler, this isn’t perfect, but its quite a minimal interface, which makes it easy to implement. I will likely evolve it in reaction to feedback and experience.
In the long term I’d love to have a super generic tool that matches that interface, so the project VCS copy of .testr.conf can just call out to it. However I don’t yet have that, but I do have a simple by-hand implementation that I use to run nova’s tests across my personal laptop, desktop and work laptop.
Testr models this by assuming each test running process can be mapped to a single ‘instance id’ (which could be a chroot, vm, cloud instances, …) and then running one or more commands in the instance, before disposing of it.
This by hand implementation consists of 4 things:
- A tiny script to rsync my source directory to the relevant places before I run tests. (This takes <2seconds on my home wifi).
- A script to allocate instance ids (I just use ints)
- A script to discard them
- And a script to copy tempfiles onto the target machine and run a given command.
I do my testing in lxc containers, because I like my primary environment to be free of project-specific quirks and workarounds. lxc is not needed though, if you don’t want it.
So, to set this up for yourself:
- on each host, make an lxc container (e.g. following) http://wiki.openstack.org/DependsOnUbuntu
- start them all (lxc-start -n nova -d)
- Make SSH config entries for the lxc containers, so you can get at them remotely. (make sure your host * rules are at the end of the file otherwise the master overrides won’t work [and you might not notice for some time...]):
Host desktop-nova.lxc # lxc addresses may be present on localhost too, so namespace the control # path to avoid connecting to the wrong container. ControlPath ~/.ssh/master-lxc-%r@%h:%p hostname 10.0.3.19 ProxyCommand ssh 192.168.1.106 nc -q0 %h %p Host hplaptop-nova.lxc # lxc addresses may be present on localhost too, so namespace the control # path to avoid connecting to the wrong container. ControlPath ~/.ssh/master-lxc-%r@%h:%p hostname 10.0.3.244 ProxyCommand ssh 192.168.1.116 nc -q0 %h %p
- make a script to copy your nova source tree to each test location. I called mine ‘sync’
#!/bin/bash cd $(dirname $0) echo syncing in $(pwd) (rsync -a . desktop-nova.lxc:source/openstack/nova --delete-after && echo dell done) & (rsync -a . hplaptop-nova.lxc:source/openstack/nova --delete-after && echo hp done)
- Make sure you have the base directory on each location
ssh desktop-nova.lxc mkdir -p source/openstack ssh hplaptop-nova.lxc mkdir -p source/openstack
- Sync your code over.
- And check tests run by running a few.
ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute" ssh hplaptop-nova.lxc "cd source/openstack/nova && ./run_tests.sh compute"
This will check the test environment: we’re not going to be running tests on each node via run-tests or even testr (because it gets immediately meta), but if this fails, later attempts won’t work. Your test virtualenv is inside the source tree, so it is copied implicitly by the sync.
- Decide what concurrency you want. For me, I picked 12: I have a desktop i7 with 4 cores, and two laptops with 2 cores each, and hyperthreads are on on all of them – I’m going to set a concurrency figure of 12 – between the cores (8) and threads (16) counts, and possibly balance it more in future. A higher number assumes less contention between ALU’s and other elements of the core pipeline, and I expect quite some contention because most of nova’s unittests are CPU bound not I/O. If the test servers are not busy, I can always raise it later.
- Create scripts to create / dispose / execute logical worker threads.
- Creation. I call this ‘instance-provision’ and all it does is find the lowest ints not currently allocated and return them.
#!/usr/bin/env python import os.path import sys if not os.path.isdir('.instances'): os.mkdir('.instances') running_ids = os.listdir('.instances') count = int(sys.argv) top = count + len(running_ids) ids = [str(i) for i in range(top)] new = set(ids) - set(running_ids) for id in new: file('.instances/%s' % id, 'w').close() print(' '.join(new))
- Disposal is easy: remove the file marking the instance as in-use.
#!/bin/bash echo freeing $@ cd .instances rm $@
- Execution is a little trickier. We need to run some commands locally, and other ones by copying in temp files that testr has setup to the machine sshing to the remote machine, cd’ing to the right directory, sourcing the virtual env, and finally running the command.
#!/bin/bash instance="$(($1 % 4))" case $instance in ) node= local="true" ;; ) node=hplaptop-nova.lxc local="" ;; [2-3]) node=desktop-nova.lxc local="" ;; *) echo "Unknown instance $instance" >&2 exit 1 ;; esac shift files= # accumulate files to copy while [ "--" != "$1" ]; do files="$files $1" shift ; done shift if [ -n "$files" -a -z "$local" ]; then echo copying $files to node. for f in $files; do rsync $f $node:$(dirname $f) ; done fi if [ -n "$local" ]; then eval $@ else echo ssh to $node ssh $node "cd source/openstack/nova && . .venv/bin/activate && $@" fi
- Finally, tell testr how to use this. (Don’t commit this change to nova, as it would break other people). Add this to your .testr.conf.
test_run_concurrency=echo 12 instance_provision=./instance-provision $INSTANCE_COUNT instance_execute=./instance-execute $INSTANCE_ID $FILES -- $COMMAND instance_dispose=./instance-dispose $INSTANCE_IDS
Now, when you run testr run –parallel, it will run across your machines. Just do a ./sync before running tests to get the code out there. It is possible to wrap all of this up via automation (or to include just-in-time provisioned cloud instances), but I like the results of still rough scripts here – it strikes a good balance between effort, reliability and performance.
Edit: I spent a bit of time poking at my config – it turns out that my laptop (coming up on 3 years old now) has relatively less grunt – so I’m now running mod 8, with 0 my laptop, 1-2 my work laptop, 3-7 my desktop, and interestingly by running a proportionately overloaded set of tests I get a time reduction.
time testr run --parallel --concurrency=16
Filed under: Uncategorized | Leave a Comment
Tags: cloud, lxc, openstack, performance, Python, Subunit, testing, testrepository, testsupport, unittest
Thanks to Corey Goldberg, one of my colleagues @ Canonical, the page performance report can now be used on regular Apache log files, rather than just the zserver trace log files that Launchpad’s middle tier generates. We use this report to identify poorly performing pages and get insight into the timing patterns of bad pages. The code lives in the Launchpad dev-utils project – instructions for checking it out and configuring it are on the wiki. If you don’t have aggregate data for your web application, I highly recommend grabbing PPR and checking it out – its very lightweight, and data is extremely useful.
Filed under: Uncategorized | Leave a Comment
Tags: data, performance, Python, stats, zope3
A while back mdz blogged about challenges facing Ubuntu and other Linux distributions. He raises the point that runtime libraries for Python / Ruby etc have a unique set of issues because they tend to have their own packaging systems. Merely a month later he attended Debconf 2010 where a presentation was given on the issues that Java packages have on Dpkg based systems. Since then the conversation seems to have dried up. I’ve been reminded of it recently in discussions within Canonical looking at how we deploy web services.
Matt suggested some ways forward, including:
- Decouple applications from the core
- Treat data as a service (rather than packages) – get data live from the web rather than going web -> distro-package -> user machines.
- Simplify integration between packaging systems (including non-packaged things)
I think its time we revisit and expand on those points. Nothing much has changed in how Ubuntu or other distributions approach integration with other packaging systems… but the world has kept evolving. Internet access is growing ever more ubiquitous, more platforms are building packaging systems – clojure, scala, node.js, to name but three, and a substantial and ever growing number of products expect to operate in a hybrid fashion with an evolving web service plus a local client which is kept up to date via package updates. Twitter, Facebook and Google Plus are three such products. Android has demonstrated a large scale app store on top of Linux, with its own custom packaging format.
In order to expand them, we need some background context on the use cases that these different packaging systems need to support.
Platforms such as antivirus scanners, node.js, Python, Clojure and so forth care a great deal about getting their software out to their users. They care about making it extremely easy to get the latest and greatest versions of their libraries. I say this because the evidence is all around us: every successful development community / product has built a targeted package management system which layers on top of Windows, and Mac OSX, and *nux. The only rational explanation I can come up for this behaviour is that the lower level operating system package management tools don’t deliver what they need. E.g. this isn’t as shallow as wanting a packaging system written in their own language, which would be easy to write off as parochialism rather than a thoughtful solution to their problems.
In general packaging systems provide a language for shipping source or binary form, from one or more repositories, to users machines. They may support replications, and they may support multiple operating systems. They generally end up as graph traversal engines, pulling in dependencies of various sorts – you can see the DOAP specification for an attempt at generic modelling of this. One problem that turns up rapidly when dealing with Linux distribution package managers is that the versions upstream packages have, and the versions a package has in e.g. Debian, differ. They differ because at some stage, someone will need to do a new package for the distribution when no upstream change has been made. This might be to apply a local patch, or it might be to correct a defect caused by a broken build server. Whatever the cause, there is a many to one relationship between the package versions that end users see via dpkg / rpm etc, and those that upstream ship. It is a near certainty that once this happens to a library package, that comparing package versions across different distribution packages becomes hard. You cannot reliably infer whether a given package version is sufficient as a dependency or not, when comparing binary packages between Red Hat and Debian. Or Debian and Ubuntu. The result of this is that even when the software (e.g. rpm) is available on multiple distributions (say Ubuntu and RHEL), or even on multiple operating systems (say Ubuntu and Windows), that many packages will /have/ to be targeted specifically to build and execute properly. (Obviously, compilation has to proceed separately for different architectures, its more the depedency metadata that says ‘and build with version X of dependency Y’ that has to be customised).
The result of this is that there is to the best of my knowledge no distribution of binary packages that targets Debian/Ubuntu and RHEL and Suse and Windows and Mac OS X, although there are vibrant communities building distributions of and for each in isolation. Some of the ports systems come close, but they are still focused on delivering to a small number of platforms. There’s nothing that gives 99% coverage of users. And that means that to reach all their users, they have to write or adopt a new system. For any platform X, there is a strong pressure to have the platform be maintainable by folk that primarily work with X itself, or with the language that X is written in. Consider Python – there is strong pressure to use C, or Python, and nothing else, for any tools – that is somewhat parochial, but also just good engineering – reducing variables and making the system more likely to be well maintained. The closest system I know of – Steam – is just now porting to Ubuntu (and perhaps Linux in general), and has reached its massive popularity by focusing entirely on applications for Windows, with Mac OSX a recent addition.
Systems like pypi which have multi platform eggs do target the wide range of platforms I listed above, but they do so both narrowly and haphazardly: whether a binary or source package is available for a given platform is up to the maintainer of the package, and the packages themselves are dealing with a very narrow subset of the platforms complexity: Python provides compilation logic, they don’t create generic C libraries with stable ABI’s for use by other programs, they don’t have turing complete scripts for dealing with configuration file management and so forth. Anti virus updaters similarly narrow the problem they deal with, and add constraints on latency- updates of anti virus signatures are time sensitive when a new rapidly spreading threat is detected.
A minor point, but it adds to the friction of considering a single packaging tool for all needs is the different use cases of low level package management tools like dpkg or rpm vs the use cases that e.g. pypi has. A primary use case for packages on pypi is for them to be used by people that are not machine administrators. They don’t have root, and don’t want it. Contrast that with dpkg or rpm where the primary use case (to date) is the installation of system wide libraries and tools. Things like man page installation don’t make any sense for non-system-wide package systems, whereas they are a primary feature for e.g. dpkg.
In short, the per-platform/language tools are (generally):
- Written in languages that are familiar to the consumers of the tools.
- Targeted at use on top of existing platforms, by non-privileged users, and where temporary breakage is fine.
- Intended to get the software packaged in them onto widely disparate operating systems.
- Very narrow – they make huge assumptions about how things can fit together, which their specific language/toolchain permits, and don’t generalise beyond that.
- Don’t provide for security updates in any specific form: that is left up to folk that ship individual things within the manager.
operating system package managers:
- Are written in languages which are very easy to bootstrap onto an architecture, and to deploy onto bare metal (as part of installation).
- Designed for delivering system components, and to avoid be able to upgrade the toolchain itself safely.
- Originally built to install onto one operating system, ports to other operating systems are usually fragile and only adopted in niche.
- Are hugely broad – they install data, scripts, binaries, and need to know about late binding, system caches etc for every binary and runtime format the operating system supports
- Make special provision to allow security updates to be installed in a low latency fashion, without requiring anything consuming the package that is updated to change [but usually force-uninstalling anything that is super-tightly coupled to a library version].
Anti virus package managers:
- Exist to update daemons that run with system wide escalated privileges, or even file system layer drivers.
- Update datasets in realtime.
- Without permitting updates that are produced by third parties.
Given that, lets look at the routes Matt suggested…
Decoupling applications from the core as a strategy makes an assumption – that the core and applications are partitionable. If they are not, then applications and the core will share common elements that need to be updated together. Consider, for instance, a Python application. If you run with a system installed Python, and it is built without zlib for some reason, but the Python application requires zlib, you have a problem. A classic example of this problem is facing Ubuntu today, with all the system provided tools moving to Python 3, but vast swathes of Python applications still being unported to Python 3 at all. Currently, the Python packaging system – virtualenv/buildout + distribute – don’t provide a way to install the Python runtime itself, but will happily install their own components for everything up the stack from the runtime. Ubuntu makes extensive use of Python for its own tools, so the system Python has a lot of packages installed which buildout etc cannot ignore – this often leads to issues with e.g. buildout, when the bootstrap environment has (say) zope.interfaces, but its then not accessible from the built-out environment that disables the standard sys.path (to achieve more robust separation). If we want to pursue decoupling, whether we build a new package manager or use e.g. virtualenv (or gem or npm or …), we’ll need to be aware of this issue – and perhaps offer, for an extended time, a dedicated no-frills, no-distro-packages install, to avoid it, and to allow an extended supported period for application authors without committing to a massive, distro sponsored porting effort. While its tempting to say we should install pip/npm/lein/maven and other external package systems, this is actually risky: they often evolve sufficiently fast that Ubuntu will be delivering an old, incompatible version of the tool to users well before Ubuntu goes out of support, or even befor the next release of Ubuntu.
Treating data as a service. All the cases I’ve seen so far of applications grabbing datasets from the web have depended on web infrastructure for validating the dataset. E.g. SSL certificates, or SSL + content checksums. Basically, small self-rolled distribution systems. I’m likely ignorant of details here, and I depend on you, dear reader, to edumacate me. There is potential value in having data repackaged, when our packaging system has behind-firewall support, and the adhoc system that (for instance) a virus scanner system has does not. In this case, I specifically mean the problem of updated a machine which has no internet access, not even via a proxy. The challenge I see it is again the cross platform issue: The vendor will be supporting Ubuntu + Debian + RHEL + Suse, and from their perspective its probably cheaper to roll their own solution than to directly support dpkg + rpm + whatever Apple offer + Windows – the skills to roll an adhoc distribution tool are more common than the skills to integrate closely with dpkg or rpm…
What about creating a set of interfaces for talking to dpkg / rpm / the system packagers on Windows and Mac OSX ? Here I think there is some promise, but it needs – as Matt said – careful thought. PackageKit isn’t sufficient, at least today.
There are, I think, two specific cases to cater to:
- The anti-virus / fresh data set case.
- The egg/gem/npm/ specific case.
For the egg/gem/npm case, we would need to support a pretty large set of common functionality, on Windows/Mac OSX / *nux (because otherwise upstream won’t adopt what we create: losing 90% of their users (windows) or 5% (mac) isn’t going to be well accepted . We’d need to support multiple installations (because of mutually incompatible dependencies between applications), and we’d need to support multiple language bindings in some fashion – some approachable fashion where the upstream will feel capable of fixing and tweaking what we offer. We’re going to need to support offline updates, replication, local builds, local repositories, and various signing strategies – to match the various tradeoffs made by the upstream tools.
For the anti-virus / fresh data case, we’d need to support a similar set of operating systems, though I strongly suspect that there would be more tolerance for limited support – in that most things in that space either have very platform specific code, or they are just a large-scale form of the egg/gem/npm problem, which also wants easy updates.
We should validate this discussion with at least two or three upstreams. Find out whats missing – I suspect a lot – and whats wrong – I hope not much :). Then we’ll be in a position to decide if there is a tractable, widespread solution *possible*.
Separately, we should stop fighting with upstreams that have their own packaging systems. They are satisfying different use cases than our core distro packaging systems are designed to solve. We should stop mindlessly repackaging things from e.g. eggs to debs, unless we need that specific thing as part of the transitive runtime or buildtime dependencies for the distribution itself. In particular, if us folk that build system packaging tools adopt and use the upstream application packaging tools, we can learn in a deep way the (real) advantages they have, and become more able to reason about how to unify the various engineering efforts going into them – and perhaps even eventually satisfy them using dpkg/rpm on our machines.
Filed under: Uncategorized | 19 Comments
Tags: clojure, Debian, eggs, lein, node.js, npm, packaging, Python, ruby, upstream
Edits: Corrected the description of the slony bug, and noted that there is a typo on the lazr_postgresql PYPI page.
Two years ago Launchpad did schema changes once a month. Everyone would cross their fingers and hope while the system administrators took all the application servers offline, patched the database with a months worth of work and brought up the servers again running the new QA’d codebase.
This had two problems:
- due to the complexity of the system – something like 300 processes have to be stopped or inhibited to take everything offline – the downtime duration was often about 90 minutes long irrespective of the schema patch duration. [Some of the processes don't like being interrupted at all].
- We simply could not deliver any change in less than 1 week, with the on average latency for something that jumped all the queues still being 2 weeks.
About a year ago we wanted to increase the rate at which schema changes could be carried out – the efforts to speed Launchpad up had consumed most low hanging fruit and more and more schema patches were required. We didn’t want to introduce additional 90 minute downtime windows though. Adopting incremental migrations – the sort of change process described in various places on the internet – seemed like a good way to make it possible to apply the schema changes without this slow shutdown-and-restart step, which was required because the pre-patch codebase couldn’t speak to the new schema. We could optimise each patch to be very fast by avoiding anything that causes a full table scan or table rewrite (such as adding indices, adding columns with a non-NULL default value). That would let us avoid the 90 minutes of downtime caused by stopping and restarting everything. However, that wasn’t sufficient – the reason Launchpad ended up doing monthly downtime is that previous attempts to do more frequent schema changes had too high a failure rate. A key reason for patch deployment time blowing out when everything wasn’t shut down was due to Launchpad being a very busy system – with the use of Slony, schema changes require an exclusive lock on all tables. [More recent versions of Slony only lock some tables, but it still requires very widespread locks for most DDL operations]. We’re doing nearly 10 thousand transactions per minute, at any point in time there are always locks open on some table in the system: it was highly improbably and effectively impossible for slonik to get an exclusive lock on all tables in a reasonable timeframe. Background tasks that take many minutes to complete exacerbate this – we can’t just block new transactions long enough to deliver all the in-flight web pages and let locks clear that way.
PGBouncer turns out to be an ideal tool here. If you route all your connections through PGBouncer, you have a single point you can deliberately interrupt to clear all database locks in a second or so (it takes time for backends to all notice that their clients have gone).
So we combined these things to get what we called ‘Fast Down Time’ or FDT. We set the following rules for developers:
- Any schema patch had to complete in <= 15 seconds in our schema staging environment (which has a full copy of the production DB), or we’d roll it back and redesign.
- Any patch could change either code or schema, never both. schema patches were to land on a separate branch and would be promoted to trunk only after deployment. That branch also receives automated merges from trunk after every commit to trunk, so its running the latest code.
This meant that we could be confident in QA: we would QA the new schema and the application process with the current live code (we deploy trunk multiple times a day). We published some documentation about how to write fast schema patches to help socialise the approach.
Then we wrote an automated tool that would:
- Check for known fragile processes and abort if any were found.
- Check for very long transactions and abort if any were found.
- Shutdown pgbouncer, disconnecting all clients instantly.
- Use slonik to apply one or more schema patches.
- Start pgbouncer back up again.
The code for this (call it FDTv1) is in the Launchpad source code history – its pretty entangled but its there for grabbing if you need it. Read on to see why its only available in the history
The result was wonderful – we immediately were able to deploy schema changes with <= 90 seconds of downtime, which was significantly less than the 5 minutes our stakeholders had agreed to as a benchmark – if we were under 5 minutes, we could schedule downtime once a day rather than once a month. We had to fix some API client code to retry more reliably, and likewise fix a few minor bugs in the database connection handling logic in the appservers, but all in all it was a pretty smooth project. Along the way we spun off a small python helper to run and control pgbouncer, which let us write effective tests for the connection handling code paths. In
This gave us the following workflow for making schema changes:
- Land and deploy an incremental schema change.
- Land and deploy any indices that need to be added – these are deployed live using CREATE INDEX CONCURRENTLY.
- Land and deploy code changes to populate any additional fields/tables from both application servers, and from cron – we do a bulk backfill that does many small transactions while walking over the entire dataset that needs to be updated / populated.
- Land and deploy code changes to drop references to the old schema, whatever it was.
- Land and deploy an incremental schema change to finalise the change – such as making a new column NOT NULL once the backfill is complete.
This looks long and unwieldy but its worth noting that its actually just repeated applications of a smaller primitive:
- Make a schema change that is fast and compatible with existing code.
- Change code to take advantage of the changed schema
Pretty much any change that is desired can be done using this single primitive.
We wanted to go further though – the multiple stages required for complex migrations became a burden with one change a day. Fortunately PostgreSQL now includes its own replication engine, which replicates the WAL logs rather than installing triggers on all tables like Slony.
Stuart, our intrepid DBA migrated Launchpad to PostreSQL 9.1, updated the FDT tool to work with native replication, and migrated Launchpad off of Slony. The result is again wonderful – the overhead in doing a schema patch, with all the protection I described above, is now ~5 seconds. We can do incremental changes in less time than it takes your browser to figure out that a given server is offline. We’re now negotiating with the Launchpad stakeholders to get multiple downtime windows each day, with this almost unnoticable, super reliable process in place.
Reliability wise, FDT has been superb. We’ve had 2 failures: one where we believe we encountered a bug in Slony: We dropped the id column from two tables in one patch (we replaced the autoincrement column as PK with a naturally unique column), and one where we landed a patch that worked on staging but led to lock contention in production – so the patch applied, but the system was very unhealthy after that until we fixed it. Thats after doing approximately 60 patches over a 1 year period.
We’re partway through extracting the patching logic from Launchpad’s code base into a reusable tool, but the basic principles will apply to any PostgreSQL environment. Note that there is a typo on the PYPI page – the actual Launchpad project is at https://launchpad.net/lazr.postgresql.
Filed under: Uncategorized | Leave a Comment
Tags: Launchpad, postgresql, Python
This is largely a memo-to-my-future self, but it may save some time for someone else facing what I was last weekend.
I’ve been putting together a Reprap recently, seeded by the purchase of a partially assembled one from someone local who was leaving town and didn’t want to take it with them.
One of the issues it had was that 2 of the stepstick driver boards it uses were burnt out, and in NZ there are no local suppliers – that I could find. There is however a supplier of Easydriver driver boards, which are apparently compatible. (The Reprap electronics is a sanguinololu, which has a fitted strip that exactly matches stepstick (or pololu) driver boards. The Easydrivers are not physically compatible, but they should be pin compatible.. no?
I mapped across all the pins carefully, and the only issues were: there are three GND’s on the Easydriver vs 2 on the stepstick, and the PFD pin isn’t exposed on the stepstick board so it can’t be mapped across.
I ended up with this mapping (I’m not sure where pin 1 is *meant* to be on the stepstick, so I’m starting with VMOT, the anti-clockwise corner pin on the same side as the 2B/2A/1A/1B pins, when looking down on an installed board pin 1, and going clockwise from there).
Stepstick – Easydriver
VMOT – M+
GND – GND
2B – B2
2A – A2
1A – A1
1B – B1
VDD – +5V
GND – GND
Dir – Dir
Step – Step
Slp – Slp
Rst – Rst
Ms3 – Nothing
Ms2 – Ms2
Ms1 – Ms1
En – Enable
But, when I tried to use this, the motor just jammed up solid.
A bit of debugging and trial and error later and I figured it out. The right mapping for the motor pins:
2B – B2
2A – B1
1A – A1
1B – A2
Thats right, the two boards have chosen opposed elements for labelling of motors coils pins – on the step stick 1/2 refers to the coil and A/B the two ends that need to have voltage put across them, on the easydriver A/B refer to the coil and 1/2 the two ends…
Super confusing, especially as I haven’t been doing much electronics for oh, a decade or so.
I’m reminded very strongly of Rusty’s scale of interface usability here.
Filed under: Uncategorized | Leave a Comment
My laptop has somewhat less than 1/2 the grunt of my desktop at home, but I prefer to work on it as I can go sit in the sun etc, very hard to do that with a mini tower case
However, running everything through ssh to another machine makes editing and iterating more clumsy; I need to do agent forwarding etc – not terribly hard, but not free either, particularly when I travel, I need to remember to sync my source trees back to my laptop. So I prefer to live on my laptop and use my desktop for compute power.
I had a couple of Juju charms I wanted to investigate, but I needed enough compute power to make my laptop really quite warm – so I thought, its time to update my local cloud provider from Eucalyptus to Openstack. This was easy enough, until I came to run Juju. Turns out that Juju’s commands really want to talk to the public DNS name of the instance (in order to SSH tunnel a connection to Zookeeper).
But! Openstack returns DNS names like ‘Server-3′, and if you think about a home network, its fairly rare to have a local DNS server *anyway*, so putting a suffix on names like that won’t help at all: you either need to use a DNS naming provider (openstack ships with an LDAP provider, which adds even more complexity), and configure your clients to know how to find it, or you need to use the public IP addresses (which default to the FlatNetwork, which is routable within a home LAN by simply adding a route to 10.0.0.0/8 to your wifi interface). Adding to confusion, some wifi routers fail to forward avahi messages, which is a) terrible and b) breaks the only obvious way of doing no-config local DNS :(.
So, I did some yak shaving this morning. Turns out other folk have already run into this and filed a Juju bug and a supporting txaws bug. The txaws bug was fixed, but just missed the release of Precise. Clint Byrum is going to SRU it this week though, so we’ll have it soon. I’ve put a patch up to address the Juju side, which is now pending review. Running the two together works very happily for me. \o/
Filed under: Uncategorized | 3 Comments
Tags: cloud, juju, openstack, Python, twisted, ubuntu
I’ve made the Testtools committers team own both the project and the trunk branch for both pyjunitxml and testscenarios. This removes me as a SPOF if anything needs doing in those projects – any Testtools committer can now do it. (Including code review and landing). If you are a testtools committer and need PyPI release rights, ping me and I’ll add you. (I wish PyPI had group management).
Filed under: Uncategorized | Leave a Comment
Tags: Python, testing, testtools
I’ve recently caught up on a bunch of reading some of which are worth commending.
- Switch – documents the factors that cause changes to fail (both in organisations and personal stuff), and provides a recipe for ensuring you have addressed those factors in any change you are planning.
- The Lean Startup – Applies Lean principles to the learning what customers respond well to – in the same way that Lean removes waste from the process of building some X, this removes waste from the process of determining what that X should be.
- The Innovator’s Solution – Pop science report of research done on why disruptive innovation at existing companies fails; covers structure, management, funding, market analysis, has recommendations to remove these sure-fail cases.
- The Innovator’s DNA – Pop science report of research done into how people innovate : turns out that there are a lot of things that one can do to be a better innovator.
Read them all, or none. I enjoyed them all.
Filed under: Uncategorized | Leave a Comment
This is a tiny PSA prompted by my digging into a deadlock condition in the Launchpad application servers.
We were observing a small number of servers stopping cold when we did log rotation, with no particularly rhyme or reason.
tl;dr: do not call any non-reentrant code from a Python signal handler. This includes the signal handler itself, queueing tools, multiprocessing, anything with locks (including RLock).
Tracking this down I found we were using an RLock from within the signal handler (via a library…) – so I filed a bug upstream: http://bugs.python.org/issue13697
Some quick background: when a signal is received by Python, the VM sets a status flag saying that signal X has been received and returns. The next chance that thread 0 gets to run bytecode, (and its always thread 0) the signal handler in Python itself runs. For builtin handlers this is pretty safe – e.g. for SIGINT a KeyboardInterrupt is raised. For custom signal handlers, the current frame is pushed and a new stack frame created, which is used to execute the signal handler.
Now this means that the previous frame has been interrupted without regard for your code: it might be part way through evaluating a multi-condition if statement, or between receiving the result of a function and storing it in a variable. Its just suspended.
If the code you call somehow ends up calling that suspended function (or other methods on the same object, or variations on this theme), there is no guarantee about the state of the object; it becomes very hard to reason about.
Consider, for instance, a writelines() call, which you might think is safe. If the internal implementation is ‘for line in lines: foo.write(line)’, then a signal handler which also calls writelines, could have what it outputs appear between any two of the lines in writelines.
True reentrancy is a step up from multithreading in terms of nastiness, primarily because guarding against it is very hard: a non-reentrant lock around the area needing guarding will force either a deadlock, or an exception from your reentered code; a reentrant lock around it will provide no protection. Both of these things apply because the reentering occurs within the same thread – kindof like a generator but without any control or influence on what happens.
Safe things to do are:
- Calling code which is threadsafe and only other threads will be concurrently calling.
- Performing ‘atomic’ (any C function is atomic as far as signal handling in Python is concerned) operations such as list.append, or ‘foo = 1′. (Note the use of a constant: anything obtained by reading is able to be subject to reentrancy races [unless you take care :)])
In Launchpad’s case, we will be setting a flag variable unconditionally from the signal handler, and the next log write that occurs will lock out other writers, consult the flag, and if needed do a rotation, resetting the flag. Writes after the rotation signal, which don’t see the new flag, would be ok. This is the only possible race, if a write to the variable isn’t seen by an in-progress or other-thread log write.
That is all.
Filed under: Uncategorized | 2 Comments
Tags: Launchpad, Python