15 Aug 2007

When are two identical changes the same, and when aren’t they? Theres a little bit of debate started by Andrew Cowie posting about unmixing the paint. Matt Palmer followed up with a claim that a particular technique used by Andrew is dangerous, and finally Andrew Bennetts makes the point that text conflicts are a small subset of merge conflicts.

That said, one critical task for a version control system is the merge command. Lets define merge at a human level as "reproduce the changes made in branch A in my branch B". There are a lot of taste choices that can be made without breaking this definition. For instance, merge that combines all the individual changes into one – losing the individual commit deltas meets this description. So does a merge which requires all text conflicts to be resolved during the merge commands execution, or one that does not give a human a chance to review the merged tree before recording it as a commit.

So if the goal of merge is to reproduce these other changes, then we are essentially trying to infer what the *change* was. For example, in an ideal world, merging a branch that changes all “log messages of floating points to 6 digit scale.” would know enough to catch all new log messages added in my branch, regardless of language, actual api used etc etc. But that is fantasy at the moment. The best we can do today depends on how we capture the change. For instance, Darcs allows some changes to be captured as symbol changing patches, and others as regular textual diffs.

So the problem about whether arriving at the same result can be rephrased ‘when is arriving at the same result correct or incorrect’.

For instance, if I write a patch and put it up as plain text on a website, then two people developing $foo download it and apply it, they have duplicate changes but its clearly correct that a merge between them should not error on this.

On the other hand, the example Andrew Bennetts quotes in his post is a valid example of two people making the same change, but the line needing a change during the merge to remain correct.

Here’s another, example though. If I commit something faulty to my branch, and you pull from me before I fix it. Then while I fix the bug, you also fix it – the same way. That is another example of no-conflict being correct.

If its possible for either answer – conflict, or do not conflict – to be correct, then what should a VCS author do?

There are several choices here:

  • Always conflict
  • Never conflict conflict
  • Conflict based on a heuristic

I think that our job is to assess what the maximum harm from choosing the wrong default is, and the likely hood of that occuring, and then make a choice. Short of fantasy no merge is, in general, definately good or bad – your QA process (such as an automatic test suite) needs to run regardless of the VCS’s logic. The risk of a bad merge is relatively low, because you should be testing, and if the merge is wrong you can just not commit it, or roll it back. So our job in merge is to make it likely as possible that your test suite will pass when you have done the merge, without further human work. This is very different to trying to always conflict whenever we cannot be 100% sure that the text is what a human would have created. Its actually harder to take this approach than conflicting – conflicting is easy.