Mystery Formula

There aren’t that many bridge books that don’t contain a single hand in my library but I just added one. It is called “Detecting Cheating in Bridge” and the title pretty much covers what the book is about. Actually, there is a hand in the book (the famous 2005 hand on which Buratti and Lanzarotti were caught cheating) but the hand is irrelevant for the discussion.

The book is written by Nicolas Hammond, a mathematician and computer scientist by training with a background in online banking and related security issues. Over the past years, he was involved in the company that developedBridgescore+, a scoring program mainly used in North America. This software maintains files on all events where it is use: hand records, scores, player data and much more.

In this blog some of my thoughts while reading the book.

The basic idea

The research behind the book was triggered when somebody allegedly cheated against the author at a tournament in 2015. While the author knew something was wrong, he also realized that there was little that could be done about it, it was a single incident not recorded on video and only his word against that of the opponents. On his way back, the author realized that intelligent software running on the datafiles from the scoring software could be used to automatically detect cheating and provide the necessary evidence.

Shortly thereafter this research became much more relevant when it became clear that 4 pairs had been cheating at the European Championships in 2014. Actually, it is not clear if the number was indeed 4 as 1 pair was cleared on legal grounds by the CAS and in another case, the statue of limitations had expired, but that is outside the scope of the book and this discussion. After all, 1 cheating pair not caught, is 1 cheating pair too many.

The author expanded his data-set, turned his ideas into algorithms, turned algorithms into code and ran the code. That is all nicely described in the book (and I’m a bit jealous, access to the dataset would have saved me quite some time researching the Bathurst/Lall case in 2016).

The book does a pretty good job describing all this in the first quarter of the book, then introduces something called the “mistakes function” or “magical formula”, or “MF”for short. The MF takes the data from a pair turns it into a number, compares that to the field, and if the values are different, that is a suggestion that something is something unusual about this pair. For this purpose, the author uses a number of special algorithms called ACDF’s or Advanced Cheating Detection Functions.

So, what is this MF?

This is the point where the book becomes a disappointment as the author only lists some properties of the function, but does not describe the algorithm in any detail. OK, I understand that there are patent and copyright issues here, but could have been sorted out beforepublishing.

What is said, is that the MF describes how good a pair (or a player) is on defense and declarer play. The MF can be compared to the peers of that pair, if it is significantly different from their peers, there must be something going one.

The book furthermore states that top players should be equally good on defense and declarer play. It doesn’t back up this claim though. In fact, it seems to say the opposite a few pages later, when it discusses differences in opening lead styles between pairs and players. In the pair of Meckstroth and Rodwell, with over 40 years of data available, the book shows that there are subtile differences in the opening lead styles of the two and that one player is a little more succesful with his opening leads than his partner. If that is the case, why can’t a player be better at declarer play than defense, or the other way around? And if a player can be better at one, why can’t an entire pair be better at one?

Leaving that aside, the fact that the book does not describe how the MF is calculated makes it impossible to use it in practice. Why? Suppose a pair is found to have a different MF from their peers and brought to a committee. The pair is, obviously, entitled to defend themselves but how can they do this? They don’t know how the evidence was collected, how the MF is calculated nor can they redo the analysis to check for any flaws there. Any judge will throw out this MF right on the spot, and the case is gone.

If you don’t agree, look at the example of DNA evidence that is used in many criminal cases. Before that was used in court cases, it was first shown in various independent scientific publications that a DNA profile is a unique property of each human being that does not change over time. Then the methods to determine this profile were published, allowing everybody to set up a laboratory and redo tests. If there is DNA evidence, a defendant can always ask for a second laboratory to redo the test. If the results differ, the evidence can be challenged.

Statistics

That brings up my next problem with this research: statistics.

The book contains lots of tables and lists of data, all with 4 or 5 significant digits, but none of the tables contain any statistical (or systematic) errors on the numbers. For example, the MF value of the entire data-set is 1.2648. The entire data set are 290 tournaments high level tournaments. Taken at face value, that suggests that the (combined statistical and systematic) error on the number is half the last significant digit, or 0.00005.

But, the next page shows a plot of the MF values for various high level tournaments and one immediately sees a spread in the MF values, ranging from 1.1 to 1.5 for all pairs for the entire tournament. For individual pairs, the ranges are even bigger, from about 0.4 to about 1.8. With those ranges, it does not make sense to quote values in 4 significant digits.

One can estimate the standard deviation on the MF values from the plots. When I do this, I get something in the range of 0.2. In other words, the MF value for the entire data set should have been quoted as something like 1.26+0.20. Giving just 1.26 is both scientifically wrong as well as misleading.

The fact that there is no discussion of statistical errors on the data, makes most of the tables in the second half of the book unconvincing to me.

More unclear effects

What is unclear from the book, is how data sets are combined. The author quotes (p97) a WBF report on a cheating pair which said that “XY had been actively monitored at some tournements”. He then speculates that this pair might have noticed this and stopped cheating for a bit.

Which immediately leads to another question: is it possible to combine data from one tournament to another. At the moment, all data from a pair is grouped but is this correct or even possible? What is the effect on the MF?

Finally, the book fails to discuss the effect of other possible errors in the source data. One example: if you watch BBO frequently, you will no doubt have noticed that there are occasional small differences between the results on BBO and the official scores. With scores entered through the bridgemates, it is even worse. How does this affect the results.

And what if they are not cheating

My final criticism of book. Pairs are listed as cheaters. Some of the pairs were indeed caught using other methods, but there are still other pairs listed as “suspect”. For legal reasons, their names are not mentioned.

However, if a pair scores significantly better than its peers, there is always a second possibility: they are better players. In a proper statistical forensic analysis of the evidence, this alternative hypothesis should be investigated as well. That has not been done for any of the described pairs.

My conclusion: this is an interesting idea, but then the book fails miserably on describing the algorithm and its consequences. The statistical treatment of the data is plainly wrong. A lot more work is needed before this actually can be used, if it can be used at all.

If you want to read the book yourself

"Detecting Cheating in Bridge" is available from the author through his websitethough currently he is not shipping it outside the US. I got my copy when somebody picked up one for me in the US and brought it over, another way would be to use a service like Reship. I haven’t seen the book at any of online (bridge) bookshops (yet) but perhaps they can help you when you give the ISBN 978-0-9822355-5-3. Online price is $39.95.

Henk Uijterwaal 2019