This article, if true, is enraging (hat tip, Adam S.). To fully appreciate the issue, it helps to have an understanding of the probability involved. I’ll explain by presenting two different problems that are similar to the DNA probability problems in question.

1) Let’s say I have a fair 6-sided die, and you have a piece of paper with a number on it. What is the probability that my roll of the dice will match your number? Since there are 6 possible rolls, only one of which will match the number, and since each roll is equally likely, the answer is 1 in 6, or 16.7%. Since your number is fixed, even if unknown, the die has a 1 in 6 chance of landing on the fixed number you are holding. Simple enough. A more complicated problem is this:

2) We have 6 people with 6 pieces of paper. We first roll the die 6 times (or use some other random number generating method) to assign a number to each of them (two people can in theory have the same number). Then we roll the die a 7th time. What is the probability that it will match at least one of the numbers written on the pieces of paper? (Savvy readers will note that this is equivalent to the more straightforward problem of rolling the die six times and asking what is the probability that 1 or more of the 6 rolls matches a fixed number, but the problem above is slightly more parallel to the DNA case.) The answer here is a little more complicated. The trick is (hopefully I’m allowed to use that word without being accused of falsifying results) to calculate the probability, call it p, that none of the pieces of paper match the die roll, and then subtract p from 1 (since if it’s not the case that none of the pieces of paper match, it must be the case that one or more pieces of paper match). This is simply (5/6)^6, or (5/6) to the 6th power. Why? For each piece of paper, the probability that it does not match the die roll is 5/6. In general, the probability of two independent events occuring, one with probability *p* and one with probability *q*, is *p*q*. Since there are 6 independent events here, we raise to the 6th power. So the probability is 1 – (5/6)^6 = 66.5%. In general, the probability, call it* P1+*, of an event of probability q happening at least once given n independent observations is *P1+* = 1 – (1-*q*)^*n*.

These are interesting problems in their own right, as we intuitively think that if something has a 1 in *n* chance in happening, after *n* tries we expect to see that thing happen. In this case, we see that our intuition is correct – 66.5% of the time, in the example above, we will see the event in question occur at least once (the event there being a match between the die roll and a piece of paper).

If the article is true, this basic truth of probability theory is lost on our legal system. The article focuses on a case in which a man, John Puckett, was accused of a rape-homocide and convicted almost purely on the basis of DNA evidence:

Puckett was arrested, tried, and eventually convicted based mostly on the DNA match, which was portrayed as proof positive of his guilt—the jury was told that the chance that a random person’s DNA would match that found at the crime scene was one in 1.1 million.

The first problem above corresponds to testing a suspect’s DNA to see if it matches the DNA found at the crime scene. There, as we saw, the relevant statistic is 1/6 in the case of the dice and 1 in 1.1 million (or whatever number the lab comes up with based on the quality of the DNA match) in the case of the DNA. If John Puckett had been a suspect, and they had tested his DNA, the above statistic would have been correct. But Puckett’s match was found by combing through a large database. If one finds a person by that method, the probabilities change dramatically, as in the second example. For a 1 in 1.1 million chance, combing through a database with 1.1 million people would yield 1 or more hits a whopping 63% of the time just by chance. Even if the database only contains 300,000 people the chances are 24% that a match will turn up randomly. But the article claims that even in cases in which a DNA match has been found through the latter method, the court bars the defense attorneys from presenting the method by which the match was made to the jury. Not only that, but the prosecutor can and does use the “1 in 1.1 million” number, and the defense is not allowed to counter that the true probability in this case is much higher. If the article is accurately describing the situation, than this state of affairs is simply insane, and innocent people are almost surely going to jail because of it.

on March 4, 2010 at 12:34 AM |AdamWhat, no hat tip?

😉

on March 4, 2010 at 9:06 AM |R.E.L.Good call – edited.

on March 4, 2010 at 9:27 AM |AdamI also recommend reading the works of John Allen Paulos (last I checked, a math prof at Temple) and author of a few books, including ‘A Mathematician Reads the Newspaper.’ In it (or one of his others) he discusses how similar probability issues plague disease diagnosis with respect to false positives in test results.

on March 4, 2010 at 9:34 AM |R.E.L.I’ve never heard of him, but the classic fallacy in disease diagnosis occurs when false positives and negatives are both very rare, but less rare than the disease itself. This will generally result in more false positives than real positives if you just test everyone. You are right that they are related issues.

on March 4, 2010 at 12:56 PM |AdamI have a bunch of his books, feel free to borrow. They’re very quick reads. Also has one on math and humor (alas, not many people get it).

on November 15, 2010 at 1:03 PM |Assorted Links – Weekend Roundup Part II « Barely Connected[…] DNA evidence. DNA evidence isn’t a sure thing either. See here for a scary DNA story and this great explanation describing the problems with DNA […]