# November 02, 2006

**Confused George: Probability Problem**Let's say you see a taxi sideswipe a car late at night. It's a hit and run. The taxi was blue. At least you're pretty sure it was blue...

Your town has 2 taxi companies. Blue Taxi has 15% of the taxis in town; Green Taxi has 85%. Independent, and impecably scientific tests of your abilities as a witness (bear with me here) say that you are able to identify the colour of a taxi at night correctly 80% of the time. So what is more likely to be the colour of the taxi you saw? The book I'm currently reading on probability -- "Chances Are: Adventures in Probability", by Michael and Ellen Kaplan -- says to figure this out we have to consider the initial 85% likelihood that the taxi was green. And when you take that into account, along with your 80% success rate as a witness, there's a 59% chance the taxi was green. I don't get it. I would think that the taxi was probably blue. There's nothing to indicate that you are better at identifing blue taxis as against green taxis. So, the proportion of blue to green taxis on the road is irrelevant to your abilities as a witness. Whatever the likelihood that one colour or another is presented to you, you have an 80% chance of identifying the correct colour. So who's right?

actualcolor was the taxi?" And the answer to that will depend on both the ratio of green to blue taxis on the street and at how well you are at judging color at night. Because green taxis are much more common, it turns out to me more likely that you misidentified a green taxi as blue than you saw a rare blue taxi and identified it correctly.purpletaxis, had dropped off a passenger in the town in question.if there were zero blue taxis in the town, but you identified the sideswiper as a blue taxi, the odds of the taxi actually being blue would still be zero regardless of your percent success rate at identification.Agreed. There's no problem regarding probability because all taxis are one colour. Success rates at distinguishing between various colours is irrelevant if there is only one colour.On the other hand, if you were 100% successful at identifying the taxi's color, the ratio of the taxi colors wouldn't matter.In this example, the skill of the witness operates independently of the ratio of taxis. 100% is 100%, regardless of how many taxis in town are blue or green. Similarly, I suppose, if the witness was always wrong -- 0% -- that would operate independently of the taxi colour ratio. So if at the extremes, witness skill is not dependant on the overall ratio, why does it necessarily have to be linked in cases of somewhat skilled witnesses, like those with 80% success rates?That is true, but it's not the question that you are answering! The question is not whether you identified it correctly, it's what was the actual color of the taxi. Those are not the same question. Whether or not you identified it correctly is only half of the matter.Whoa. My head hurts. Is there a way for the witness to identify the colour correctly other than stating the actual colour of the taxi?"Pr(A) is the prior probability or marginal probability of A. It is "prior" in the sense that it does not take into account any information about B."In the monkey poop example, the colour of the poop and the hit rate are unrelated, so one can be "prior" of the other. But in the taxi example, can you say that one fact is truly prior of the other?there is no taxi.Priorto us even thinking about B’s influence. In fact, we find Pr(A) by adding up all the possible ways (within the restrictions of the problem) that A can occur, regardless of whether or not B happened.But in the taxi example, the two stats are about the same variable: colour of the taxi.Actually there are two variables, actual color and perceived color. Independently, the actual color can be green or blue, and the perceived color can be green or blue. For the probability we want, the two events are “The taxi’s actual color is green” (Ag) and “I see a blue taxi” (Sb). We want the probability of the taxi being green, which, with no other information is 85% (this is the prior probability, Pr(Ag)). So already it’s pretty likely it was green. But we have the extra info that I saw a blue taxi, so that makes it less likely that the taxi is green (if you trust my vision at all), but by how much? To find out, we have to look at my performance in another hypothetical experiment. Out of 100 taxis, 15 blue and 85 green, I will see 12 blue taxis as blue and 17 green taxis as blue, or 29 total perceived blue taxis. I mistook 17 green taxis as blue, so among my blue sightings, I mistook a green taxi for blue 17/29, or 59%, of the time. This is the same as Pr(Ag | Sb) or the probability of the taxi actually being green restricted to those situations where I saw a blue taxi. That’s just a rewording of Rhomboid’s original enumerative method, which for me is the cleanest way of looking at it. Also looking at extreme examples as Mr. K and Rhomboid have done helps the intuition a lot. I tried and failed to find an intuitive way to think about the ratio Pr(Sb | Ag) / Pr(Sb) = 20/29, which in Bayes’ Theorem is multiplied by 85% to get what we want. Oh well, intuition sometimes just isn't there.isthe winning ticket. In other words, #1, the winning number, and #2, the number on the ticket, are the same. So, what are the odds that he is a reliable witness? How likely is it that he can properly observe that he has the winning ticket? You'd be barking up the wrong tree if you started off your analysis by saying -- "well, the odds of him holding the correct ticket are one in a zillion." You don't have to go there because the question now defines the likelihood of his holding the correct ticket as 1 in 1. 100%. The only variable left is the possible variation between the actual number in his hand, and what he will perceive that number to be. We know he's correct 19 times out of 20. So his odds of his figuring out he holds the correct ticket are 19/20 to the power of the number of digits on the ticket. Ok, back to the taxis. The three "balls in the air" in the taxi problem are: 1. The true colour of the taxi in the accident. 2. The actual colour of the taxi seen by the witness. 3. The colour perceived by the witness. What are the odds that #1 (the colour of the taxi in the accident) and #2 (the actual colour of the taxi seen by the witness) are the same? We know from the question that she saw the accident. She didn't see a random taxi that may or may not have been the one in the accident. She saw the specific, particular taxi that committed the hit and run. So there is no variation between #1 and #2 in the taxi problem. The odds of #1 and #2 being the same are 1 in 1, or 100%. If you assume a variation between #1 and #2 -- using the 85% blue, 15% green stat -- I believe you make a logical error. To put it another way, because we know she is oberserving the specific taxi in the accident, she is, metaphorically speaking, holding the winning lottery ticket in her hand. The only variable in the question is the variation between the actual colour of the taxi, and the colour she perceives. She right 80% of the time. Since we're only dealing with one taxi -- not the multiple digits on a lottery ticket -- the odds are a straight up 80% chance she's right. Or have I made a mistake somewhere?One of these identifications is wrong, according to the tests of the witness's ability.Wouldn't there be a 33% chance that the witness has it right?... 1st event -- 80% chance correct 2nd event -- .8 x .8 = 64% chance correct 3rd event -- .8 x .8 x .8 = 51% 4th event -- .8 x .8 x .8 x .8 = 41% 5th event -- .8 x .8 x .8 x .8 x .8 = 33%We know from the question that she saw the accident. She didn't see a random taxi that may or may not have been the one in the accident. She saw the specific, particular taxi that committed the hit and run. So there is no variation between #1 and #2 in the taxi problem. The odds of #1 and #2 being the same are 1 in 1, or 100%.She saw the taxi BUT we already know she is only 80% accurate in her "seeing". So the odds of #1 and #2 are not the same. That is the crux of it.This is one of those mathematical ponderings that have little or no relevance to a real-world accident.You are high, bees. This kind of statistical analysis is extremely relevant to the real world. I gave two examples above. It not just a math trick, it's the way the real world operates. The fact that this is so counterintuitive to so many people has harmed society in so many different ways.Imagine setting up an experiment where our guy watches a computer animation of the hit and run accident many, many times - 15% blue taxi animations and 85% green taxi animations. Which is going to happen more often...that he gets the color wrong, or that the color is blue? Since he kind of sucks at seeing blue stuff, it's going to be the former, so the probability you're looking for has to be more than 50%.Over many, many accident simulations, there will be a total universe ofnaccidents of which 15% will be blue and 85% will be green. And the 80-20 right-wrong ratio will sort itself out as Rhomboid described. But in each individual case, the odds are always 80-20. This is perfectly consistent with Rhomboid's numbers:A. It was actually blue, and you identified it correctly as blue; 0.15 * 0.80 = 0.12 B. It was actually blue, but you misidentified it as green; 0.15 * 0.20 = 0.03 C. It was actually green, and you identified it correctly as green; 0.85 * 0.80 = 0.68 D. It was actually green, but you misidentified it as blue; 0.85 * 0.20 = 0.17 0.12 + 0.03 + 0.68 + 0.17 = 1.00Over many, many sightings, when you look in the 80% pile, you'll see a ratio of 68 green and 12 blue (17 to 3). But for any specific sighting -- whether it's blue or green -- the ratio of right to wrong answers will always be the same: 4 out of 5 times (80%). If the witness says a specific taxi is blue, Rhomboid's numbers say that in 12 out of 15 times (80%), that will be correct. If the witness says it's green, his numbers say that will be correct 68 out of 85 times (80%). Whatever the proportion of taxis on the road, you can count on the witness to correctly identify them 4 out of 5 times.giventhat it actually is a blue taxi. But in the problem we're trying to solve, we don't know the actual color of the taxi. All we're given is what the witness saw. We're trying to find the probability that the car is actually bluegiventhat the witness saw blue. This is not the same question and not the same probability as before, though it may seem like it. I think it may be because, in the absence of any other information, we tend to fall back on the reliability of the witness. But even if we don't know the distribution of blue and green taxis in the town and assume it's equally likely to encounter either, the two probabilities are still not the same. It may be possible that if we make no assumptions about what color cars are in the town, and thus have an infinitude of possibilities, then these two probabilities might converge. I'm not sure though, I'll have to think about it. But I do know that in order for the two probabilities to be equal in the original problem, some weird shit has to happen. I think an understanding and an agreement can be made about our intuitive understandings of this problem, as contrived as it is. It's not like this is a religious argument (not yet, anyway).We're trying to find the probability that the car is actually blue given that the witness saw blue.But if it's given that the witness says "blue" I don't get how it can be among the possiblities that "C. it was green and you identified it as green". If it's a given that the witness says "blue", how can you assign a probability to the chance that the witness says "green"? It's a 100% certainty the witness said blue. And as for you, Rhomboid, I forgive you for your uncouth words.Probability the person who tested positive is a user is .095 / .14 = .6786 What am I doing wrong???Nothing as far as I can see. You did it right. I didn't catch it before, but I don't know where the 21.1% comes from.)))for StoryBored's reformulation > the Law of Large Numbers The Wolfram website (a fantastic resource) gives proofs for the weak and strong forms.areonly 2 possibilities: 1. Correct, the taxi is blue, or 2. Wrong, the taxi is really green. So, how big is the correct pile? How many yes's can there be? Out of every hundred taxis, we know 15 are blue, and with an 80% accuracy rate, that means the correct pile will be 12 taxis. How big is the wrong pile? How many no's can there be? If the witness always identifies 80% as correct, she'll identify 67 taxis as green. That leaves 17 green taxis, 20%, improperly identified as blue. Do the math and you get the 59% likelihood that the witness will be wrong. On the one hand, I find this counterintutive to the point of loveliness. Itistrue that the witness will always identify any specific taxi 4 times out of 5. Itistrue that this 4/5 success rate always stays the same no matter what the ratio of green to blue taxis. But at the same time it's equally true that altering the ratio of blue to green can make it probable that the witness will be wrong when she says "blue". That's very cool. But on the other hand, it now seems pretty obvious that 80% of 15 is smaller than 20% of 85. Go figure. Thanks everybody for a most enjoyable discussion.On the one hand, I find this counterintutive to the point of loveliness.That's a brilliant summary of the thread, Torluath!I find this counterintutive to the point of loveliness. That's a brilliant summary of the thread, Torluath!I agree, that's one of the more beautiful statements I've seen in a while.The car that hit her was green.I wasn't there but I can say with confidence that it was 80% green...carrying a big load of samwiches....QED.