Interpreting risks

An article on caught my eye the other day, which discussed the nature of medical test results and the interpretation of risk. Take a look, as it claims that doctors are not as good as they should be at interpreting (on behalf of patients) the significance of test results.

In this article they gave the example of a patient (a 50 year old women, about whom no medical information is known) who has just had a test for breast cancer.

Note - the reliability of a particular test is assessed over time by medical researchers comparing tests results on patients and then seeing (using other follow up tests and procedures) how many of the patients actually had the disease that the test was saying they did or did not have. This way, they get a fairly decent estimate of how good the test is.

In this BBC article example, the breast cancer test had a sensitivity rate of 90%. What sensitivity means is, if we had perfect knowledge that a group of people DID have the disease, what percentage of times would the test come out as positive (positive in medical parlance means the test indicates you HAVE the disease or condition)? In our example, on average 90% of outcomes, or 9 times out of 10, the test would give a positive (correct) result. Of course, the perfect test would be 100% sensitive. 

Some terminology here. 90% of the test results would thus result in a TRUE POSITIVE. But 10% of the results would be a FALSE NEGATIVE (i.e. you are told you don't have the disease but you actually do). 

But if you just walked in off the street, and had the test (with no-one knowing if you had the disease or not) and got a positive result then, knowing that the test is 90% sensitive, one error of thinking, commonly made, is that you might think that this means there is a 90% chance that you have the disease. Not at all - read on.

We need to understand something else here - what is known as the specificity of the test. Some more terminology. In the example given in the article, they had a specificity rate of 91%. And what this means is, if we had perfect knowledge that a group of people did NOT have the disease, how often would the test produce a (correct) negative result? In the example given, it is 91%. Meaning, if the test was performed 100 times, 91 of those times would produce a negative result i.e. what we might call a TRUE NEGATIVE. But 9 of those times would produce a positive result when you don't actually have the disease - in other words, 9 FALSE POSITIVES. Again, if a test had perfect specificity (100%) all would be great.

This is a key concept - a test needs to be assessed for how reliable it is when it is performed on groups that DO have the disease, but also for how reliable it is when it is performed on groups that don't. It may not be intuitive, but these two things are completely separate. Even if a test is performed on the same person repetitively, these types of errors will produce some false negatives or some false positives (depending on whether the person does or does not have the condition). Where tests have similar levels of sensitivity and specificity, that's just a coincidence - they can be quite different because it all depends on the underlying logic, science and fallibility of the measuring process.

The ideal combination would be a test that had 100% sensitivity, and 100% specificity. To my knowledge, and you will not be surprised to hear this, there are few if any tests that are so accurate. If a test was 100% accurate in both these ways, the test would have perfect predictive power, and you could completely rely on it. But this never happens in the real world. Again, a test might be quite sensitive, but have worse specificity, or vice versa.

Well, so far we have more of an idea of the issues involved. But, about half of the gynecologists in the BBC article apparently concluded from the above data that the chances of the women having cancer from the positive test alone, was in fact 90%. Oops - this is completely wrong, as we shall see.

The problem is that, in the real world of imperfect tests that do not have 100% sensitivity and 100% specificity, we can't assess the significance of test results without knowing what proportion of the overall population actually have the disease (this proportion is known as the prevalence rate).

To take a silly example, if we knew that 100% of the population always had the disease we can see that we don't need the tests. We know the answer already! And, if we knew that the prevalence rate was zero, we also know the answer already and don't need the tests. But anything in between we need to have a handle on. Why?

Well, because the test does not have perfect sensitivity or specificity (the test has a tendency to throw up false positives and false negatives) we have to weight or balance the size of these outcomes by the prevalence of the disease. Specifically;

  • Imagine you got a positive result - you would be interested to know how likely - adjusted for prevalence - is a true positive compared to a false positive.
  • And if you got a a negative result? You would be interested, instead, in how likely - adjusted for prevalence - is a true negative compared to a false negative.

An example. If the same number of people in the population have the disease as those who don't (a prevalence rate of 50%) then the 'weights' applied to the false positive effect are the same as the weights applied to the false negative effect. And in this case, with both sensitivity and specificity in the BBC example being 90% (actually the latter was 91% but that is near enough) then it is true to say the the lady with the positive result would have a probability of 90% of having the disease. But only in this example of a prevalence of 50%. And even then, the specificity has to be the same as the sensitivity. What if the proportion of the disease in the population was actually only 1%?

Well, we can see 'intuitively' that this causes a problem. If someone walks off the street and has the test, there is only 1 chance in a 100 that they have the disease. If the test is performed on them, the likelihood of a true positive (where they do have have the condition and the test produces a correct result) is very much less than the likelihood of a false positive (where they don't have the condition and the test produces an incorrect result).

Why? The test - if you have the disease - returns true positives 90% of the time. But the test, if you don't have the disease, still returns false positives 9% of the time (100 minus the specificity of 91). But - we still have to adjust for the prevalence of the disease. After we have multiplied true positive returns of 90% by 1% (the prevalence rate) to produce 0.9% and false positive returns of 9% by 99% (100 minus the prevalence rate) to still get very nearly 9.0%, we find that any positive result is thus approximately 10 times more likely to be a false alarm that a true positive.

Hence the true likelihood of having the disease, following a positive test result, is (in this example) 10%.

And we can see, all other things being equal, that if;

  • we increase the prevalence of the disease, any positive result from a test mean we are more likely to have the disease
  • we decrease the specificity, OR decrease the sensitivity, any positive result from a test means we are less likely to have the disease
  • if we decrease the specificity, any negative result from a test is slightly less likely to be correct
  • and so on...

To give you some idea, if we drop the sensitivity and specificity of the test we used in the above example to 75%, then we have to raise the prevalence of the disease from 1% to 3% to produce the same likelihood of having the disease if we get a positive result from the test.

So, if you have to explain this to patients, or have your own chat with your doctor, I hope this all helps you be a little better prepared!

(Footnote: this BBC podcast of July 2014 gives an excellent overview of the issues - highly recommended - you won't be able to listen after July 2015!)