Sunday, February 7, 2016

Anecdotes Are Not Evidence And Other "Evidence-Based" Fairy Tales

A lot depends on the kind of anecdote that you are talking about.  When we talk about anecdotes in medicine that generally means a case or a series of cases.  In the era of "evidence based medicine" this is considered to be weak or non-existent evidence.  Even 30 years ago when I was rounding with a group of medicine or surgery residents: "That's anecdotal..." became a popular part of roundsmanship as a way to put down someone basing their opinion on a case or series of cases.  At the time, series of cases were still acceptable for publications in many mainstream medical journals.  Since then there has been and inexorable march toward evidence-based medicine and that evidence is invariably clinical trials or meta-analyses of clinical trials.  In some cases the meta-analyses can be interpreted in a number of ways including interpretations that are opposed to the interpretation of the author.  It is easy to do if a biologically heterogeneous condition is being studied.

There has been some literature supporting the anecdotal but it is fairly thin.   Aronson and Hauben considered the issue of drug safety.  They made the point that isolated adverse drug reactions are rarely studied at a larger epidemiological level following the initial observation.  They argued that they could establish criteria for a definitive adverse event based on a single observation.  In the case of the adverse drug event they said the following criteria could be considered "definitive":  extracellular or intracellular deposition of the drug, a specific anatomic location or pattern of injury, physiological dysfunction or direct tissue damage, or infection as a result of an infective agent due to contamination (1).  They provide specific examples using drugs and conclude: "anecdotes can, under the right circumstances, be of high quality and can serve as powerful evidence.  In some cases other evidence may be more useful than a randomized clinical trial.  And combining randomized trials with observational studies, or with observational studies and case series, can sometimes yield information that is not in clinical trials alone."  This is essentially the basis for post marketing surveillance by the Food and Drug Administration (FDA).  In that case, monitoring adverse events on a population wide basis, increases the likelihood of finding rare but serious adverse events compared with the the original trial.

Enkin and Jadad had a paper that also considered anecdotal evidence as opposed to formal research (2).   They briefly review the value of informal observation and the associated heuristics, but also point out that these same observations and heuristics can lead some people to adhere to irrational beliefs.  The values of formal research is a check on this process when patterns emerge at a larger scale.  Several experts have applied Bayesian analysis to single case results, to illustrate how pre-existing data can be applied to single cases analyses with a high degree of success.  Pauker and Kopelman (3) looked at a case of hypertension and when to consider a search for secondary causes - in this case pheochromocytoma.  They made the interesting observation that:

"Because the probability of pheochromocytoma in a patient with poorly controlled hypertension who
does not have weight loss, paroxysms of sweating, or palpitations is not precisely known, the clinician often makes an intuitive estimate.  But the heuristics (rules of thumb) we use to judge the likelihood of diseases such as pheochromocytoma may well produce a substantial overestimate, because of the salient features of the tumor, its therapeutic importance, and the intellectual attraction of making the diagnosis."   

They take the reader through a complex presentation of hypertension and likelihood ratios used to analyze it and conclude:

"Hoofbeats usually signal the presence of horses, but the judicious application of Bayes' rule can help prevent clinicians from being trampled by a stampeding herd that occasionally includes a zebra."

In other words, by using Bayes rule,  you won't subject patients with common conditions to excessive (and risky) testing in order to not miss an uncommon condition and you won't miss the uncommon condition.  Looking at the data that supports or refutes that condition will make it clear, if you have an idea about the larger probabilities.

How does all of this apply to psychiatry?  Consider a few vignettes:

Case 1:  A psychiatric consultant is asked to assess a patient by a medicine team.  The patient is a 42 year old man who just underwent cardiac angiography.  The angiogram was negative for coronary artery disease and no intervention was necessary.  Shortly afterwards the patient becomes acutely agitated and the intern notices that bipolar disorder is listed in the electronic health record.  He calls the consultant and suggests an urgent consult and possible transfer to psychiatry for treatment of an acute manic episode.  The consultant arrives to find a man sitting up in bed, appearing very angry and tearful and shaking his head from side to side.

Case 2:  A 62 year old woman is seen in an outpatient clinic for treatment resistant depression.  She has been depressed for 20 years and had had a significant number of medication changes before being referred to a psychiatric clinic.  All of the medications were prescribed by her primary care physician.  She gives the history that she gets an annual physical exam done each year and they have all been negative.  Except for fatigue and some lightheadedness, her review of systems is all negative.  She is taking lisinopril for hypertension, but has no history of cardiac disease.  She has had electrocardiograms (ECGs) in the past but no other cardiac testing.  The psychiatrist discusses the possibility of a tricyclic antidepressant.

Case 3:  An inpatient psychiatrist has just admitted a 46 year old woman who is concerned about the FBI.  She has been working and functioning at a high level until about a month ago when she started to notice red automobiles coming past her cul de sac at a regular frequency.  She remembered getting into an argument at work about American military interventions in the Middle East.  She made it very clear that she did not support this policy.  She started to make the connection between the red automobiles and the argument at work.  She concluded that an employee must have "turned her in" to the federal authorities.  These vehicles were probably the FBI or Homeland Security.  She noticed clicking noises on her iPhone and knew it had been cloned.  She stopped going into work and sat in the dark with the lights out.  She reasoned it would be easier to spot somebody breaking into her home by the trail of light they would leave.   There is absolutely no evidence of a mood disorder, substance use disorder, or family history of psychiatric problems.  She agrees to testing and all brain imaging and laboratory studies are normal.  She agrees to a trial of low dose antipsychotic medication, but does not tolerate 3 different medications at low doses.

These are just a very few examples of the types of clinical problems that psychiatrists encounter on a daily basis that require some level of analysis and intervention.  A psychiatrist over the course of a career is encountering 20 or 30 of these scenarios a day and ends up making tens of thousands of these decisions.  What is the "evidence basis" of these decisions?  There really is nothing beyond anecdotes with the availability of various strengths of confirmation.  What kinds of evidence would the evidence based crowd like to see?  In Case 1, a study that looked at behavioral disturbances after cardiac catheterization would be necessary, although Bayes would suggest the likelihood of that occurring would be very low.  In Case 2, a large trial of treatment approaches to 62 year old women with depression and fatigue would be useful.  I suppose another trial of what kinds of laboratory testing might be necessary although much of the literature suggests that fatigue is very nonspecific.  In most cases where patients are being seen in primary care - fatigue and depression are indistinguishable.  Extensive testing, even for newer inflammatory markers yields very little.  Further on the negative side,  evidence-based authorities suggest that a routine physical examination and screening tests really adds nothing to disease prevention or long term well being.  Case 3 is more interesting.  Here we have a case of paranoid psychosis that cannot be treated due to the patient experiencing intolerable side effects to the usual medication.  Every practicing psychiatrist knows that a significant number of people can't take entire classes of medications.  Here we clearly need a clinical trial of 40 year old women with paranoid psychoses who could not tolerate low dose antipsychotic medication.

By now my point should be obvious.  None of the suggested trials exist and they never will.  It is very difficult to match specific patients and their problems to clinical trials.  Some of the clinical occurrences are so rare (agitation after angiography for example) that it is very doubtful that enough subjects could be recruited and enrolled in a timely manner.  And there is the expense.  There are very few sources that would fund such a study and these days, very few practice environments or clinical researchers that would be interested in the work.  Practice environments these days are practically all managed care environments where physician employees spend much of their time administrative work, the company views clinical data as proprietary, and research is frequently focused on advertising and marketing rather than answering useful clinical questions.

That brings us to the larger story of what the "evidence" is?  The anecdotes that everyone seems to complain about are really the foundation of clinical methods.  Training in medicine is required to experience these anecdotes as patterns to be recognized and classified for future work.  They are much more than defective heuristics.  How does that work?  Consider case #1.  The psychiatric consultant in this case sees an agitated and tearful man who appears to be in distress.  The medicine team sees a diagnosis of bipolar disorder and concludes the patient is having an acute episode of mood disturbance.  The consultant quickly determines the fact that the changes are acute and rejects the medical team's hypothesis that this is acute mania.  After about 5 questions he realizes that the patient is unable to answer, pulls a pen out of his pocket, and asks the patient to name of the pen.  When he is not able to do this, he performs a neurological exam and determines the patient has right arm weakness and hyperreflexia.  An MRI scan confirms the area of an embolic stroke and the patient is transferred to neurology rather than psychiatry.  The entire diagnostic process is based on the past anecdotal experience of diagnosing and treating neurological patients as a medical student, intern, and throughout his career.  Not to labor the point (too much) - it is not based on a clinical trial or a Cochrane review.                

The idea that the practice of medicine comes down to a collection of clinical trials that that are broken down according to a proprietary boilerplate and generally conclude that the quality of most studies is low and therefore there is not much to draw conclusions on is absurd.  Trusting meta-analyses for answers is equally problematic.  You might hope for some prior probability estimates for Bayesian analysis.  But you will find very little in the endless debate (4) about whether or not antidepressants work or they are dangerous.  You will find nothing when these studies are in the hands of journalists who are not schooled in how to practice medicine and know nothing about treating biologically heterogeneous populations and unique individuals.  No matter how many times the evidence based crowd tells you that you are treating the herd - a physician knows that if they treat the herd - it is one person at a time.  They also know that screening the herd can lead to more problems than solutions.

Treating the herd would not allow you to make a diagnosis of complete heart block and immediate referral to Cardiology for pacemaker placement (Case 2) or psychotherapy with no medications at all and eventual recovery (Case 3).  If you accept the results of many clinical trials or their interpretation by journalists - you might conclude that your chances of recovery from a complex disorder are not better than chance.  There is nothing further from the truth.

That is why most of us practice medicine.

George Dawson, MD, DLFAPA


1:   Aronson JK, Hauben M. Anecdotes that provide definitive evidence. BMJ. 2006 Dec 16;333(7581):1267-9. Review. PubMed PMID: 17170419; PubMed Central PMCID: PMC1702478.

2:  Enkin MW, Jadad AR. Using anecdotal information in evidence-based health care:  heresy or necessity? Ann Oncol. 1998 Sep;9(9):963-6. Review. PubMed PMID: 9818068.

3:  Pauker SG, Kopelman RI. Interpreting hoofbeats: can Bayes help clear the haze? N Engl J Med. 1992 Oct 1;327(14):1009-13. PubMed PMID: 1298225.

4:  Scott Alexander.  SSRIS: MUCH MORE THAN YOU WANTED TO KNOW.  Slate Star Codex posted on July 7, 2014.


Photo at the top is Copyright Gudkov Andrey and downloaded from Shutterstock per their Standard License Agreement on 2/3/2016.


  1. Excellent piece.

    Besides the points you make, here are a few more important considerations: 1. In Psychiatry there are almost no objective tests at all, just self report and the experimenter's observations - both subject to significant bias. In psychiatry, it is usually true that "data" is just the plural of anecdote. 2. When it comes to relationship stress, the major cause of a lot of chronic anxiety, dysthymia, and self destructive behavior, the context and subtext of interpersonal interactions are key in figuring out what is going on with the patient. Neither of these things are quantifyable, and usually can not be observed directly, since we can't read minds and are not there for the vast majority of the patient's interpersonal interactions. The "data" are the unique stories the participants tell themselves.

    Another key point is that all anecdotes are not alike. Some we have reason to believe might be generalizable to specific populations, and some not. How many similar anecdotes are we talking about, and in how many different clincial contexts? There is also the question of whether the anecdote was reported accurately. And then there is the interpretation of the anecdote, which is often confused with the anecdote itself and is treated like data (this is also true with any clinical trial).

    Since all of human history is anecdotal based on people who recorded events at the time, I guess the "evidenced-based" crowd must think we can't learn from it.

    1. Thanks David,

      I like your additional considerations. They are part of the larger point that what physicians do and how successful they are is almost totally independent of clinical trials and certainly Cochrane. Gawande made that point nicely in one of his articles where he pointed out that the outcomes for cystic fibrosis were markedly superior for one physician and clinic based on his treatment methods - not on a superior "evidence-based" treatment. To your point on the usual outpatient mix of anxiety, dysthymia, depression and its interpersonal context, it is amazing how this area is generally neglected in terms of what might be useful for clinicians. We are supposed to believe that we are treating unitary diagnoses that are stable over time. Critics who know nothing about psychiatry will point out how "unreliable" psychiatric diagnoses are. Some psychiatric scholars will bemoan the fact that we don't have written criteria to make this distinction. You and I and all clinical psychiatrists know that we see people who "meet criteria" for GAD at one point in time and dysthymia at another. Further we know that if we see a patient for a repeat evaluation several years after an initial treatment episode - they are very unlikely to give us that same past history. We are really sampling a unique conscious state and making interpretations and interventions on the fly.

      And people get better - much better than suggested by clinical trials. That should be expected rather than a surprise.