Wednesday, December 26, 2018

What Does Carl Sagan's Observation About The Effects of Medicine Really Mean?

I started reading Carl Sagan's book The Demon Haunted World based on what I heard about it on Twitter.  Of course I was very familiar with his work based on his television persona but none of his writings. Part of the way in, I ran across the above quote as a footnote on page 13. It hit me immediately based both on my personal medical care and also the thousands of people I have assessed over the years.  There was also a clear contrast with what is in the popular and professional press.  The media has been obsessed with metrics of medical systems for various reasons. First, it is sensational.  When they can get a hold of a controversial statistic like the estimated number of people killed by medical interventions every year.  Change that to the equivalent number of 757 crashes per year and you have a hot headline.  Second, generalizations about the quality of health care lack granularity and precision.  There are no epidemiological studies that I am aware of that can even begin to answer the question he asked at the dinner party.

From a research standpoint, researching this endpoint would take an unprecedented level of detail. Standard clinical trials, epidemiological studies, and public health statistics look at mortality as an outcome typically of few variables and limited age groups. Nobody publishes any clear data on avoiding mortality - often many times over the course of a lifetime.  The same is true about avoided morbidity. Researchers seem focused on binary outcomes - life or death, cured or ill, recovered or permanently disabled.

The closest literature seems to be the way that chronic illnesses accumulate over time but that endpoint is not as striking as a mortality endpoint.  Every study that I have seen looks at specific cause of mortality or a collection of similar causes and not all possible causes.  The best longitudinal data I have found is in a graph in this article in the Lancet (see figure 1).  Please inspect this graph at the link and notice the trend in the number of chronic illnesses and how they increase with age.  The population studied here was in Scotland.  This is very impressive work because as far as my research goes I can find no other graph of chronic illnesses with this level of detail.  Data in the US have very crude age groups and the number of chronic illness is often limited.  If I look at CMS data for Medicare beneficiaries 34.5% have 0-1 chronic medical conditions, 29.5% have 2-3 conditions, 20.7% have 4 to 5 conditions, and 15.3 have 6+ conditions.  By selecting certain age groups the percentages change in the expected directions.  For example, looking at beneficiaries less than 65 years of age, 45% have 0-1 conditions and 11.2% have 6+ conditions.

Chronic conditions are the most significant part of medical effort and expenditure, but as indicated in Sagan's quote - they are only a part of the medical experience of people across their lifetimes. Now that some previously fatal conditions are being treated as chronic conditions the demarcation between morbidity and mortality is blurred even further. The experience of being treated and cured for a potentially fatal illness is left out of the picture.  Being treated and cured from multiple fatal conditions over the course of a lifetime is also not captured.  Two common examples are acute appendicitis and acute cholecystitis.   These diagnoses alone account for 280,000 appendectomies and 600,000 cholecystectomies each year. The vast majority of those people go on to live normal lives after the surgery.  Those surgical treatments are just a portion of the surgery performed each year that is curative and results in no further disability and in many situations the prevention of significant mortality and morbidity.  A more complete metric of life saving, disability preventing, and disease course modification would be most useful to determine what works in the long run and where the priorities should be.

One way to get at those dimensions would be to look at what I would call the Sagan Index.  Since I am sure that there are interests out there licensing and otherwise protecting Carl Sagan's name I am going to use the term Astronomer Index instead - but make no mistake about it - the concept is his.

I have included an initial draft of what this index might look like in the supplementary notes below.  Higher number correlates with the number of times a life has been saved or disability prevented.  Consider the following example - an average guy in his 60s.  When I take his history he recalls being hospitalized for anaphylaxis at age 16 and a gangrenous appendix at 19.  With the appendectomy he has a Penrose drain in his side and had a complicated hospital stay.  He traveled to Africa in his 20's where he got peptic ulcer disease, malaria and 2 additional episodes of anaphylaxis from vaccines that he was allergic to. When he was 42 he had an acute esophageal obstruction and needed an emergency esophagoduodenoscopy. At age 45 he insisted on treatment for hypertension.  At 50 he had polysomnography, was diagnosed with obstructive sleep apnea (OSA) and treated with continuous positive airway pressure (CPAP). At age 55 he had multiple episodes of atrial fibrillation and needed to be cardioverted twice.  He is on long term medication to prevent atrial fibrillation.  At age 60 he had an acute retinal detachment and needed emergency retinal surgery.  At age 66 he needed prostate surgery because of prostatic hypertrophy and urinary tract obstruction.  Assigning a point for either saving his life or preventing disability would yield a Astronomer Index of 11.  In 5 of these situations he was likely dead without medical intervention (3 episodes of anaphylaxis, gangrenous appendix, and acute esophageal obstruction)  and in the other six he would be partially blind, acutely ill with a urinary tract obstruction and possible renal failure, experiencing the cardiac side effects (or sudden death) from OSA, or possibly a stroke with disabling neurological deficits from the untreated atrial fibrillation. The index would take all of these situations into account as well as life threatening episodes of psychiatric illness or substance use.

Compare the Astronomer Index (AI) to all of the media stories about the number of medical errors that kill people.  Preventing medical errors is an essential goal but it really does not give the average person a measure of how many times things go right.  When I see stories about how many planes full of people die each year because of medical errors I think of a couple of things.  The first is a NEJM article that came out in response to the IOM estimate of people dying from medical errors.  It described the progress that had been made and what problems might be associated with the IOM report.  The second thing I think of is the tremendous number of saves that I have seen in the patients I treat.  I have to take a comprehensive medical history on any new patients that I assess and I have talked with many people who would score very highly on the AI.

Another aspect of treatment captured by this index would be the stark reality of medical treatment. The vast majority of people realize this and do not take any type of medical treatment lightly. There is a broad array of responses to these decisions ranging from rational decision making to denying the severity of the problem. Everyone undergoing medical treatment at some point faces a decision with varying degrees of risk. That should be evident from the current television direct-to-consumer pharmaceutical ads that rapidly list the serious side effects including death every time the commercial runs.  The decisions that people make also have to answer the serious question of what their life would be like without the treatment and associated risk. There are no risk free treatments as far as I know. In the case of the hypothetical patient, he has taken at least 11 significant risks in consenting to medical and surgical treatment that have paid off by living into his seventh decade.

I have not seen any metric like AI applied across the population.  Certainly there are many people who make it into their 60s and have fewer problems than in our example but there are also many who have more and actual disability.  The available epidemiology of chronic, cured, and partially cured conditions is extremely limited and I don't see anything that comes close to a metric that captures an individuals lifetime experience like the AI index might.  The rate of change in the index over the lifespan of the individual and across different populations might provide detailed information important for both prevention and service provision. In terms of psychiatric treatment - a good research question would be the response of people to treatment for psychiatric or substance use disorders with high scores on the index to people with low scores. Is there a potential correlation with cognitive decline?

The bottom line for me is that life is hard and most of us sustain considerable damage to our organism over the course of a lifetime.  Only a small portion of that is covered in most medical and epidemiological studies.  This index might provide the needed detail.  It might also provide some perspective on how many times each of us need serious medical treatment over the course of a lifetime.

George Dawson, MD, DFAPA


1. Karen Barnett, Stewart W Mercer, Michael Norbury, Graham Watt, Sally Wyke, Bruce Guthrie,
Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study, The Lancet, Volume 380, Issue 9836, 2012, Pages 37-43,


A copy of the Astronomer Index (AI) is shown below:

Sunday, December 16, 2018

Morning Report

I don't know if they still call it that or not - but back in the day when I was an intern Morning Report was a meeting of all of the admitting residents with the attendings or Chief of Internal Medicine.  The goal was to review the admissions from the previous night, the initial management, and the scientific and clinical basis for that management. Depending on where you trained, the relationship between house staff and attendings could be affiliative or antagonistic. In affiliative settings, the attendings would guide the residents in terms of management and the most current research that applied to the condition. In the antagonistic settings, the attendings would ask an endless series of questions until the resident presenting the case either fell silent or excelled.  It was extremely difficult to excel because the questions were often of the "guess what I am thinking" nature. The residents who I worked with were all hell bent on excelling.  After admitting 10 or 20 patients they would head to the library and try to pull the latest relevant research.  They may have only slept 30 minutes the night before but they were ready to match wits with the attendings in the morning.

Part of that process was discussing the relevant literature and references.  In those days there were often copies of the relevant research and beyond that seminar and research projects that focused on patient care. I still remember having to give seminars on gram negative bacterial meningitis and anaphylaxis.  One of my first patients had adenocarcinoma of unknown origin in his humerus and the attending wanted to know what I had read about it two days later.  I had a list of 20 references. All of that reading and research required going to a library and pulling the articles in those days.  There was no online access.  But even when there was - the process among attendings, residents, and medical students has not substantially changed.

I was more than a little shocked to hear that process referred to as "intuition based medicine" in a recent opinion piece in the New England Journal of Medicine (1).  In this article the authors seem to suggest that there was no evidence based medicine at all.  We were all just randomly moving about and hoping to accumulate enough relevant clinical experience over the years so that we could make intuitive decisions about patient care.  I have been critical of these weekly opinion pieces in the NEJM for some time, but this one seems to strike an all time low. Not only were the decisions 35 years ago based on the available research, but there were often clinical trials being conducted on active hospital services - something that rarely happens today now that most medicine is under corporate control.

Part of the author's premise here is that evidence-based medicine (EBM) was some kind of an advance over intuition-based medicine and now it is clear that it is not all that it is cracked up to be. That premise is clearly wrong because there was never any intuition based medicine before what they demarcate as the EBM period. Secondly, anyone trained in medicine in the last 40 years knew what the problems with EBM were from the outset - there would never be enough clinical trials of adequate size to include the real patients that people were seeing.  I didn't have to wait to read all of the negative Cochrane Collaboration studies saying this in their conclusions.  I knew this because of my training, especially training in how to research problems relevant to my patients. EBM was always a buzzword that seemed to indicate some hallowed process that the average physician was ignorant of.  That is only true if you completely devalue the training of physicians before the glory days of EBM.

The authors suggest that interpersonal medicine is what is now needed. In other words the relationship between the physician and patient (and caregivers) and their social context is relevant.  Specifically the influence the physician has on these folks.  Interpersonal medicine "requires recognition and codification of the skills that enable clinicians to effect change in their patients, and tools for realizing those skills systematically." They see it as the next phase in "expanding the knowledge base in patient care" extending EBM rather than rejecting it.  The focus  will be on social and behavioral aspects of care rather than just the biophysical. The obvious connection to biopsychosocial models will not be lost on psychiatrists.  That is straight out of both interpersonal psychotherapy (Sullivan, Klerman, Weissman, Rounsaville, Chevron) and the model itself by Engel.  Are the authors really suggesting that this was also not a focus in the past?

Every history and physical form or dictation that I ever had to complete contained a family history section and a social history section.  That was true if the patient was a medical-surgical patient or a psychiatric patient.  Suggesting that the interpersonal, social, and behavioral aspects of patient care have been omitted is revisionism that is as serious as the idea of intuition based medicine existing before EMB.

I don't understand why the authors just can't face the facts and acknowledge the serious problems with EBM and the reasons why it has not lived up to the hype.  There needs to be a physician there to figure out what it means and be an active intermediary to protect the patient against the shortfalls of both the treatment and the data. As far as interpersonal medicine goes that has been around as long as I have been practicing as well.  Patients do better with a primary care physician and seeing a physician who knows them and cares for them over time. They are more likely to take that physician's advice.  Contrary to managed care propaganda (from about the same era as EBM) current health care systems fragment care, make it unaffordable, and waste a huge amount of physician time taking them away from relationships with patients.

Their solution is that physicians can be taught to communicate with patients and then measured on patient outcomes.  This is basically a managed care process applied to less tangible outcomes than whether a particular medication is started. In other words, it is soft data that it is easier to blame physicians for.  In this section they mention that one of the author's works for Press Ganey - a company that markets communication modules to health care providers. I was actually the recipient of such a module that was intended to teach me how to introduce myself to patients. The last time I took that course was in an introductory course to patient interviewing in 1978.  I would not have passed the oral boards in psychiatry in 1988 if I did not know how to introduce myself to a patient.  And yet here I was in the 21st century taking a mandatory course on how to introduce myself after I have done it tens of thousands of times.  I guess I have passed the first step toward the new world of interpersonal medicine.  I have boldly stepped beyond evidence based medicine.   

I hope there is a lot of eye rolling and gasping going on as physicians read this opinion piece.  But I am also concerned that there is not. Do younger generations of physicians just accept this fiction as fact?  Do they really think that senior physicians are that clueless?  Are they all accepting a corporate model where what you learn in medical school is meaningless compared to a watered down corporate approach that contains a tiny fraction of what you know about the subject?

It is probably easier to accept all of this revisionist history if you never had to sit across from a dead serious attending at 7AM, present ten cases and the associated literature and then get quizzed on all of that during the next three hours of rounding on patients.

George Dawson, MD, DFAPA


1: Chang S, Lee TH. Beyond Evidence-Based Medicine. N Engl J Med. 2018 Nov 22;379(21):
1983-1985. doi: 10.1056/NEJMp1806984. PubMed PMID: 30462934.

Graphic Credit:

That is the ghost of Milwaukee County General Hospital one of the teaching affiliates of the Medical College of Wisconsin.  It was apparently renamed Doyne Hospital long after I attended  medical school there.  It was demolished in 2001.  I shot this with 35mm Ektachrome walking to medical school one day. The medical school was on the other side of this massive hospital.

Sunday, December 9, 2018

What Isn't Available In Multimillion Dollar EHRs? Decision Support from 1994

Physician Decision Support Software from the 20th Century

I used to teach a class in medical informatics. My emphasis was not mistaking a physical illness for a psychiatric one and also not missing any medical comorbidity in identified psychiatric patients.  The class was all about decision-making, heuristics, and recognition of biases that cause errors in medical decisions. Bayesian analysis and inductive reasoning was a big part of course. About that time, software packages were also available to assist in diagnostic decisions. Some of them had detailed weighting estimates to show the relative importance of diagnostic features.  It was possible to enter a set of diagnostic features and get a listing of probable diagnoses for further exploration. I printed out some examples for class discussions and we also reviewed research papers and look at the issue of pattern recognition by different medical specialists.

The available software packages of the day were reviewed in the New England Journal of Medicine (1).  In that review, 10 experts came up with 15 cases as written summaries and then those cases were cross checked for validity and pared down to 105 cases.  The four software programs (QMR, Iliad, Dxplain, and Meditel) were compared in their abilities to suggest the correct diagnosis. Two of programs used Bayesian algorithms and two used non-Bayesian algorithms. The authors point out that probability estimates varied based on literature clinical data used to establish probabilities. In the test, the developers of each program were used to enter the diagnostic language and the compared outcomes were the list of diagnoses produced by each program. The diagnoses were rank ordered according likelihood.

The metrics used to compare the programs was correct diagnosis, comprehensiveness (in terms of the differential diagnosis list generated), rank, and relevance.  Only 0.73-0.91 of the programs had all of the cited diagnoses in the knowledge base. Of the programs 0.52 - 0.71 made the correct diagnosis across all 105 cases and 0.71-0.89 made the correct diagnosis across 63 cases.  The 63 case list was used because those diagnoses were listed in all 4 knowledge bases.  The authors concluded the lists generated had low sensitivity and specificity but that unique diagnoses were suggested that the experts agreed may be important. They concluded that the performance of these programs in clinical settings being used by physicians was a necessary next step. They speculated that physicians may use these programs beyond generating diagnoses but also looking at specific findings and how that might affect the differential diagnosis.

A study (2) came out five years later that was a direct head-to-head comparison of two different physicians using QMR software to assess 154 internal medicine admissions where there was no known diagnosis.  In this study physician A obtained the correct diagnosis in 62 (40%) cases and physician B was correct in 56 (36%) of the cases. That difference was not statistically significant. Only 137 cases had the diagnosis listed in the QMR knowledge base. Correcting for that difference, correct diagnoses increased to 45% for physician A and 41% for physician B. The authors concluded that a correct diagnosis Listed in the top five diagnoses 36 to 40% of the time was not accurate enough for a clinical setting, but they suggested that expanding the knowledge base would probably improve that rate.

Since then the preferred description of this software has become differential diagnosis generators (DDX) (3.4). A paper from 2012, looked at a total of 23 of these programs but eventually included only 4 for in their analysis. The programs were tested on ten consecutive diagnosis-focused cases chosen from from 2010 editions of the Case Records of the New England Journal of Medicine (NEJM) and the Medical Knowledge Self Assessment Program (MKSAP), version 14, of the American College of Physicians. A 0-5 scoring system was developed that encompassed the range of 1= diagnosis suggested on the first screen or first 20 suggestions to 5= no suggestions close to the target diagnosis. The scoring range was 0-50. Two of the programs exactly matched the diagnosis 9 and 10 times respectively. These same two programs DxPlain and Isabel had identical mean scores of 3.45 and were described as performing well. There was a question of integration with EHRs but the authors thought that these programs would be useful for education and decision support. They mention a program in development that automatically incorporates available EHR data and generates a list of diagnoses even without clinician input.

The most recent paper (4) looked at a a systemic review and meta-analysis of differential diagnosis (DDX) generators. In the introductory section of this paper the authors quote a 15% figure for the rates of diagnostic errors in most areas of medicine. A larger problem is that 30-50% of patients seeking primary care or specialty consultation do not get an explanation for their presenting symptoms. They looked at the ability to generate correct lists of diagnosis, whether the programs were as good as clinicians, whether the programs could improved the clinicians list of differential diagnoses, and the practical aspects of using DDX generators in clinical practice. The inclusion criteria resulted in 36 articles comparing 11 DDX programs (see Table 2.)  The original paper contains a Forest Plot of the results of the DDX generators showing variable (but in some cases high) accuracy but also a high degree of heterogeneity across studies.  The authors conclude that there is insufficient evidence to recommend DDX generators based on the variable quality and results noted in this study.  But I wonder if that is really true.  Some of the DDX generators did much better than others and one of them (Isabel) refers to this study in their own advertising literature.

My main point in this post is to illustrate that these DDX generators have been around for nearly 30 years and the majority of very expensive electronic health record (EHR) installations have none.  The ones that do are often in systems where they are actively being studied by physicians in that group or one has been added and the integration in the system is questionable.  In other words, do all of the clinical features import into the DDX generator so that the responsible clinician can look at the list without making that decision.  At least one paper in this literature suggests that eliminates the bias of deciding on whether to not to make the decision to use diagnostic assistance.  In discussion of physician workflow, it would seem that would be an ideal situation unless the software stopped the workflow like practically all drug interaction checking software.

The drug interaction software may be a good place to start. Some of these program and much more intelligent than others. In the one I am currently using trazodone x any serotonergic medication is a hard stop and I have to produce a flurry of mouse clicks to move on.  More intelligent programs do not stop the workflow for this interaction of SSRI x bupropion interactions.  There is also the question of where artificial intelligence (AI) fits in.  There is a steady stream of headlines about how AI can make medical diagnoses better than physicians and yet there is no AI implementation in EHRs designed to assist physicians.  What would AI have to say about the above drug interactions? Would it still stop my work process and cause me to check a number of exception boxes? Would it be able to produce an aggregate score of all such prescriptions in an EHR and provide a probability statement for a specific clinical population?  The quality of clinical decisions could only improve with that information. 

And there is the issue of what psychiatrists would use a DDX generator for?  The current crop has a definite internal medicine bias.  Psychiatrists and neurologists need an entirely different diagnostic landscape mapped out.  The intersection of psychiatric syndromes, toxidromes and primary neurological disorders needs to be added and expanded upon. As an experiment, I am currently experimenting with the Isabel package and need to figure out the best way to use it.  My experimental paradigm is a patient recently started on lithium who develops an elevated creatinine, but was also started on cephalexin a few days after the lithium was started.  Entering all of those features seems to produce a random list of diagnoses and the question of whether an increasing creatinine is due to lithium or cephalexin.  It appears that the way the diagnostic features are entered may affect the outcome.

Decision support is supposed to be a feature of the modern electronic health record (EHR). The reality is the only decision support is a drug interaction feature that varies greatly in quality from system to system. Both the drug interaction software and DDX generators are very inexpensive options for clinicians.  EHRs don't seem to get 1990s software right.  And it does lead to the question: "Why are EHRs so expensive and why do they lack appropriate technical support for physicians?" 

Probably because they were really not built for physicians.

George Dawson, MD, DFAPA


1: Berner ES, Webster GD, Shugerman AA, Jackson JR, Algina J, Baker AL, Ball EV, Cobbs CG, Dennis VW, Frenkel EP, et al. Performance of four computer-based diagnostic systems. N Engl J Med. 1994 Jun 23;330(25):1792-6. PubMed PMID:8190157.

2: Lemaire JB, Schaefer JP, Martin LA, Faris P, Ainslie MD, Hull RD. Effectiveness of the Quick Medical Reference as a diagnostic tool. CMAJ. 1999 Sep 21;161(6):725-8. PubMed PMID: 10513280.

3: Bond WF, Schwartz LM, Weaver KR, Levick D, Giuliano M, Graber ML. Differential diagnosis generators: an evaluation of currently available computer programs. J Gen Intern Med. 2012 Feb;27(2):213-9. doi: 10.1007/s11606-011-1804-8. Review. PubMed PMID: 21789717.

4: Riches N, Panagioti M, Alam R, Cheraghi-Sohi S, Campbell S, Esmail A, Bower P.The Effectiveness of Electronic Differential Diagnoses (DDX) Generators: A Systematic Review and Meta-Analysis. PLoS One. 2016 Mar 8;11(3):e0148991. doi: 10.1371/journal.pone.0148991. eCollection 2016. Review. PubMed PMID: 26954234.