Tuesday, February 3, 2026

Combinatorics Summary....

 



 

I realized that I have a combinatorics thread running through my blog across several subjects.  I have been interested in combinatorics since I sent an email to Robert Spitzer on the various combinations of diagnostic criteria.  His only comment was “Interesting”.  Since then, I have commented on a post that purported to discredit psychiatric diagnoses based on combination of diagnostic criteria (too many), a study of the real combinations of major depression diagnoses, and character and word phrase combinations for encryption and password protection.  I went as far as getting dice and using them to construct passphrases of varying length using the Electronic Frontier Foundation (EFF) word list for that purpose.

If you have no experience with combinations or it has been a long time since your college statistics course – dice are a good place to start.  Each die has 6 sides with corresponding numbers. The total combinations possible are 6n, where n = the number of dice rolled at once.  The EFF world list is 6,667 word long and that happens to be 66.  So, to generate passphrases – 5 dice are rolled and the corresponding number is looked up on the word list and recorded.  The process is repeated until the desired phrase length is generated.  The only downside to this method is that some sites still insist on additional numbers and special characters.  They can still be inserted in the passphrase, but other systems like hexadecimal may be more convenient.  The advantage to passphrases is that they are theoretically easier to memorize and type without error.  That breaks down with very long phrases.

In biology and medicine, combinatorics can be applied at several levels. Some have more meaning than others.  On this blog, I responded to a paper suggesting that the possible combinations of diagnostic criteria meant that psychiatric diagnoses were meaningless and unscientific.  The lesson from this post is to have an idea of what you are counting and what it means. The total combinations of verbal criteria depend a lot on the phrasing and the total number of criteria whether large or small is not necessarily disqualifying as illustrated in this post.  The combinatorial upper limit can be unrealistically large based on how it is defined and just running the numbers does not mean that all possible combinations will be found.  There also seems to be some magical thinking involved – just because you count something does not say anything about what that means.  It is quite literally an exercise in the map is not the territory. 

I looked at a second paper where the authors looked at a lower number of combinations based on the DSM diagnostic criteria for major depression.  In that case the total number of diagnoses was much lower at 227 combinations.  The authors of that second paper did standardized interviews on 3,800 people and of the 1,566 with major depression – just 10 of those combinations accounted for 50% of the cases.  About ¼ of the possible combinations (57/227) did not occur in any group.  This paper is a stark reminder that just counting things in biology or medicine doesn’t necessarily mean anything.

That brings me to the concept of how we make sense out of the most valid combinatorial explosions in medicine. For me validity is baked into the biology and not a verbal description of things.  The backing for that comes from biological taxonomy and the fact that molecular biology and genomics is solving problems that could not be solved by the verbal description of direct observations in the Linnean tradition.  To that end I am reproducing a table below that is all about the polygenic risk for bipolar disorder. 



Note that in this table the authors are estimating the total possible combinations of 803 polygenes. The theoretical number of possible combinations can be calculated using the formula n! / r!((nr)!,where n represents the number of genetic variants analyzed in a study, and r represents the number of genetic variants per combination. In the case of  SNP genotypes,3^r.the formula is n! / r!(n-r)! ×3^r.  The authors point out that the lowest value for r is 2 but the upper limit is unknown.  They also show how the number of combinations can be limited experimentally.  Of the 57,911,211 combinations found only in patients and not controls they could all be random but there were a significant number of SNPs associated with different groupings in bipolar disorder.    

Using the equations from above in a more readable graphic form:

 

 

Substitution yields the following:

- from the top equation, for 100 variants the theoretical 10-variant combinations would be 1.73 x 1013

- from the bottom equation, for 500,000 SNPs analyzed there would be 2.3 × 1012 two-variant combinations and 3.4 × 1018 three variant combinations.

The application of practical measure includes scanning SNPs for varying combination lengths in the population of interest relative to controls. At lower numbers those combinations can be taken out scanning for longer combinations. A further simplification is to scan only for combinations found in patient populations.  An example of that study is included in the tables below for 803 SNPs in 607 bipolar disorder patients and  1,354 controls. 

Cluster and subgroup analysis is required in very heterogeneous conditions to analyze clusters containing a specific SNP, the distribution of SNP genotypes relative to controls, and cluster selection that contains an SNP for a specific biological function.  Using this kind of analysis 73/609 bipolar disorder patients had these clusters compared to none in the control population. 

While the SNP and variant analysis in 2017 is a good example of combinatoric applications – it did not address the problem of missing heritability.  Missing heritability is the difference between what is observed in familial heritability studies and what is predicted with genetic analysis.  Looking at the predictions from SNP based analysis only a low percentage of familial inheritance was predicted.  That improved with more sensitive analytical techniques that considered additional genetic mechanisms.  The additional mechanisms included SNV (single nucleotide variation), insertions or deletions (indels), SVs (structural variations), CNV (copy number variations), and STR (short tandem repeat (3-5).  Applications that identify all these variations are much more likely to predict the heritability of the pedigree than earlier techniques.  I hope to revisit some of these genetic innovations in an upcoming post about the DSM-6 proposals.

 

George Dawson, MD, DFAPA 

 

References:

1:  Mellerup E, Møller GL. Combinations of Genetic Variants Occurring Exclusively in Patients. Comput Struct Biotechnol J. 2017 Mar 10;15:286-289. doi: 10.1016/j.csbj.2017.03.001. PMID: 28377798; PMCID: PMC5367802.

2:  Koefoed P, Andreassen OA, Bennike B, Dam H, Djurovic S, Hansen T, Jorgensen MB, Kessing LV, Melle I, Møller GL, Mors O, Werge T, Mellerup E. Combinations of SNPs related to signal transduction in bipolar disorder. PLoS One. 2011;6(8):e23812. doi: 10.1371/journal.pone.0023812. Epub 2011 Aug 29. PMID: 21897858; PMCID: PMC3163586.

3:  Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Finocchio A, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat Biotechnol. 2025 Jul;43(7):1177-1191. doi: 10.1038/s41587-024-02382-1. Epub 2024 Oct 25. PMID: 39455800; PMCID: PMC12022141.

4:  Wainschtein P, Zhang Y, Schwartzentruber J, Kassam I, Sidorenko J, Fiziev PP, Wang H, McRae J, Border R, Zaitlen N, Sankararaman S, Goddard ME, Zeng J, Visscher PM, Farh KK, Yengo L. Estimation and mapping of the missing heritability of human phenotypes. Nature. 2026 Jan;649(8099):1219-1227. doi: 10.1038/s41586-025-09720-6. Epub 2025 Nov 12. PMID: 41225014; PMCID: PMC12851931.

5:  Grotzinger AD, Werme J, Peyrot WJ, Frei O, de Leeuw C, Bicks LK, Guo Q, Margolis MP, Coombes BJ, Batzler A, Pazdernik V, Biernacka JM, Andreassen OA, Anttila V, Børglum AD, Breen G, Cai N, Demontis D, Edenberg HJ, Faraone SV, Franke B, Gandal MJ, Gelernter J, Hatoum AS, Hettema JM, Johnson EC, Jonas KG, Knowles JA, Koenen KC, Maihofer AX, Mallard TT, Mattheisen M, Mitchell KS, Neale BM, Nievergelt CM, Nurnberger JI, O'Connell KS, Peterson RE, Robinson EB, Sanchez-Roige SS, Santangelo SL, Scharf JM, Stefansson H, Stefansson K, Stein MB, Strom NI, Thornton LM, Tucker-Drob EM, Verhulst B, Waldman ID, Walters GB, Wray NR, Yu D; Anxiety Disorders Working Group of the Psychiatric Genomics Consortium; Attention-Deficit/Hyperactivity Disorder (ADHD) Working Group of the Psychiatric Genomics Consortium; Autism Spectrum Disorders Working Group of the Psychiatric Genomics Consortium; Bipolar Disorder Working Group of the Psychiatric Genomics Consortium; Eating Disorders Working Group of the Psychiatric Genomics Consortium; Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium; Nicotine Dependence GenOmics (iNDiGO) Consortium; Obsessive-Compulsive Disorder and Tourette Syndrome Working Group of the Psychiatric Genomics Consortium; Post-Traumatic Stress Disorder Working Group of the Psychiatric Genomics Consortium; Schizophrenia Working Group of the Psychiatric Genomics Consortium; Substance Use Disorders Working Group of the Psychiatric Genomics Consortium; Lee PH, Kendler KS, Smoller JW. Mapping the genetic landscape across 14 psychiatric disorders. Nature. 2026 Jan;649(8096):406-415. doi: 10.1038/s41586-025-09820-3. Epub 2025 Dec 10. PMID: 41372416; PMCID: PMC12779569.

 

Graphics Credit:

Table 1 is reused from open access reference 1 above per  CC BY license (http://creativecommons.org/licenses/by/4.0/).