There is no one-size-fits-all brain model.
Machine learning has aided scientists in understanding how the brain generates complex human characteristics, revealing patterns of brain activity associated with actions such as working memory, traits such as impulsivity, and conditions such as depression. Scientists can use these methods to develop models of these relationships, which can then be used to make predictions about people’s behavior and health.
However, it only works if models represent everyone, and past research has shown they do not. For every model, there are certain individuals who just do not fit.
Researchers from Yale University have analyzed who these models tend to fail in, why that occurs, and what can be done to fix it in a study that was recently published in the journal Nature.
According to the lead author of the research and M.D.-Ph.D. student at Yale School of Medicine Abigail Greene, models must be applicable to any specific person in order to be most helpful.
“If we want to move this kind of work into a clinical application, for example, we need to make sure the model applies to the patient sitting in front of us,” she said.
Greene and her colleagues are considering two approaches that they believe might help models deliver more accurate psychiatric categorization. The first is by classifying patient populations more accurately. For instance, a diagnosis of schizophrenia covers a wide range of symptoms and might vary greatly from person to person. Researchers may be able to classify individuals in more precise ways if they have a better knowledge of the neural underpinnings of schizophrenia, including its symptoms and subtypes.
Second, some characteristics, such as impulsivity, are characteristic of a variety of conditions. Understanding the neural basis of impulsivity may help physicians tackle that symptom more effectively, regardless of the medical diagnosis.
“And both advances would have implications for treatment responses,” said Greene. “The better we can understand these subgroups of individuals who may or may not carry the same diagnoses, the better we can tailor treatments to them.”
But first, models need to be generalizable to everybody, she said.
To understand model failure, Greene and her colleagues first trained models that could use patterns of brain activity to predict how well a person would score on a variety of cognitive tests. When tested, the models correctly predicted how well most individuals would score. But for some people, they were incorrect, wrongly predicting people would score poorly when they actually scored well, and vice versa.
The research team then looked at who the models failed to categorize correctly.
“We found that there was consistency — the same individuals were getting misclassified across tasks and across analyses,” said Greene. “And the people misclassified in one dataset had something in common with those misclassified in another dataset. So there really was something meaningful about being misclassified.”
Next, they looked to see if these similar misclassifications could be explained by differences in those individuals’ brains. But there were no consistent differences. Instead, they found misclassifications were related to sociodemographic factors like age and education and clinical factors like symptom severity.
Ultimately, they concluded that the models weren’t reflecting cognitive ability alone. They were instead reflecting more complex “profiles” — sort of mashups of the cognitive abilities and various sociodemographic and clinical factors, explained Greene.
“And the models failed anyone who didn’t fit that stereotypical profile,” she said.
As one example, models used in the study associated more education with higher scores on cognitive tests. Any individuals with less education who scored well didn’t fit the model’s profile and were therefore often erroneously predicted to be low scorers.
Adding to the complexity of the problem, the model did not have access to sociodemographic information.
“The sociodemographic variables are embedded in the cognitive test score,” explained Greene. Essentially, biases in how cognitive tests are designed, administered, scored, and interpreted can seep into the results that are obtained. And bias is an issue in other fields as well; research has uncovered how input data bias affects models used in criminal justice and health care, for instance.
“So the test scores themselves are composites of the cognitive ability and these other factors, and the model is predicting the composite,” said Greene. That means researchers need to think more carefully about what is really being measured by a given test and, therefore, what a model is predicting.
The study authors provide several recommendations for how to mitigate the problem. During the study design phase, they suggest, scientists should employ strategies that minimize bias and maximize the validity of the measurements they’re using. And after researchers collect data, they should as often as possible use statistical approaches that correct for the stereotypical profiles that remain.
Taking these measures will lead to models that better reflect the cognitive construct under study, the researchers say. But they note that fully eliminating bias is unlikely, so it should be acknowledged when interpreting the model output. Additionally, for some measures, it may turn out that more than one model is necessary.
“There’s going to be a point where you just need different models for different groups of people,” said Todd Constable, professor of radiology and biomedical imaging at Yale School of Medicine and senior author of the study. “One model is not going to fit everybody.”
Reference: “Brain–phenotype models fail for individuals who defy sample stereotypes” by Abigail S. Greene, Xilin Shen, Stephanie Noble, Corey Horien, C. Alice Hahn, Jagriti Arora, Fuyuze Tokoglu, Marisa N. Spann, Carmen I. Carrión, Daniel S. Barron, Gerard Sanacora, Vinod H. Srihari, Scott W. Woods, Dustin Scheinost, and R. Todd Constable, 24 August 2022, Nature.