Skip to main content
SearchLoginLogin or Signup

A data-driven approach to match organisms and research problems

What if we could select research organisms that are far more relevant to human biology or more likely to unearth biological solutions not found in humans? With more sequence data, structural prediction, and phylogenetic comparative methods, a richer framework is possible. 
Published onDec 14, 2024
A data-driven approach to match organisms and research problems
·

Purpose

It’s critical to select the ideal organismal model to use for studying a human disease or biological process faster, cheaper, and easier than can be explored in humans. Scientists often select organisms based on historical precedent, ease of use in the lab, and similarity of genes or phenotypes. While this approach has resulted in many important advancements and certainly has its merits, relying on intuition, convention, and prior studies to select model organisms isn’t always optimal for understanding the complexities of human biology, particularly in the context of therapeutic development. Discovery research and preclinical testing in animal models often fail to translate to the clinic [1] and don't take the evolutionary history of mice and humans into account [2].

GIF of search process for organisms to study where traditional model organisms are illuminated "under the lamppost" but many other, potentially more useful organisms are just out of view, illuminated only through more deliberate search.

We tend to rely on a single set of model organisms, looking “under the lamppost” at the biology we know. What if we shone a light across the whole tree of life? Could we find better models?

In this pub, we describe a new framework for thinking about organismal model selection that leverages the vastness of biology, including and beyond traditional model systems. This approach has the potential to accelerate the pace of biological discovery by highlighting valuable organisms that have been historically overlooked and understudied but have outsized biological relevance to humans.

This pub is meant for a scientific audience and we’d love feedback. Would our organismal selection framework change how you’d select which organism you’d use to solve your research problem of interest? Would you use these tools to identify new research directions based on where your organismal expertise is best leveraged? 

Share your thoughts!

Watch a video tutorial on making a PubPub account and commenting. Feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or post about this work on social media. Please make all feedback public so other readers can benefit from the discussion. 

Traditional organism selection

There are many reasons why traditional model organism selection is suboptimal when pursuing biological conservation, the context most relevant to humans. Traditionally, this is done by comparing gene or protein sequences between the organism and humans and considering whether the two share relevant phenotypes. Historically, identifying the right system with conserved biology has required deep knowledge of individual organisms and the contribution of an entire field to unearth the dimensions of shared biological context.

Illustration of cilia side-by-side with Chlamydomonas cells with the text, “Modeling human cilia with Chlamydomonas flagella”

Leveraging intuition about commonalities to unearth shared principles

Imagine we wanted to study the movement of mucus in our airway in respiratory diseases, movement of cerebrospinal fluid in the brain in developmental disorders, or movement of eggs to the uterus in female infertility. Cilia, the finger-like protrusions in cells lining the trachea, brain ventricles, and fallopian tubes responsible for this movement are nearly identical to the flagellar structures that single-celled green algae use to swim. Both the individual proteins and the coordinated processes needed to generate force from these protrusions are conserved in algae and provide a low-cost, simple, and less invasive way to study these mechanisms to improve a range of complex diseases.

Rather than relying on intuition or luck, we wondered if it was possible to more systematically identify properties of organisms across the tree of life that might be redeployed or re-engineered to develop human therapeutics and other useful innovations. Not only might we be able to accelerate the work many organismal biologists have contributed to mechanistic understanding, but we may also be able to improve the accuracy of organismal selection for downstream application. 

For example, around 90% of drugs that progress from preclinical testing in organismal models (95% is done in rodents) to clinical trials in humans fail. This failure rate suggests that many researchers are using convention or historical precedent and not fully leveraging available data to optimize the organism they select for their research questions. We asked whether we could use a more rigorous data-driven framework for discovery research to increase the accuracy of insights with respect to human relevance.

Rationally sourcing biological conservation

Beyond proteins and prior mechanistic studies, we’ve never been in a better position to leverage even more data. We can use protein structural properties inferred from amino acid sequence and take into account evolutionary history to do comparisons between species [3]. Sometimes we find that our intuition about model systems was spot-on, but we can be much more confident in our choices and reach conclusions quicker.

Illustration of sperm side-by-side with a Chlamydomonas cell and mouse showing that the algae is a closer match to humans than the mouse with the text, “Modeling human sperm motility with Chlamydomonas”

Leveraging data to speed up model selection

Spermatogenic failure is a severe form of male infertility with certain subtypes attributable to mutations in the SPEF2 and DNALI1 genes.

Using the data from our organism selection pipeline, we landed on the green alga Chlamydomonas reinhardtii as an appropriate model for spermatogenic failure. We identified motility defects in algal cells with mutations in the appropriate genes as well as rescue motility [4] with compounds previously found to increase sperm motility [5].

While scientists have long been using Chlamydomonas to understand sperm motility due to structural similarities between Chlamydomonas and sperm flagella [6][7][8] due to high-resolution electron microscopy, we were able to use our framework to identify an appropriate model and validate its relevance to human biology quickly, cheaply, and with high confidence using little additional context. In this case, the pipeline led us to an existing model, but we got there through an unbiased selection process.

The power of this data-driven approach is more readily appreciated when the results of our analyses lead to unintuitive results, identifying organisms with non-obvious similarities to human biology.

Illustration of a neuron side-by-side with single-celled organisms with RNA in the background on both sides with the text, “Modeling human neuronal mRNA processing in unicellular organisms”

Leveraging data to find unexpected models with human relevance

Imagine we want to develop a treatment for spinal muscular atrophy (SMA), a neuromuscular disease caused by mutations in SMN1, a protein involved in RNA processing that’s critical for motor neuron function and survival. Let’s say we’re trying to decide which research organism to use upstream of pharmacology and toxicity assessments to unearth relevant biological assays and mechanisms of action for therapeutic assets.

If we use standard model organism selection, we’d likely start by considering a mouse model or another well-established organism. In other words, when studying a neuromuscular disease, you might assume that the right organism to study this in has neurons and muscles. However, our analysis based on multiple physical and chemical protein properties beyond primary sequence suggests that unicellular Sphaeroforma arctica and Chlorella vulgaris have a more conserved biological context relative to other species and are well-suited to tackle SMA. 

Well-established models like mice aren’t just expensive to maintain — they also don’t necessarily recapitulate the specifics of the human disease. The conservation of relevant properties in a much simpler system may signal that the etiology of the disease is in a more ancient and conserved biological process that makes muscles and nerves particularly vulnerable. And that more complex tissue-level phenotypes may be a consequence rather than a cause of the disease.

Our strategy lets us rationally and agnostically consider less-studied organisms that may be more biologically relevant to the disease or trait in question.

A call for change

We’ve developed an approach that allows scientists to rationally identify research organisms for modeling human traits by incorporating genomic data, protein structure, and other biological contexts [3]. Knowing that not all researchers can dynamically spin up new infrastructure for every new research organism they land on, the other major utility of our framework is that for a fly or fish or worm lab, we can help agnostically identify the focus areas where these species are most relevant and can make the most headway. We hope this data-driven approach will increase our ability to leverage the full diversity of the natural world for scientific discovery.

Weigh in!

Would you use our workflow to identify an appropriate research organism, a biological area the model you have expertise in can best tackle, or use these data to support your choices when seeking funds, in publications, or for drug development? This platform relies on access to high-quality, annotated genomes across a wide range of organisms. What species for which you already have expertise or tools would you like to be integrated into our platform?


Share your thoughts!

Watch a video tutorial on making a PubPub account and commenting. Feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or post about this work on social media. Please make all feedback public so other readers can benefit from the discussion. 


Contributors
(A–Z)
Conceptualization, Supervision, Writing
Visualization
Conceptualization
Comments
3
Beverly Setzer:

This paper presents a really exciting framework for organismal model selection that could address some major gaps in how we study human biology. As someone with a background in neuroscience and experience using computational tools to connect data to real-world questions, I especially appreciated the integration of evolutionary history and protein structure to identify new models. That said, I’d love to see more on how this approach could tackle the complexity of brain disorders, where conserved mechanisms aren’t always obvious. It would also be helpful to provide tools or resources that make it easier for researchers to adopt these non-traditional models—maybe something like open-access pipelines tailored to specific fields. Overall, this is a really promising approach that could expand how we think about connecting basic science to human health!

?
Mark Wogulis:

Very interesting idea, and independently identifying Chlamydomonas reinhardtii as a model for sperm motility is very encouraging.

Of course, researchers can use multiple models. I once worked on in vitro models of neurodegeneration. We would using cell lines for high throughput screening and primary cultures (different species) for validation, but this ranking was purely theoretical. It would have been great to have a model like this to help in decision making.

Many drugs that fail in trials fail before efficacy testing even begins. One failure point is drug metabolism. Perhaps this model could identify organisms that would better predict human metabolism of new drugs.

It would be great if some pharma company would go back to their failed and successful trials to see if this approach would have helped better select candidate drugs.

Prachee Avasthi:

That would be awesome! Also I hope it expands the range of what’s possible in early stage drug discovery for novel target selection, narrowing mechanism of action, assay development etc. so that the range of effective therapeutics can increase from improved powerful model selection. There is some reluctance from deviating too much from the models that are traditionally accepted so it would be fantastic if many in the scientific community are able to utilize this framework for their own benefit so the tolerance for a broader range of organisms increases and there can be an increased success rate for trials/a larger impact for patients.

Prachee Avasthi:

That’s exciting and a great example of counterintuitive molecular conservation given the complexity of the retina — what protein?

?
Anahita Daruwalla:

I identify with this thought process to scan the entire tree of life for model organisms to understand human biology. For example, we were looking at Archaea to understand questions related to a protein involved in human vision. We were pleasantly surprised to see that the archaeal protein was phylogenetically related to the human protein, and the two had conserved structural features. In fact, given that Archaea do share some common ancestry to eukaryotes, it may be a good research organism to look at.