A data-driven approach to match organisms and research problems

Prachee Avasthi; Ryan York; Audrey Bell; Megan L. Hochstrasser

doi:10.57844/arcadia-48b0-607a

Idea Feedback requested Genetics: Decoding evolutionary drivers across biology

Published on Dec 14, 2024 by Arcadia Science

A data-driven approach to match organisms and research problems

What if we could select research organisms that are far more relevant to human biology or more likely to unearth biological solutions not found in humans? With more sequence data, structural prediction, and phylogenetic comparative methods, a richer framework is possible.

A data-driven approach to match organisms and research problems

Purpose

It’s critical to select the ideal organismal model to use for studying a human disease or biological process faster, cheaper, and easier than can be explored in humans. Scientists often select organisms based on historical precedent, ease of use in the lab, and similarity of genes or phenotypes. While this approach has resulted in many important advancements and certainly has its merits, relying on intuition, convention, and prior studies to select model organisms isn’t always optimal for understanding the complexities of human biology, particularly in the context of therapeutic development. Discovery research and preclinical testing in animal models often fail to translate to the clinic [1] and don't take the evolutionary history of mice and humans into account [2].

GIF of search process for organisms to study where traditional model organisms are illuminated "under the lamppost" but many other, potentially more useful organisms are just out of view, illuminated only through more deliberate search. — We tend to rely on a single set of model organisms, looking “under the lamppost” at the biology we know. What if we shone a light across the whole tree of life? Could we find better models?

In this pub, we describe a new framework for thinking about organismal model selection that leverages the vastness of biology, including and beyond traditional model systems. This approach has the potential to accelerate the pace of biological discovery by highlighting valuable organisms that have been historically overlooked and understudied but have outsized biological relevance to humans.

This pub is meant for a scientific audience and we’d love feedback. Would our organismal selection framework change how you’d select which organism you’d use to solve your research problem of interest? Would you use these tools to identify new research directions based on where your organismal expertise is best leveraged?

This pub is part of the platform effort, “Genetics: Decoding evolutionary drivers across biology.” Visit the platform narrative for more background and context.
Read our companion pub, “Leveraging evolution to identify novel organismal models of human biology” [3], for more details on the science underlying our organismal selection pipeline.
For an example of this approach in action, check out “Rescuing Chlamydomonas motility in mutants modeling spermatogenic failure” [4].

Share your thoughts!

Feel free to provide feedback by commenting in the box at the bottom of this page or by posting about this work on social media. Please make all feedback public so other readers can benefit from the discussion.

Traditional organism selection

There are many reasons why traditional model organism selection is suboptimal when pursuing biological conservation, the context most relevant to humans. Traditionally, this is done by comparing gene or protein sequences between the organism and humans and considering whether the two share relevant phenotypes. Historically, identifying the right system with conserved biology has required deep knowledge of individual organisms and the contribution of an entire field to unearth the dimensions of shared biological context.

Leveraging intuition about commonalities to unearth shared principles
Imagine we wanted to study the movement of mucus in our airway in respiratory diseases, movement of cerebrospinal fluid in the brain in developmental disorders, or movement of eggs to the uterus in female infertility. Cilia, the finger-like protrusions in cells lining the trachea, brain ventricles, and fallopian tubes responsible for this movement are nearly identical to the flagellar structures that single-celled green algae use to swim. Both the individual proteins and the coordinated processes needed to generate force from these protrusions are conserved in algae and provide a low-cost, simple, and less invasive way to study these mechanisms to improve a range of complex diseases.

Rather than relying on intuition or luck, we wondered if it was possible to more systematically identify properties of organisms across the tree of life that might be redeployed or re-engineered to develop human therapeutics and other useful innovations. Not only might we be able to accelerate the work many organismal biologists have contributed to mechanistic understanding, but we may also be able to improve the accuracy of organismal selection for downstream application.

For example, around 90% of drugs that progress from preclinical testing in organismal models (95% is done in rodents) to clinical trials in humans fail. This failure rate suggests that many researchers are using convention or historical precedent and not fully leveraging available data to optimize the organism they select for their research questions. We asked whether we could use a more rigorous data-driven framework for discovery research to increase the accuracy of insights with respect to human relevance.

Rationally sourcing biological conservation

Beyond proteins and prior mechanistic studies, we’ve never been in a better position to leverage even more data. We can use protein structural properties inferred from amino acid sequence and take into account evolutionary history to do comparisons between species [3]. Sometimes we find that our intuition about model systems was spot-on, but we can be much more confident in our choices and reach conclusions quicker.

Leveraging data to speed up model selection
Spermatogenic failure is a severe form of male infertility with certain subtypes attributable to mutations in the SPEF2 and DNALI1 genes.
Using the data from our organism selection pipeline, we landed on the green alga Chlamydomonas reinhardtii as an appropriate model for spermatogenic failure. We identified motility defects in algal cells with mutations in the appropriate genes as well as rescue motility [4] with compounds previously found to increase sperm motility [5].

While scientists have long been using Chlamydomonas to understand sperm motility due to structural similarities between Chlamydomonas and sperm flagella [6][7][8] due to high-resolution electron microscopy, we were able to use our framework to identify an appropriate model and validate its relevance to human biology quickly, cheaply, and with high confidence using little additional context. In this case, the pipeline led us to an existing model, but we got there through an unbiased selection process.

The power of this data-driven approach is more readily appreciated when the results of our analyses lead to unintuitive results, identifying organisms with non-obvious similarities to human biology.

Leveraging data to find unexpected models with human relevance
Imagine we want to develop a treatment for spinal muscular atrophy (SMA), a neuromuscular disease caused by mutations in SMN1, a protein involved in RNA processing that’s critical for motor neuron function and survival. Let’s say we’re trying to decide which research organism to use upstream of pharmacology and toxicity assessments to unearth relevant biological assays and mechanisms of action for therapeutic assets.
If we use standard model organism selection, we’d likely start by considering a mouse model or another well-established organism. In other words, when studying a neuromuscular disease, you might assume that the right organism to study this in has neurons and muscles. However, our analysis based on multiple physical and chemical protein properties beyond primary sequence suggests that unicellular Sphaeroforma arctica and Chlorella vulgaris have a more conserved biological context relative to other species and are well-suited to tackle SMA.

Well-established models like mice aren’t just expensive to maintain — they also don’t necessarily recapitulate the specifics of the human disease. The conservation of relevant properties in a much simpler system may signal that the etiology of the disease is in a more ancient and conserved biological process that makes muscles and nerves particularly vulnerable. And that more complex tissue-level phenotypes may be a consequence rather than a cause of the disease.

Our strategy lets us rationally and agnostically consider less-studied organisms that may be more biologically relevant to the disease or trait in question.

A call for change

We’ve developed an approach that allows scientists to rationally identify research organisms for modeling human traits by incorporating genomic data, protein structure, and other biological contexts [3]. Knowing that not all researchers can dynamically spin up new infrastructure for every new research organism they land on, the other major utility of our framework is that for a fly or fish or worm lab, we can help agnostically identify the focus areas where these species are most relevant and can make the most headway. We hope this data-driven approach will increase our ability to leverage the full diversity of the natural world for scientific discovery.

Weigh in!

Would you use our workflow to identify an appropriate research organism, a biological area the model you have expertise in can best tackle, or use these data to support your choices when seeking funds, in publications, or for drug development? This platform relies on access to high-quality, annotated genomes across a wide range of organisms. What species for which you already have expertise or tools would you like to be integrated into our platform?

Share your thoughts!

Provide feedback

Pub details

Content 4 contributors

8 references

Activity 34 discussions

5 social posts

This work is licensed under CC BY 4.0

Purpose
Traditional organism selection
Rationally sourcing biological conservation
A call for change
Weigh in!

Prachee Avasthi

Conceptualization, Supervision, Writing

Audrey Bell

Visualization

Megan L. Hochstrasser

Editing

Ryan York

Conceptualization

Sierra on Aug 30, 2025

Zoogle is such a great name! This framework has strong potential to correct biases in organism selection, but its success seems to depend heavily on the quality and coverage of comparative genomic data. Right now, genomic resources are still biased toward popular species and agricultural/medical models. Without deliberate investment in sequencing diverse lineages, could this approach risk reinforcing existing gaps rather than filling them? And, along those lines, would integration with phenotypic databases and functional assays be critical to ensure predictions translate into actionable biological models?

Prachee Avasthi on Sep 02, 2025

Thanks for your comment. The organisms included in the dataset are pre-selected for ability to work with them in the lab and genetic manipulation so that any predictions can be validated and utilized readily. So the goal isn't specially to fill in the gaps as broadly as possible such that we can build a full stack from genomes to phenotyping approaches for as many species as possible. Rather, we hope to understand where a much broader range of species punch way above their weight (and do better than rodents) with respect to human-relevant biological context. Of course, as you note, there may be simpler uncharacterized/unsequenced species that may have advantages that none in our 65 species database can model particularly well, and that can inform a broader range of species we'd include on the basis of what would cover more of disease space, but we'll likely do that once we have a stronger evidence base on the utility of the approach. Another way to put it is that "gaps" must be pretty explicitly defined not just as gaps on the tree but gaps with respect to disease modeling ability and our current dataset is quite diverse covering about 2B years of evolution. Hopefully as we/others test out our organismal selection pipeline and as more causal disease genes get identified, we can continue to iterate on what species makes sense to include!

Santosh K on May 25, 2025

I'm approaching these publications with a reverse aspiration: identifying research problems that are biologically tractable and best addressed by my familiar organismal models. It's truly impressive that the authors developed Zoogle, a tool designed to pinpoint relevant biological questions for which specific organisms serve as optimal models.
"For example, around 90% of drugs that progress from preclinical testing in organismal models (95% is done in rodents) to clinical trials in humans fail." While preclinical model fidelity is crucial, attributing the vast majority of drug candidate failures in clinical trials solely to poor in vitro or in vivo translation oversimplifies a complex process. Clinical trials frequently fail due to myriad factors beyond initial translational shortcomings, like deeply flawed study designs (e.g., inadequate sample size, poor patient selection, suboptimal dosing). Operational challenges like patient recruitment, data management, and funding also contribute significantly to high failure rate.

Prachee Avasthi on May 25, 2025

Yep certainly there are multiple other reasons as well contributing to failure to translate. We felt we were in a position to tackle this one most readily given we can use our comparative analysis tools to analyze multiple dimensions of molecular conservation. Indeed, we also assumed many have infrastructure and expertise in a specific model and are looking for how it can be best leveraged. Would love to know if you end up pursuing genes from Zoogle for your model of interest and how it works out!

Gaia Andreoletti on Mar 03, 2025

Selection

standing the complexities of human biology, particularly in the context of therapeutic development. Discovery research and preclinical testing in animal models often fail to translate to the clinic and don't take the evolutionary history of mice and humans into account .In this pub, we describe a new framework for thinking about organismal model selection that lever

Absolutely, well said.

Sonam Popli on Feb 19, 2025

Excellent direction! the overall goal should be to reach maximal clinical mimicry, which may be achieved by an in vitro model, the appropriate animal model, or a combination.

Prachee Avasthi on Feb 19, 2025

Yes absolutely! We think a portfolio or repertoire of relevant biological context will be important and that improved whole organism or tissue level insights may dramatically improve our understanding of the biology that can help us catch what may be missed in in vitro assays alone.

Ann Wells on Feb 21, 2025

I love this idea and completely agree with the premise! I am curious though, if the ultimate goal is adoptability by scientists globally then how do we shift the way science is currently funded? By which I mean, how do we de-emphasize the need for researchers to not only be an expert in their field but also their organism of choice? Or conversely how do we de-emphasize the need for topic expertise to maximize a researcher’s expertise in an organism? Additionally, how can the creators of this algorithm be sure it is unbiased and agnostic? After all, we are always building on knowledge that exists and that will always somewhat bias us towards the lampost. I personally always strive to perform my work in an unbiased and agnostic way but ultimately I have to make decisions, whether its at the experimental design, statistical, or the interpretation level and I always wonder what biases I have introduced whether implicit or explicit.

Prachee Avasthi on Feb 21, 2025

One of the reasons we made the web tool based on this (Zoogle.arcadiascience.com) searchable by both gene and organism is so that those who are unable to expand their research organism infrastructure can still get help making data-driven decisions for where they can make more human-relevant headway. We of course still only have a drop in the ocean of data to tap compared to the vastness of life but we have genome data far beyond other information available so hope that this expansion across >1.5B years of evolution can improve our predictive capacity in humans. There is always more work to be done but we hope this is a good start and many can try it out/benefit!

Morgan Connolly on Mar 02, 2025

This is a great step toward diversifying the models available to researchers and hopefully providing pre-clinical data that better reflects clinical outcomes!

It’s also a great example of the classic paradox in protein science that proteins with divergent sequences can assume similar folding patterns and functions, so viewing protein function only through the scope of sequence similarity can leave promising but sequence divergent candidates out of sample sets. This seems to be improving now that computational protein structure prediction has become more accessible and enables comparisons of structure and not just sequence.

I wonder if in the model organism selection process any particular interest is paid to the spatial arrangement of amino acids known to be key to function or drug binding in the human system (if known)? This might help in selecting model proteins that correlate well with results on the human homolog. Although that still creates a reliance on prior knowledge that might be counter to some of the motives here. I’d love to know how you prioritize developing pipelines that are robust across many different protein classes and thus don’t require deep pre-existing knowledge of the enzyme vs. leveraging information that is already available for many drug targets?

Prachee Avasthi on Mar 02, 2025

You’ve totally nailed it. Definitely it makes sense for functional domains etc. to be used but, as you point out, that can be limited by existing knowledge or prevent us from leveraging or predicting unknown functions if too heavily dependent on priors. One of the things we can do and are prioritizing is to use our other prediction pipelines like inferring gene networks (that rely on co-occurrence and other agnostic info) instead of just what is known.

Gaia Andreoletti on Mar 03, 2025

Selection

g a human disease or biological process faster, cheaper, and easier than can be explored in humans. Scientists often select organisms based on historical precedent, ease of use in the lab, and similarity of genes or phenotypes. While this approach has resulted in many important advancements and certainly has its merits, relyi

In most cases, scientists use organisms that have been specifically selected for their relevance to the disease being studied. However, there are instances where researchers may discover an organism with a phenotype similar to the disease of interest by random chance. For example, the dog model used for studying dystrophy disease, which is caused by the XLMTM gene, is a rare case of this phenomenon. The dog with the XLMTM genetic mutation was found in a family of a child with that disease. What are the odds!!

Prachee Avasthi on Mar 03, 2025

absolutely! there’s a lot of interesting and relevant models that have been found and in some instances our data recapitulate those selections. Hopefully this approach yields more confidence in those as well as helps find more relevant models for others where a good model-match has not yet been found

Gaia Andreoletti on Mar 03, 2025

Selection

is based on multiple physical and chemical protein properties beyond primary sequence suggests that unicellular Sphaeroforma arctica and Chlorella vulgaris have a more conserved biological context relative to other species and are well-suited to tackle SMA. Well-established models like mice aren’t just expensive to maintain — they also don’t necessarily

This is very interesting. Has this finding bee validated in the lab?

Prachee Avasthi on Mar 03, 2025

We haven’t done so for these instances, but welcome the communities interested in this disease to give it a try!

Gaia Andreoletti on Mar 03, 2025

I read this short article with great interest. Coming from the industry, I believe the proposed workflow could be highly valuable for identifying suitable research organisms, aligning biological areas with model expertise, and supporting funding applications, publications, or drug development. However, the choice of species to integrate into the platform should consider both the limitations of current animal models and the potential of alternative organisms.

As previously mentioned, animal models, such as mice, often fail to accurately recapitulate human disease phenotypes. For instance, in the case of Duchenne Muscular Dystrophy (DMD), mouse models exhibit milder disease severity compared to rat models, which present a phenotype closer to human pathology. Many animal models do not capture the complexity of human diseases due to differences in genetics, physiology, or disease etiology. For example, neurodegenerative diseases like Alzheimer’s disease are poorly modeled in mice because of distinct amyloid and tau pathologies. Additionally, animal testing often does not accurately predict human toxicity, leading to delays in drug development and the loss of potentially beneficial treatments. I believe that including organisms like C. elegans and Drosophila melanogaster can offer simpler systems that reveal conserved molecular mechanisms. These models are particularly useful for studying fundamental processes such as gene regulation, signaling pathways, and cellular responses (e.g., Drosophila shares 60% of its genes with humans and is widely used in neuropharmacology).

I believe the industry would benefit from incorporating underutilized organisms like C. elegans, zebrafish, and Drosophila, which can provide mechanistic insights into conserved processes. However, due to the structure and requirements of the FDA, such analyses might be perceived as superfluous. Integrating diverse model organisms into the platform will enhance its utility by addressing the gaps in traditional animal models while leveraging the strengths of simpler systems for mechanistic studies and complex species for translational applications. The next step, however, would be determining how to incorporate these findings into the documentation required for an Investigational New Drug (IND) approval.

Prachee Avasthi on Mar 03, 2025

Thanks for your comments! We also hope that more can take advantage of leveraging a repertoire of relevant models for investigating mechanism of action or target specificity for lead optimization etc.

Anahita Daruwalla on Dec 19, 2024

Selection

This platform relies on access to high-quality, annotated genomes across a wide range of organisms. What species for which you already have expertise or tools would you like to be integrated into our platform?

I identify with this thought process to scan the entire tree of life for model organisms to understand human biology. For example, we were looking at Archaea to understand questions related to a protein involved in human vision. We were pleasantly surprised to see that the archaeal protein was phylogenetically related to the human protein, and the two had conserved structural features. In fact, given that Archaea do share some common ancestry to eukaryotes, it may be a good research organism to look at.

Prachee Avasthi on Dec 19, 2024

That’s exciting and a great example of counterintuitive molecular conservation given the complexity of the retina — what protein?

Mark Wogulis on Jan 08, 2025

Very interesting idea, and independently identifying Chlamydomonas reinhardtii as a model for sperm motility is very encouraging.

Of course, researchers can use multiple models. I once worked on in vitro models of neurodegeneration. We would using cell lines for high throughput screening and primary cultures (different species) for validation, but this ranking was purely theoretical. It would have been great to have a model like this to help in decision making.

Many drugs that fail in trials fail before efficacy testing even begins. One failure point is drug metabolism. Perhaps this model could identify organisms that would better predict human metabolism of new drugs.

It would be great if some pharma company would go back to their failed and successful trials to see if this approach would have helped better select candidate drugs.

Prachee Avasthi on Jan 08, 2025

That would be awesome! Also I hope it expands the range of what’s possible in early stage drug discovery for novel target selection, narrowing mechanism of action, assay development etc. so that the range of effective therapeutics can increase from improved powerful model selection. There is some reluctance from deviating too much from the models that are traditionally accepted so it would be fantastic if many in the scientific community are able to utilize this framework for their own benefit so the tolerance for a broader range of organisms increases and there can be an increased success rate for trials/a larger impact for patients.

Beverly Setzer on Jan 15, 2025

This paper presents a really exciting framework for organismal model selection that could address some major gaps in how we study human biology. As someone with a background in neuroscience and experience using computational tools to connect data to real-world questions, I especially appreciated the integration of evolutionary history and protein structure to identify new models. That said, I’d love to see more on how this approach could tackle the complexity of brain disorders, where conserved mechanisms aren’t always obvious. It would also be helpful to provide tools or resources that make it easier for researchers to adopt these non-traditional models—maybe something like open-access pipelines tailored to specific fields. Overall, this is a really promising approach that could expand how we think about connecting basic science to human health!

Prachee Avasthi on Jan 19, 2025

Absolutely, for these more complex disorders, we have some parallel efforts for genotype-phenotype mapping that takes into account non-linear interactions and are building gene network analyses into our workflows as well. We also have plans to help others more easily leverage our tools so stay tuned for more and thanks for the interest!

Beverly Setzer on Jan 21, 2025

That sounds great! I’m interested to see where the phenotype mapping takes you. The openness and utility of your science is super inspiring!

Brandon Aho on Jan 20, 2025

I find this to be a very exciting model for organism selection. I’m curious on how this idea may be pushed even further. For a lot of fluorescent microscopy work, a cell type may be chosen based on the effectiveness a certain transfection reagent has on it. It would be wonderful to see this model used to select not only the most relevant organisms but also take into account ease of use. Then we can see this approach appeal to an even broader audience by finding models which are both relevant physiologically and have a proven track record if needed.

Prachee Avasthi on Jan 20, 2025

Yes absolutely! The original list of organisms included does take into account availability of genetic tools but we also would like to expand this both by including additional organisms based on tractability along with broadening our species-agnostic toolbox to interrogate the biology of a larger range of organisms.

Luis Goicouria on Jan 23, 2025

I believe that the consistent use of more established disease models has ended up being a double edged sword—the research infrastructure is much sturdier and more well established but the ‘translatibility’ of the findings to human conditions is, at times, tenuous. Using noveltree (very useful graphic in the README by the way) to leverage information from phylogeny and protein structure conservation to find better model organisms for disorders with etiologies attributable to known protein dysfunctions is a great approach. I have a lot of questions and thoughts, but these three sum up the themes of most of them.
\

There are a lot of different approaches that scientists are using to find new model organisms or systems to more accurately recapitulate human disease etiology: village-seq to generate more genetically heterogenous cell cultures, organoids to create more diverse tissue systems in-vitro, and FaunaBio’s use of Convergence AI to find better targets that they can use to partner with external labs using rare animal models. What these all have in common is that they assume the need for multicellular model systems to recapitulate what are largely considered diseases of tissue or inter-cellular functioning. Given the sentence about tissue-based disorders potentially having etiologies related to protein dysfunctions ‘model-able’ in single-cell organisms, and the application of this approach to sperm dysfunction, is it fair to assume that your approach works best in finding single-cell model systems? If so, how do we assess the success of a rescue experiment in a single-cell organism that’s modeling, say, epilepsy? I specialized in modeling epilepsy in mice and, while fully acknowledging that mice are a fraught model system, modeling epilepsy in a single cell organism seems especially difficult despite the fact that several epilepsies are monogenetic disorders.
Is it fair to assume that it is easier to translate the techniques for mono-cellular organisms across systems than it would be to do the same for, say, mammals? Fauna Bio uses the thirteen-lined ground squirrel using a similar philosophy to yours—establish a genetic database (Zoonomia) and analyze it to find an organism with known genetic implementation for (in this case) hibernation and leverage an established lab’s infrastructure to perform the research. As a given however, a lot of the technologies used to assess phenotypes likely don’t purely apply to a unique mammalian model (not to mention the legal and logistical complications). Do the same limitations/growing pains apply to using a brand new single-cell model organism, or is that headache mitigated by the organisms’ relative simplicity? (Give how fickle different tissue types can be to slightly different growing conditions for established in-vitro systems, I can imagine it may be difficult to even keep a new organism alive)
Not the most accurate dichotomy, I know, but on one hand there are conditions attributable to known dysfunctions in single proteins and on the other hand (likely) polygenetic disorders with incompletely understood etiologies. I understand that gene therapy is in a weird place now broadly, but is there utility in generating a much better model system for a monogenetic disorder that can (emphasis on) theoretically be cured using gene therapy? For polygenetic disorders, how can we leverage a system like yours to discover a better model system for a disease that we don’t know the etiology of yet?

I am grateful that Arcadia is exploring the space of establishing and applying new model organisms to discovering therapeutic targets that are more likely to translate to something useful in treating conditions in people. I strongly believe that we have been using the same drugs to treat psychiatric disorders for decades because we’ve been using the same model systems for decades and expecting different results. I look forward to reading your cutting-edge research using unique and valuable model systems.

Prachee Avasthi on Jan 23, 2025

Thank you for your thoughtful comments! These are great questions. I will address in turn.

I’d say this approach is neither better nor worse on the whole in single-celled organisms given that currently the approach is currently largely based on protein properties — though there may be other considerations for modeling given variable gene copy number across species as well. I highlight here the organisms with lesser complexity to make the point that cell or molecular level dysfunction may be the underlying etiology of more complex tissue or organ system level phenotypes. However, one can think of this approach as not just useful for a single organism but a repertoire or portfolio of organisms best suited to investigate human-relevant phenotypes. So one could choose both a more molecularly relevant single-celled organism as well as a multicellular or mammalian model better suited for modeling complex phenotypes.
The species included in our starting set were selected for some evidence of genetic tractability and we have put some thought into developing an “engineering score” that takes into account some criteria for growth and experimental interrogation. It could certainly be that growth conflations and experimental techniques for interrogating unicellular organisms may be more species agnostic or broadly applicable than more complex organisms or that in come cases some species are more finicky in their maintenance. As you suggest, there will always be some lift/growing-pains in optimizing conditions for any new species brought into the lab.
We are also working on parallel efforts to help us better identify the causal basis of complex traits in polygenic disorders through taking into account non-linear interactions in genotype-phenotype mapping. We’ve found that high dimensional phenotypes can facilitate these efforts and increase predictive capacity in our models (https://research.arcadiascience.com/pub/result-nonlinear-phenotypes) but much more on this soon!

Luis Goicouria on Jan 23, 2025

Thank you for taking the time to provide thoughtful responses to each one of my questions. I can imagine that the generation of an engineering score seems like a difficult but necessary step—how does one account for the pragmatics of using a new model system without compromising a truly agnostic approach? I personally don’t believe it to be all that problematic, but in many ways taken too far it threatens to lean into the motivators that give us our current overused model systems, motivators which seem largely based on the pragmatics of using an established model. I’m very interested in your work in non-linear genotype-phenotype matching, as stagnation in treatments for polygenic disorders suggest deficiencies not only model systems, but in the paradigm used to understand the etiology more broadly. I look forward to keeping up with Arcadia’s developments!

Prachee Avasthi on Jan 23, 2025

Ah great point about the tradeoff! It’s also why we’re trying to take an organism agnostic approach to experimental techniques as well, for example, Raman spectroscopy:

https://research.arcadiascience.com/pub/result-easy-raman-spectroscopy/release/2

https://research.arcadiascience.com/pub/result-raman-taxonomy/release/2

Spurthy Skandan on Feb 07, 2025

This is a really exciting approach to bridging the gap between biological research and model organism selection. A data-driven framework for matching organisms to research problems has the potential to streamline discovery and optimize experimental design in ways that traditional methods might overlook. Looking forward to seeing how this influences research across different fields and leads to more effective and innovative scientific insights.

Prachee Avasthi on Feb 10, 2025

Thanks for the note! We’ll be continuing to develop this further and expand how others are able to leverage it!

Abrahim El Gamal on Feb 13, 2025

Interesting work! The 90% failure rate of drugs from rodent studies to human trials underscores the need to rethink default approaches to preclinical discovery. While the model organism may play a role, the in vitro model used to generate the lead is at least as important. Would be fascinating to retrospectively analyze successful vs failed drug trials through this framework to see what trends emerge in relation to in vitro model, in vivo efficacy, and indication/target.

Prachee Avasthi on Feb 19, 2025

Yes definitely! This retrospective data has immense value for a different kind of validation of our approach. In vitro data from human cells is intended to be more biologically relevant and high throughput but hopefully with low cost whole organism and tissue level insights, we can make more headway with mechanistic discovery that would be missed in individual human cell lineages (and immortalized/abnormal ones at that). While additional dimensions of data would need to be incorporated for improved in vitro cell type selection, we are applying our species-agnostic high dimensional phenotyping efforts to in vitro assay development as well.

Contributors (A-Z)

Purpose

Share your thoughts!

Traditional organism selection

Rationally sourcing biological conservation

A call for change

Weigh in!

References

Share your thoughts!

Provide feedback

Pub details

Table of contents