Genetics | Arcadia Science Publications

Arcadia aims to identify biology’s greatest innovations and decode the principles that generate them. To do this, we need to embrace exploration across the tree of life. However, for this exploration to happen at all, we need to identify phenomena that are broadly shared by living things. Luckily, biology is built upon a single unifying feature, one that we are now able to compare and dissect at unprecedented scale: genomes.

An illustration of various organisms connected by a DNA strand.

Across biology, genomes perform a set of universal tasks: transmitting information between generations, generating phenotypes, and orchestrating functions over a lifetime. This is as true for viruses — which employ admirably minimal genetic toolkits — as it is for the rare flower Paris japonica, whose 149 billion base pairs outstrip the human genome 50 times over.

Our goal is to use this universality to accelerate exploration at the scale of the tree of life: from comparisons of kingdoms that diverged billions of years ago to rapidly evolving populations that we generated last week. Through these efforts, we are developing a variety of tools to help us identify hotspots of biological novelty, make strong hypotheses about their causal basis, and ultimately predict traits in the previously inaccessible reaches of the tree of life.

Big-picture goals

Goal #1

Identify transformative biological innovations, agnostically.

Why: We’ll never find the next transformative biological insight by looking under the lamppost of what we already know. Much of the tree of life awaits exploration, and it is through the holistic lens of phylogenetics that we can begin to do so efficiently, scalably, and agnostically.

How: We are leveraging publicly available genomic data to generate what we are calling “phylogenetic discovery platforms.” These platforms combine phylogenetic trees with a variety of other data (e.g., protein structure, environment, organismal phenotypes) to map key evolutionary innovations. We are generating platforms for a variety of taxa in order to generate novel hypotheses, identify undiscovered phenomena, and select organisms for study at Arcadia.

Goal #2

Rapidly dissect phenotypes across the tree of life.

Why: After identifying a strong hypothesis and the right organism(s) to study it, we want to quickly get into the lab and start identifying key elements that we can usefully employ. However, most of the organisms we work on at Arcadia are uncharacterized.

How: We’re developing a toolkit of sequencing, phenotyping, and analysis approaches that will allow us to connect genetic variation (natural or induced) with measurable traits. With this toolkit, we aim to quickly generate genotype-phenotype maps and infer molecular hypotheses in uncharacterized organisms.

Goal #3

Move from descriptions to predictions.

Why: Arcadia is searching for deep principles underlying biology. We don’t want to just describe phenomena; we want to predict the when, where, and how of biology. It is this shift, from description to prediction, that will let us truly engineer serendipity.

How: We are using a “phenotype-forward” approach to generate models that learn, and then genetically map, the structure of biological systems. We want to get to the point where we can figure out how a majority of organismal processes are encoded in the genome. To do so, we first need to reliably (and scalably) map the structure of phenotypic space. To this end, we are currently exploring a number of methods for high-dimensional phenotyping and their utility for mapping phenotypic space in a way that is quantitative, high-throughput, and broadly applicable to diverse organisms.

Community philosophy

We are committed to developing tools that the broadest number of scientists possible can employ rapidly and cost-effectively. Given the sheer breadth of the diversity of life, coupled with the size of data sets produced by modern imaging and sequencing, no single organization will be able to tackle the depths of a given biological question alone. To this end, we are also constantly re-analyzing and integrating our results to identify minimal data amounts needed to conduct these experiments and to optimize for cost-effective solutions that can be applied in resource-limited contexts. Through these efforts, we hope to reduce wasted effort and empower community use, independent of funding level or institutional affiliation.

Progress

To decode the tree of life, we need to be able to understand the products of genomes — phenotypes — in their full richness across diverse organisms. We therefore started by focusing on creating computational frameworks and identifying experimental technologies that will allow us to measure, dissect, and compare complex phenotypes across the tree of life. These efforts will ultimately allow us to use these high-dimensional phenotypes to decode patterns of genetic variation.

High-throughput phylogenomic inference

One of our central goals is to scan the tree of life for biological innovation. Critical to this effort is the inference of evolutionary relationships — be it among species or their genes — via phylogenetic methods. To do this at the scale we are hoping to, our methods must be highly efficient, scalable, and able to be applied extremely broadly. That is, they must empower us to infer the evolutionary histories of all gene families, not just for a handful of genes of special interest.

With this in mind, we developed our first “phylogenetic discovery platform” — NovelTree, a Nextflow workflow that carries out all essential steps of standard phylogenomic analysis. This method infers not only the relationships among species, but also the relationships among all of their gene copies, for any number of gene families. NovelTree goes a step further still, inferring the history of gene duplication, transfer, and loss for each gene family and species.

To demonstrate NovelTree’s utility, we applied the method to a dataset composed of 36 species belonging to the four eukaryotic supergroups that comprise the TSAR clade: Telonemia, Stramenopila, Alveolata, and Rhizaria. We highlight key outputs, point to potential future research directions, and provide resources to facilitate summarization, visualization, and downstream analysis of the evolutionary datasets produced by NovelTree.

These efforts have resulted in our first “phylogenomic library” — a vast evolutionary data set that we can mine to pinpoint when and where evolutionary innovations have arisen in this group of organisms. This resource is only the first of many. We anticipate that its utility will only increase as we generate additional libraries across branches of the tree of life, allowing us to integrate these explorations and expand our mapping of the biological universe.

Learn more about this tool and how to use it here:

Resource

September 28,

2023

NovelTree: Highly parallelized phylogenomic inference

We want to find and use evolutionary innovations to solve present-day problems. We developed NovelTree, an efficient phylogenomic workflow that will empower us to decode the evolutionary traces of these innovations across the tree of life.

Community ideas to enhance NovelTree

Inspired by our ongoing work on improving and extending the capabilities of our phylogenomic capabilities with NovelTree, we released a short pub to spur engagement with the broader community about future directions in phylogenomic inference and method development.

Read it and weigh in!

Open question

March 05,

2024

How can we improve upon and expand the scope of our phylogenomic inferences?

We’re seeking feedback on NovelTree, our modular phylogenomic workflow. We’d appreciate your insights into how we can improve gene family inference, incorporate protein structure predictions, and expand to whole-genome data as input.

Tackling diverse types of motility among unicellular organisms

Unicellular organisms make up a substantial portion of the tree of life. To survive, many of these organisms are obligated to move around in the world via swimming, crawling, gliding, and even walking. These diverse, and sometimes ingenious, motility types are supported by complex and multifaceted biological processes. Given this, we want to see if we can leverage the diversity of unicellular motility to gain insight into its molecular and cellular underpinnings.

A key first step here is deciding how to quantitatively represent different modes of cellular movement. While swimming, gliding, and walking all share features with other types of organismal movement (e.g., the swimming of protists and models such as zebrafish may be treated similarly), crawling can be difficult to model since it involves active changes to the cell’s shape.

To address this, we developed a computational framework for processing, representing, and comparing images of cells crawling. We found that we could capture the movement dynamics of diverse cell types in a single ‘movement space.’ Using this space, we were able to discover that crawling varies broadly across multiple dimensions. Furthermore, we developed a simple statistic that can measure the relationship between variation in cell shape and the types of movement that are generated. This work lays the foundation for identifying the mechanistic bases of cellular movement across large evolutionary distances.

Learn more about this tool and how to use it here:

Result

October 14,

2022

Distinct spatiotemporal movement properties reveal sub-modalities in crawling cell types

Quantifying movement is a powerful window into cellular functions. However, cells can generate movement through a variety of complex mechanisms. Here, we generate a flexible framework for comparing an especially variable type of motility: cellular crawling.

Using spectral imaging to accelerate comparative biology

The vast majority of organisms we are interested in studying lack genetic and molecular toolkits. Most do not have genome sequences. For some, we aren’t even sure if they are unique species or not. We are interested in identifying tools that allow us to rapidly, and comparably, measure and monitor biological processes across taxonomic groups without needing to develop species-specific tools. Label-free imaging methods using vibrational spectroscopy, such as Raman imaging, offer promising alternatives for addressing a number of basic problems in biology. These methods can be non-destructive and do not require dyes, labels, or a priori knowledge. By detecting the presence of a diversity of chemical bonds, they can provide rich molecular fingerprints that can reflect metabolic or physiological state, cell type, or even species of origin.

Given this last point, we were interested in exploring which aspects of these signals, if any, correlate with phylogenetic relationships between species. We hypothesized that, if we were able to find such associations, then it might be possible to leverage Raman spectra for a variety of uses in our comparative work. To test this idea, we analyzed a publicly available dataset of Raman spectra measured from a variety of clinically isolated bacteria and fungi. As hypothesized, we found that specific portions of the spectra correlated with species relationships. Furthermore, these regions overlapped with variation in the abundance of nucleic acids across the species’ genomes, suggesting that this technology provides a potentially interesting way to measure the relationship between genetic and molecular components of a biological sample.

Learn more about this effort:

Result

June 15,

2023

Raman spectra reflect complex phylogenetic relationships

Even with many tools available, categorizing species is tough. We used data from Raman spectroscopy, a form of label-free imaging, to infer phylogenetic patterns among several dozen diverse microbial taxa, offering a non-destructive and rapid way to dissect species relationships.

Dissecting phenotypic relationships to accelerate genetic mapping

Genetic variation is the raw material of evolution. Analyses of variation within naturally interbreeding populations therefore make it possible to identify genetic sources generating phenotypic diversity. However, populations of organisms can vary in many ways at the same time (e.g., in their shapes, diets, reproductive strategies, and so on). To understand the diverse contributions of variants across the genome, it makes sense that we first need to capture the myriad phenotypes they affect.

With this in mind, we have begun deeply characterizing the phenotypes of two species of unicellular algae: Chlamydomonas reinhardtii and smithii. These species can interbreed and we plan to mate them to generate clonal libraries that we can use to correlate genetic variation with high-dimensional phenotypes. C. reinhardtii is a well-known cell biological model while the biology of C. smithii is relatively uncharacterized. To empower precision genetic mapping in crosses of these two species, we are beginning to chart their differences across a variety of phenotypes. Our initial efforts have identified differences related to morphology, growth, and physiology. We will be adding to this pub as we measure and compare more phenotypes.