Arcadia aims to identify biology’s greatest innovations and decode the principles that generate them. To do this, we need to embrace exploration across the tree of life. However, for this exploration to happen at all, we need to identify phenomena that are broadly shared by living things. Luckily, biology is built upon a single unifying feature, one that we are now able to compare and dissect at unprecedented scale: genomes.
Across biology, genomes perform a set of universal tasks: transmitting information between generations, generating phenotypes, and orchestrating functions over a lifetime. This is as true for viruses — which employ admirably minimal genetic toolkits — as it is for the rare flower Paris japonica, whose 149 billion base pairs outstrip the human genome 50 times over.
Our goal is to use this universality to accelerate exploration at the scale of the tree of life: from comparisons of kingdoms that diverged billions of years ago to rapidly evolving populations that we generated last week. Through these efforts, we are developing a variety of tools to help us identify hotspots of biological novelty, make strong hypotheses about their causal basis, and ultimately predict traits in the previously inaccessible reaches of the tree of life.
Identify transformative biological innovations, agnostically.
Why: We’ll never find the next transformative biological insight by looking under the lamppost of what we already know. Much of the tree of life awaits exploration, and it is through the holistic lens of phylogenetics that we can begin to do so efficiently, scalably, and agnostically.
How: We are leveraging publicly available genomic data to generate what we are calling “phylogenetic discovery platforms.” These platforms combine phylogenetic trees with a variety of other data (e.g. protein structure, environment, organismal phenotypes) to map key evolutionary innovations. We are generating platforms for a variety of taxa in order to generate novel hypotheses, identify undiscovered phenomena, and select organisms for study at Arcadia.
Rapidly dissect phenotypes across the tree of life.
Why: After identifying a strong hypothesis and the right organism(s) to study it, we want to quickly get into the lab and start identifying key elements that we can usefully employ. However, most of the organisms we work on at Arcadia are uncharacterized.
How: We’re developing a toolkit of sequencing, phenotyping, and analysis approaches that will allow us to connect genetic variation (natural or induced) with measurable traits. With this toolkit, we aim to quickly generate genotype-phenotype maps and infer molecular hypotheses in uncharacterized organisms.
Move from descriptions to predictions.
Why: Arcadia is searching for deep principles underlying biology. We don’t want to just describe phenomena; we want to predict the when, where, and how of biology. It is this shift, from description to prediction, that will let us truly engineer serendipity.
How: We are using a “phenotype-forward” approach to generate models that learn, and then genetically map, the structure of biological systems. We want to get to the point where we can figure out how a majority of organismal processes are encoded in the genome. To do so, we first need to reliably (and scalably) map the structure of phenotypic space. To this end, we are currently exploring a number of methods for high-dimensional phenotyping and their utility for mapping phenotypic space in a way that is quantitative, high-throughput, and broadly applicable to diverse organisms.
We are committed to developing tools that the broadest number of scientists possible can employ rapidly and cost-effectively. Given the sheer breadth of the diversity of life, coupled with the size of data sets produced by modern imaging and sequencing, no single organization will be able to tackle the depths of a given biological question alone. To this end, we are also constantly re-analyzing and integrating our results to identify minimal data amounts needed to conduct these experiments and to optimize for cost-effective solutions that can be applied in resource-limited contexts. Through these efforts, we hope to reduce wasted effort and empower community use, independent of funding level or institutional affiliation.
To decode the tree of life, we need to be able to understand the products of genomes — phenotypes — in their full richness across diverse organisms. We therefore started by focusing on creating computational frameworks and identifying experimental technologies that will allow us to measure, dissect, and compare complex phenotypes across the tree of life. These efforts will ultimately allow us to use these high-dimensional phenotypes to decode patterns of genetic variation.
Unicellular organisms make up a substantial portion of the tree of life. To survive, many of these organisms are obligated to move around in the world via swimming, crawling, gliding, and even walking. These diverse, and sometimes ingenious, motility types are supported by complex and multifaceted biological processes. Given this, we want to see if we can leverage the diversity of unicellular motility to gain insight into its molecular and cellular underpinnings.
A key first step here is deciding how to quantitatively represent different modes of cellular movement. While swimming, gliding, and walking all share features with other types of organismal movement (e.g. the swimming of protists and models such as zebrafish may be treated similarly), crawling can be difficult to model since it involves active changes to the cell’s shape.
To address this, we developed a computational framework for processing, representing, and comparing images of cells crawling. We found that we could capture the movement dynamics of diverse cell types in a single ‘movement space.’ Using this space, we were able to discover that crawling varies broadly across multiple dimensions. Furthermore, we developed a simple statistic that can measure the relationship between variation in cell shape and the types of movement that are generated. This work lays the foundation for identifying the mechanistic bases of cellular movement across large evolutionary distances.
Learn more about this tool and how to use it here:
The vast majority of organisms we are interested in studying lack genetic and molecular toolkits. Most do not have genome sequences. For some, we aren’t even sure if they are unique species or not. We are interested in identifying tools that allow us to rapidly, and comparably, measure and monitor biological processes across taxonomic groups without needing to develop species-specific tools. Label-free imaging methods using vibrational spectroscopy, such as Raman imaging, offer promising alternatives for addressing a number of basic problems in biology. These methods can be non-destructive and do not require dyes, labels, or a priori knowledge. By detecting the presence of a diversity of chemical bonds, they can provide rich molecular fingerprints that can reflect metabolic or physiological state, cell type, or even species of origin.
Given this last point, we were interested in exploring which aspects of these signals, if any, correlate with phylogenetic relationships between species. We hypothesized that, if we were able to find such associations, then it might be possible to leverage Raman spectra for a variety of uses in our comparative work. To test this idea, we analyzed a publicly available dataset of Raman spectra measured from a variety of clinically isolated bacteria and fungi. As hypothesized, we found that specific portions of the spectra correlated with species relationships. Furthermore, these regions overlapped with variation in the abundance of nucleic acids across the species’ genomes, suggesting that this technology provides a potentially interesting way to measure the relationship between genetic and molecular components of a biological sample.
Learn more about this effort:
Genetic variation is the raw material of evolution. Analyses of variation within naturally interbreeding populations therefore make it possible to identify genetic sources generating phenotypic diversity. However, populations of organisms can vary in many ways at the same time (e.g. in their shapes, diets, reproductive strategies, and so on). To understand the diverse contributions of variants across the genome, it makes sense that we first need to capture the myriad phenotypes they affect.
With this in mind, we have begun deeply characterizing the phenotypes of two species of unicellular algae: Chlamydomonas reinhardtii and smithii. These species can interbreed and we plan to mate them to generate clonal libraries that we can use to correlate genetic variation with high-dimensional phenotypes. C. reinhardtii is a well-known cell biological model while the biology of C. smithii is relatively uncharacterized. To empower precision genetic mapping in crosses of these two species, we are beginning to chart their differences across a variety of phenotypes. Our initial efforts have identified differences related to morphology, growth, and physiology. We will be adding to this pub as we measure and compare more phenotypes.
Read more on this work:
Now that we’ve started making progress on dissecting complex phenotypes, we’re excited to intersect these efforts with genomic data across many different biological scales.
On one hand, we’re searching for biological innovation across billions of years of evolutionary time by generating novel phylogenomic libraries. On the other, we are developing next-generation tools for predicting phenotypes from genotypes (and vice versa).
Longer-term, we are excited to apply these tools to a variety of use cases by identifying which organisms we can leverage, generating robust evolution-informed hypotheses, and, in the process, refining our abilities to ask and answer transformative biological questions.