We believe that ticks have much to teach us about our own biology. Ticks are ancient parasites that co-evolved with humans and have developed their own sophisticated molecular toolkits for extracting blood through our skin. Unlike other blood-feeding parasites, ticks feed for days at a time on the same host while evading detection. This is in contrast to other parasites, including mosquitos and fleas, which only need to feed for seconds at a time and can be detected upon skin breach.
This suggests that ticks have evolved extremely sophisticated ways of manipulating all kinds of processes in our skin barrier (sensory pathways, immune response, blood flow, wound healing, etc.). We are interested in mining tick saliva for the molecules that carry out these functions so that we can leverage them for novel therapeutic interventions. To do this, we need some basic tools in place to assay for the activities we are interested in and to identify the specific molecules responsible for these activities.
Develop mass spectrometry pipeline for protein identification (proteomics)
Develop mass spectrometry pipeline for small molecule identification (metabolomics)
Develop assays to screen for activities of interest
A few guiding concepts in how we’re thinking about this project and why we’re sharing each part of our work:
It’s very hard to do mechanistic work in new organisms without some basic omics in place, but usually the people exploring new organisms don’t have the resources or expertise to perform sophisticated omics analyses or know what tradeoffs exist at each step.
This work isn’t always documented.
Different groups use different references, which makes it hard to compare results. Many scientists don’t even realize the degree to which reference quality for proteomics can totally change what the results look like — are we even comparing apples to apples?
One of the first things we had to do was figure out how to do molecule discovery in tick species for which there was very little existing omics information. Most of the omics work in ticks has focused on one species, due to its critical role as a disease vector: the blacklegged tick (or deer tick), Ixodes scapularis, which transmits the bacterium that causes Lyme disease. For our purposes, we want to survey a broader range of tick species to access a bigger pool of bioactive molecules that regulate our skin, focusing on species known to attach and feed on humans. For many of these, such as Amblyomma americanum, there was not a fully assembled and annotated genome. This inherently limits the kind of proteomic analyses we can do, given that mass spectrometry-based identification of proteins relies on the mapping of protein fragments (peptides) against protein sequences that are typically predicted from genomic information. Thus, we had to solve this problem quickly. We also had to figure out the most generalizable way to do it so we could survey broadly across many tick species.
First, we considered sequencing the tick genomes. Unfortunately, tick genomes tend to be several gigabases large (similar in size to the human genome), so we decided to avoid genome sequencing and assembly in favor of a simpler approach. After conversations with Joan Wong (CZ Biohub) and Elizabeth Tseng (Pacific Biosciences), we saw a path forward through long-read transcriptome profiling using Pacific Bioscience’s Iso-seq method. This approach to RNA sequencing provided us with the full structures of tick transcripts without assembly. We could then use these transcripts to identify protein-coding sequences, which formed the basis of our proteomics database.
The method we ultimately developed should be broadly useful for any species that lacks reference omics data. Read more and access a detailed protocol here:
Figuratively speaking, this was our first time at the rodeo. We were excited when the slew of data arrived, but we knew little about genome/transcriptome completeness assessment. Juliana Gil (UC Davis) came to the rescue and taught us about how the BUSCO software package could help our efforts. According to BUSCO, our transcriptome and associated proteome were found to be mostly complete. We used our new proteome dataset to map peptides to mass spectra in order to identify proteins in our own complex Amblyomma americanum lysate. Gratifyingly, we were able to make a number of peptide-mass spectrum assignments that were not previously possible.
Read more about our Lone star tick data set and access the data itself here:
In addition to soliciting direct comments on our pubs, we decided to speak directly with other scientists in the tick community.
On June 8, 2022, we hosted a Zoom session with tick researchers to discuss our recent pubs, pain points in tick omics, and possible future solutions. We also discussed the results of our Twitter poll asking which tick species we should apply omics tools to next (the winner: Ixodes scapularis). We're super grateful to Januka Athukoralage, Matthew Butnaru, Nsa Dada, Joao Pedra, Agustin Rolandelli, and Isobel Ronai for their time and for all of their great input!
Below is a distillation of the items we touched on in problem-solution format.
Several hundred tick species are known, but Ixodes scapularis is the most well-studied. Based on our poll results, the community still wants a high-quality reference genome and accompanying transcriptomes because the current reference datasets are inadequate. These types of references are generally scarce or non-existent for other tick species.
The Pal lab is currently developing new references that could significantly improve the Ixodes scapularis landscape. These references could provide exactly what our poll respondents are requesting, which means we may be able to move on to the second most voted-on tick species: Ixodes pacificus.
Genome and transcriptome annotations are important for advancing tick biology at a molecular level. High-quality annotations are not broadly available and are major bottlenecks for tick biology (this is even true for Ixodes scapularis).
Long-read transcriptomics tools might provide better information about gene structures. Additionally, there may be a way to work backwards using proteomics data to improve annotations at the transcript and genome levels. This is something we can begin exploring in the near future.
Protein functional annotations are of mixed quality as tick genomes encode many proteins of unknown function.
Better gene annotations would enable better high-throughput functional assays for characterizing genes of unknown function.
There seems to be a reluctance around exploring the biology of other tick species and more broadly, non-model organisms. This may be due to biases carried over from prior training experiences and funding pressures. Without a critical mass of researchers building and sharing tools and data, it can be daunting to establish the groundwork necessary to study a new organism with satisfying depth.
A fellowship program encouraging post-doctoral level trainees to build tools at Arcadia could help lower barriers around the study of non-model organisms. Researchers could carry these tools forward, giving them a head-start in career development.
Ultimately, we’ve realized that there may still be many proteins in our lysates that remain unidentified if we rely solely on our transcriptome as a reference. This could be a consequence of inadequate transcriptome sampling depth or because those proteins’ corresponding genes were not actively transcribed during our tissue harvest. We decided to use PacBio HiFi long-read sequencing to generate a reference genome for Amblyomma americanum, which will hopefully enable a more complete decoding of the proteomics mass spectra we’ve obtained so far. We present our first draft assembly in this pub: