Designing genome-wide MERFISH probes for understudied species
Researchers studying any organism with genomic data can follow this simple walkthrough to create sets of barcoded probes for the multiplexed FISH technique called MERFISH. We’re sharing interactive code notebooks that can be adapted to design barcoded FISH probes for any species.
Designing genome-wide MERFISH probes for understudied species
·
Purpose
We’re interested in understanding how RNA trafficking and local translation affect ciliary growth in Chlamydomonas reinhardtii. MERFISH is a technology for visually tracking hundreds to thousands of RNAs at once, so it is an appealing option. However, existing tools for designing the barcoded DNA probes used in MERFISH experiments are not very versatile, and many understudied organisms do not have publicly shared probe sets.
We developed a short bioinformatic workflow to design MERFISH probes for C. reinhardtii, and we’re sharing it here so the community can use it for experiments in other understudied organisms.
We’ve shared all of our code in executable Colab notebooks so users can modify the workflow and run it with their own data on Google’s cloud computing network.
Share your thoughts!
Watch a video tutorial on making a PubPub account and commenting. Please feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or use the URL for this page in a tweet about this work. Please make all feedback public so other readers can benefit from the discussion.
We’ve put this effort on ice! 🧊
#StrategicMisalignment
While we were getting MERFISH set up, we shifted our research portfolio. We no longer have a strong need to use MERFISH, so we’ve stopped pursuing this work.
Learn more about the Icebox and the different reasons we ice projects.
Summary
This publication is part of a larger effort at Arcadia to apply spatial genomics techniques, experiments that will allow us to profile RNAs and other biomolecules from tissue sections, cell lines, and unicellular organisms. One of the first steps in a new spatial transcriptomics experiment is to design barcoded probes complementary to RNA molecules in the cell. Here, we are releasing an executable probe design pipeline as a set of Google Colab notebooks. These notebooks serve as a functional example of probe design, stepping through the process to generate MERFISH probes against the C. reinhardtii genome. They allow users to reproduce our work on Google’s free, browser-based virtual computing platform, Colab, providing interactive access for those who want to design probes against new genomes or otherwise modify our workflow. Users can upload their own data, download their results, and make custom changes to the codebase.
The strategy
MERFISH (multiplexed error-robust fluorescence in situ hybridization) is a method that allows researchers to simultaneously image hundreds to thousands of RNA transcripts or other molecules within single cells. The protocol relies on sets of barcoded “encoding probes” that hybridizes to target biomolecules and are subsequently identified through the sequential binding of a set of fluorescent “readout” probes.
The problem
There are great tools to generate MERFISH probes, but they aren’t always set up to work with new genomes. Our first target organism, Chlamydomonas reinhardtii, does not yet have a set of designed MERFISH probes.
Our solution
Here, we apply the PaintSHOP pipeline [1] for FISH probe design to generate a genome-wide panel of MERFISH probes for Chlamydomonas reinhardtii. We provide our workflows in the form of interactive, browser-based code notebooks that can be used by researchers to generate MERFISH probes for any genome of interest. These executable notebooks can be readily modified or improved to support any number of applications in the rapidly developing field of spatial genomics and multiplexed in situ hybridization.
The resource
The process to create gene-specific MERFISH probe sets for Chlamydomonas is summarized below. By supplying appropriate genome and genome annotation files (FASTA and GTF, respectively), as well as gene names or target genomic coordinates, users can easily modify these self-contained notebooks to design MERFISH probes against any genome of interest. The core of the workflow is the PaintSHOP pipeline for FISH probe design, but the notebooks also contain all code necessary to download packages and data, install software, execute the probe design workflow, download, and verify the designs. The first of two notebooks runs the PaintSHOP pipeline on a new genome of interest, generating a genome-wide set of DNA and RNA FISH probes (Figure 1). Researchers can use the second notebook for individual MERFISH experiments. It designs probes for a given experiment by filtering for a set of target genes, appending MERFISH barcodes and PCR handles, and finally running BLAST against the target genome to check for off-target binding events.
These notebooks make it easy for us to share our workflows in a more permanent and flexible way than traditional bioinformatics software. They are one example of containerized, cloud-based scientific computing. This model of scientific computing is focused on cloud datasets rather than physical computing hardware. Users attach scalable virtual machines to their datasets and build lightweight environments in which to carry out actions on the data. In this way, data is more readily accessed from different physical endpoints or different users. Clean, reproducible computing environments greatly reduce the effort required to adopt code written by others.
Process overview
Designing target probes
Generate genome-wide probes
First, we generate probe sequences against an entire target genome. Our first Colab notebook uses the PaintSHOP pipeline to generate genome-wide probes using C. reinhardtii genome FASTA and GTF files.
Select targets
Next, we must choose the set of target genes or genome regions that we'd like to detect in our eventual MERFISH experiment. Our second notebook filters the genome-wide encoding probe set based on user-specified target genes. This example uses a small set of genes associated with ciliary assembly in Chlamydomonas. In practice, the 16-bit MHD4 barcode set used in these notebooks can barcode up to 140 separate genes or genomic targets.
Append MERFISH sequences
Next, we append MERFISH sequence regions to the oligos we’ve designed so far. This includes one to four MERFISH readout regions (where fluorescent probes bind) as well as PCR primer binding sites for amplification and manipulation of the pooled MERFISH oligo library.
Do quality control
Our second notebook installs the BLAST CLI and builds a local BLAST database against the Chlamy genome. Using BLAST, any off-target binding sites can be identified, and a BED file can be created to visually inspect the designed probes using a genome browser.
Designing readout probes
Readout probes are complementary to the synthetic readout regions appended to the target sequences in Step 3 above. In most cases, we use disulfide-bridged oligos to bind directly to the encoding probe. These can be easily cleaved with a DTT wash between rounds of staining and imaging.
Colab Notebook #1: Designing probes with PaintSHOP on Colab
This interactive notebook generates new FISH probes against a genome of interest. As a demonstration of our own workflow, the notebook downloads data from Arcadia’s public Amazon S3 bucket arcadia-merfish and generates probes against the Chlamydomonas reinhardtii genome. To run the notebook, use the “Open in Colab” button below and then select Runtime > Run allor use Shift+Enter to execute individual code blocks.
You can also view the workflow below:
Colab Notebook #2: Encoding probes filtering and quality control
This interactive notebook processes FISH probes for individual MERFISH experiments. It performs the target selection, appending, and quality controls steps described under “Process Overview” above. Our demonstration downloads genomic data from our arcadia-merfish S3 bucket, and we show how users can download their own designs.
You can also view the workflow below:
At the end of this process, users will have a set of probes that they can order and use in MERFISH experiments.
Key takeaways
Bioinformatics has traditionally relied on large workstations or clusters to store and process individual datasets, leading to massive redundancy and reduced interoperability of scientific software. A new model for scientific computing is emerging, based on cloud storage of datasets and flexible virtual computers, obviating the need for extensive physical computing resources. Here, we present an example of reproducible scientific computing relying on containerized workflows and virtual machines. We have used this resource to generate FISH probes against the Chlamydomonas reinhardtii genome at a range of model temperatures. We provide our inputs and outputs, allowing anyone to reproduce our work and modify the workflow.
Next steps
We will continue to develop these workflows to incorporate alternative probe design engines or alternative encoding probe architectures and barcode sets. We’re already using these probes in MERFISH experiments at Arcadia as part of our spatial genomics project and cell biology efforts, and will report our results when we have them. The PaintSHOP pipeline does not currently support overlapping oligo designs, although it has been demonstrated that overlaps up to 20 nucleotides can be used, increasing oligo density severalfold. We may explore pipelines with support for overlapping probes in the future.
Do you have any tools that you find useful for spatial genomics probe design? Would you like to contribute to these notebooks? Are you able to use this workflow successfully in your own work? What could make it easier to use? Comment with your feedback!
Share your thoughts!
Watch a video tutorial on making a PubPub account and commenting. Please feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or use the URL for this page in a tweet about this work. Please make all feedback public so other readers can benefit from the discussion.
I really liked this publication. Short, with simple language, links to resources like videos (God, I wish academia adopted this!!), and a useful, freely accessible tool for those who want to adopt MERFISH for their model organism of choice. The only things I think the article would benefit from are (1) a schematic of the features of general MERFISH probes - a picture speaks a thousand words, after all, and (2) a short description (even 1-2 sentences should suffice) of how the PaintSHOP pipeline works - something like “it is a pipeline that discovers probe sites using thermodynamic calculations and then uses some machine learning algorithms to prune them”. Or words to that effect. I think it will give a little more context to how the pipeline works behind the scenes and motivate why you adopted this pipeline.