Defining actin: Combining sequence, structure, and functional analysis to propose useful boundaries

Feridun Mert Celebi; Taylor Reiter; Prachee Avasthi; Megan L. Hochstrasser; Brae M. Bigge

doi:10.57844/arcadia-ynth-kh70

Idea Feedback requested Revised after community feedback Annotation: Mapping the functional landscape of protein families across biology

Published on Dec 01, 2022 by Arcadia Science

Defining actin: Combining sequence, structure, and functional analysis to propose useful boundaries

The process of deciding whether a candidate actin homolog represents a “true” actin is tricky. We propose clear and data-driven criteria to define actin that highlight the functional importance of this protein while accounting for phylogenetic diversity.

Defining actin: Combining sequence, structure, and functional analysis to propose useful boundaries

Purpose

To learn about a protein’s function and regulation across a broad range of species, you must define which of the many potentially related proteins you’re going to count as homologs and where the line between true homologs and other proteins exists. Then, understanding the proteins that exist at this boundary can help identify novel functions and regulation, as well as insights into how the protein family evolved. Determining whether a protein fits within a particular protein family requires characterizing the sequence, structure, and importantly, the function of that protein.

We’ve outlined a series of well-defined and testable criteria for determining whether a candidate actin is a “true” actin as opposed to an actin-related protein or an actin-like protein. Using these criteria, we created a pipeline to computationally analyze candidate actins. We ran almost 50,000 candidate actins through this pipeline and, among other things, found that global sequence conservation and functional analysis showed a distinct cluster of true actins.

These criteria and the pipeline we developed to analyze them might be useful for anyone studying “fringe” actins. We would love feedback on whether you think these criteria are sufficient, if there are other criteria we should include, and what might make this pipeline more useful for your own work.

This pub is part of the project, “Mapping the evolution of interactomes.” Visit the project narrative for more background and context.
All associated code is available in this GitHub repository.
The data outputs from our actin identification pipeline are available here.

Share your thoughts!

Feel free to provide feedback by commenting in the box at the bottom of this page or by posting about this work on social media. Please make all feedback public so other readers can benefit from the discussion.

Motivation

Think about your favorite protein. Maybe you study how it functions in the cell, but proteins do not exist in a vacuum. Proteins are regulated in many ways, including by other proteins. For example, actin can spontaneously polymerize, but it does so much faster and into defined networks for specific functions with the help of other proteins called nucleators. Studying protein–protein interactions within a single species or cell type only gives you a fraction of the full picture, because proteins and functions evolve over time, gaining and losing characteristics and behaviors.

To deeply understand how a protein functions, it is important to uncover how that protein and its regulation changed throughout the tree of life. For instance, an actin nucleator called the Arp2/3 complex requires activation by nucleation-promoting factors, many of which have been found outside of mammalian cells — if we’d only ever studied actin in mammalian cells, we would have missed these important insights into actin regulation. That said, you can’t compare interactomes across diverse species unless you are confident that the thousands of sequence variations you find for your central protein of interest are really homologs of that protein and not look-a-likes with different functions and regulation that will interact with a different set of proteins.

So, you take your candidate protein and its amino acid sequence, and you want to determine if it really is a member of your favorite protein family. The gold standard would be to understand how closely the sequence, structure, and function of the candidate protein match your main protein of interest using computational and experimental tools. You run it through a sequence search, like BLAST, which gives you a long list of related proteins, all based on the sequence [1][2]. With the recent advances in AlphaFold and structural comparison, you might also run the predicted structure of your favorite protein through a comparison search, and that can tell you about the structural similarity of your protein to others [3][4]. But these both just give you a list of proteins and scores. How do you know when those scores mean that a protein is similar enough to be relevant or considered a homolog, yet still different enough to be interesting and likely to have novel interactions or functions?

Another important characteristic in wading through possible homologs is protein behavior, or function. Typically this is done in an experimental setting, where first you identify an intriguing potential homolog based on sequence or structure and then investigate it in your experimental system. For example, if you think that a particular protein might be involved in cell division, you might mutate that protein in your cells and see if it affects division, or tag that protein and see where it localizes when cells divide. This can tell you a lot of information about your protein, but it is generally pretty low-throughput and requires a lot of time and effort.

We faced this issue for one particular protein — actin. We’re sharing our solution here so that others can provide feedback and in case our general approach to including and excluding candidate members of a protein family is useful for other proteins.

Our use case: Actin

As part of our project to map protein interactomes, we wanted to learn more about our favorite protein, actin. Actin is a cytoskeletal protein that is required for a long list of cellular functions that are essential for life, and it is sorted into these diverse functions through its interaction with actin-binding proteins (this list is not exhaustive and is periodically updated and refined). Because it’s important for so many functions, actin is generally well-conserved and present throughout the tree of life. However, most of what we know about actin comes from cells that represent a relatively small sliver of the tree of life, mostly Opisthokonts (amoebae, fungi, and animals) with very highly conserved actins. Because of this, our rules about what makes an actin an actin might be incomplete. This means that we are missing out on important data about actin, its functions in the cell, and what determines which of these many functions actin will perform in a given species or at a given time. We are also potentially missing out on how we might be able to re-engineer cellular functions based on the regulation of actin and what is possible in a wide range of organisms. Therefore, we want to look at actins that lie right on the boundary between “true” actins and actin-like or actin-related proteins.

Unsurprisingly, the first thing we did was a BLAST search against the NCBI non-redundant database using human ß-actin. This gave us a list of about 50,000 related proteins, but we realized there are no clear rules about how similar an actin has to be to our model actins to be considered a true actin. We did structure searches and found a similar problem. None of this really told us if the proteins we were looking at were similar enough to be considered actins but different enough to potentially provide new insights.

This is complicated by the presence of actin-related proteins and actin-like proteins. Actin-related proteins, or ARPs, are a class of proteins found across cell types that are highly similar to conventional actin, but that have different cellular functions, different abilities to polymerize, and are generally only found in cells that express a separate, primary actin. ARP1, for example, is part of the dynactin complex that forms with dynein. ARP1 is able to form short filaments within the complex, but is unable to form longer independent filaments. ARP2 and ARP3 are part of the Arp2/3 complex, which nucleates new branched actin filaments. They serve as the first two subunits of the newly forming actin filament. Other ARPs are important for chromatin remodeling and mitochondrial dynamics [5].

Actin-like proteins are present in cells that already express a primary actin, or in non-eukaryotic cells. An example of #1 is the novel actin-like protein 1 (NAP1) in Chlamydomonas reinhardtii, Volvox carteri, and other closely related algae that encode a primary actin. Chlamydomonas NAP1 is roughly 60% identical to mammalian actin, while the primary actin is closer to 90% [6][7]. Non-eukaryotic examples are actin-like proteins in archaea, including Crenactin and Lokiactin, and bacteria, including MreB and ParM.

There are no clear rules for when a candidate homolog should be considered an actin, an actin-related protein, or an actin-like protein. To ensure that our rules defining a protein as a true actin are not incomplete or inaccurate, leading to holes in our understanding of actin biology, we aim to simplify this problem. Here, we work towards defining a “true” actin by creating a set of clear, easily testable, and quantifiable criteria. Beyond actin, we hope that this general workflow and the idea of using quantitative measures of similarity across sequence, structure, and function to define protein families will be broadly useful.

The proposed criteria and the our actin identification pipeline

The quantifiable criteria we propose to define actin are as follows (click to jump to our analysis for each criterion):

In narrowing down our initial list, we considered a few important things. First, the importance of each criterion to the overall function of the protein helped us determine which criteria really mattered. We were more likely to consider criteria that are very important for actin function, like polymerizability, than those that do not necessarily influence the function of the protein, like phylogeny. Additionally, we selected criteria that we could easily determine for our candidate actins. We can determine most of the criteria using computational tools or simple experiments.

Using the three criteria above and each step described below, we created a streamlined and efficient pipeline that tells us the likelihood that a protein of interest is a true actin. While other tools allowed us to look at global sequence identity or structural identity independently, this pipeline considers sequence and structural identity together as well as important functional properties and their conservation (Figure 1).

**The actin prediction pipeline**.
To investigate whether our proteins of interest are true actins, we analyzed a group of actins based on their sequence, structure, and function.

Computational method for actin identification

Briefly, we used the pipeline to perform a global sequence analysis by comparing query proteins to a multiple sequence alignment containing frequently studied actins that we know polymerize using MAFFT [8][9]. We also used the actin PFAM profile to determine if the proteins of interest were members of the actin family using the hmmer3 package [10][11]. Next, we determined structural conservation by comparing structural models of query proteins that were determined using AlphaFold to a known actin structure using the Foldseek program [3][4][12]. Finally, we looked at specific actin functions by aligning query proteins to human ß-actin labeled with specific residues that are known to be important in either polymerization of the protein or its ATPase function again using MAFFT [8][9]. More information on this pipeline to investigate the “actin-ness” of a particular protein of interest can be found in subsequent sections and on GitHub.

All code generated and used for the pub is available in this GitHub repository (DOI: 10.5281/zenodo.7384386), including all of the processes summarized in Figure 1.

Applying the pipeline

While we use this specific pipeline to look at actins, the idea behind this pipeline is broadly applicable to other proteins. Coupling sequence and structure analysis together in a fast and efficient pipeline and adding in a functional component can help better define various families of proteins and can help researchers determine whether or how their proteins of interest fit within those families.

Using this pipeline, we performed analyses of all of the candidate actins that came up when we did a BLAST search of human ß-actin (limiting the output to the first 50,000 sequences). Of these 50,000 initial BLAST matches, 2,363 failed to download from NCBI with eutils (error invalid uid), returning empty FASTA files. So, we analyzed 47,634 candidate actins. We outline our key findings in the next section.

Findings

Sequence conservation shows clustering of “true” actins and other proteins

Generally, a protein’s global sequence identity to a known protein is used to determine its divergence or similarity to other proteins. The amino acid sequence, or the primary structure, helps determine how the protein will assemble into its secondary structure. It is also important for interacting with other proteins, with other monomers in the case of actin, and with other molecules in the cell, like ions and small molecules. Thus, looking at the global sequence similarity can be a useful metric. However, this metric alone ignores other ways in which proteins can be similar resulting in likely misses of proteins that may be related. It could also give rise to spurious relationships between proteins that have actin-like sequences but do not function like actin.

Most actins consist of about 375 amino acid residues. Humans have six actins, which, compared to each other, are at least 93% identical (Figure 2, A) [13]. Most of the differences in these sequences appear at the extreme N-terminus, where these differences cause differential regulation due to their post-translational modifications. On the other end of the spectrum, the most divergent eukaryotic actin currently characterized belongs to the single-celled parasite, Giardia, coming in at roughly 58% sequence identity compared to human actin [14]. Between Giardia and humans lies a wide spectrum of actins that could be potential goldmines for better understanding actin biology, and this doesn’t even consider the vast array of actin family proteins that exist outside Eukarya. This again underscores the need to clearly define actin-related proteins, actin-like proteins, and true actins.

We approached this issue in two ways. First, we used MAFFT to create a multiple sequence alignment that consists of extensively studied, known actins that function as we would expect, including human actins, yeast actins, and several other conventional actins (Figure 2, A) [8][9]. We then aligned each of our actins of interest (reminder — these are the top 47,634 results from running human ß-actin through BLAST) to this multiple sequence alignment and calculated an average pairwise identity. This tells us how conserved our actins of interest are to a set of known, well-studied conventional actins. Next, we looked at the conservation of our actins of interest in relation to the actin family of proteins by comparing our sequences to the actin PFAM profile using hmmer3 [10][11]. This primarily tells us whether a given candidate actin fits into the broader family of actin proteins.

Because we identified proteins based on sequence similarity, we found that all of the query proteins we analyzed do align well with the actin PFAM profile and therefore do fit into the actin family. Next, we determined the average global sequence identity of each query protein compared to the multiple sequence alignment in panel A (Figure 2, B–C). Average global sequence identity of the query proteins ranged from about 25% to nearly 100% (Figure 2, B–C). The data appears to be multimodal with about 4-6 peaks and a noticeable transition in the data between about 60–70% (Figure 2, C).

**Global sequence conservation shows clustering of actins into bona fide actins and other proteins**.
A) The multiple sequence alignment used for this step of the analysis containing known actins that are well-studied and that have been shown to polymerize normally.
B) The average global sequence identity of all of the actins shown compared to the multiple sequence alignment shown in A. AU refers to arbitrary units; the x-axis in this case is each of the individual actins.
C) Frequency distribution of the data graphed in C showing clear clustering of query proteins into true actins and other proteins.

Structural conservation does not align well with sequence conservation, highlighting the need for multiple analyses

Structural conservation adds another dimension to assessing protein relatedness, and the advancements made with AlphaFold, the AlphaFold Protein Structure Database, and programs like Foldseek make this relatively simple to implement into our pipeline [3][4][12].

The actin fold is composed of four subdomains, termed subdomains 1–4, which have varying degrees of importance in actin’s polymerization, ATPase function, and binding ability (Figure 3) [15]. Between subdomains 2 and 4 lies the nucleotide-binding cleft, where actin binds and hydrolyzes ATP. Between subdomains 1 and 3 lies the target-binding cleft (TBC), where a large number of actin-binding proteins bind, including profilin, gelsolin, WH2 domain proteins like WASP, and FH2 domain proteins like formins. Additionally, this region is highly important for the association of actin monomers during polymerization.

This “actin fold” is critical because in order to perform its cellular functions, actin must be able to polymerize, perform its ATPase functions, and interact with a large range of actin-binding proteins. Due to its functional importance across organisms, the overall structure of actin is well-conserved. A similar fold can be found in actins from humans to the distant eukaryote Giardia, but also in archaea and bacteria (Figure 3, A–D). This characteristic actin fold is also shared by non-actin proteins, including some actin-related proteins, sugar kinases, hexokinase, and Hsp70 proteins [16]. This highlights the importance of considering multiple criteria in determining whether a protein is indeed an actin.

To include structural analysis in our pipeline, we obtained structural models for a set our candidate actins using the AlphaFold Database [3][4]. We compared these proteins to the experimentally determined structure of rabbit muscle actin (PDB: 1J6Z) using Foldseek [12]. This produced a list of aligned structures and scores (E-values) associated with each. E-values are determined by creating an extreme value distribution, then using a neural network to determine the mean and scale parameters for each query [12]. They tell us the structural similarity between our query protein and our reference protein. Of the 50,000 candidate actins identified via BLAST, we analyzed structures of those that were on UniProt and had structures determined by AlphaFold (17,036). After compiling these scores, we found a distribution of structural conservation (Figure 3, E). We compared the structural conservation to the sequence conservation, and interestingly, did not see a strong correlation in structural conservation and sequence conservation (Figure 3, F). Specifically, there is a density in the high sequence conservation space that has a huge range of structural divergence, which was not expected. These data demonstrate the importance of considering both criteria. This kind of large-scale structural analysis could also be useful for understanding which aspects of the actin fold are conserved elsewhere, like in sugar kinases and heat shock proteins, and making predictions to annotate functions in non-actin proteins.

**Actin structure is broadly conserved, but differences between structural conservation and sequence conservation highlight the importance of doing multiple analyses**.
A) Cytoplasmic actin from rabbit (PDB: 1J6Z) with subdomains represented in different colors. Of note are the nucleotide-binding cleft where ATP is typically bound and the target-binding cleft which is important for the binding of several known actin-binding proteins.
B) AlphaFolded actin model of the most divergent eukaryotic actin currently known, the actin from *Giardia* (AlphaFold Database: AF-P51775-F1). Subdomains of the protein colored as in A. The overall structure is quite similar.
C) AlphaFolded actin model of the Archaeon *Candidatus Lokiarchaeota* (AlphaFold Database: AF-A0A532TFF0-F1). Subdomains of the protein colored as in A. The overall structure is quite similar.
D) AlphaFolded actin model of the bacterial actin-like protein, MreB (AlphaFold Database: AF-P0A9X4-F1). Subdomains of the protein are colored as in A. Despite the evolutionary distance and lack of sequence identity, the overall structure is still quite similar.
E) Structural similarity represented by -1*log transformed E-values obtained from FoldSeek for each candidate actin.
F) Bivariate analysis of -1*log-transformed E-values, showing structural similarity and the global percent identity.

Actin’s polymerizing function is less conserved than its ATP-binding ability

After the determination of sequence and structural conservation, we are often forced to turn to the bench to determine functional conservation. Usually this would be done by looking at how the protein functions in the cell or in vitro. However, we hoped to address this issue computationally and in a relatively high-throughput manner by identifying the residues important for specific protein functions and looking at their conservation. This is possible because proteins are multifunctional and different domains of proteins have different functions that might set that functional criteria. While the multiple sequence alignments, Hidden Markov Models, and FoldSeek used previously will work for basically all proteins as long is there is a good reference protein, the residue-specific functional annotations are typically the rate limiting step. Because we already know a lot about actin, we know that polymerization and ATPase activity are key functions to probe through this approach.

Polymerization

Polymerization is actin’s ability to transition between a monomeric state and a filamentous state. During polymerization, monomers interact with each other to form a polymeric filament (Figure 4, A). This important characteristic is found in all actins and many actin-like proteins, but actin-related proteins are unable to form long, stable polymers. This suggests that a candidate protein’s ability to polymerize is a highly important characteristic in deeming it a “true” actin.

Typically, researchers use pyrene assembly assays and total internal reflectance fluorescence (TIRF) microscopy to experimentally test actin polymerization. But is there a way to computationally predict whether a putative actin will likely polymerize? Actin filaments form a right-handed helix composed of two chains of actin monomers [15]. So, polymerization of actin involves two types of interactions between monomers: longitudinal (long-pitch) contacts between monomers within a chain and lateral (short-pitch) contacts between monomers in adjacent chains (Figure 4, A). Based on cryo-EM structures of actin filaments, researchers have found the residues involved in each of those types of contacts (Figure 4, B) [17]. Using this information, we created a computational program that looks for conservation of those particular residues as a way to determine if polymerization of a potential actin is likely.

We ran our candidate actins through this program, and got results for 20,095 candidate actins that had non-gapped alignments to our reference human ß-actin. In a subsequent version, we plan to adjust the pipeline to include alignments with gaps as well. Here, we found that there is a broad range of conservation of these residues for both lateral contacts and longitudinal contacts (Figure 4, C–D). Putting these contacts together, we saw a clear transition between a group of query actins that are more conserved and a group that are less conserved (Figure 4, E–F). We also looked at polymerizability compared to both sequence identity (Figure 4, G) and structural conservation (Figure 4, H). We found that in both cases, there is a correlation between the percentage of conserved residues involved in polymerization and each of the other two criteria.

**Analysis of the conservation of the residues involved in specific actin functions reveal clustering of proteins into actins and other proteins**.
A) Actin monomers (PDB: 1J6Z) polymerize into polar filaments composed of two helical actin chains (PDB: 3G37).
B) Annotation of all actin residues involved in lateral and longitudinal contacts between monomers within filaments.
C) Percentage of lateral contacts conserved throughout the query actins.
D) Percentage of longitudinal contacts conserved throughout the query actins.
E) Total polymerization contacts (lateral and longitudinal) conserved throughout the query actins.
F) Frequency distribution of the total conserved polymerization contacts showing a cluster of well-conserved actins and a cluster of less-conserved actins.
G) Bivariate analysis comparing the polymerizability to the global sequence identity showing correlation between the two, with some outliers.
H) Bivariate analysis comparing the polymerizability to the structural similarity, represented by -1*log-transformed E-values, showing correlation between the two, with some outliers.

ATPase activity

Important for polymerization and depolymerization, actin functions as an ATPase, an enzyme that hydrolyzes ATP. Typically, monomeric actin bound to ATP joins the end of a growing actin filament. As the filament ages, the ATP is hydrolyzed to ADP and inorganic phosphate (Pi) (Figure 5, A) [18]. Then, once the inorganic phosphate is released, ADP-bound actin can be released from the filament as monomeric actin. The ADP in monomeric actin is swapped out for ATP so the monomers can rejoin actin filaments once again.

ATPase function can be determined with biochemical assays. However, similar to polymerization, the region of actin that binds nucleotides is known based on crystal structures and cryo-EM structures of actin with several different bound nucleotides (ATP, ADP, ADP + Pi) (Figure 5, B) [17]. Using these residues, we created a program that looks for conservation of the nucleotide-binding site to use as a readout of possible ATPase function.

We aligned each of query sequences to that of human ß-actin and annotated the regions that have been found to be involved in ATP binding and therefore ATPase function. We analyzed 32,680 of our original candidate actins that had non-gapped alignments to our reference human ß-actin that had non-gapped alignments to our reference human ß-actin and found that overall, the ability to bind ATP seems to be more conserved than the ability to polymerize (Figure 5, C–F). Even some proteins with relatively low percent global identity or structural similarity still seem likely to bind ATP based on the conservation of the residues involved (Figure 5, E–F). However, there are cases where these residues are not well-conserved. These might be interesting targets for better understanding how actin functions as an ATPase and how those functions evolved.

Key takeaways and conclusions

Key takeaways

Analysis of global sequence identity shows clustering of potentially true actins and other proteins (Figure 2).
Structure analysis and sequence analysis are not well-correlated, highlighting the need for evaluating multiple criteria instead of relying on one (Figure 3).
The conservation of residues involved in polymerization is well-correlated with global sequence identity and shows similar clustering of true actins and other proteins (Figure 4).
Generally the residues that bind ATP are well-conserved (Figure 5).

Conclusions

Based on the key takeaways summarized above, we returned to our original problem of defining the line between true actins and other proteins so that we can identify divergent proteins that exist at this border. We wondered whether we could use the patterns we observed to determine which proteins are actins as opposed to actin-like proteins or actin-related proteins. So we investigated how existing annotations fit with the data presented here. We extracted gene names for the proteins that were listed on UniProt and parsed these into one of three categories: actin (this includes any type of actin and any isoforms labelled specifically as “actin”), actin-like proteins (this includes both proteins termed actin-like proteins and actin family proteins), and actin-related proteins (this is any and all actin-related proteins or ARPs). We then visualized our results with these categories mapped on the graphs.

Looking at the percent identity alone, we found that our large peak between about 80–100% is primarily composed of “actins” (Figure 6, A). The majority of proteins between 57% and 80% also seem to be “actins” (black), while proteins with lower percent identities are primarily annotated as “actin-like” (purple) or “actin-related” proteins (yellow) (Figure 6, A). However, we also found that the designation of “actin-like” does not necessarily mean that a protein has a lower sequence identity, underscoring the need for a multi-dimensional tool like this to determine where candidates fit within the broader family of actins and actin-like proteins.

We also mapped these existing annotations onto our analysis of structural similarity compared to global sequence identity (Figure 6, B). Here, while the percent identity of “actin-related” proteins (yellow) are much lower, the structural similarity of these same proteins is around average (Figure 6, B). Meanwhile, “actins” and “actin-like proteins” have a broad range of structural conservation, which was unexpected (Figure 6, B). This could be due to the high structural conservation throughout the entire actin family. Because most actin family proteins are highly structurally conserved, perhaps we are just not able to see clear patterns among them.

Finally, we mapped these existing annotations onto our analyses of functional similarity compared against global sequence identity (Figure 6, C–D). For the polymerization, it is clear that “actin-related” proteins seem to be clustered in a region of low global sequence identity and also low conservation of the polymerization contact sites, while most “actins” seem to have high conservation of both (Figure 6, C). “Actin-like” proteins, however, are spaced out across the distribution, suggesting that this particular annotation is not as meaningful as we might think (Figure 6, C). All the proteins seem to have relatively well-conserved ATP-binding sites. The clustering of “actins,” “actin-like” proteins, and “actin-related” proteins is less clear, likely because the ATPase function of actin extends into related proteins, even “actin-like” proteins and “actin-related” proteins (Figure 6, D).

Together, these data demonstrate the importance of considering multiple criteria when deciding whether a protein fits within a protein family. Each of our criteria — sequence, structure, and protein function — yield slightly different results and distributions, but considering the full picture provides more insight into when a candidate could be considered a true actin.

The other goal of this work was to create a useful way to identify potentially intriguing actin homologs that might lead us to novel functions or interactions. We are especially interested in actins that lie at the border between true actins and other actin-related proteins, as these have conserved actin properties but will more likely interact with different proteins or perform unknown functions. Indeed, we identified a set of actins that might be interesting. These actins appear at the transition between about 60–70% global sequence identity (Figure 6, A), where there is also a transition in the conservation of the polymerization contact sites (Figure 6, C). Further investigation into the nature of these actin candidates and what organisms they come from might help us narrow them down into a core list of actins that can be used to identify novel interacting proteins or new cellular actin functions.

**Labeling of candidate actins based on existing annotations shows interesting patterns**.
A) Frequency distribution of the candidate actins that are present on UniProt and have relevant annotations. Colors correspond to existing annotations.
B) Bivariate analysis of structural similarity as determined by the log-transformed E-values and the global sequence identity for all candidates with relevant annotations available. Colors correspond to pre-existing annotations.
C) Bivariate analysis of the conservation of the residues important for polymerization and the global sequence identity for all candidate actins with annotations available. Colors correspond to existing annotations.
D) Bivariate analysis of the conservation of the residues involved in ATP binding and the global sequence identity for all candidate actins with annotations available. Colors correspond to existing annotations.

What do you think?

The purpose of this work is to create a useful and clear metric to decide when a protein is an actin as opposed to an actin-related protein, an actin-like protein, or some other protein entirely. In order to do this, we built a pipeline that we think could be broadly applicable to other actin researchers, but that could also be adapted by protein biologists anywhere who are interested in a specific protein family.

This list of specific and testable criteria is designed as a starting point for a larger discussion about how we determine the definitions of cytoskeletal proteins. How might criteria such as these be applied to cytoskeletal proteins, or even non-cytoskeletal proteins, that are expressed throughout the tree of life?

How could we expand the current scope of our pipeline to make it useful to your own research? Are there ways that we could take this beyond studying actin?

We hope that this list of criteria will be useful for researchers studying actin and cytoskeletal proteins. Do you feel that these criteria could be useful to of whether a protein is an actin or not? Are there places we should be more or less specific? Is there anything we didn’t include that you feel we should include, or anything we included that you think isn’t relevant?

We are particularly interested in researchers that are studying actins at the edge of these criteria — maybe actins that are really divergent, actins that exist outside of the traditional eukaryotic actin definition, or actins that perform really weird functions. Do “actins” from your favorite organisms fit these criteria? Are there ways we might update these criteria to be more inclusive of the diversity of the actinome without sacrificing the specificity of the definition? If this pipeline were available as a web tool, is it something you would use to learn about your own potential actin family proteins?

We would love any feedback or thoughts you’d like to contribute.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990). Basic local alignment search tool. https://doi.org/10.1016/s0022-2836(05)80360-2

STATES DJ, GISH W. (1994). QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification. https://doi.org/10.1089/cmb.1994.1.39

Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S. (2021). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. https://doi.org/10.1093/nar/gkab1061

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. (2021). Highly accurate protein structure prediction with AlphaFold. https://doi.org/10.1038/s41586-021-03819-2

Goodson HV, Hawse WF. (2002). Molecular evolution of the actin family. https://doi.org/10.1242/jcs.115.13.2619

Onishi M, Pringle JR, Cross FR. (2015). Evidence That an Unconventional Actin Can Provide Essential F-Actin Function and That a Surveillance System Monitors F-Actin Integrity inChlamydomonas. https://doi.org/10.1534/genetics.115.184663

Kato-Minoura T, Uryu S, Hirono M, Kamiya R. (1998). Highly Divergent Actin Expressed in aChlamydomonasMutant Lacking the Conventional Actin Gene. https://doi.org/10.1006/bbrc.1998.9373

Katoh K. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. https://doi.org/10.1093/nar/gkf436

Katoh K, Standley DM. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. https://doi.org/10.1093/molbev/mst010

Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. (2020). Pfam: The protein families database in 2021. https://doi.org/10.1093/nar/gkaa913

http://hmmer.org/documentation.html

van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, Steinegger M. (2022). Fast and accurate protein structure search with Foldseek. https://doi.org/10.1101/2022.02.07.479398

Perrin BJ, Ervasti JM. (2010). The actin gene family: Function follows isoform. https://doi.org/10.1002/cm.20475

Paredez AR, Assaf ZJ, Sept D, Timofejeva L, Dawson SC, Wang C-JR, Cande WZ. (2011). An actin cytoskeleton with evolutionarily conserved functions in the absence of canonical actin-binding proteins. https://doi.org/10.1073/pnas.1018593108

Dominguez R, Holmes KC. (2011). Actin Structure and Function. https://doi.org/10.1146/annurev-biophys-042910-155359

Bork P, Sander C, Valencia A. (1992). An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins. https://doi.org/10.1073/pnas.89.16.7290

Chou SZ, Pollard TD. (2019). Mechanism of actin polymerization revealed by cryo-EM structures of actin filaments with three different bound nucleotides. https://doi.org/10.1073/pnas.1807028115

Kanematsu Y, Narita A, Oda T, Koike R, Ota M, Takano Y, Moritsugu K, Fujiwara I, Tanaka K, Komatsu H, Nagae T, Watanabe N, Iwasa M, Maéda Y, Takeda S. (2022). Structures and mechanisms of actin ATP hydrolysis. https://doi.org/10.1073/pnas.2122641119

Share your thoughts!

Provide feedback

Pub details

Content 5 contributors

18 references

Activity 43 discussions

0 social posts

This work is licensed under CC BY 4.0

Purpose
Motivation
Our use case: Actin
The proposed criteria and the our actin identification pipeline
Computational method for actin identification
Applying the pipeline
Findings
Sequence conservation shows clustering of “true” actins and other proteins
Structural conservation does not align well with sequence conservation, highlighting the need for multiple analyses
Actin’s polymerizing function is less conserved than its ATP-binding ability
Key takeaways and conclusions
Key takeaways
Conclusions
What do you think?

Prachee Avasthi

Conceptualization, Supervision

Brae M. Bigge

Conceptualization, Data Curation, Formal Analysis, Visualization, Writing

Feridun Mert Celebi

Validation

Megan L. Hochstrasser

Editing, Visualization

Taylor Reiter

Data Curation, Software, Validation

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990). Basic local alignment search tool. https://doi.org/10.1016/s0022-2836(05)80360-2

STATES DJ, GISH W. (1994). QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification. https://doi.org/10.1089/cmb.1994.1.39

Goodson HV, Hawse WF. (2002). Molecular evolution of the actin family. https://doi.org/10.1242/jcs.115.13.2619

Kato-Minoura T, Uryu S, Hirono M, Kamiya R. (1998). Highly Divergent Actin Expressed in aChlamydomonasMutant Lacking the Conventional Actin Gene. https://doi.org/10.1006/bbrc.1998.9373

Katoh K. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. https://doi.org/10.1093/nar/gkf436

Katoh K, Standley DM. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. https://doi.org/10.1093/molbev/mst010

http://hmmer.org/documentation.html

Perrin BJ, Ervasti JM. (2010). The actin gene family: Function follows isoform. https://doi.org/10.1002/cm.20475

Dominguez R, Holmes KC. (2011). Actin Structure and Function. https://doi.org/10.1146/annurev-biophys-042910-155359

Bork P, Sander C, Valencia A. (1992). An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins. https://doi.org/10.1073/pnas.89.16.7290

Chou SZ, Pollard TD. (2019). Mechanism of actin polymerization revealed by cryo-EM structures of actin filaments with three different bound nucleotides. https://doi.org/10.1073/pnas.1807028115

Karthik Narayan on Feb 21, 2025

Could this method be used to find actin binding partners in different species? I would imagine that using a similar residue specific binding site analogy would be an interesting way to find novel homologs.

Brae M. Bigge on Feb 21, 2025

Thanks for the question! I do think that a method like this could be useful for finding binding partners. You could look at the conservation of actin residues that interact with the binding partners, or you could use this to guide your search. For example, organisms with more closely related actins may have more closely related binding partners, so if you knew the binding partners in one organism, you could use it to guide your search in others.

Pankaj Dubey on Mar 17, 2025

I really liked the innovative approach to studying proteins by combining evolutionary diversity with structural and functional similarities. I'm especially curious about how many of the identified actins actually exhibit ATP hydrolysis and polymerization in experiments, as this would help evaluate the method's success. This approach could also be valuable in discovering new CRISPR-Cas enzymes with better target recognition and cutting accuracy, potentially improving gene-editing technology.

Brae M. Bigge on Mar 17, 2025

Hi Pankaj, thanks for the comment! We agree that adding experimental data would be useful for this kind of analysis, but we’ve iced our efforts related to this for now. It could be interesting even to just mine the literature for experimental data and see where it falls within our distributions.

Brae M. Bigge on Dec 12, 2022

Selection

rrelated, highlighting the need for evaluating multiple criteria instead of relying on one (). The conservation of residues involved in polymerization is well-correlated with global sequence identity and shows similar clustering of true actins and other proteins (). Generally the residues that bind ATP are well-conserved (). ConclusionsBased on the

*Feedback from the Cell Bio 2022 Conference*
Some biochemical or biological validation would be really interesting, especially for the polymerizability. Could we see if the true physical cutoff for polymerizability lines up with our computational one?

Brae M. Bigge on Dec 12, 2022

This is definitely something that I’ve been considering. I would love to choose a few candidate proteins from throughout the distribution and characterize their polymerization to see where the real breaks are in ability to polymerize. This might even be something we can start to do with data that already exists. For example, we know that certain divergent actins polymerize less efficiently, we could see where those map on our distribution.

Seemay Chou on Dec 02, 2022

Selection

proteins are actins as opposed to actin-like proteins or actin-related proteins. So we investigated how existing annotations fit with the data presented here. We extracted gene names for the proteins that were listed on UniProt and parsed these into one of

I would be interested to see the inverse too - how your own independent predictions based on your three criteria map onto existing annotations. Would it be possible to throw out some sort of quantitative prediction for each candidate based on your different metrics and how heavily you weigh them? For example, if you think polymerization is most important, in theory that should contribute to your predictive “score” more. So now X gene was scored 92/100 or something based on your priorities. Where is Y gene, although was annotated as actin-like, actually only scored 41/100 or something. I know it’s a bit scary to try and guess through some score, but we could treat it more like a training game. Play around with an algorithm to make some guesses and see how they match up over time as more experiment analyses emerge?

Brae M. Bigge on Dec 12, 2022

I think something along these lines is in the works for version 2 of this pub! It does seem that some of the criteria (global sequence identity and conservation of the residues involved in polymerization) seem more important for deciding whether a protein is or is not an actin. It would be super cool if we could use this Information to create scores that can help us actually define some boundaries that make a protein an actin vs an actin related protein.

Brae M. Bigge on Dec 12, 2022

Selection

ns that lie right on the boundary between “true” actins and actin-like or actin-related proteins. Unsurprisingly, the first thing we did was a BLAST search against the NCBI non-redundant database using human ß-actin. This gave us a list of about 50,000 related proteins, but we realized there are no clear rules abou

*Feedback from the Cell Bio 2022 Conference*
Could we also include proteins that we select based on structural similarity instead of just sequence similarity?

Brae M. Bigge on Dec 12, 2022

Selection

human ß-actin that had non-gapped alignments to our reference human ß-actin and found that overall, the ability to bind ATP seems to be more conserved than the ability to polymerize (, C–F). Even some proteins with relatively low percent global identity or structural similarity s

*Feedback from the Cell Bio 2022 Conference*
Are there conserved proteins that we know don’t bind ATP?

Brae M. Bigge on Dec 12, 2022

Selection

on (, E). We compared the structural conservation to the sequence conservation, and interestingly, did not see a strong correlation in structural conservation and sequence conservation (, F). Specifically, there is a density in the high sequence conservation space that has a huge ra

*Feedback from the Cell Bio 2022 Conference*
We should include an outgroup for structure analysis. The reason we’re seeing such a weird distribution in the bivariate analysis might be because all of the structures are too similar so we’re getting a “zoomed in” view of a larger distribution.

Brae M. Bigge on Dec 12, 2022

Selection

ve found the residues involved in each of those types of contacts (, B) . Using this information, we created a computational program that looks for conservation of those particular residues as a way to determine if polymerization of a potential actin is likely.We ran our candidate actins through this program, and got results for 20,095 candidate actins that

*Feedback from the Cell Bio 2022 Conference*
Could we weight the polymerization residues based on how important they are for polymerization? Presumably not all residues are equally important, could we determine which are more important and then weight our score based on that?

Brae M. Bigge on Dec 12, 2022

Selection

make it useful to your own research? Are there ways that we could take this beyond studying actin?We hope that this list of criteria will be useful for researchers studying actin and cytoskeletal proteins. Do you feel that these criteria could be useful to of whether a protein is an actin or not? Are ther

*Feedback from the Cell Bio 2022 Conference*
It would be interesting to see if all the organisms that contain “true” actins also have profilin (which is essential for regulating the amount of monomeric actin in cells).

Brae M. Bigge on Dec 12, 2022

I really love this idea! Another thing we could consider is looking at the conservation of the target binding cleft where lots of actin binding proteins bind.

Brae M. Bigge on Dec 12, 2022

Selection

n help researchers determine whether or how their proteins of interest fit within those families. Using this pipeline, we performed analyses of all of the candidate actins that came up when we did a BLAST search of human ß-actin (limiting the output to the first 50,000 sequences). Of these 50,000 initial BLAST matches, 2,363 failed to download from NCBI with eutils (error invali

*Feedback from the Cell Bio 2022 Conference*
Is this analysis biased toward any particular organisms?

Brae M. Bigge on Dec 12, 2022

We searched the non-redundant NCBI database with no taxonomic limits, so our search is not biased. However, there are organisms and regions of the tree that have been sequenced more often than others, so there could be some unavoidable bias in our resulting sequences due to that.

Lakshmeesha Kempaiah on Dec 19, 2022

Can a ‘reverse/retro’ BLAST analysis be included in the pipeline?

Eg. If I use human beta-actin and do a BLAST search, I may get multiple hits for a given organism that is rank-ordered based on sequence homology. If I take the first hit and BLAST it against the human genome will I still get beta-actin as the first hit or will I get an actin-related protein?

I have seen this in my work on enzymes of the haloacid dehalogenase superfamily (that are predominantly phosphatases). A research group used the thiamine monophosphate phosphatase sequence and did a BLAST search against the Plasmodium genome and found a putative homolog and assigned it as such. But when we subjected the sequence of the homolog to BLAST against the Human genome, the first hit was phosphoglycolate phosphatase. Experimental characterization showed it was indeed phosphoglycolate phosphatase.

Reverse BLAST can be helpful in ruling out false positive hits.

Brae M. Bigge on Jan 17, 2023

Thanks for the comment! I think this is a great idea and could be really useful as a validation tool to show that what we find in the pathway is consistent with what is already know.

Brae M. Bigge on Dec 12, 2022

Selection

f functional similarity compared against global sequence identity (, C–D). For the polymerization, it is clear that “actin-related” proteins seem to be clustered in a region of low global sequence identity and also low conservation of the polymerization contact sites, while most “actins” seem to have high conservation of both (, C). “Actin-like” proteins, however, are spaced out across the distribution, suggesting that thi

*Feedback from the Cell Bio 2022 Conference*
Could we mutate all of the residues required for polymerization in an actin-related protein to match those residues found in “true” actins and make the actin-related protein polymerize like actin?

Brae M. Bigge on Dec 12, 2022

Selection

letal proteins, or even non-cytoskeletal proteins, that are expressed throughout the tree of life?How could we expand the current scope of our pipeline to make it useful to your own research? Are there ways that we could take this beyond studying actin?We hope that this list of criteria w

*Feedback from the Cell Bio 2022 Conference*
Could we include some sort of filament function like ability to form bundles?

Brae M. Bigge on Dec 12, 2022

This is a cool idea, I wonder if we could find some conserved bundler binding sites that we could use for this analysis.

Jonathan A. Eisen on Aug 01, 2023

Selection

itrarily limiting the output to the first 50,000 sequences). Of these 50,000 initial BLAST matches, 2,363 failed to download from NCBI with eutils (error invalid uid), returning empty FASTA files. So, we analyzed 47,634 candidate actins. We outline our key findings in

It would be interesting to know more about these that did not download. For example, can you try again? Was this a temporary thing (eg maybe the entries have been fixed)? Or is there something weird about these?

Taylor Reiter on Aug 03, 2023

Thanks for the comment, we’re happy to provide more details. We used the entrez-direct (version 16.2) software to download sequences from the GenBank Protein database. Here’s an example of one of the accessions that failed to download:

esearch -db protein -query "MCQ7618849.1" | efetch -format fasta > MCQ7618849.1.faa

This produces an empty file, but it fails without an error. This behavior has been persistent for about a year now, but without an error, it’s hard to track down why it is occurring. However, the accession is viewable through its URL: https://www.ncbi.nlm.nih.gov/protein/MCQ7618849.1 We created this GitHub issue to document the problem and discuss alternative download strategies to get around this issue: https://github.com/Arcadia-Science/2022-actin-prediction/issues/20

Brae M. Bigge on Aug 03, 2023

To dig into this a little more, I used the Entrez web search page (http://www.ncbi.nlm.nih.gov/Entrez/) to search for a few of the proteins because it tells you in which database your protein is found. For the ones that returned normal complete files during our initial search, there were results in the protein database. For many (but not all) proteins that returned empty files, the results appear to be in the identical protein group database and not in the protein database. Note that this is not necessarily consistent with what's actually in the NCBI protein database because many of these did show up when we searched the protein database itself. This is true for MCQ7618849.1 referenced above for example. This should also now be reflected in the GitHub issue!

Camille McAvoy on Oct 21, 2024

This is a very interesting publication that highlights the insufficiency of sequence similarity alone. How can this method be used for proteins of unknown function? Or how can it be used to predict function based on sequence and structure?

Brae M. Bigge on Oct 21, 2024

Hi Camille, great questions! This specific example is really focused on the actin family, and using this method, one could probably say whether a protein of unknown function is likely an actin or not an actin, but it’s hard to generalize beyond that with this specific tool. However, a method similar to what we do here, looking at the conservation of important resides, could be combined with our other tool, ProteinCartography, which compares protein structures to create an interactive map of protein families based on their structural similarity. This could allow you to see what proteins share structural features with your protein of unknown function and confirm that they’re function is more or less likely conserved due to active site or important residue conservation.

Anum Khan on Nov 11, 2024

Very cool idea! I was wondering do you do any analysis on how many actins are being co-expressed when you classify an actin as a “true actin”. I am thinking of the example or Arp1 which is actin like and can form short filaments as well and is only expressed in presence of a primary actin. Would it be worth comparing some of the proteins you are classifying as true actins to known ARP’s or actin like proteins in parallel?

Brae M. Bigge on Nov 11, 2024

Thanks for the question! I think it could be really interesting to look at co-expression. While we haven’t done the direct comparison that you’re talking about, we have done a more in-depth structural comparison of all of these proteins. That info can be found here. We compared every protein structure to every other protein structure and then used those comparisons to cluster them. We found (in Figure 6) that proteins do tend to separate into clusters consistent with their function or annotation. For example, there’s a cluster of proteins primarily annotated as Arp1.

Reid Gordon on Nov 15, 2024

Selection

p identify novel functions and regulation, as well as insights into how the protein family evolved. Determining whether a protein fits within a particular protein family requires characterizing the sequence, structure, and importantly, the function of that protein.We’ve outlined a series of well-defined and testable criteria for determining whether a candidate

Do you have plans to incorporate predictions for protein interactions and the phosphoproteome? For example, if a group of proteins are all targeted by the same kinase, or a group all has the same binding partner, could you define functional homology in part by shared interactions?

Prachee Avasthi on Mar 13, 2025

Hey Shane, I think that’s a good idea but also that some of the nucleators and other binding partners start to look quite different or can appear to be absent in a search when those proteins may just be divergent or another protein is serving overlapping functions. This becomes easier to detect again through structural prediction/search than sequence search but it might prove difficult to interpret if we compound assumptions in this way. That said, as long as one were to take the absence of these regulators with a grain of salt, the accounting of this could still be useful.

Brae M. Bigge on Nov 18, 2024

Thanks Reid! I think that incorporating protein interactions and PTMs could be extremely useful for better understanding the function and evolution of these proteins. We don’t have plans to add that info to this work in particular, but we’re in general very interested in protein interactions and how they are related to homology, and I like the idea of using actual functional features to learn more about protein relationships.

Shane McInally on Jan 27, 2025

I like the idea of incorporating information about interactions with different actin regulators to help identify the 'true' actins. Would it be possible to include the presence/absence of classes of different regulators? For example, do these organisms contain a homolog of an actin nucleator or disassembly factor? What I'm thinking, would be create a coarse-grained version of Fig 3 in https://doi.org/10.1242/jcs.261660 and using that to weight your predictions.

Stephen Goldstein on Jan 29, 2025

Selection

tated the regions that have been found to be involved in ATP binding and therefore ATPase function. We analyzed 32,680 of our original candidate actins that had non-gapped alignments to our reference human ß-actin that had non-gapped alignments to our reference human ß-actin and found that overall, the ability to bind ATP seems to be more conserved than the ability to poly

duplicated some words in here I think

Brae M. Bigge on Feb 10, 2025

Thanks for pointing this out!

Gliday Yuka on Jan 09, 2025

This computational pipeline for actin identification is robust but would benefit from experimental validation, particularly for borderline cases. By incorporating phosphoproteomics and machine learning, we could better predict both protein classification and post-translational modifications, creating a more comprehensive understanding of protein regulation and function across different protein families.

Brae M. Bigge on Jan 09, 2025

Thanks! I agree that experimental validation of this work could be useful. While we didn’t get to that, we did follow up this work in a pub where we clustered these proteins based on their structural similarity. That clustering could be an interesting place to apply some data relating to protein regulation and modifications.

Josie Bircher on Jan 28, 2025

This is really neat work! I have two questions/comments:

I wonder if cross-referencing cell viability data (though surely it is not as abundant as genome sequencing) might be useful, or a pretty straightforward way to screen some of the hits that might be borderline actins. I know that actin is essential for cell viability (in many organisms), but perhaps some actin related proteins, while important for aiding polymerization or forming higher order structures, may not be strictly essential. It might create an additional interesting metric related to function.
I would be interested to see how many hits in each class of actin there are per organism. I am not sure what the ratios are in species which have validated actins, but I would assume there may be fewer variants of true, polymerization-competent actins (or they may be more tightly related) than variants of actin related proteins. Along those lines, I think adding in another set of data related to expression levels or proteomics-level abundance data could be informative. It seems like ‘true actin’ may exist in much higher quantities in the cell compared to other actin regulators/actin related proteins and that could help tease apart how potential actins may fall into distinct categories.

Brae M. Bigge on Jan 28, 2025

Thanks Josie! For your first question, incorporating some kind of simple phenotypic information, like cell viability, could be useful in discerning true actins from other proteins. However, some actin-related proteins are also important for cell viability. Additionally, as you mentioned in your second point, many organisms have multiple copies of actins, so there may be complications caused by compensation. The number of true actins in an organism varies widely across species, and so does the number of actin related proteins. Combining that with expression levels or abundance levels could help provide a fuller picture of this protein family and how’s it’s evolved.

Stephen Goldstein on Jan 29, 2025

Selection

uery proteins are to our reference protein.Of the 50,000 candidate actins we identified via BLAST, we analyzed structures of those that were on UniProt and had structures determined by AlphaFold (17,036). After compiling these scores, we found a distribution of structural conservation (, E). We compar

Did you consider whether among the other 33,000 there were ones worth folding and adding to the structural conservation analysis?

Brae M. Bigge on Feb 10, 2025

Thanks for the question! We didn’t fold any additional actins for our structural analysis. I haven’t done an exhaustive search, but most of the proteins that weren’t included dropped out because they failed to map to UniProt entries. However, I have found quite a few of them in UniParc (UniProt Archive). Further analyzing why some of these proteins aren’t included in UniProt could help us expand our list of proteins in the structural analysis or help us refine the list that we use in the rest of the analyses for actin and future proteins we study.

Kyle Lopez on Feb 06, 2025

Are there any high throughput ways to validate the functional differences experimentally?

Brae M. Bigge on Feb 10, 2025

Thanks for the question! We did plan to test the results of this analysis in the lab and had some assays in mind, including a pyrene actin assembly assay (https://www.cytoskeleton.com/cs-ap07). However, we pivoted to a new protein family as the purifications proved to be quite challenging and low throughput.

Contributors (A-Z)

Purpose

Share your thoughts!

Motivation

Our use case: Actin

The proposed criteria and the our actin identification pipeline

Computational method for actin identification

Applying the pipeline

Findings

Sequence conservation shows clustering of “true” actins and other proteins

Structural conservation does not align well with sequence conservation, highlighting the need for multiple analyses

Actin’s polymerizing function is less conserved than its ATP-binding ability

Polymerization

ATPase activity

Key takeaways and conclusions

Key takeaways

Conclusions

What do you think?

References

Share your thoughts!

Provide feedback

Pub details

Table of contents