Skip to main content
SearchLoginLogin or Signup

Harnessing genotype-phenotype nonlinearity to accelerate biological prediction

It is commonly assumed that phenotypes arise from the cumulative effects of many independent genes. However, we show that by accounting for dependent and nonlinear biological relationships, we can generate models that predict phenotypes with great accuracy.
Published onSep 22, 2023
Harnessing genotype-phenotype nonlinearity to accelerate biological prediction

You're viewing an older Release (#1) of this Pub.

  • This Release (#1) was created on Sep 22, 2023 ()
  • The latest Release (#4) was created on Jun 08, 2024 ().


A core focus of genetics is understanding the relationship between genetic variation (genotypes) and biological traits (phenotypes). Efforts as diverse as tracing the evolution of complex phenotypes, identifying disease-causing genes, and understanding how organisms are built are all contingent on deciphering the mapping between genotype and phenotype.

Our results show that assumptions underlying many current genotype-phenotype models (namely that genotypes are additive and linear) do not reflect the nonlinearities present in biology. Non-additive relationships between genes are well known — one gene can influence the effects of another (epistasis), and some genes have multiple phenotypic effects (pleiotropy). By accounting for such nonlinear interactions between genes and phenotypes, we show that we are able to accurately predict suites of simulated phenotypes.

These findings should be of interest to anyone whose work relies on accurately modeling genotype-phenotype relationships, especially those in the fields of quantitative, population, and human genetics. Additionally, we are excited to get feedback on how this work might help contribute to these fields and possible refinements or extensions of its utility.

Share your thoughts!

Watch a video tutorial on making a PubPub account and commenting. Please feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or use the URL for this page in a tweet about this work. Please make all feedback public so other readers can benefit from the discussion.

Background and goals

For several centuries now, scientists have attempted to decode how biology emerges from genetic information. Some of the models that have come out of this assume that traits are associated with infinitesimally complex genetic bases [1]; others hold that distinct clusters of few, but highly impactful genes drive each aspect of an organism’s growth and function [2]. Many (if not most) of these models share a common feature: no matter their complexity, phenotypes can be understood by simply adding up the effects of their genetic contributors.

A single human trait best demonstrates this tendency: height. Dozens, if not hundreds, of genetic studies have been conducted with the goal of mining ever greater amounts of genetic “gold dust” [2] predicting variation in height [3][4]. Through these efforts, two things have become clear: 1) height is highly heritable (> 80% by a recent study) [4] but 2) so much of the genome is involved that identifying a discrete molecular basis seems extremely unlikely.

In response to findings such as these (along with those gleaned from other human traits), some researchers have begun favoring the use of “polygenic (risk) scores” [5]. By aggregating the effects of many genomic loci, these scores can explain increasing amounts of trait variation (albeit at the cost of biological interpretability). Similarly, the “omnigenic model” [6] proposes that small sets of core, trait-determining genes work in parallel with many other “peripheral” genes. Through their sheer number, these peripheral loci thus also substantially contribute to trait variation. Importantly, in this model, contributions of core and peripheral genes are entangled and can’t be distinguished, ultimately implicating vast swaths of the genome. Both polygenic risk scores and the omnigenic model assume the same inevitable conclusion: identifying the molecular drivers of complex traits is exceedingly difficult, if not impossible [7]. But what if the problem isn’t how we are conceiving of the genetic bases of complex traits; what if the problem actually has to do with how we are thinking of traits themselves?

Height really is a complex trait, one that involves a myriad of interacting biological processes (development, metabolism, physiology, and so on). Each process is, in turn, regulated by its own sets of genes and it is likely that at least some of these gene sets relate to each other in complicated ways. Indeed, many complex phenotypes result from a variety of interconnected, nonlinear biological networks and genotypes that can relate to each other in a variety of directions (e.g., linear, nonlinear) and manners (e.g., additive, subtractive, dominant) [8][9]

We believe that these points, if considered seriously, may have substantial implications for genetics. How often are phenotypes and genotypes nonlinearly correlated relative to being purely additive? Can information about any one phenotype help to predict another? Does accounting for phenotypic relationships increase predictive power? With these questions in mind, we decided to see if an approach explicitly capturing the complex relationships among phenotypes might provide some useful insights for genetic analysis writ large. Specifically, we focus on simultaneous analysis of groups of multiple phenotypes (here referred to as “polyphenotypes”). We examine how levels of pleiotropy (the impact of single genes on multiple phenotypes) and gene-gene interaction (the non-linear impact of combinations of genes on phenotypes) structure polyphenotypes.


Any grouping of multiple organismal phenotypes. Here, we argue that polyphenotypes have multiple uses. On one hand, they can be used to understand the relationships between multiple phenotypes. At the same time, they are also useful for developing accurate predictions of any single phenotype (by allowing one to control for potential false positives/negatives arising from phenotypic relationships).

The approach

For reasons both causal and correlative, phenotypes co-vary. For example, as referenced above, height is likely correlated with other phenotypes such as mass or metabolic rate. In this pub, we quantify the nature and prevalence of phenotype-phenotype relationships within large groups of phenotypes (what we refer to as polyphenotypes) to gain insight into the processes that cause these phenotypes. We find non-linear relationships among phenotypes in natural populations to be widespread. Furthermore, we find that, in simulations, the degree of non-linear phenotypes is modulated by the degree of gene-gene interaction and pleiotropy. We then demonstrate that, where present, phenotype-phenotype relationships can be leveraged to increase prediction accuracy for individual phenotypes.

Data collection/generation

All the data we used to study empirical variation across sets of phenotypes are publicly available. Sources and details for these data are available in Table 1. We chose data sets on the basis of phenotype number, sample size, and population type. We sought data sets in which a minimum of 15 phenotypes were measured for at least 100 individuals of the same species or interspecies cross. We also generated a set of random, unrelated phenotypes to compare with the observed phenotypic relationships contained within these datasets. To do so, for a single “phenotype”, we randomly generated integer values (values could be any integer between 1 and 1,000) 600 times. This process thus resulted in 600 simulated “individuals”, each with a randomly chosen phenotypic value. This was repeated to ultimately generate 20 simulated phenotypes, each composed of 600 individual observations. After filtering on data completeness (see below for details on data set-specific filtering) we imputed missing values using the mean value for each phenotype and then performed rank normalization using the R function RankNorm from the package RNOmni.

Below are descriptions of data set-specific filtering. We tailored filtering parameters to each study given variation in sample size and the rate of missing data.

Arabidopsis: We excluded individuals if they had NAs for more than 20 phenotypic measurements. Similarly, we excluded phenotypes with more than 20 NAs. In addition, we removed non-continuous phenotypes (at least five unique values required per phenotype). 

Yeast: We removed non-continuous phenotypes (at least five unique values required per phenotype).

C. elegans: We removed non-continuous phenotypes (at least five unique values required per phenotype).

Mouse (AIL): We removed non-continuous phenotypes (at least five unique values required per phenotype).

Mouse (JAX): We excluded samples that were missing more than 100 measurements. Similarly, we excluded phenotypes missing more than 100 measurements. In addition, we removed non-continuous phenotypes (at least five unique values required per phenotype).

Fruit fly: We excluded samples that were missing more than 50 measurements. Similarly, we excluded phenotypes missing more than 50 measurements. We reduced the dimensionality of gene expression values from Huang et al. 2015 [10] using PCA (we extracted the first 30 PCs). In addition, we removed non-continuous phenotypes (at least five unique values required per phenotype).


Main reference


N samples

N phenos



Natural strains (accessions)





F1 segregant



C. elegans


Recombinant inbred lines (RIL)



Mouse (AIL)


Advanced intercross line (AIL)



Mouse (JAX)


Laboratory strains



Fruit fly


Inbred lines




This pub



Table 1. Data sets we used for empirical analyses of nonlinearity.
All of these data are available on Zenodo. 

We simulated 100 phenotypes for 121 populations (N individuals per population = 500) These populations were created by first simulating genetic data and deriving the phenotypes from these genotypes. For each individual, we randomly assigned one of three allelic states at each of 300 loci (e.g., homozygous reference, heterozygous, homozygous alternate). Then, we generated a genetic architecture for each phenotype, by randomly assigning 100 loci to that phenotype and giving each allelic state at each locus a weight of influence between zero and 10.

We modeled effects of pleiotropy and gene-gene interaction on phenotypes, varying the impact of each systematically across populations such that each population had a unique pairing of the probability pleiotropy and the probability of gene-gene interactions. These probabilities were per gene-phenotype pair or gene-gene pair and ranged from 0–1 in increments of 0.1, thus forming an 11 by 11 grid with one simulated population for each pairing. For example: population 1 has probabilities P(pleiotropy) = 0, P(epistasis) = 0; population 2 has probabilities P(pleiotropy) = 0.1, P(epistasis) = 0; and so on.

To model pleiotropy, for each individual population for each phenotype, we assigned each locus already determined to influence a phenotype (100 loci per phenotype, 9900 locus-phenotype pairs) to be involved in pleiotropy with a population specific-probability as defined above. If we determined the gene-locus pair to be involved in pleiotropy, the weights assigned to that locus were included in the calculation of that phenotype. Similarly, to create gene-gene interactions (e.g., epistasis) that varied across populations, we assigned each gene-gene pair (N = 4,950) to be involved in an interaction with a population-specific probability as defined above. If we determined a locus was involved in an interaction, we randomly assigned that interaction to one of the six possible pairs of alleles (i.e., interaction among loci here occurs only between single pairs of alleles). We then multiplied the weights of those alleles. Finally, we calculated the phenotypes for each individual by summing the weights at loci influencing that phenotype.

Creating the autoencoder

We implemented a neural network called a denoising autoencoder to test the utility of examining multiple phenotypes for phenotypic prediction [19]. Autoencoders consist of two networks — first, an encoder that forces the data through an information bottleneck, and then a decoder that takes information compressed through that bottleneck and tries to reconstruct the data. Accurate reconstruction of the data following the compression through the information bottleneck suggests that the network has learned a representation of that data. During training, noise is added to the input data, preventing a common failure mode in which the learned representation does not extrapolate to data that was not in the training set; the learned model fails to generalize.

All code associated with this pub — including analysis notebooks, the synthetic phenotype generator, and the autoencoder model — is available in this GitHub repository (DOI: 10.5281/zenodo.8371249).

Briefly, our autoencoder consisted of an encoder with two fully connected rectified linear layers that we subjected to batch normalization and a similarly structured decoder. The latent space separating the two networks contained 32 nodes. We conducted the training with phenotypic data from 80% of the simulated individuals over 100 epochs with a batch size of 16. To all training data, we added 0.1 standard deviation of noise. Following training, we predicted phenotypes on the remaining 20% of the data. To evaluate the utility of increasing the number of phenotypes under different values of pleiotropy and interaction, we trained individual models using 5, 10, 20, and 30 phenotypes, and evaluated the model accuracy on five phenotypes. We calculated prediction error as the mean absolute percentage error. We implemented the autoencoder in PyTorch [20].

Analysis of empirical phenotypes

After filtering, imputation, and rank normalization (see “Data collection/generation” section) we computed the frequency of nonlinear phenotypic relationships for each data set. To do so, we fit a linear and a nonlinear model for all possible phenotypic pairs within the data set. We generated the linear model with a linear regression (lm function in R). W generated the nonlinear model using a generative additive model (gam function in the R package mgcv) with a single smoothing spline term (via the mgcv function s). We compared model fits using the Akaike information criterion (AIC) and considered three possible outcomes: a tie (equal AIC), the nonlinear model is a better fit (nonlinear = lower AIC), or the linear model is a better fit (linear = lower AIC). We then calculated the frequency of nonlinearity from the ratio of the number of cases in which the nonlinear model had lower AIC compared to the full number of phenotypic comparisons.

Given the possible diversity of phenotypic relationships within any given data set, and to facilitate the measurement of variance in nonlinearity rates, we used a permutation-based approach to calculate nonlinearity across subsets of each data set. To do so, we calculated the nonlinearity rate for 1,000 random sets of phenotypes for each data set (data proportion per random set = 0.25). We visualized this distribution using violin plots (as in Figure 1, A). We then measured the variation of these permutation distributions using the R function var (as in Figure 1, B) and calculated the correlation between all phenotype pairs using Pearson’s correlation (as in Figure 1, C).

Analysis of synthetic phenotypes

To further dissect patterns of phenotypic nonlinearity, we generated 10,201 phenotypic matrices spanning possible combinations between gene-gene interaction and pleiotropy probabilities (each ranging from zero to one in increments of 0.01). We measured the nonlinearity rate of each phenotypic matrix using the same approach outlined above. We visualized the distribution of nonlinearity as a function of gene-gene interaction and pleiotropy by creating a generalized additive model (GAM). Here, nonlinearity was treated as a response variable predicted by gene-gene interaction and pleiotropy and was implemented using the gam function in the R package mgcv (as in Figure 2, B). The predicted nonlinearity values are visualized in two dimensions, representing all possible combinations gene-gene interaction and pleiotropy probabilities.

We next wanted to characterize the entropy of full phenotypic data sets. Taking influence from the phenotypic integration literature [21], we first calculated the “generalized variance” (the determinant of the variance-covariance matrix) for each phenotypic matrix. Generalized variance is a useful measure in that it allows us to directly compare phenotypic data sets with different dimensionalities [21]. To extract a single-vector descriptor, we then calculated the eigenvector of the generalized variance matrix using spectral decomposition (eigen R function). We then calculated the entropy of the leading eigenvector using the R function entropy.empirical from the R package entropy [22]. We could thus use the resulting entropy estimate to infer the overall information contained among an arbitrarily large set of phenotypic measurements.

We next developed a method to infer the correlational structure of a phenotypic set by calculating entropy across increasingly large, random subsets of phenotypes. Broadly, this method sweeps through pre-set portions of a data set, randomly selects a set of phenotypes for each portion, and calculates entropy using the method described above. We applied this test to phenotypic matrices with varying probabilities of pleiotropy (probability zero to one, 0.01 increments) by calculating entropy for increasingly large proportions of samples (10% to 90%, 10% increments). For each portion, we analyzed 10 permuted sets of phenotypes and calculated their mean entropy. The results of this analysis appear in Figure 3, A. We extracted slopes of the resulting entropy distributions from a linear regression (lm function in R) comparing pleiotropy probability and entropy (Figure 3, B).

SHOW ME THE DATA: You can find all the data used in this pub, including empirical and simulated phenotypes, on Zenodo (DOI: 10.5281/zenodo.8298808).

Autoencoder analyses

We calculated autoencoder prediction error as the mean absolute percent error between prediction and ground truth. We conducted autoencoder training using 80% of the individuals in the data set and evaluated accuracy on the remaining 20%.

We calculated entropy as above using the same parameters (portions: 10% to 90% of samples in 10% increments; 10 permutations per portion).

All code associated with this pub — including analysis notebooks, the synthetic phenotype generator, and the autoencoder model — is available in this GitHub repository (DOI: 10.5281/zenodo.8371249).

The result(s)

Nonlinearity is prevalent among biological traits

To our knowledge, it remains unclear just how common additivity and linearity are in genetic systems. To address this, we compiled a data set of “polyphenotypes” (see definition above) from a diverse set of interbreeding species populations (see Approach for details). We reasoned that inferring the rate of nonlinear phenotypic relationships would allow us to glean how well linear/additive models would fit these populations.

Using a simple test (see Approach) we determined the (non)linearity of pairwise phenotypic relationships within each species population. We found that all species display rates of nonlinearity that are significantly greater than expected by chance (p < 0.001, Kruskal-Wallis test) (Figure 1, A), ranging from 43.5% (fruit flies) to 84.4% (Arabidopsis) (Figure 1, A). These observations support the idea that nonlinearity is a prevalent feature of biological phenotypes and contributes to a substantial portion of species’ phenotypic relationships.

The range of nonlinearity also differs greatly across populations. For example, nematodes display almost 10× more variation in phenotypic relationships than fruit flies (C. elegans = 0.19, fruit flies = 0.029; mean normalized standard deviation) (Figure 1, B), while randomly generated data display the greatest degree of relative variation (0.39; mean normalized standard deviation). Interestingly, these randomly generated data should be largely independent of each other and, thus, may be considered representative of a set of non-pleiotropic, additive traits. Supporting this idea, we found that the mean pairwise correlation of the random phenotypes is significantly less than that of the species data (Figure 1, C). Overall, these observations suggest that complex aspects of phenotypic relationships may be inferred using a set of relatively simple descriptive statistics.

However, given their heterogeneity, determining how epistasis and pleiotropy might affect the frequency of phenotypic nonlinearity is hard using these datasets. Some data come from advanced genetic crosses (e.g. the DGRP and JAX data) while other data sets sampled variants from a diverse natural population (Arabidopsis). In addition, the polyphenotypes reflect the interests of the original studies and, therefore, occupy somewhat random and undetermined regions of phenotypic space. Therefore, while it is apparent that nonlinearity exists in a variety of quantitative genetic data sets, it is difficult to use these data to develop strong intuitions about its sources.

Figure 1

Characterizing phenotypic nonlinearity across species.

(A) Half-violin plots comparing the percentage of nonlinear relationships among phenotypes for six species and a random data set. For each species, we present full sets of data points to the left and show the distribution in the half-violin plot on the right.

(B) The degree of variation in nonlinear relationships for each species.

(C) Half-violin plots comparing the distribution of phenotypic correlations for each species.

Synthetic phenotypes let us interrogate nonlinear effects

With this in mind, we sought to control many of these covariates to allow more direct interrogation of pleiotropy, epistasis, and phenotypic structure. Using a novel approach, we generated a series of polyphenotypes from simulated genotypic data. Briefly, we generated n random genotypes from a probability distribution for a given number of individuals (see Approach for a more in-depth description). Each genotype could influence an output phenotype given a set probability distribution and could interact with others via a predefined probability. We also allowed the resulting polyphenotypes to correlate given a certain probability, letting us vary the amount of epistasis (probability of gene-gene interactions) and pleiotropy (probability of phenotype-phenotype interactions) in the data. Using this approach, we generated a data set in which all combinations of epistatic and pleiotropic probabilities were considered (from P = 0 to P = 1, 0.01 increments). This produced a final set of 10,201 polyphenotypes, each containing 20 synthetic phenotypes measured across 600 simulated individuals.

A main goal in generating this synthetic data set was for it to capture a broad range of nonlinear relationships. To assess how well the data set accomplishes this, we used the same test as above (see Figure 1; Approach) to calculate the rate of nonlinearity for each of the 10,201 polyphenotypes. Notably, these percentages span the values observed among the empirical phenotypes, with a mean nonlinearity rate of 47.53% (min = 16%, max = 100%; Figure 2, A; Figure 1, A). In addition, nonlinearity varies smoothly across the distribution of pleiotropy/gene interaction probabilities (Figure 2, B). Together, these observations suggest that our data generation approach successfully produced a naturalistic range of nonlinearity from which to sample.

Figure 2

The distribution of nonlinearity within a simulated phenotypic data set.

(A) Histogram of nonlinearity for all 10,201 simulated phenotypic data matrices.

(B) The 2D distribution of nonlinearity (represented by color) as a function of pleiotropy and gene-gene interaction probabilities. We calculated the distribution using a generative additive model.

Entropy and nonlinearity capture diverse phenotypic interactions

How can we best identify drivers of biological nonlinearity? Can we statistically decouple the effects of different genetic and phenotypic interactions? Motivated by our efforts to apply information theory to genetics (more on this in a companion pub coming soon), we hypothesized that we may start to untangle some of the drivers of nonlinearity by measuring the information content (or “entropy” as defined in information theory) of polyphenotypes. We reasoned that entropy may be informative in multiple ways. First, overall entropy reflects the interrelatedness of polyphenotypes. Lower values may reflect a set of phenotypes that are driven by the same underlying biology (e.g., multiple, correlated measurements of a trait such as finger length). On the other hand, higher values may indicate that polyphenotype data contain measurements from multiple, orthogonal features of biology (e.g., finger length and education level). The second way entropy may be informative is through its distribution across different portions of a polyphenotype. Consider the case of completely orthogonal phenotypes. If we select random combinations of orthogonal phenotypes and measure their entropy, it should be the case that entropy proportionally increases as we analyze larger and larger sets of phenotypes (i.e., more new information is being added with each increase in the number of randomly chosen phenotypes). In contrast, for a set of strongly correlated phenotypes (e.g., in the case of pleiotropy), one should expect entropy to stay constant as we analyze larger sets of the phenotypes (i.e., no new information is added).

Applying this framework to all 10,201 polyphenotypes, we calculated entropy across increasing proportions of randomly selected phenotypes (see Approach). We found that entropy distributions strongly vary with the probability of pleiotropy. Increasing pleiotropy equates with a flattening of the distribution (Figure 3, A). This point is further demonstrated by an extremely strong relationship between entropy distribution slopes and pleiotropy (Pearson’s r = −0.95; Figure 3, B). These observations support the notion that entropy is a reliable measure of the interrelatedness of a set of phenotypes. What’s more, this suggests that, by analyzing the within-polyphenotype distribution of entropy, we may infer the amount of phenotypic pleiotropy with minimal knowledge of the underlying genetics.

Are similar measures available for determining the frequency of gene-gene interactions? Taking a hint from the previously identified relationship between nonlinearity and gene-gene interactions/pleiotropy in Figure 2, we found that nonlinearity varies strongly in the absence of gene-gene interactions but decreases in dynamic range as interaction probability increases (Figure 3, C). Furthermore, comparing the relationship between the entropy slope and percentage of nonlinearity reveals an interesting trade-off between gene-gene interactions and pleiotropy (Figure 3, D). There is an overall negative relationship (Pearson’s r = −0.82) suggesting that, as pleiotropy increases, so too do nonlinear phenotypic interactions. Furthermore, as gene-gene interactions increase (as indicated by point color in Figure 3, D), the variance of entropy/nonlinearity relationships decreases.

Taken together, these results suggest that pleiotropy leads to increasingly nonlinear phenotypic relationships, especially in the absence of genetic interactions. Furthermore, we can study this trade-off via entropy and nonlinearity, which are both non-genetic measures. Finally, these patterns indicate that phenotypic nonlinearity — like that observed both here and among real phenotypes (Figure 1) — also reflects genetic nonlinearities, hinting at potential insufficiencies of additive/linear models for capturing the genetic components of biological traits.

Figure 3

Characteristics of gene interaction and pleiotropy variation.

(A) Shannon entropy calculated across data set proportions for all pleiotropy probabilities. Entropy is measured in bits.

(B) Scatter plot of the slopes for each distribution plotted in Figure 3, A. r: Pearson’s correlation.

(C) The percentage of nonlinear interactions between phenotypes as a function of pleiotropy probability (x-axis).

(D) The joint distribution of entropy slope and % nonlinear interactions for all 10,201 synthetic phenotype matrices. r: Pearson’s correlation.

We indicate pleiotropy probability (A and B) and the probability of genetic interactions (C and D) using a color scale. 

Accounting for phenotypic nonlinearity significantly increases predictive power

If genetic nonlinearities are truly prevalent across many types of biological traits, which types of models might be better suited for capturing their effects? Neural network-based strategies are enticing options for several reasons. Neural networks inherently learn nonlinearities across their layers, letting them model complex interactions between inputs and outputs (e.g., between phenotypes and genotypes). In addition, they can model multiple inputs and outputs, facilitating nonlinear mapping of multiple phenotypes at once. We therefore hypothesized that neural network strategies might help us determine the benefit of accounting for complex, nonlinear interactions between phenotypes.

To do this, we constructed an autoencoder for modeling and predicting phenotypic relationships (see Approach; Figure 4, A). Taking simulated polyphenotypes as input, the model encoded phenotypic relationships into a lower-dimensional latent space and generated predictions via a decoder (Figure 4, A). We used this strategy to predict aspects of all 10,201 polyphenotypes. Specifically, we generated four sets of predictions for each, varying the number of input phenotypes (n = 5, 10, 20, 30) used to predict an output set (n = 5; Figure 4, A). We then assessed the accuracy of each model by calculating the percent error between observed and predicted phenotypes. 

We found that the autoencoder approach predicted phenotypes with extremely high accuracy (mean error = 1.76%; Figure 4, B) and that models become more accurate when the number of input phenotypes increases (Figure 4, B). In fact, all models using more than five input phenotypes display significantly decreased error distributions (Kruskal-Wallis test followed by Dunn’s test; Figure 4, B). It’s also apparent that the error distributions themselves display a degree of heterogeneity, with some clear outliers displaying error percentages above 4% (Figure 4, B). Plotting percent error as a function of pleiotropy shows that these outliers are associated with cases in which the probability of pleiotropic interactions was very low (Figure 4, C), indicating that pleiotropic interactions can help increase the accuracy of multi-phenotype models. More broadly, it’s apparent that by accounting for nonlinearities such as pleiotropy, autoencoder strategies are able to predict individual phenotypes with great accuracy.

Figure 4

An autoencoder for phenotypic prediction.

(A) Cartoon overview of the autoencoder architecture and analytical design. We used phenotypic matrices of different sizes (5, 10, 20, 30 phenotypes) as input to predict an output set of five phenotypes.

(B) Violin plots comparing percent error for all predictions from the four different model classes (5, 10, 20, 30 phenotypes). Asterisks indicate significant differences according to a Kruskal-Wallis test followed by Dunn’s test.

(C) Model error as a function of pleiotropy probability for the four model classes. 

Finally, we explored whether these models could generate realistic polyphenotypes. To do so, we measured the entropy of each set of predicted phenotypes. Overall entropy decreases as the probability of pleiotropy increases (Figure 5, A), reflecting the patterns observed among the input phenotypes (Figure 4, A). There is also a similarly tight relationship between percent error and entropy across all models (Pearson’s r = 0.78; Figure 5, B). When comparing mean percent error and entropy slope, we found a strong separation between the five-phenotype model and all others (Figure 5, C). Models with more input phenotypes display lower entropy slopes and greater degrees of model accuracy. Furthermore, the 10-phenotype model displayed the lowest error rate and entropy slope (Figure 5, C), a pattern that is also apparent in the comparisons of percent error across the models (Figure 4, B). It is interesting to consider that this may reflect something important about the structure of the synthetic phenotypes. Specifically, the 10-phenotype model may represent a better trade-off between input phenotype information content and the overall model complexity (i.e., the 20- and 30-phenotype models may just be adding redundant information).

In total, these findings suggest that the autoencoder did indeed create realistic polyphenotypes with expected entropy distributions. Given this, we conclude that models 1) accommodating polyphenotypic designs and 2) accounting for biological nonlinearity provide opportunities to greatly increase the predictive capacity of genetic analysis.

Figure 5

Characterizing autoencoder output.

(A) Entropy compared to pleiotropy probability for the four model classes.

(B) The relationship between entropy slope and model error across all runs. r: Pearson’s correlation.

(C) Entropy slope compared to mean error for each model class.

Key takeaways

  • Nonlinearity is a prevalent feature of biological phenotypes (Figure 1)

  • Phenotypic nonlinearity varies as a function of genetic and phenotypic interactions (Figure 2)

  • Measures from information theory, such as entropy, can quantify the structure of phenotypic interactions (Figure 3)

  • Models that account for nonlinearity and phenotypic interactions have increased predictive potential and improve as the number of phenotypes increases (Figure 4)

  • Model accuracy varies as a function of the information content of phenotype sets (Figure 5)


Biology is in an age of increasingly large, high-dimensional, and complex data sets. Endeavors such as AlphaFold are attempting to map the full universe of protein structures [23][24]. Similarly, a number of multi-team efforts are characterizing human cell type diversity via a host of omics and cell biological data types [25]. These data sets — and others like them — contain (or will contain) a diversity of phenotypic measurements possessing unknown and complex inter-relationships. A goal for many of these efforts will be to identify these relationships and, ultimately, use them to decode the function of complex biological systems (e.g., identifying how RNA expression, chromatin accessibility, and cell morphology interact to generate a cell type). This undertaking butts up against a statistical sampling problem: is there enough data available to power such analysis for the system you’re interested in? Put another way, have you sampled enough of “phenotypic space” to account for the biology in question?

These are hard questions to answer a priori. However, asking them is useful. If it’s possible to efficiently sample phenotypic space, minimizing measurement redundancy, then scalability and cost-effectiveness would correspondingly increase. It’s interesting to consider how the aggregated results of this pub may help. Our framework predicts that samplings of different parts of phenotypic space should be associated with correspondingly variable parameter combinations (Figure 6). We’d predict that sampling a single phenotype would be associated with uniformly low values of entropy, nonlinearity, and predictiveness (Figure 6, A). On the other hand, if multiple correlated phenotypes (perhaps due to pleiotropy) were sampled, the rate of nonlinearity and predictiveness would increase, but entropy would not (Figure 6, B; Figure 4, C). If multiple orthogonal phenotypes were measured, we’d find increased entropy and predictiveness and a modest amount of nonlinearity (Figure 6, C; Figure 4, C). Due to their numerical generality, it should be possible to measure the entropy and nonlinearity of most (if not all) polyphenotypes.

Given this, we propose that these measures may be implemented as a general-purpose toolkit for inferring the phenotypic structure, predictiveness, and even genetic patterns (i.e., gene-gene interactions and pleiotropy) associated with a given polyphenotypic data set. There are likely many useful extensions of this. For one, we can use phenotypic entropy to measure the complexity of a phenotypic data set. We could therefore determine if ongoing collection is adding new or informative dimensions to a data set. Similarly, we may use entropy and nonlinearity to estimate the number of generative biological processes associated with a polyphenotype (if one, then entropy should be low and nonlinearity high; if multiple, entropy will increase). Indeed, the entropy of a polyphenotype is the number of bits of information necessary to capture the phenotypic structure and, as a result, the generative processes that drive that structure. With these measures in hand, it’s possible to hypothesize, a priori, the structure of genetic mapping results and, by factoring in these patterns, design studies around minimally necessary and maximally informative polyphenotypes.

Figure 6

Mapping phenotypic space.

(A) (Top) A cartoon representation of phenotypic space. Here, we assume that we can use some form of high-dimensional measurement to separate and cluster phenotypes. Individual phenotypes are represented by colored shapes with increasing densities (darker color) toward the center of the shape. The amount of phenotypic space sampled is represented by the filled dotted line. In Scenario 1, measurements for only a single phenotype are available. (Bottom) Predicted measurements for total entropy, the slope of entropy, nonlinearity, and overall predictiveness (i.e., ability to predict a sampled phenotype given the amount of space that has been sampled). Values are represented as arbitrary units (AU).

(B) In Scenario 2, we’ve measured multiple correlated phenotypes (as reflected by their same color). In this case, nonlinearity increases due to the inter-phenotype correlation, as does predictiveness.

(C) In Scenario 3, we’ve sampled multiple orthogonal (differently colored) phenotypes, Here, entropy increases and nonlinearity is somewhat lower than in Scenario 2 (since orthogonal phenotypes tend to display higher rates of linear relationships). We can still predict phenotypes with accuracy, but proportionally less than in the situation of multiple correlated phenotypes.

More generally, a “phenotype-forward” framework that allows for complex nonlinear relationships between traits (as we suggest here) has the prospect of reflecting organismal structure that is likely missed when we examine phenotypes individually. For example, modeling height and weight simultaneously likely provides more biological insight and predictive ability than modeling them independently, as some of their causal mechanisms are shared. The neural network method we use here explicitly captures these relationships and has the possibility of “encoding” the generative processes for sets of phenotypes with at least partially overlapping causes.

Treating the organism as a system in this way has the potential to answer more complex questions than modeling individual phenotypes alone. Such approaches may prove critical to leveraging the increasing amount of phenotypic data to achieve better biological understanding and outcomes across a host of problems.

Share your thoughts!

Watch a video tutorial on making a PubPub account and commenting. Please feel free to add line-by-line comments anywhere within this text, provide overall feedback by commenting in the box at the bottom of the page, or use the URL for this page in a tweet about this work. Please make all feedback public so other readers can benefit from the discussion.

  • Contributors

    • Prachee Avasthi

      • Supervision

    • Feridun Mert Celebi

      • Validation

    • Megan L. Hochstrasser

      • Editing

    • David G. Mets

      • Conceptualization, Formal Analysis, Investigation, Methodology, Software, Writing

    • Ryan York

      • Conceptualization, Formal Analysis, Investigation, Methodology, Software, Supervision, Visualization, Writing

Conceptualization, Formal Analysis, Investigation, Methodology, Software, Writing
Conceptualization, Formal Analysis, Investigation, Methodology, Software, Supervision, Visualization, Writing
1 of 2
Ralph Haygood:

Where is that under "Analysis of synthetic phenotypes"? I see the adjustments of pleiotropy and epistasis, which indirectly affect correlations among phenotypes, but I don't see any direct manipulation of those correlations.

David G. Mets:

You are right that we did not directly manipulate the phenotype-phenotype correlations, we just allowed for epistasis and pleiotropy. The wording here was meant to capture the allowance for pleiotropy while the previous sentence was meant to capture the allowance for epistasis. We have changed the wording to help with clarity.

Ralph Haygood:

Non-pleiotopic, yes, but additive? If I understand correctly, for each trait, you have 600 pseudo-observations, each of which is a uniform integer deviate between 1 and 1000. That's a peculiar phenotype distribution. If the trait were additive in the conventional quantitative genetics sense (e.g., Fisher's infinitesimal model), the phenotype distribution would be at least roughly normal (possibly after a log- or other transformation), not uniform.

David G. Mets:

The simulated genetic influences entirely determine the phenotypes. Which is to say the phenotypes are not drawn from any particular distribution. The additive genetic influences on phenotype were drawn from a uniform probability distribution, then gene-gene interaction and pleiotropy were added. As you suggest, this should (per the infinitesimal model and the central limit theorem) result in phenotypes that are approximately normally distributed across the population. This is indeed what we see. Plots of the data from our repository show the simulated phenotypes to be approximately normally distributed.

Ralph Haygood:

Should this be locus-phenotype pair?

David G. Mets:

Yes, thanks for catching this! We have chanted the wording.

Ralph Haygood:

Independently? Uniformly? Either would seem an odd choice. For example, if the three values per locus are independent deviates, overdominance and underdominance are presumably common.

David G. Mets:

To clarify, the weight of influence for each allele at a given locus was drawn uniformly, and independently from other alleles and loci. Each allele was given a weight ranging from zero to 10. For example, the weights for each of three alleles at a single locus might be allele 1 = 3, allele 2 = 1, and allele 3 = 6. The additive influence of that locus for an individual who is heterozygous for allele 1 and allele 2 would be four.  We have changed the wording from “allelic state” to “allele” to make it clear we are not randomly assigning an influence to a diploid allelic state but to each of the individual alleles in the population. This approach does not result in under or overdominance where, as you suggest, randomly assigning a weight to a diploid allelic state would. Thank you for highlighting this!

Ralph Haygood:

Should this be 30, as in Table 1?

Ryan York:

Yes, thank you!

Ralph Haygood:

I like this. It brings a fresh perspective and novel approaches to some long-standing concerns of quantitative genetics.

What intrigues me most is the general idea that examining relationships among many traits simultaneously can yield insight into the prevalence of pleiotropy and epistasis and perhaps more broadly the generative processes underlying the traits. I hadn't thought about it before, but now that you mention it, I agree that in this "age of increasingly large, high-dimensional, and complex data sets", the time seems ripe for this idea. I think it could and should be pursued in several ways. For example, computational modeling could be used to explore how structural characteristics of generative processes, such as pathway topologies, affect relationships among traits.

With respect to genetics, speaking clearly about additivity vs. nonadditivity and linearity vs. nonlinearity can be challenging. For example, you write, "To our knowledge, it remains unclear just how common additivity and linearity are in genetic systems ... We reasoned that inferring the rate of nonlinear phenotypic relationships would allow us to glean how well linear/additive models would fit these populations." In the first sentence, you seem to be speaking about additivity and linearity at the level of an individual, that is, biochemistry, whereas in the second, you seem to be speaking about them at the level of a population, that is, statistics. As I expect you realize, those things are rather different. As Sella and Barton remark, "The processes that generate an individual's phenotype clearly involve highly complex and nonlinear interactions among many genes and external and internal environments. Nonetheless, the predominance of additive variance implies that (on average) the deviation of an individual's trait value from the population mean ... can be well approximated by a simple additive model ..." That model is, they note, "a descriptive statistical framework rather than a generative model". (They go on to discuss why additive effects predominate in the statistical framework, including that "most segregating genetic variation arises from alleles with small effects on the trait, which are likely to translate into additive effects.") It might be interesting to compare your results for synthetic genotypes and phenotypes to that traditional description.

Speaking of the synthetic genotypes and phenotypes, I found some aspects of the presentation difficult to follow. In particular, the relationships among the three paragraphs following Table 1, the three paragraphs under "Analysis of synthetic phenotypes", and the two paragraphs under "Synthetic phenotypes let us interrogate nonlinear effects" are unclear to me. The discussion under "Synthetic phenotypes let us interrogate nonlinear effects" seems to refer only to the data described under "Analysis of synthetic phenotypes": 10,201 populations, each containing 600 individuals, each having 20 phenotypes. That description in turn seems to refer to the procedure for controlling pleiotropy and epistasis described under Table 1. However, there doesn't seem to be any further mention of the "100 phenotypes for 121 populations (N individuals per population = 500)" mentioned there. Are they actually in the results somewhere? If so, I'm afraid I missed it.

David G. Mets:

Regarding paragraph 1: Great! We think so too. The approach of capturing phenotype-phenotype structure underlies our efforts to understand genetic and phenotypic relationships across scales.

Regarding paragraph 2: As you suggest, capturing the systematic drivers of individual variation and describing the population-level phenotypic outcomes of that system using statistics are quite different things and have been treated very differently historically.  We are hoping that focusing on the generative process by which genes (and environments) result in phenotypes will provide a richer description of both drivers of individual variation and population-level variation. We view an examination of these phenotype-phenotype relationships as a first step toward a better understanding of this generative process.

Regarding paragraph 3: The second set of synthetic (“100 phenotypes for 121 populations (N individuals per population = 500”) was used for the autoencoder analysis where we vary the number of phenotypes used for prediction.

Tara Essock-Burns:

Interesting! Do you have a sense of which factor (entropy or nonlinearity) accounts more for the difference in the predictiveness between scenario 2 and scenario 3 in Fig. 6? I would imagine that the predictiveness in scenario 2 depends heavily on the group of correlated phenotypes but am not sure how the increased nonlinearity factors in there.

Ryan York:

You’re intuition fits with how we are thinking about this! From analyses in Fig 3C, we found that more correlated phenotypes tend to display increased rates of nonlinearity (but, importantly, this relationship is also mitigated by gene-gene interactions).

You can also get a sense for why scenarios 2 and 3 differ in predictiveness from Figs 4/5. Here we found that phenotypes were more accurately predicted in the presence of pleiotropy (i.e. more correlation, more nonlinearity) and lower entropy. Despite this difference, the crucial point is that both scenarios demonstrate the usefulness of leverage the relationships of multiple phenotypes in generating predictive models.