Simulations to compare the results of Exchangeable Effects and Exchangeable Z-scores model
The correct model has higher likelihood in simulations.
Simulations to compare R vs Rcpp:
In the simulation, Rcpp is faster then R in posterior calculation. Rcpp and R have similar computational time for likelihood calculation. However, from my experience, Rcpp tends to use shorter time in likelihood computation.
Simulations to find more latent factors than the number of covariates (p)
When the var_type='constant'
, the flash
results in 6 factors, whereas the data has 5 conditions.
Some factors contain +ve in some conditions and -ve in others. These are not separated into 2 factors.
Here is the MASH
analyses:
Generate the covariance matrices:
Firstly, we do the factor analysis on the data using flash
. Flash
Then, we generate the potential covariance structures:
I realized a question here. We run the cov_ed
using mash data and Ulist. The mash data has Bhat
, Shat
, Shat_alpha
, V
, alpha
. In cov_ed
, it only use the Bhat
and Shat
. Therefore, it assumes the errors are independent (V is identity). The cov_ed
algorithm does not account for the V matrix.
We use extreme deconvolution to find the covariance candidates. Maybe, the ignorance of the dependency in ed does not matter. Will including the dependency improve the initial covariance matrices?
mash
model
Changing the parameter alpha
could obtain EE or EZ fitted model.
vhat
could obtain different V matrices.
vhat = 0
, V is identity matrixvhat = 1
, V is the estimated correlation matrixV is the correlation matrix here: \[ \hat{\beta}_{j}|\beta_{j}, S_{j} \sim N(\hat{\beta}_{j}; \beta_{j}, S_{j}VS_{j}) \]
The estimated weights for covariance matrices are different for EE and EZ model when V is the estimated correlation. When V is Identity, the estimated weights are similar. But the measurements are correlated due to sample overlap (all based on same 134 individuals), I think it is more reasonable to use the estimated correlation from the data (V1).
Among these models, EZ V1 has the highest log likelihood value.
Using Mash with V be the estimated correlation, we found more eQTLs, the number of eQTLs are similar using EE and EZ model. Based on the EZ V1 model, the majority weights for the covariance matrix is on the null matrix (0, no effects, 84.6%). The remaining weights are on the covariance matrix capturing the pattern that the standardized effects are strong positively correlated among all treatments (14.5%). The covariance component \(11'\) (5.4%) shows that the standardized effects are similar. The other covariance component has the format \(D11'D\) (9.1%), D is a diagnoal matrix, which means the standardized effects are different in size, but they are strongly correlated (\(\beta_{lpa6h} = d * \beta_{ctrl}\) etc.).
The mash fit favors component where standardized effects are very similar, which does not mean the raw effects are similar. The similarity of standardized effects could imply the similarity of raw effects if and only if the standard errors of the estimated effects are all similar. The derivation is in Mash Posterior under non-equal \(s_{jr}\). In this dataset, the \(s_{jr}\) are not similar across treatments. The equal effect
covariance structure (correlation) is for the standardized effects. The raw effects have the same correlation structure, but with different covariance, different mean.
We found a lot reQTLs (raw effect), treatment specific reQTLs and time point specific reQTLs using the linear transformation. We found more reQTLs, treatment specific reQTLs and time point specific reQTLs than the paper’s result.
However, the number of reQTLs from EE model are much less than EZ model. I don’t know the reason for this.
If we define the reQTL as different standardized effects, the result is EZ V1 model reQTL.
The number of reQTLs are fewer than the original reQTLs.
Qualitative Interaction:
Among the eQTLs that significant among all treatments (4792), there are 4 of them have effects in different directions in different conditions. Interaction
In the null data, there is no qualitative interaction cases. Therefore, the estimated weights for the mash
model can not capture the qualitative interaction covariance structure. Posterior Check
Using flash
results to construct mashr covariance matrices
Method 1: \[z_{i} = \sum_{k=1}^{K}l_{ik} f_{k}\] \[U_{i} = z_{i}z_{i}'\] Flash individual
The number of matrices is too large. The current algorithm cannot handle it. The matrices \(U_{i}\) for two genes will be similar if they have similar \(z_{i}\) vectors. So we try to cluster \(z_{i}\) vectors. Flash individual reduce
There is only one cluster identified here.
Method 2: Suppose the rows of L come from a mixture of multivariate normals with covariances \(U_{1}, \cdots, U_{n}\). \[l_{i} \sim \sum_{m=1}^{n} \pi_{m} N_{K}(0, U_{m})\] The detail of this method is in Flash L MMVN.
The mash
model including these covariance matrices:
The plot of logFC:
The y axis is logFC = \(log_{2} \frac{mean \ sti}{mean \ base}\). The x axis is the group of genes: ‘eQTL in both baseline and the stimulate’, ‘eQTL in baseline’, ‘eQTL in stimulate’.
The logFC in different groups are similar.
Here, the x axis is: ‘eQTL only in baseline’, ‘eQTL only in stimulate’.
EZ V1 eQTL logFC ONLY thresh=0.05
EZ V1 eQTL logFC ONLY thresh=0.01
EZ V1 eQTL logFC ONLY thresh=0.001
The logFC in ‘eQTL only in stimulate’ group is slightly higher.
The differences become clearer as we concentrate on more convincingly baseline/stimulate-specific eQTLs.
Method: Within the baseline/stimulate-specific group, select the more convincingly eQTLSs by decreasing the statistical significance cutoff.
contrast
method simple simulation examples:
Including more covariance structures basd on flash
output.
This R Markdown site was created with workflowr