puma: an R/Bioconductor package for Propagating Uncertainty in Microarray Analysis
Most analyses of Affymetrix GeneChip data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By propagating uncertainty to downstream analyses we can improve results from microarray analyses. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. puma also offers improvements in terms of scope and speed of execution over previously available uncertainty propagation methods. Included are summarisation, differential expression detection, clustering and PCA methods, together with useful plotting and data manipulation functions. It is a part of the PUMA project.
puma incorporates the methods mmgmos, pplr and pumaclust. The following sections show why these methods are different to other methods
Affymetrix microarrays adopt multiple probes to measure the
abundance of transcription, so it is possible to apply various
statistical and probabilistic methods to provide confident gene
expression results. The most popular probe-level analysis methods are
statistic models which are able to calculate gene expression levels
accurately. However, these methods are incapable of providing the
credibility of the expression values that may be very useful for
further statistical analyses. mmgmos
is specifically designed to address this limitation.
There are two version of gMOS implemented in this package, modified
gMOS (mgMOS) and multi-chip modified gMOS (multi-mgMOS). The original
gMOS uses two gamma distributions to model Perfect Match intensities and
Mismatch intensities with shared scale parameters on each chip. The
mgMOS changes the scale parameters into latent variables to reflect
the different binding affinity of probes within the probe-set. This
modified distribution accurately captures the correlated changes in
the binding affinity of probe-pairs within the probe-set. Both gMOS and
mgMOS are single chip models. The multi-mgMOS is an extended version of
gMOS and mgMOS. It shares the scale parameters in gamma distributions
across all chips to reflect the intrinsic characteristic of probe
sequences of the same type of chip. It also allows for a fraction of
signal binding to Mismatch probe. The likelihood function of all
versions of gMOS can be written in closed form and the computation is
therefore very fast compared with other probabilistic models.
The package mmgmos implements mgMOS in function mgmos and multi-mgMOS in function mmgmos. The fast C program donlp2 is used to optimise parameters. Both mgmos and mmgmos functions output the mean, median, standard deviation, 5%, 25%, 75% and 95% credibility intervals of the expression level for each gene.
There are two main reasons that make the detection of differential gene expression difficult. One is that the noisy nature of microarray data requires a reasonable probabilistic model to characterise the variability in probe data (within-chip variance). Another is that the small number of replicates makes it difficult to obtain an accurate variance estimate for each gene across replicates (between-replicate variance). Many approaches have been devised to address the second difficulty and obtain accurate between-replicate variance. Most of these methods are based on single point estimates of gene expression values. Few methods include within-chip variance in finding differential gene expression. pplr is used to include probe-level measurement error into the variance estimate of gene expression levels and makes use of this improved variance to detecting down and up-regulated genes by the calculation of the PPLR. The probe-level measurement error are calculated from the function mmgmos.
Clustering is an important analysis performed on microarray gene expression data since it groups genes which have similar expression patterns and enables the exploration of unknown gene functions. Due to the complicated multi-step microarray experiments, the resulting gene expression data are very noisy. Many heuristic and model-based clustering approaches have been developed to cluster this noisy data. However, few of them include consideration of probe-level measurement error which provides rich information about technical variability. We augment a standard model-based clustering method to incorporate probe-level measurement error. Using probe-level measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we include the probe-level measurement error directly into the standard Gaussian mixture model. The performance of model-based clustering of gene expression data is improved by including probe-level measurement error and more biologically reasonable clustering results are obtained. The probe-level measurement error are calculated from the function mmgmos.
||Mac OS X
||R version requirement
||New DEResults class output from pumaDE. Changed default normalisation in mmgmos and mgmos to "median" (from "none"). New functions calcAUC, numFP, removeUninformativeFactors. createContrastMatrix now creates "1-vs-others" contrasts. Various other minor changes and bug fixes (see svn for full details).
||Original Bioconductor release of puma.