Abstract
Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, ‘cn.FARMS’, which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.
| Original language | English |
|---|---|
| Pages (from-to) | e79 |
| Number of pages | 13 |
| Journal | Nucleic Acids Research |
| Volume | 39 |
| Issue number | 12 |
| DOIs | |
| Publication status | Published - Jul 2011 |
Fields of science
- 106013 Genetics
- 106041 Structural biology
- 102 Computer Sciences
- 101029 Mathematical statistics
- 102001 Artificial intelligence
- 101004 Biomathematics
- 102015 Information systems
- 102018 Artificial neural networks
- 106002 Biochemistry
- 106023 Molecular biology
- 305 Other Human Medicine, Health Sciences
- 106005 Bioinformatics
JKU Focus areas
- Computation in Informatics and Mathematics
- Nano-, Bio- and Polymer-Systems: From Structure to Function