An analysis pipeline for detecting copy number variations with a low false discovery rate in microarray data

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

A low false discovery rate (FDR) at the detection of copynumber aberrations (CNAs) in microarray data ensures sufficient detection power and prevents failures in CNA-disease association studies. A high FDR means many falsely discovered aberrations, which are not associated with the disease, though correction for multiple testing must take them into account. Thus, a high FDR not only decreases the discovery power of studies but also the significance level of the remaining discoveries after correction for multiple testing. Methods: We obtain a low FDR at the detection of CNAs in microarray data by a probabilistic latent variable model, called “cn.FARMS”. The model is optimized by Bayesian maximum a posteriori approach, where a Laplace prior prefers models, which represent the null hypothesis of observing a constant copy number 2 for all samples. The posterior can only deviate from this prior by strong (deviation from copy number 2 intensities) and consistent signals in the data, which hints at a CNA - the alternative hypothesis. The information gain of the posterior over the prior gives the informative/non-informative (I/NI) call that serves as a filter for CNA candidate regions. I/NI call filtering reduces the FDR, because a region with a large I/NI call is unlikely to be a falsely detected CNA, which would neither have strong nor consistent measurements. It can be shown that the I/NI call filter applied to null hypotheses of the association study is independent of the test statistic which in turn guarantees that a type I error rate control by correction for multiple testing is still possible after filtering. I/NI-calls perform well for the usually rare CNAs that are seen at few samples only, where variance-based filtering approaches fail. Results: cn.FARMS clearly outperformed prevalent methods for CNA detection with respect to sensitivity and especially with respect to FDR on different HapMap benchmark data sets.
Original languageEnglish
Title of host publication12th International Congress of Human Genetics and the American Society of Human Genetics
Number of pages1
Publication statusPublished - Oct 2011

Fields of science

  • 106013 Genetics
  • 106041 Structural biology
  • 102 Computer Sciences
  • 101029 Mathematical statistics
  • 102001 Artificial intelligence
  • 101004 Biomathematics
  • 102015 Information systems
  • 102018 Artificial neural networks
  • 106002 Biochemistry
  • 106023 Molecular biology
  • 305 Other Human Medicine, Health Sciences
  • 106005 Bioinformatics

Cite this