HapRNF: a deep learning method to identify short IBD segments

Gundula Povysil, Djork-Arné Clevert, Sepp Hochreiter

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

For whole genome sequencing data HapFABIA was shown to be superior in detecting short IBD (identical by descent) segments that are tagged by rare variants. Nevertheless, HapFABIA still has several problems: (1) To decide whether individuals possess an IBD segment is often difficult because of the soft bicluster membership supplied by HapFABIA. (2) HapFABIA can only extract 10-30 IBD segments at once and therefore needs to perform multiple iterations. However, the IBD segments identified in different iterations may not be decorrelated, thus they may be redundant and overlapping or even split into smaller segments. (3) Very large data sets are time intensive. We recently introduced Rectified Factor Networks (RFNs) as an unsupervised deep learning approach. Each code unit of the RFN represents a bicluster and therefore an IBD segment, where samples for which the code unit is active share the bicluster (IBD segment) and features (SNVs) that have activating weights to the code unit tag the IBD segment. HapRFN overcomes the problems of HapFABIA. (1) RFNs provide sparser codes via their rectified linear units that immediately supply bicluster memberships as factors being different from zero. (2) RFNs can learn thousands of factors and therefore many IBD segments simultaneously. Therefore, all IBD segments are mutually decorrelated, thus are not redundant and do not overlap. (3) RFNs allow for much faster processing of very large data sets using techniques from deep learning like efficient matrix multiplications and implementations of networks on graphical processing units (GPUs).
Original languageEnglish
Title of host publicationASHG 2016 Proceedings
Number of pages1
Publication statusPublished - 2016

Fields of science

  • 303 Health Sciences
  • 304 Medical Biotechnology
  • 304003 Genetic engineering
  • 305 Other Human Medicine, Health Sciences
  • 101004 Biomathematics
  • 101018 Statistics
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102004 Bioinformatics
  • 102010 Database systems
  • 102015 Information systems
  • 102019 Machine learning
  • 106023 Molecular biology
  • 106002 Biochemistry
  • 106005 Bioinformatics
  • 106007 Biostatistics
  • 106041 Structural biology
  • 301 Medical-Theoretical Sciences, Pharmacy
  • 302 Clinical Medicine

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Nano-, Bio- and Polymer-Systems: From Structure to Function
  • Medical Sciences (in general)
  • Health System Research
  • Clinical Research on Aging

Cite this