Identification of Short and Rare Haplotype Clusters in Korean Genomes

Sepp Hochreiter, Günter Klambauer, Gundula Povysil, Djork-Arné Clevert

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

We developed HapFABIA to identify rare haplotypes in large sequencing data by biclustering which combines LD information across individuals and IBD information along the chromosome. For biclustering large data sets, we developed a sparse matrix algebra which is implemented in HapFABIA. HapFABIA significantly outperformed IBD methods at detecting rare haplotypes on simulated genotype data with implanted rare haplotypes. We used HapFABIA to extract rare haplotypes from sequencing data from the Korean Personal Genome Project (KPGP). The genotyping data from KPGP was combined with those from the 1000-Genomes-Project leading to 1,131 individuals and 3.1 million single nucleotide variants (SNVs) on chromosome 1. HapFABIA identified 113,963 different rare haplotypes marked by tagSNVs that have a minor allele frequency of 5% or less. The rare haplotypes comprise 680,904 SNVs; that is 36.1% of the rare variants and 21.5% of all variants. The vast majority of 107,473 haplotypes is found in Africans, while only 9,554 and 6,933 are found in Europeans and Asians, respectively. We characterized haplotypes by matching with archaic genomes. Haplotypes that match the Denisova or the Neandertal genome are significantly more often observed in Asians and Europeans. Interestingly, haplotypes matching the Denisova or the Neandertal genome are also found, in some cases exclusively, in Africans. Our findings indicate that the majority of rare haplotypes from chromosome 1 are ancient and are from times before humans migrated out of Africa. The enrichment of Neandertal haplotypes in Koreans (odds ratio 10.6 of Fisher’s exact test) is not as high as for Han Chinese from Beijing, Han Chinese from South, and Japanese (odds ratios 23.9, 19.1, 22.7 of Fisher’s exact test). In contrast to these results, the enrichment of Denisova haplotypes in Koreans (odds ratio 36.7 of Fisher’s exact test) is higher than for Han Chinese from Beijing, Han Chinese from South, and Japanese.
Original languageEnglish
Title of host publicationHGV 2012 Proceedings
Number of pages1
Publication statusPublished - Sept 2012

Fields of science

  • 106013 Genetics
  • 106041 Structural biology
  • 102 Computer Sciences
  • 101029 Mathematical statistics
  • 102001 Artificial intelligence
  • 101004 Biomathematics
  • 102015 Information systems
  • 102018 Artificial neural networks
  • 106002 Biochemistry
  • 106023 Molecular biology
  • 305 Other Human Medicine, Health Sciences
  • 106005 Bioinformatics

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Nano-, Bio- and Polymer-Systems: From Structure to Function

Cite this