A Cloud-based GWAS Analysis Pipeline for Clinical Researchers

Paul Heinzlreiter, James Richard Perkins, Oscar Torreno, Johan Karlsson, Juan Antonio Ranea, Andreas Mitterecker, Miguel Blanca, Oswaldo Trelles

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The cost of obtaining genome-scale biomedical data continues to drop rapidly, with many hospitals and universities being able to produce large amounts of data. Managing and analysing such ever-growing datasets is becoming a crucial issue. Cloud computing presents a good solution to this problem due to its flexibility in obtaining computational resources. However, it is essential to allow end-users with no experience to take advantage of the cloud computing model of elastic resource provisioning. This paper presents a workflow that allows the end-user to perform the core steps of a genome wide association analysis where raw gene- expression data is quality assessed. A number of steps in this process are computationally intensive and vary greatly depending on the size of the study, from a few samples to a few thousand. Therefore cloud computing provides an ideal solution to this problem by enabling scalability due to elastic resource provisioning. The key contributions of this paper are a real world application of cloud computing addressing a critical problem in biomedicine through parallelization of the appropriate parts of the workflow as well as enabling the end-user to concentrate on data analysis and biological interpretation of results by taking care of the computational aspects.
Original languageEnglish
Title of host publicationProc. of the 4th International Conference on Cloud Computing and Services Science (CLOSER 2014)
Number of pages1
DOIs
Publication statusPublished - 2014

Fields of science

  • 303 Health Sciences
  • 304 Medical Biotechnology
  • 304003 Genetic engineering
  • 305 Other Human Medicine, Health Sciences
  • 101004 Biomathematics
  • 101018 Statistics
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102004 Bioinformatics
  • 102010 Database systems
  • 102015 Information systems
  • 102019 Machine learning
  • 106023 Molecular biology
  • 106002 Biochemistry
  • 106005 Bioinformatics
  • 106007 Biostatistics
  • 106041 Structural biology
  • 301 Medical-Theoretical Sciences, Pharmacy
  • 302 Clinical Medicine

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Nano-, Bio- and Polymer-Systems: From Structure to Function

Cite this