Abstract
The strength and weakness of microarray technology can be attributed to the enormous amount of information it is generating. To fully enhance the benefit of microarray technology for testing differentially expressed genes and classification, there is a need to minimize the amount of irrelevant genes present in microarray data. A major interest is to use probe-level data to call genes informative or noninformative based on the trade-off between the array-to-array variability and the measurement error. Existing works in this direction include filtering likely uninformative sets of hybridization (FLUSH; Calza et al., 2007) and I/NI calls for the exclusion of noninformative genes using FARMS (I/NI calls; Talloen et al., 2007; Hochreiter et al., 2006). In this paper, we propose a linear mixed model as a more flexible method that performs equally good as I/NI calls and outperforms FLUSH. We also introduce other criteria for gene filtering, such as, R2 and intra-cluster correlation. Additionally, we include some objective criteria based on likelihood ratio testing, the Akaike information criteria (AIC; Akaike, 1973) and the Bayesian information criterion (BIC; Schwarz, 1978 ).
...
Original language | English |
---|---|
Article number | 4 |
Number of pages | 31 |
Journal | Statistical Applications in Genetics and Molecular Biology |
Volume | 9 |
Issue number | 1 |
Publication status | Published - Jan 2010 |
Fields of science
- 101004 Biomathematics
- 101027 Dynamical systems
- 101028 Mathematical modelling
- 101029 Mathematical statistics
- 101014 Numerical mathematics
- 101015 Operations research
- 101016 Optimisation
- 101017 Game theory
- 101018 Statistics
- 101019 Stochastics
- 101024 Probability theory
- 101026 Time series analysis
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 102018 Artificial neural networks
- 102019 Machine learning
- 103029 Statistical physics
- 106005 Bioinformatics
- 106007 Biostatistics
- 202017 Embedded systems
- 202035 Robotics
- 202036 Sensor systems
- 202037 Signal processing
- 305901 Computer-aided diagnosis and therapy
- 305905 Medical informatics
- 305907 Medical statistics
- 102032 Computational intelligence
- 102033 Data mining
- 101031 Approximation theory