We present an experimental set-up for the analysis and prediction on microarray data specifically designed to identify and correct the impact of selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes that incur overfitting effects. We outline the selection bias problem and demonstrate its effect on synthetic and microarray data. We then introduce and describe a procedure that successfully deals with the problem through extensive resampling and label randomization techniques that employ support vector machines as a base classifier and an improved version of the recursive feature elimination algorithm for gene ranking.

Control of selection bias in microarray data analysis

Jurman G
2003-01-01

Abstract

We present an experimental set-up for the analysis and prediction on microarray data specifically designed to identify and correct the impact of selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes that incur overfitting effects. We outline the selection bias problem and demonstrate its effect on synthetic and microarray data. We then introduce and describe a procedure that successfully deals with the problem through extensive resampling and label randomization techniques that employ support vector machines as a base classifier and an improved version of the recursive feature elimination algorithm for gene ranking.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11699/97625
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact