We present an experimental set-up for the analysis and prediction on microarray data specifically designed to identify and correct the impact of selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes that incur overfitting effects. We outline the selection bias problem and demonstrate its effect on synthetic and microarray data. We then introduce and describe a procedure that successfully deals with the problem through extensive resampling and label randomization techniques that employ support vector machines as a base classifier and an improved version of the recursive feature elimination algorithm for gene ranking.
Control of selection bias in microarray data analysis
Jurman G
2003-01-01
Abstract
We present an experimental set-up for the analysis and prediction on microarray data specifically designed to identify and correct the impact of selection bias in high-throughput problems. A number of recently published and overoptimistic studies present feature selection and gene profiling processes that incur overfitting effects. We outline the selection bias problem and demonstrate its effect on synthetic and microarray data. We then introduce and describe a procedure that successfully deals with the problem through extensive resampling and label randomization techniques that employ support vector machines as a base classifier and an improved version of the recursive feature elimination algorithm for gene ranking.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.