Seunggeun Lee, Wei Sun, Fred A Wright, Fei Zou
Unobserved environmental, demographic, and technical factors can negatively affect the estimation and testing of the effects of primary variables. Surrogate variable analysis, proposed to tackle this problem, has been widely used in genomic studies. To estimate hidden factors that are correlated with the primary variables, surrogate variable analysis performs principal component analysis either on a subset of features or on all features, but weighting each differently. However, existing approaches may fail to identify hidden factors that are strongly correlated with the primary variables, and the extra step of feature selection and weight calculation makes the theoretical investigation of surrogate variable analysis challenging...
June 1, 2017: Biometrika