SEMINAR
Principal Component Analysis Combined with Truncated-Newton Minimization for Dimensionality Reduction of Chemical Database
Dexuan Xie
Department of Mathematics
University of Southern Mississippi
ABSTRACT
We recently proposed an efficient projection protocol for large chemical databases based on the singular value decomposition (SVD) and the truncated-Newton minimization procedure implemented with the TNPACK program package. Since the principal component analysis (PCA) is another classic tool for data reduction, it is interesting to study the application of PCA in our projection protocol. Replacing PCA to SVD in the projection protocol results in a new projection method called PCA/TNPACK. In this talk I will describe the PCA procedure for data reduction and display a close relationship between PCA and SVD.
I then will show that PCA/TNPACK can sharply improve the quality of the PCA projection mapping in retaining the original distance relationships of chemical database based on the data sets selected from the database MDDR (MDL Drug Data Report), where I used a list of around 300 topological atom-pair descriptors to represent each chemical compound. Numerical results also confirm that a two dimensional PCA/TNPACK projection mapping can well retain the original clusters of chemical compounds. Therefore, the PCA/TNPACK two dimensional mapping of chemical database is valuable in similarity and diversity samplings of chemical compounds, which are preliminary steps in generating drug candidates or optimizing bioactive compounds.
WHERE: TEC 205
WHEN(day): Friday, (part I) Setpember 15th, 2000
WHEN(day): Friday, (part II) December 1st, 2000
WHEN(time): 2:00pm
EVERYBODY IS INVITED