Ma'lumot

A COMPARATIVE STUDY ON FEATURE SELECTION METHODS IN QSAR

Kuziev Botir Nomozovich


A quantitative structure-activity relationship (QSAR) relates quantitative chemical structure attributes (molecular descriptors) to a biological activity. QSAR studies have now become attractive in drug discovery and development because their application can save substantial time and human resources. Several parameters are important in the prediction ability of a QSAR model. On the one hand, different statistical methods may be applied to check the linear or nonlinear behavior of a data set. On the other hand, feature selection techniques are applied to decrease the model complexity, to decrease the overfitting/overtraining risk, and to select the most important descriptors from the often more than  calculated. The selected descriptors are then linked to a biological activity of the corresponding compound by means of a mathematical model. Different modeling techniques can be applied, some of which explicitly require a feature selection. A QSAR model can be useful in the design of new compounds with improved potency in the class under study. Only molecules with a predicted interesting activity will be synthesized. In the feature selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus attention, while ignoring the rest. This paper studied the comparative analysis of the Chi-square, Mutual Information, Anova F-value, Fisher Score and SHAP feature selection methods used in QSAR modeling. The Python code written to get experimental results in this article has been uploaded to Github.



https://doi.org/10.59251/2181-1296.2023.v3.139.2.2184

135 Ko'rishlar | 94 Yuklab olishlar

To'liq maqolani yuklab olish