Top Qs
Timeline
Chat
Perspective

Applicability domain

From Wikipedia, the free encyclopedia

Remove ads


In chemistry and machine learning, the applicability domain (AD) of a quantitative structure-activity relationship (QSAR) model defines the boundaries within which the model's predictions are considered reliable. It represents the chemical, structural, or biological space covered by the training data used to build the model. Essentially, the AD aims to determine if a new compound falls within the model's scope of applicability, ensuring that the underlying assumptions of the model are met. Predictions for compounds within the AD are generally considered more reliable than those outside, as the model is primarily valid for interpolation within the training data space, rather than extrapolation. For example, the Organisation for Economic Co-operation and Development (OECD) Guidance Document on the Validation of (Q)SAR Models states that to have a valid (Q)SAR model for regulatory purposes, the applicability domain must be clearly defined.[1]

Remove ads

Algorithms

Summarize
Perspective

While no single, universally accepted algorithm for defining the applicability domain exists, several methods are commonly employed to characterise the interpolation space.[2][3] Range-based and geometric methods such as bounding box and convex hull, distance-based methods such as the Euclidean or Mahalanobis distance, and probability-density distribution-based strategies are commonly used in cheminformatics tasks.[4] Another systematic approach focuses on defining interpolation regions by removing outliers and using a kernel-weighted sampling method to estimate the probability density distribution. For regression-based QSAR models, a widely used technique for assessing the structural AD relies on leverage values, calculated from the diagonal elements of the hat matrix of the molecular descriptors.[5][6][7] More recently, a rigorous benchmarking study suggested that the standard deviation of model predictions offers the most reliable approach for AD determination.[8] To investigate the AD of a training set of chemicals one can directly analyse properties of the multivariate descriptor space of the training compounds or more indirectly via distance (or Tanimoto similarity) metrics. When using distance metrics care should be taken to use an orthogonal and significant vector space. This can be achieved by different means of feature selection and successive principal components analysis.

Remove ads

Broader Applications

The concept of applicability domain has expanded beyond its traditional use in QSAR to become a general principle for assessing model reliability across domains such as nanotechnology, material science, and predictive toxicology. In nanoinformatics, the definition of applicability domain is used in nanomaterial property and toxicity prediction, since data scarcity and heterogeneity require defining model boundaries. For instance, applicability domain assessment in nano-QSARs helps determine whether a new engineered nanomaterial is sufficiently similar to those in the training set to warrant a prediction.[9]

Remove ads

Notes

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads