Savitzky–Golay filter
Algorithm to smooth data points / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Numerical smoothing and differentiation?
Summarize this article for a 10 year old
A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive sub-sets of adjacent data points with a low-degree polynomial by the method of linear least squares. When the data points are equally spaced, an analytical solution to the least-squares equations can be found, in the form of a single set of "convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal, (or derivatives of the smoothed signal) at the central point of each sub-set. The method, based on established mathematical procedures,[1][2] was popularized by Abraham Savitzky and Marcel J. E. Golay, who published tables of convolution coefficients for various polynomials and sub-set sizes in 1964.[3][4] Some errors in the tables have been corrected.[5] The method has been extended for the treatment of 2- and 3-dimensional data.
Savitzky and Golay's paper is one of the most widely cited papers in the journal Analytical Chemistry[6] and is classed by that journal as one of its "10 seminal papers" saying "it can be argued that the dawn of the computer-controlled analytical instrument can be traced to this article".[7]
The data consists of a set of points , , where is an independent variable and is an observed value. They are treated with a set of convolution coefficients, , according to the expression
Selected convolution coefficients are shown in the tables, below. For example, for smoothing by a 5-point quadratic polynomial, and the smoothed data point, , is given by
- ,
where, , etc. There are numerous applications of smoothing, such as avoiding the propagation of noise through an algorithm chain, or sometimes simply to make the data appear to be less noisy than it really is.
The following are applications of numerical differentiation of data.[8] Note When calculating the nth derivative, an additional scaling factor of may be applied to all calculated data points to obtain absolute values (see expressions for , below, for details).
- Location of maxima and minima in experimental data curves. This was the application that first motivated Savitzky.[4] The first derivative of a function is zero at a maximum or minimum. The diagram shows data points belonging to a synthetic Lorentzian curve, with added noise (blue diamonds). Data are plotted on a scale of half width, relative to the peak maximum at zero. The smoothed curve (red line) and 1st derivative (green) were calculated with 7-point cubic Savitzky–Golay filters. Linear interpolation of the first derivative values at positions either side of the zero-crossing gives the position of the peak maximum. 3rd derivatives can also be used for this purpose.
- Location of an end-point in a titration curve. An end-point is an inflection point where the second derivative of the function is zero.[9] The titration curve for malonic acid illustrates the power of the method. The first end-point at 4 ml is barely visible, but the second derivative allows its value to be easily determined by linear interpolation to find the zero crossing.
- Baseline flattening. In analytical chemistry it is sometimes necessary to measure the height of an absorption band against a curved baseline.[10] Because the curvature of the baseline is much less than the curvature of the absorption band, the second derivative effectively flattens the baseline. Three measures of the derivative height, which is proportional to the absorption band height, are the "peak-to-valley" distances h1 and h2, and the height from baseline, h3.[11]
- Resolution enhancement in spectroscopy. Bands in the second derivative of a spectroscopic curve are narrower than the bands in the spectrum: they have reduced half-width. This allows partially overlapping bands to be "resolved" into separate (negative) peaks.[12] The diagram illustrates how this may be used also for chemical analysis, using measurement of "peak-to-valley" distances. In this case the valleys are a property of the 2nd derivative of a Lorentzian. (x-axis position is relative to the position of the peak maximum on a scale of half width at half height).
- Resolution enhancement with 4th derivative (positive peaks). The minima are a property of the 4th derivative of a Lorentzian.
Moving average
The "moving average filter" is a trivial example of a Savitzky-Golay filter that is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles. Each subset of the data set is fit with a straight horizontal line as opposed to a higher order polynomial. An unweighted moving average filter is the simplest convolution filter.
The moving average is often used for a quick technical analysis of financial data, like stock prices, returns or trading volumes. It is also used in economics to examine gross domestic product, employment or other macroeconomic time series.
It was not included in some tables of Savitzky-Golay convolution coefficients as all the coefficient values are identical, with the value .