Loading AI tools
An algorithm for measuring similarity between two temporal sequences, which may vary in speed From Wikipedia, the free encyclopedia
In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.
In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:
We can plot each match between the sequences and as a path in a matrix from to , such that each step is one of . In this formulation, we see that the number of possible matches is the Delannoy number.
The optimal match is the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.
The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.
In addition to a similarity measure between the two sequences, a so called "warping path" is produced. By warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section.
This is conceptually very similar to the Needleman–Wunsch algorithm.
This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d(x, y)
is a distance between the symbols, e.g. d(x, y)
= .
int DTWDistance(s: array [1..n], t: array [1..m]) { DTW := array [0..n, 0..m] for i := 0 to n for j := 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := 1 to m cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] }
where DTW[i, j]
is the distance between s[1:i]
and t[1:j]
with the best alignment.
We sometimes want to add a locality constraint. That is, we require that if s[i]
is matched with t[j]
, then is not greater than w, a window parameter.
We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if is not greater than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that (see the line marked with (*) in the code).
int DTWDistance(s: array [1..n], t: array [1..m], w: int) { DTW := array [0..n, 0..m] w := max(w, abs(n-m)) // adapt window size (*) for i := 0 to n for j:= 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) DTW[i, j] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] }
The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matched warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements.
The time complexity of the DTW algorithm is , where and are the lengths of the two input sequences. The 50 years old quadratic time bound was broken in 2016: an algorithm due to Gold and Sharir enables computing DTW in time and space for two input sequences of length .[2] This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form for some cannot exist unless the Strong exponential time hypothesis fails.[3][4]
While the dynamic programming algorithm for DTW requires space in a naive implementation, the space consumption can be reduced to using Hirschberg's algorithm.
Fast techniques for computing DTW include Early Abandoned and Pruned DTW,[5] PrunedDTW,[6] SparseDTW,[7] FastDTW,[8] and the MultiscaleDTW.[9][10]
A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh,[11] LB_Improved,[12] LB_Enhanced,[13] LB_Webb[14] or LB_Petitjean.[14] However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective.[5]
In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient.[15] Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute.[13] LB_Petitjean is the tightest known lower bound that can be computed in linear time.[14]
Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF[16] is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to the one of the multiple alignment and requires heuristics. DBA[17] is currently a reference method to average a set of sequences consistently with DTW. COMASA[18] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.
A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure.[19]
Amerced Dynamic Time Warping (ADTW) is a variant of DTW designed to better control DTW's permissiveness in the alignments that it allows.[20] The windows that classical DTW uses to constrain alignments introduce a step function. Any warping of the path is allowed within the window and none beyond it. In contrast, ADTW employs an additive penalty that is incurred each time that the path is warped. Any amount of warping is allowed, but each warping action incurs a direct penalty. ADTW significantly outperforms DTW with windowing when applied as a nearest neighbor classifier on a set of benchmark time series classification tasks.[20]
Graphical Time Warping (GTW) is a generalized version of DTW that can align multiple pairs of time series or sequences jointly. [21] Compared with aligning multiple pairs independently through DTW, GTW considers both the alignment accuracy of each sequence pair (as DTW) and the similarity among pairs (according to the data structure or assigned by user). This can result in better alignment performance when the similarity among pairs exists.
DTW is sensitive to the distance function used to score matches between pairs of values across the two sequences. The original definition of DTW[22] used . In time series classification, has become popular.[23]
Recent work[24] has shown that tuning of this distance measure can be useful for tuning DTW performance. Specifically, tuning γ in a family of distance functions of the form makes DTW focus more on low amplitude effects when γ is small and large amplitude effects when γ is large.
DTW cannot handle missing values in time series. Simple preprocessing methods such as dropping or interpolating missing values do not provide a good estimate of the DTW distance.[25]
DTW-AROW (DTW with Additional Restrictions on Warping) is a generalization of DTW to handle missing values.[25] DTW-AROW obtains both a distance and a warping path; hence, can simply be replaced by DTW to handle missing values in many applications.[25] DTW-AROW has the same time and memory complexity as DTW.[25] An open-source implementation of DTW-AROW is available in Python.
In functional data analysis, time series are regarded as discretizations of smooth (differentiable) functions of time. By viewing the observed samples as smooth functions, one can utilize continuous mathematics for analyzing data.[26] Smoothness and monotonicity of time warp functions may be obtained for instance by integrating a time-varying radial basis function, thus being a one-dimensional diffeomorphism.[27] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements.[28][29]
Another related approach is hidden Markov model (HMM) and it has been shown that the Viterbi algorithm used to search for the most likely path through the HMM is equivalent to stochastic DTW.[30][31][32]
DTW and related warping methods are typically used as pre- or post-processing steps in data analyses. If both observed sequences contain random variation in their values, shape of observed sequences and random temporal misalignment, the warping may overfit to noise leading to biased results. A simultaneous model formulation with random variation in both values (vertical) and time-parametrization (horizontal) is an example of a nonlinear mixed-effects model.[33] In human movement analysis, simultaneous nonlinear mixed-effects modeling has been shown to produce superior results compared to DTW.[34]
Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis, which needs to be eliminated.[22] DP matching is a pattern-matching algorithm based on dynamic programming (DP), which uses a time-normalization effect, where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid of their timing differences by warping the time axis of one so that the maximal coincidence is attained with the other. Moreover, if the warping function is allowed to take any possible value, very less[clarify] distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.
Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warping.
Dynamic time warping is used in finance and econometrics to assess the quality of the prediction versus real-world data.[36][37][38]
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.