Version vector
From Wikipedia, the free encyclopedia
A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality tracking among data replicas and are a basic mechanism for optimistic replication. In mathematical terms, the version vector generates a preorder that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
- Initially all vector counters are zero.
- Each time a replica experiences a local update event, it increments its own counter in the vector by one.
- Each time two replicas a and b synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters: . After synchronization, the two replicas have identical version vectors.
Pairs of replicas, a, b, can be compared by inspecting their version vectors and determined to be either: identical (), concurrent (), or ordered ( or ). The ordered relation is defined as: Vector if and only if every element of is less than or equal to its corresponding element in , and at least one of the elements is strictly less than. If neither or , but the vectors are not identical, then the two vectors must be concurrent.
Version vectors[1] or variants are used to track updates in many distributed file systems, such as Coda (file system) and Ficus, and are the main data structure behind optimistic replication.[2]
Other mechanisms
- Hash Histories [3] avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
- Concise Version Vectors [4] allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
- Version Stamps [5] allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
- Interval Tree Clocks[6] generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
- Bounded Version Vectors [7] allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
- Dotted Version Vectors [8] address scalability with a small set of servers mediating replica access by a large number of concurrent clients.
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.