Data mesh

Distributed architecture framework for data management From Wikipedia, the free encyclopedia

Data mesh is a sociotechnical approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective), and borrows Eric Evans’ theory of domain-driven design[1] and Manuel Pais’ and Matthew Skelton’s theory of team topologies.[2] Data mesh mainly concerns itself with the data itself, taking the data lake and the pipelines as a secondary concern. [3] The main proposition is scaling analytical data by domain-oriented decentralization.[4] With data mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform.[5] This enables a decrease in data disorder or the existence of isolated data silos, due to the presence of a centralized system that ensures the consistent sharing of fundamental principles across various nodes within the data mesh and allows for the sharing of data across different areas.[6]

History

The term data mesh was first defined by Zhamak Dehghani in 2019[7] while she was working as a principal consultant at the technology company Thoughtworks.[8][9] Dehghani introduced the term in 2019 and then provided greater detail on its principles and logical architecture throughout 2020. The process was predicted to be a “big contender” for companies in 2022.[10][11] Data meshes have been implemented by companies such as Zalando,[12] Netflix,[13] Intuit,[14] VistaPrint, PayPal[15] and others.

In 2022, Dehghani left Thoughtworks to found Nextdata Technologies to focus on decentralized data.[16]

Principles

Data mesh is based on four core principles:[17]

In addition to these principles, Dehghani writes that the data products created by each domain team should be discoverable, addressable, trustworthy, possess self-describing semantics and syntax, be interoperable, secure, and governed by global standards and access controls.[18] In other words, the data should be treated as a product that is ready to use and reliable.[19][20]

In practice

Summarize
Perspective

After its introduction in 2019[7] multiple companies started to implement a data mesh[12][14][15] and share their experiences. Challenges (C) and best practices (BP) for practitioners, include:

C1. Federated data governance
Companies report difficulties to adopt a federated governance structure for activities and processes that were previously centrally owned and enforced. This is especially true for security, privacy, and regulatory topics.[21][22][23]
C2. Responsibility shift
In data mesh individuals within domains are end-to-end responsible for data products. This new responsibility can be challenging, because it is rarely compensated and usually benefits other domains.[21][22]
C3. Comprehension
Research has shown a severe lack of comprehension for the data mesh paradigm among employees of companies implementing a data mesh.[21]
BP1. Cross-domain unit
Addressing C1, organizations should introduce a cross-domain steering unit responsible for strategic planning, use case prioritization, and the enforcement of specific governance rules—especially concerning security, regulatory, and privacy-related topics. Nevertheless, a cross-domain steering unit can only complement and support the federated governance structure and may grow obsolete with the increasing maturity of the data mesh.[21][24]
BP2. Track and observe
Addressing C2., organizations should observe and score data product quality as tracking and ranking key data products can encourage high-quality offerings, motivate domain owners, and support budget negotiations.[21]
BP3. Conscious adoption
Organizations should thoroughly assess and evaluate their existing data systems, consider organizational factors, and weigh the potential benefits before implementing a data mesh. When introducing data mesh, it is advised to carefully and consciously introduce data mesh terminology to ensure a clear understanding of the concept (C3).[21]

Community

Scott Hirleman has started a data mesh community that contains over 7,500 people in their Slack channel.[25]

See also

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.