10 MINS READ
Data-driven retail transformation has gained momentum in the last few years.
Most retailers have invested heavily to modernize their data estate to support business functions, giving credence to British mathematician Clive Humby’s observation made in 2006 that ‘data is the new oil’.
The value of data, however, largely depends on the contextual knowledge of data teams and, hence, in recent days, we are observing a continuous push to decentralize data estate and shift its control to business domain teams. To treat data as a strategic asset or product, retailers are focusing on making it discoverable, trustworthy, self-describing, addressable, and interoperable for better value realization.
Though data mesh fits well as a concept to address this purpose, the feasibility of setting it up for large retailers with matured data estate and a vast user base is quite challenging, if not impossible, in the short or mid-term. This is why retailers with centralized data estate and frameworks are exploring options to extend their existing data ‘lakehouse’ to cater to the needs of domain teams in a quick and cost-effective manner.
A data ‘meshhouse’ will blend the best of both the data mesh and data ‘lakehouse’ without requiring significant organizational and process changes. It can help retailers, particularly those with mature data estate, resolve the challenges posed by data mesh and ‘lakehouse’ and reap greater benefits from data.
Data ’meshhouse’, in fact, holds the potential to become the mainstay of future data platforms for retailers.
Data mesh is an architectural and organizational paradigm, and not a technology or a solution that can be bought.
Data mesh, a term introduced by technologist and author Zhamak Dehghani in 2019, is based on four fundamental principles that bundle well-known concepts: domain ownership, data as a product, self- service data platform, and federated governance.
Data mesh is a decentralized approach that enables domain teams to perform cross-domain data analysis on their own. However, data practitioners have some serious concerns about it:
There is no standard tool, technology, framework to enable data mesh
It requires major organizational and process changes
Federated governance is extremely complex
Defining bounded context for domain teams is a time-consuming process
Cross-domain and common-domain use cases have performance and governance implications
Not all domains have the maturity and skillset to govern and manage their own data estate
All these make retailers apprehensive about implementing data mesh in its purest form. Those with a mature data ‘lakehouse’ also face various challenges.
A data ‘lakehouse’ is a modern platform that enables structure and schema of a data warehouse on top of low-cost and unstructured data stored in a data lake.
Over the last few years, retailers’ data estates have matured as they have transformed their modern data platform into a data ‘lakehouse’ with robust processing and management frameworks.
As many data-matured retailers have already onboarded huge volume and variety of data in their centralized cloud data estate, the focus is now shifting toward consumption use cases. That is where domain users have concerns about the quality and completeness of a data ‘lakehouse’ for analytical use cases. Some of the key concerns are:
Data ‘lakehouse’ focuses on technical capabilities and IT use cases
Trustworthiness, transparency, and discoverability of data assets are questionable due to missing context
Over-reliance on central data teams
Both data producers and data consumers are not accountable for governance of the central data estate
Functional data quality checks are more reactive than proactive
All of these are causing dissatisfaction among domain users as they perceive a central data ‘lakehouse’ and a central data team to be a roadblock to enabling data-driven business processes for their domains.
Retailers are looking for options to enable data mesh principles on top of the central data ‘lakehouse’ without causing much disruption.
While both data mesh and ‘lakehouse’ have their benefits and challenges, the future of modern data platforms is expected to be a careful blend of both.
Retailers who are data matured and have invested in a centralized ‘lakehouse’ with all standards and processes will look for options to extend it further to satisfy the demands of domain teams. This is where a data ‘meshhouse’ will be of use.
A data ‘meshhouse’ provides all technical frameworks and standards of a data ‘lakehouse’ to all domain users. Data stored in a central data lake is exposed to domain users through dedicated data products. Thus, it combines the best of both a data ‘lakehouse’ and data mesh to avoid data silos and governance challenges.
Following are the key enhancements needed to establish a data ’meshhouse’:
Thus, data ‘meshhouse’ addresses the analytical need of domain teams by not disrupting technology and processes much. Value realization becomes much faster and more predictable through well-governed subscriptions through the data marketplace.
Data ‘meshhouse’ helps retailers to avoid challenges associated with federated governance and organizational process changes.
Let us understand the approach taken in retail to convert a central data ‘lakehouse’ into a data ‘meshhouse’ for domain teams.
A large multinational retailer built a centralized data lakehouse in Azure with multiple frameworks and templates for standardization and productivity gain. But, still, there was confusion among domain users if the central data platform team is capable of supporting their business use cases in a timely fashion.
To address this concern, the following improvement actions are considered to form a data ‘meshhouse’:
All these set the foundation for a data ‘meshhouse’ for the retailer to enable the core principles of a data mesh within their central data lakehouse.
Retailers are looking for faster value realization from their data estate.
Retail business priorities and analytical use cases are changing rapidly. To stay ahead of competition, retailers need quick analytical enablement for domain teams without too many organizational and process changes.
Finding value out of a huge data estate without proper metadata and governance as well as unavailability of relevant skillset often becomes a big hurdle for domain-owned data teams. Data 'meshhouse’ can be a good solution to this problem as it brings in domain orientation without shifting control of data asset development to individual domain teams.