There is a general acceptance in business today that data analytics is the new competitive edge and the business that is able to make better, quicker decisions will outperform the market and stay ahead of their competitors.
What is not generally understood is how to go about making data available to all the decision-makers throughout the organisation. Many organisations are still stuck in traditional data warehouse environments, where IT bears the brunt of publishing data or reports to business users and having to put a lot of effort into doing so. And this leads to inevitable delays and complications in enabling quick business decision-making based on facts and readily available data. The typical slow response from IT often means that business is flying blind, and data insights arrive too late to be useful.
The challenge is shifting the paradigm that data analytics is solely IT’s problem to solve. Changing this will enable better business decisions through tighter coupled data and analytics.
Consider a mining analogy. The process of turning data into useful insights for business decisions can be considered as the equivalent of turning a raw material (e.g. iron ore) into a valuable commodity (e.g. refined steel products). There is a chain of work and refinement that turns a raw mineral (data) into a valuable resource (business insights and decisions). We term this process the Data Value Chain: Taking raw data, transforming and connecting it to provide value in the final data product made available to business users.
Some of the processes that are involved in this data value chain include storage and governance of data, extracting insight from the raw and refined data, and granting access to data for technical and non-technical people alike.
This page highlights three main areas that should be addressed in order to deliver the promise of business empowerment through data analytics:
- The modern data warehouse environment
- Enabling business intelligence through intuitive, natural language search
- Putting real-time analytics in the hands of business
Moving to a Modern Data Warehouse
Traditional Data Warehouse Environment
The diagram below depicts a traditional data warehouse environment.
In this traditional approach, data is fed from various sources. The data is Extracted, Transformed, and Loaded (ETL) into a staging environment or Operational Data Store (ODS), and after cleansing and certification, published into the formal Data Warehouse environment. From here it is consumed by a variety of reporting and analytics applications.
The main challenge with this formal, structured approach is that the environment is typically complex and, by design, resistant to change. In practical terms, this means that business does not get answers quickly and that new requirements take a long time to accommodate.
The reporting and analytics tools used in this traditional data warehouse environment also often need a technical expert to compile new reports and dashboards.
This unresponsiveness from IT typically leads businesses to create parallel environments from which it can gain quick access to data. But these “sandbox” environments or rogue data marts present a governance challenge. This also leads to other challenges, where businesses see various versions of the truth, with no single complete view of the customer. Data is often also not accurate.
Introducing the Data Lake
The answer to a more agile environment is the implementation of a data lake, where uncertified and ungoverned data can be landed. The data lake replaces the traditional data staging area and offers formalised support for the sandbox environment.
The value in the data lake is that it provides an area in which data may be collected before its value has been demonstrated. This avoids the cost of applying full governance to the data (as in the case of publishing to the formal data warehouse) yet allows business quick access. In this sense, it provides all the capabilities of the staging area that it replaces, but it also has several other important benefits:
- A data lake can hold raw data forever, rather than just storing it temporarily.
- A data lake has compute capabilities that allow transformations, and it therefore can become a single platform for staging and ETL.
- A data lake has capabilities that allow it to be used to analyse raw data for trends and anomalies.
- A data lake can easily store semi-structured and even unstructured data.
- A data lake can store big data volumes.
The remaining challenge in this environment is that data access becomes disparate. Data users have to access multiple environments, and it may not always be obvious which data to access for the best results.
This leads us to the introduction of the Hybrid Logical Data Warehouse.
Moving to a Hybrid Logical Warehouse
Adding data federation to the picture resolves the challenges described above. Providing a layer through which users can access data regardless of where the data is physically located simplifies and therefore encourages data access, and at the same time offers IT a level of control that is impossible otherwise.
Some of the benefits that this environment offers include:
- Data can be accessed through a virtual layer, which handles the complexity of the underlying databases and data sources, yet presents a simplified view to the users and data analysts.
- Users can immediately access data from their sandbox environments, and join it to dimensions and facts deployed in the enterprise warehouse.
- Rogue data marts can be integrated back into the fold.
- Redundant data stored to allow joins within a single database instance can be eliminated, and the joins can be federated, saving on storage needed for typically large data sets.
- Databases or technologies can be retired without affecting the programs that access them. Old data can be moved around to the most appropriate platform, and federation will take care of the tools and applications access.
- Data can be located based on economics and performance reasons, rather than access considerations.
The BITanium Offering
We have implemented the Hybrid Logical Warehouse at one of our key customers in South Africa, leveraging a mixture of mainly IBM technologies with remarkable success. Further, we have implemented selected components of this architecture at many of our local customers.
Our offering coincides to a large degree with the direction that IBM is taking with both their data warehousing and analytics offerings, and increasingly allows us to leverage their cognitive technologies.
Further detail on the specific recommended technologies will be provided after the initial discovery of your own architectures and technologies that you currently employ.
Please contact us for more information.