There is general acceptance in business today that Analytics is the new competitive edge. The business that is able to make better decisions quicker is likely to be the one to outperform the market.
What is not generally understood is how to go about making data available to all the decision makers throughout the organisation. Many organisations are still stuck in traditional environments, where IT bears the brunt of publishing data or reports to business users. And this leads to inevitable delays and complications in enabling decision-making based on fact. The typical slow response from IT often means that business is flying blind, and that data insights arrive too late to be useful.
The challenge today is to change this paradigm, and to truly enable business to utilise data and analytics in reaching their business decisions.
Some of the processes that are involved in this data value chain include storage and governance of the data, extracting insight from the raw and refined data and granting access to data for non-technical and technical people alike.
This document addresses three main areas that should be addressed in order to deliver this promise of empowerment through data analytics:
- The modern data warehouse environment
- Enabling business intelligence through Search
- Putting analytics in the hands of business
THE MODERN DATA WAREHOUSE
Traditional Data Warehouse Environment
The diagram below depicts a traditional data warehouse environment.
In this traditional approach, data is fed from various sources. The data is Extracted, Transformed and Loaded (ETL) into a staging environment or Operational Data Store (ODS), and after cleansing and certification, published into the formal Data Warehouse environment. From here it is consumed by a variety of reporting and analytics applications.
The main challenge with this formal, very structured approach is that the environment is typically quite complex, and by design resistant to change. In practical terms, this means that business does not get answers quickly, and that new requirements take a long time to accommodate.
This unresponsiveness from IT typically led to business creating parallel environments from which it can gain quicker access to data. But these “sandbox” environments or rogue data marts present a governance challenge. This also leads to other challenges, where business sees various versions of the truth, with no single complete view of the customer.
INTRODUCING THE DATA LAKE
The answer to a more agile environment is the implementation of a data lake, where uncertified and ungoverned data can be landed. The data lake replaces the traditional data staging area, and offer formalised support for the sandbox environment.
The value in the data lake is that it provides an area in which data may be collected, before its value has been demonstrated. This avoids the cost of applying full governance to the data (as in the case of publishing to the formal data warehouse) yet allows business quick access. In this sense, it provides all the capabilities of the staging area that it replaces, but it also has several other important benefits:
- A data lake can hold raw data forever, rather than just storing it temporarily.
- A data lake has compute capabilities that allows transformations, and it therefore can become a single platform for staging and ETL.
- A data lake has capabilities included that allow it to be used to analyse raw data for trends and anomalies.
- A data lake can easily store semi-structured and even unstructured data.
- A data lake can store big data.
The remaining challenge in this environment is that data access becomes disparate. Users of the data have to access multiple environments, and it may not always be obvious which data to access for the best results.
This leads us to the introduction of the Hybrid Logical Data Warehouse.
MOVING TO HYBRID LOGICAL WAREHOUSE
Adding data federation to the picture resolves the challenges described above. Providing a layer through which users can access data regardless of where the data is physically located simplifies and therefore encourages data access, and at the same time offers IT a level of control that is impossible otherwise.
Some of the benefits that this environment offers include:
- Data can be accessed through a virtual layer, which handles all the complexity of the underlying databases, yet presents a simplified view to the users and analysts.
- Users can immediately access data from their sandbox environments, and join it to dimensions and facts deployed in the enterprise warehouse.
- Rogue data marts can be integrated back into the fold.
- Redundant data stored to allow joins within a single database instance can be eliminated, and the joins can be federated, saving on storage needed for typically large data sets.
- Databases or technologies can be retired without affecting the programs that access them. Old data can be moved around to the most appropriate platform, and federation will take care of the tools and applications access.
- Data can be located based on economics and performance reasons, rather than access considerations.
We have implemented the Hybrid Logical Warehouse at one of our key customers in South Africa, leveraging a mixture of mainly IBM technologies with remarkable success. Further, we have implemented selected components of this architecture at many of our local customers.
Our offering coincides to a large degree with the direction that IBM is taking with both their data warehousing and analytics offerings, and increasingly allows us to leverage their cognitive technologies.
Further detail on the specific recommended technologies will be provided after the initial discovery of your own architectures and technologies that you currently employ.