How is compression, associativity in-memory are a revolution for Business Intelligence project?

How to divide by 10 Business Intelligence costs?

Evolution of technology combined with database compression, in-memory  tables and data associativity is a revolution for the BI software market. But before understanding why this can generate such a huge cut of costs in BI is possible, let’s examine the traditional approach and the concepts used.

Compression associativity BI

You will learn why traditional approach in BI projects becomes costly compared to projects using the new generation BI Solutions.

Introduction

Constraints of traditional BI are incompatible with the responsiveness, creativity and autonomy that companies now expects its managers and its traditional BI analysts. A new generation of BI tools attracts by ease of use and almost immediate value. These new tools create a rupture accompanying the user’s reasoning and allowing it to follow the path of his choice to find the answers to his questions.

What are the pains of the traditional Business Intelligence approach?

STEP 1: You identify the source of information. In the Business Intelligence tools, it is handled through a data extractor. For example, you may need a SQL extractor, or an ERP extractor. In an ERP, it is handled by setting up a queue of data for each source for example sales order header, order item, invoicing header, payment item. When each queue is  setup, you may need additional fields, filter, scheduled and this modifications require  some efforts which will generate futher administration later. Costs are multiplied by the number of extractors which is usually equal to the number of source tables in the ERP system (sales order header+sales order item+delivery header+….payment item).

STEP 2: Once data are extracted, they are stored in the Business Intelligence tool in a Persitant Staging Area (PSA). It is usually a basic flat table which is there just to copy data as fast as possible. Usually, you start from templates when you are working with ERP but PSA need to be adapted, setup and need administration also. Costs are multiplied by the number of PSA which is usually multiplied by the number of source tables in ERP system (sales order header+sales order item+delivery header+….payment item) multiplied by the number of sources.

Cubes, Aggregations, ODS, PSA, compression, associativity

Heavy BI structure

STEP 3: You start to prepare data and do some data cleansing, calculation in an Operational Data Store (ODS). In one ODS, you can for example normalize data from multiple sources collected in different PSA. Data in ODS will usually collect all the details of information including identification data like order number or invoice number.  This layer is useful to help to find errors in KPIs calculation because you can map it with the source system. Costs are multiplied by the number of ODS which is frequently equal to the number of source item tables in the ERP system (sales order item+delivery item+….payment item). But, it is regrouped header values have been copied on item level and regrouped sources system.

STEP 4: Several cubes are created to provide analysis data. They group several KPIs called fact table which can be analysed with a collection of characterisitics called dimensions. KPIs can be the sales volume, delivered quantity, invoiced amounts. Dimensions could be the customer, country, group…Here the complexity is usually to calculate KPIs with all necessary characteristics. For example, if you have a dimension bill to or payer, the data may not be available in the source table of deliveries which is using sold to, ship to. You will create inter-relation between KPIs with the missing dimension using a custom code. Costs are depending on the number of datacubes and the complexity to inter-relate KPIs and dimensions in a cube.

STEP 5: Several reports are created. Those report are linked to an object either a cube or an ODS. If you pass from sumup to the details, technically you jump between two objects a cube to an ODS for example. The possible jumps need to be predefined and the mapping is defined. This is limiting the flexibility of analysis. There are often performance issues to solve. In a cube, you will need either to copy and simplify it or to create pre-calculated aggregation. Costs to maintain performance are very difficult to predict and are usually not covered by any guarantee. Performance decline directly by factor proportionnal to the volume of information.

Last but not least, there are some hidden costs to maintain the data-loading chain between Extractor/PSA/ODS/Cube. As said in the preceding example, missing data need to be rebuilt with custom routines in the chain. Data loading chain can collapse due to the characters send by the source system or non-availability of source systems. If you connect to multiple sources systems, character encoding (Kanji, cyrilic) may differ, data-cleansing and mapping routines will be needed. If you make a stuctural change, like adding a dimension in some reports, it may require to modify the Extractor, PSA, ODS, routines and all cubes and reports. The multiplication of BI objects to be adapted results in high change costs. For example, adding a dimension  can require between 8 to 15 days of IT or external resources.

What are the benefits of new technologies in Business Intelligence approach?

The new approach rely on data associativity and compression concepts applied to business intelligence. In fact the two technologies are available since the 90s. You will learn in this chapter why and how those two concepts are a revolution in the BI world?

With the generalisation of parallel processing and the availability of cheaper memory (RAM), those concepts have started to emerge  in the database area and in business intelligence since mid-2000. Each vendors implements its compression algorithm and in-memory engine. The leader QlikView is active since 1993 in the BI area. But what brings this leading technology in BI?

Native compression on disk and in-Memory, vectorised tables

Easy and flexible BI

 

QlikTech, the editor of QlikView innovated by integrating the 2 concepts of data associativity and compression together. QlikView simplifies BI by using just one object : the table. There is no more need for PSA, nor ODS, nor cubes, nor pre-aggregation.

The tables used by QlikView are natively and automatically extended with 2 concepts of data associativity and compression.

Data associativity permits to de-duplicate information. And compresion works on top of associativity to compress informations in a <<query-able>> way. The algorithm is very effective and fits exactly to capabilities of today servers which use parallel processing and available memory mostly under-used.

To be precise, each table is vectorized on the fly during the data-loading phase. It means that information is only present once in memory and on disks  on the same time while some other vendors work either compression either on disk or in memory. The vectorization uses internal binary tables. Internal binary tables handle the link between the value of an attribute and the attribute in a table. This is all internal <<cuisine>> and the  users doen’t  have to care on it.

So all-data information are de-duplicated and compressed in memory and on disks, what else?

You can build compression on database, file and recently with some new tools in memory but those tools available on the lower level of the infrastructure are not designed for BI queries. The power QlikView comes from the two concepts have been shaped to permit fast queries using the best of multiple cores processors and cheap memory. Customers like BMW achieve to query millions of records in a mouse click.

Conclusions

As it is unrealistic to change all reporting stacks in one shot , CIOs should play the card of complementarity rather than competition with the existing BI environment. 

CIOs should strength synergies between the two generations of tools with two types of actions: 

  • Identify what is not in traditional BI used and why ?
  • Initiative governance to keep information reliability.
  • Invest progressively in new BI tools