Gartner sets out to debunk big data myths


30 September 2014
Data myths

The data warehouse will continue to play an important role in the world of big data analytics, and enterprises will still need to spend considerable effort on data integration despite the "schema on read" approach promised by big data technology. These were some of the key points raised by Gartner in an advisory aimed at debunking data myths and helping IT leaders to evolve their information infrastructure strategies.

Amongst the data myths being discredited by Gartner is the belief that building a data warehouse for advanced analytics is pointless as many information management (IM) leaders consider them to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse.

However, the reality is that many advanced analytics projects use a data warehouse during the analysis. They are used by IM leaders to refine new data types that are part of big data to make them suitable for analysis by deciding which data is relevant, how to aggregate it, and the level of data quality necessary.

Another widely-held belief about the data warehouse is that the technology will be replaced by data lakes as the enterprise-wide data management platforms for analysing disparate sources of data in their native formats.

According to Nick Heudecker, a research director at Gartner, data warehouses are equipped with the capabilities to support a broad variety of users throughout an organisation. This means that IM leaders do not have to wait for data lakes in order to catch up.

Therefore, it is misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers’ analytical infrastructure because the data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies.

It is also incorrect to assume that big data technology will eliminate the need for data integration due to its potential for processing information via a “schema on read” approach to enable organisations to read the same sources using multiple data sources. Many people believe this flexibility of will enable end users to determine how to interpret any data asset on demand and provide data access tailored to individual users. According to Gartner, however, the reality is that most information users rely significantly on “schema on write” scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios.

Another data myth relates to data flaws. Gartner noted that with the increased amount of data, many IT leaders believe that their huge volume will make individual data quality flaws insignificant due to the “law of large numbers”. They assume that individual data quality flaws do not influence the overall outcome when the data is analysed because each flaw is only a tiny part of the mass of data in their organisation.

This is not the case, according to Ted Friedman, vice president and distinguished analyst at Gartner. Friedman said even though each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there will also be more flaws than before due to the increased data. Thus, the overall impact of poor-quality data on the whole dataset remains the same, with much of the data that organizations use in a big data context coming from outside, or of unknown structure and origin. In fact, the larger data also means that the likelihood of data quality issues is also higher than before, making data quality more important in the world of big data.

Gartner also pointed out that despite widespread interest in big data, actual adoption by most organisations is still at an early stage. Amongst the challenges that they face, the biggest is determining how to obtain value from data and deciding where to start. Many organisations get stuck at pilot stages, as they do not tie the technology to business processes or concrete use cases. Based on a survey by Gartner in 2014, 73 per cent of the organisations have invested or are planning to invest in the technology. Out of this, only 13 per cent have deployed the technology solutions while the others are still in the very early stages of adoption.