By Robert Eve
"In the struggle for survival, the fittest win out at the expense of their rivals because they succeed in adapting themselves best to their environment." - Charles Darwin, The Origin of Species
Data Integration (DI) technology, (specifically, extract, transform, and load (ETL) middleware), when combined with an intermediate data store such as a warehouse or mart, has played a key role in advancing business intelligence (BI) and performance management since the mid-1990s. Virtualized DI evolved from these technologies in the mid-2000s. Alternatively known as virtual data federation or enterprise information integration (EII), virtual DI eliminates the intermediate data store by leveraging high-performance query techniques that let the consuming application pull data directly from the source, in real time.
The next evolutionary DI step is currently in a nascent stage. Data discovery revolutionizes how business professionals can leverage enterprises’ ever-expanding data assets, thus changing the competitive dynamic with its speed and simplicity.
Drivers of the DI Evolution: Data Volume and Source Complexity
Recently, IDC estimated the rate of compound enterprise data growth to reach nearly 60 percent annually. In other words, enterprises will likely have 10 times today’s data by 2013, and 100 times by 2018.
Concurrent with this growth has been the rapid expansion of data complexity. Data can be structured in rows and columns within transactions systems. Data can be unstructured in documents stored on desktops. Recent advancements with new XML standards have opened the door to semi-structured data, which is often available through web services in a ervice-oriented-architecture (SOA).
Today’s enterprises typically have hundreds, if not thousands, of unique, structured data sources built, bought and/or acquired via merger. Each has its own syntax, access methods, metadata and more, presenting myriad challenges to the access and use of these information assets. New applications, such as management portals, e-commerce solutions and performance analytics that require data from diverse sources, add more complexity. These applications need data in a specific format not typically compatible to how data is stored in its original sources.
Beyond volume and complexity, time to solution is another significant factor. Business change equals IT change. The proverbial endless backlog greatly impacts enterprises’ abilities to successfully adapt to market changes. Accelerating new projects through better tools and/or fewer steps has become more important than ever.
Helping Business Professionals Discover Their Data
Regardless of the DI approach taken, IT professionals are currently the primary go-to-data source in today’s enterprises. Business users such as business analysts, engineers, scientists, production planners, customer service managers are the primary data consumers. This business dependency often creates frustrations between groups. Facing numerous requests, IT daily deals with backlogs and delays. To operate most efficiently and meet these high volumes of demand, IT requires that data consumers request the exact information they need. Yet, this isn’t as easy as it sounds.
Daily, business professionals face new and potentially unforeseen problems. They also need to resolve unanticipated issues or answer new questions as they arise. This variability makes it difficult to anticipate what information will be required prior to making informed decisions, answering questions or resolving issues.
Data Discovery - Regaining Competitive Advantage
Data discovery applications are end-to-end solutions that let business professionals “do it themselves” with minimal IT assistance. Complementary to existing reporting and analytic solutions, data discovery currently opens the door to structured and semi-structured data across the enterprise.
Specifically, data discovery allows business users to find the data they need using a keyword search paradigm, relate that discovered data to other data in the enterprise to get a complete picture, and then share the results with colleagues using applications such as Microsoft Excel. In short, the business user can go from raw data to having his or her question answered in a few minutes, with minimal to no IT involvement.
IT’s Role in a Data Discovery Environment
Because business professionals require less IT intervention in the data discovery process, what is the evolved role of IT? IT provides critical, behind-the-scenes expertise in typical data discovery deployments. At set-up, IT installs the tool, giving account credentials and privileges to users. Next, IT grants access to source data domains. Finally, IT runs the data indexer as well as the relationship discovery tool. These set-up activities typically represent a few days’ work.
During runtime, IT periodically updates data indexing and relationships to ensure searchable data remains fresh. Further, IT administers users and adds new data sources to correspond with on-going organizational and system changes. In addition, IT can make data discovery easier and more productive by adding annotations, aliases, domains, synonyms and views.
Early adopter scenarios demonstrate that IT provides incremental support efforts that are relatively minor. An added benefit: data discovery products keep data security risks low by leveraging and conforming to existing security paradigms and controls, down to the row and column levels. Because data discovery tools are non-invasive, they add little burden to existing architectures and operations. Finally, ITs overall workload is likely to be reduced through the elimination of a large percentage of new reports and other requests from the more self-sufficient business professionals.
When to Implement Data Discovery