Nowadays most organizations do acknowledge the importance of using their data to optimize the way they run their business. With visions of optimized operations, streamlined strategies, and sharper decisions-making, budgets are allocated for the initiation of data analytics projects.

On day 1, typically the following questions are raised: What data do you have? What insights do you need? And how do you want to have these insights visualized? The truth is that most organizations are quite clueless at that point in time. Which makes perfect sense. There is value in the data but how can a traveler describe a place that he hasn’t explored yet?

Phase 1: exploration

Initially, we explore the entire business data landscape. This might include data from operational systems, log files, sensors, marketing data, business processes, spreadsheets, documents, websites, etc. Preferably, the exploration phase is not only limited to the organization’s own data, as there might also be useful complementary external datasets.

Once data sources are identified, typically a data profiling exercise will take place to understand the contents, relationships, quality and usability of the data. The information gathered in profiling is being documented, typically in a so called data dictionary, and modeled in conceptual (so-called logical) data models to get a proper functional understanding of the data, its characteristics and also its shortcomings.

Besides being the first stage in an organisation’s “data journey”, data exploration should be a continuous process in order to be always fully aware of the available data and its potential value.

Phase 2: curation

Now that raw data sources have been identified, we curate the data. This comprises cleaning (handling missing values, outliers  and inconsistencies), organizing, and transforming (aggregating, reshaping) raw data into a structured format that is suitable for analysis. Data from these various raw data sources might be combined (merged, joined, appended) during this process.

Curated data will of course be documented and modeled to build up and share the knowledge of the company data.

Phase 3: analysis

Once the raw data has been curated, it is ready for accurate, reliable and representative analysis. This involves applying statistical techniques, artificial intelligence (AI) techniques including machine learning models, and other analytical methods to understand data’s historical context and past performance (descriptive analytics), predict future trends, identify complex patterns and optimize decision-making (prescriptive analytics).

Phase 4: storytelling

Data storytelling is about taking complex datasets and translating them into narratives that executives, managers, and stakeholders can comprehend. Storytelling brings together diverse datasets that once existed in silos. Customer data mingles with transaction records, market trends combine with operational metrics. This brings life into insights, and enables a panoramic view of the organization’s landscape.

At the core lies visualization. Filterable charts, tables combined in dashboards, reveal trends, correlations and outliers with a simple glance. A sales manager can measure profitability, a marketing strategist gets insights in campaign impacts, and an operations director can measure process efficiency, all through visual representation.

Phase 5: action

Having the storyline, it’s time to transform the insights and findings into new business strategies, refined marketing campaigns and fine-tuned budgets. And whatever each choice the organization makes, it is backed with a data-driven essence.

So what you need…

Taking all of the above into account, and organization willing to embark on such a data journey needs a proper architecture for its data platform, depending on the use case(s), volumes of data, compute power needed, timing of the availability of data, etc. In the next blog, we will give you a brief overview of the typical components that comprise such as an architecture for a data analytics platform.