sdecoret - stock.adobe.com

Generative AI can improve -- not replace -- predictive analytics

Generative AI improves predictive analytics through synthetic data generation. Managing data bias and ethical AI risks can enable GenAI to widen the scope of simulated outcomes.

Donald Farmer, TreeHive Strategy

Published: 26 Feb 2024

New GenAI capabilities can improve the insights data teams glean from predictive analytics. Creating synthetic data helps predictive analytics simulate a wider range of future outcomes.

Before generative AI caught organizations' attention, the most advanced analytics available were data mining and predictive analytics. Data mining looks for patterns within historical data and predictive analytics forecasts future events based on those patterns.

Predictive AI can accelerate this process by using machine learning to sift through large data sets autonomously to identify patterns. But models that rely solely on backward-looking data might struggle to anticipate trends and events as rapid economic, technological and social changes make historical data outdated. In an increasingly dynamic world, forecasting requires looking beyond retrospective data to model a wider range of potential possibilities.

Generative AI models, on the other hand, have the potential to upgrade predictive analytics techniques by creating synthetic data, which enables predictive analytics tools to simulate a wider range of potential events and outcomes. Generative AI can increase model accuracy and broaden the use of predictive analytics applications across industries to support intelligent and automated decision-making.

Graphic showing the applications of AI, predictive analytics and how they complement each other to improve decision-making. — AI and predictive analytics applications improve each other's capabilities and decision-making.

However, the new technology also presents challenges, including data bias and ethical concerns. Responsible development can mitigate the risks and maximize benefits.

Generative AI augments predictive analytics

Independent of AI, predictive analytics is valuable for forecasting and planning because it can discern meaningful patterns and insights from the real data of a business with machine learning algorithms and statistical analysis. These models struggle if the data becomes outdated.

Generative AI models, such as the GPT or Falcon series, offer the ability to create new content, shifting from merely categorizing existing data to simulating future possibilities. This capability aligns well with the needs of predictive analytics. For example, in finance, predictive analytics could only forecast future events such as purchases, loan defaults or insurance risks by finding patterns from the past and applying the patterns in new situations. But generative AI can create synthetic data to capture a fuller range of potential outcomes. Models can mitigate the cold start problem, which arises in predictive analytics when the system has insufficient data to make accurate predictions. A common scenario of the cold start problem occurs when designing a new campaign or offer for completely new products with little or no historical data to draw on.

Similarly, generative synthetic data creates new information that models can use to make more accurate predictions by simulating thousands of potential future scenarios to better anticipate rare events or unusual cases.

Generative AI uses available data from similar scenarios along with business domain knowledge it trained on to create a statistically sound foundation for generating synthetic data. The techniques are quite sophisticated. For example, look at the new capabilities generative AI tools bring to visual models. Bots such as DALL-E or Midjourney can create convincing paintings or drawings of new scenes, often in the style of existing art genres, because they train on many previous examples.

Similarly, synthetic data creates new information that models can use to make more accurate predictions by simulating thousands of potential future scenarios to better anticipate rare events or unusual cases. For example, in forecasting future sales over time, combining AI-generated data with traditional models improves performance, even with noisy and irregular historical data.

Understand the risks of GenAI

Generative AI systems are not perfect. Anyone using generative AI chatbots or image generators might soon encounter problems. For example, Midjourney quickly generated a good image of a zebra eating a pineapple, but the zebra was wearing a suit and eating an orange rather than a pineapple. Commonly known as hallucinations, the problem is particularly noticeable in natural language and image generation. If similar issues occur with synthetic data, the predictions could be seriously wrong.

Midjourney uses AI to generate images, including an attempt at making a zebra eating a pineapple. — AI-generated image from Midjourney using the prompt 'a zebra eating a pineapple in the style of Leonardo da Vinci.'

Users can apply several methods to mitigate hallucinations. The most common method for synthetic data is to apply real-world constraints similar to running a data quality application against real-world data.

Another option is to apply business rules, such as retail inventory management policies, to simulate realistic scenarios. Other business rules may encapsulate regulatory requirements or ethical standards. When integrated into synthetic data generation, they help maintain compliance with legal and ethical guidelines to ensure the synthetic data generated is suitable and permissible for use.

Pitfalls of data bias

Legal and ethical guidelines are critical when dealing with bias in a real-world dataset. Relying on historical datasets collected under outdated social and technical standards is not enough. Smart augmentation of real samples with synthetic ones can boost model fairness in applications ranging from healthcare to hiring.

But data alone is not enough, either. Algorithm designs should also foster inclusion from the start.

AI professionals have a toolkit of techniques they can use to mitigate data bias concerns. One such method is setting a fairness threshold, which lays down a measurable limit on how much bias is acceptable in the outcomes produced by an AI model. For example, a fairness threshold could mandate an AI model should exhibit no more than a 5% variation in accuracy among different groups, especially when these groups are distinguished by protected attributes such as race or gender.

Establishing well-thought-out fairness thresholds encourages algorithms to maintain a consistent level of performance across diverse groups. Tightening the thresholds promotes more equality, whereas relaxing thresholds permits greater variation. Adjusting the thresholds is a key measure AI developers can take to curb bias and ensure the AI technology is beneficial to all groups impartially.

Pairing fairness thresholds with other inclusivity-oriented practices is vital to maximize the positive effects of AI while minimizing any negative repercussions, especially when using predictive analytics or automated decision-making.

The promise of GenAI and prediction

Generative AI is a new technique for enhancing predictive analytics. Predictive models can now tap into a much wider set of possibilities compared to training on historical data alone. As generative algorithms continue to advance, their integration into forecasting and predictive tasks has the potential to enable more accurate and inclusive models across industries.

Responsible oversight is essential, but generative augmentation offers organizations the opportunity to benefit their own operations and customers' experiences.

Donald Farmer is the principal of TreeHive Strategy, who advises software vendors, enterprises and investors on data and advanced analytics strategy. He has worked on some of the leading data technologies in the market and in award-winning startups. He previously led design and innovation teams at Microsoft and Qlik.

Next Steps

Will generative AI replace data analysts?

Generative AI can improve -- not replace -- predictive analytics

Generative AI improves predictive analytics through synthetic data generation. Managing data bias and ethical AI risks can enable GenAI to widen the scope of simulated outcomes.

Generative AI augments predictive analytics

Understand the risks of GenAI

Pitfalls of data bias

The promise of GenAI and prediction

Next Steps

Dig Deeper on Data science and analytics

Comparing real-world, synthetic and de-identified data

Industry experts share top tactics for AI-powered analytics

4 high-value use cases for synthetic data in healthcare

How GenAI-created synthetic data improves augmentation

Generative AI augments predictive analytics

Understand the risks of GenAI

Pitfalls of data bias

The promise of GenAI and prediction

Next Steps

Related Resources

Dig Deeper on Data science and analytics

Comparing real-world, synthetic and de-identified data

Industry experts share top tactics for AI-powered analytics

4 high-value use cases for synthetic data in healthcare

How GenAI-created synthetic data improves augmentation