AI Business Value Is Not an Oxymoron: How Predictive AI Delivers Real ROI for Enterprises
Eric Siegel reveals why predictive AI often fails to deliver—and how valuation, not just accuracy, is the key to turning models into measurable business outcomes.
“Shouldn’t a great model be a sure bet to deploy?”
Eric Siegel opened his AI Realized keynote by answering his own question. A technically strong model is not a sure bet, he argued, because value does not come from accuracy alone. It comes from deployment decisions that are justified in the language of business, not in the language of data science. Until a model’s impact is estimated in terms of profit, savings, or other concrete KPIs, it remains a science project, not an operational asset.
That tension sat at the heart of his session, “AI Business Value Is Not an Oxymoron.” In an event full of executives who are under pressure to turn AI from experiments into real outcomes, Siegel put predictive AI back on center stage. Generative AI grabs attention now, he acknowledged, yet predictive AI has quietly been delivering value for decades in marketing, credit scoring, fraud detection, logistics, and more. It is also failing to launch at a discouraging rate. Only about one in five data scientists say their predictive initiatives usually make it into production.
Watch the full AI Realized Summit presentation:
Why valuation must come before deployment
The diagnosis was blunt. Predictive AI projects often skip a critical step: valuating the model, in business terms, before deployment. Teams evaluate models with precision, recall, AUC, and lift, but they do not estimate the dollar value or KPI impact of acting on those predictions. As Siegel put it, that missing step is a major reason predictive AI “usually fails to launch.”
[PULL QUOTE: “A model that never goes into production has zero business value, no matter how elegant its metrics look.”]
How predictive AI really works
Siegel grounded the discussion in a simple mental model. Business needs prediction, prediction requires machine learning, and machine learning depends on data. Reverse the chain and you get the work of predictive AI. You take historical data, train models, and then use those models to assign risk or response scores to individuals, accounts, or events. The model does not decide what to do. It ranks cases, so that operations can treat different segments differently.
[PULL QUOTE: “Business needs prediction, prediction requires machine learning, and machine learning depends on data.”]
He likes to depict models as “golden eggs” that emerge from a machine learning pipeline. Each egg is a model designed for a specific use case: which customer will cancel, which transaction is fraudulent, which prospect will respond, which invoice is likely to default. Deployed models sit inside large-scale operations and nudge them in smarter directions. They boost sales, cut costs, prevent fraud, and reduce risk across a long tail of applications.
The catch is that many eggs never hatch. Models are trained, evaluated on held-out test data, and celebrated in technical terms. They prove that they predict better than guessing or better than a baseline method. Then they stall. Operations teams hesitate. Business owners do not see a clear, quantified upside. Other priorities crowd in, and the model ends up “collecting dust,” as Siegel put it, instead of steering decisions.
For executives who have watched promising pilots quietly disappear, this diagnosis felt familiar. The issue is not that predictive AI lacks power. Harvard Business Review has called it the most important general purpose technology of the century, precisely because it improves existing operations rather than inventing entirely new ones. Siegel’s point was that the industry has professionalized model evaluation, yet has neglected model valuation.
Prediction as operational triage
From Siegel’s perspective, the core of predictive AI is triage. He illustrated the concept with a story from his teaching days.
He would ask students to write down the diagonal size of the largest TV in their home, including zero if they had no TV. Then he would ask a simple yes or no question: “Do you have a Netflix subscription?” Arranged from smallest TV to largest, the students naturally formed a ranking of likely subscribers. Large TVs clustered at one end, and that end was heavily populated with “yes” answers.
No model can perfectly predict who will click, buy, or churn. What it can do is sort the population so that the top slice contains a much higher concentration of positives than the average. That is often enough to drive significant financial lift, as long as the business is clear about who to act on and how.
In marketing, the sorted list might determine which customers receive an expensive direct mail offer, and which are left out. In credit, it might determine who receives an offer at all, or which applicants require manual review. In fraud or cyber risk, the ranking influences which transactions to flag, block, or send to investigation. In each case, the model itself is only half the story. The other half is where you “draw the line” on that ranked list.
Decision boundaries as strategy
Siegel’s central claim is that this line, the decision boundary, is a business decision. It is not determined by the model. It represents a tradeoff between competing objectives, such as profit versus coverage, or fraud loss versus investigation cost. Where you place it can have more impact on project value than incremental gains in model accuracy.
He illustrated this with a direct marketing example that many in the audience could map to their own organizations. Imagine a list of one million customers who could receive a promotional offer. The model assigns each a probability of response. On a chart where the x-axis shows what fraction of the list you target, and the y-axis shows profit, the curve rises as you add the most likely responders, then flattens, and finally declines as you start contacting marginal customers who do not buy but still cost money to reach.
At some point on that curve there is a peak. If your sole objective is to maximize expected profit, that peak tells you how many customers to contact. Perhaps it suggests contacting only twenty four percent of the list, not the entire million.
Yet many businesses will decide to operate at a different point. Suppose the campaign launches a new product, and the marketing leader wants broader reach. They might accept a lower profit in exchange for awareness and growth, for example by contacting seventy three percent of the list and essentially breaking even. That is still far better than contacting one hundred percent and losing money. The model organizes the opportunity. The decision boundary expresses strategy.
The same pattern applies in fraud detection. Rank transactions by risk, then decide how many to audit or block. Audit too few and you miss fraud. Audit too many and you annoy customers, slow commerce, and flood operations. There is a Goldilocks zone where risk reduction and operational cost find an acceptable balance.
The key lesson for executives is that there is no single “goodness number” for a model. AUC, F-score, and accuracy describe how well the model separates classes, but they say little about which decision boundary is right or how much value you can expect. “Model goodness is never a single number,” one of his slides insisted.
[PULL QUOTE: “The key lesson for executives is that there is no single ‘goodness number’ for a model.”]
The communication gap around metrics
Siegel then turned to the communications gap between data science teams and business stakeholders. On one side, data scientists are trained to speak in technical metrics. They produce confusion matrices, ROC curves, and lift charts. On the other side, executives want to know how much money the organization might make or save, which risks it will reduce, and how the project will move their KPIs.
He showed a slide that captured the gap in a few quotes. Harvard Data Science Review has described standard technical metrics as “fundamentally useless to and disconnected from business stakeholders.” Another practitioner warned, “Don’t show a confusion matrix to executives.” A research paper on effective model monitoring pointed out that technical performance measures “do not account for the business KPIs that business leaders rely on to evaluate model effectiveness.” These comments came from his slides, but they echoed what many leaders attending the summit have experienced.
To illustrate the disconnect, he contrasted two survey responses from data scientists. When asked what metrics matter most to them, they ranked business metrics such as ROI and revenue at the top. When asked what metrics they actually use most often, they named lift and AUC. The intent is there, but the practice has not caught up.
The consequence is predictable. Projects advance through data collection and model training, yet stall at the decision point. Stakeholders lack a clear, quantified view of upside and downside. Without a business case tied to their KPIs, they scrub the launch or quietly redirect resources. From Siegel’s vantage point, this is not simply a communication problem. It is a missing piece of the workflow.
Building a business console for predictive AI
That missing piece is what he and his colleagues at Gooder AI set out to build: a “business console for predictive AI.” Instead of treating valuation as a one-off spreadsheet exercise, they turned it into a first-class, interactive step in model development.
The console allows teams to configure business metrics, such as profit, savings, or any custom KPI that matters. It then evaluates models over a range of decision boundaries and business assumptions. On one screen, a profit curve shows how earnings change as you contact more or fewer customers, audit more or fewer transactions, or approve more or fewer loans. Other views show tradeoffs between competing KPIs.
Crucially, the assumptions that drive these curves are explicit and adjustable. Cost per contact, average revenue per responder, fraud loss per undetected case, investigation cost per alert, and other factors are all expressed as parameters. They can be moved with sliders, and the curves update in real time. That interactivity is not decoration. It acknowledges that business value “hinges on assumptions,” and that different stakeholders may reasonably disagree about those inputs.
When executives see the curves move in response to their own assumptions, they are no longer being asked to accept a single magic number. They are participating in the valuation. They can ask, “What if investigation costs increase by thirty percent?” or “What happens if our response rate is lower than expected?” The console lets them explore answers before committing to deployment.
Siegel described this as the second thesis behind Gooder AI. The first thesis is that predictive AI projects must estimate business value before deployment. The second is that such valuation requires an interactive experience that reveals how assumptions affect value, rather than hiding them behind a static score.
Using generative AI to explain value
To make the console approachable, his team added a generative AI chatbot on top. Built with Claude, the chatbot acts as a thought partner. It explains profit curves, decision thresholds, and tradeoffs in plain language. It can even reframe concepts as stories for different audiences.
Siegel shared a short example from a video demo. After the console plotted a profit curve for a marketing campaign, the chatbot explained why the curve climbs and then falls. Asked to translate that explanation for a ten-year old, it told a story about a lemonade stand with a “special machine” that predicts which kids are most likely to buy. At first, the kid running the stand calls out only to those who love lemonade and makes money. Over time, they call out to everyone, including kids who are not thirsty. Eventually they spend more on lemons than they earn.
That small story made a bigger point. Predictive AI is not inaccessible magic. It is a quantitative way to express familiar business dynamics: diminishing returns, marginal cost, and tradeoffs between reach and efficiency. The chatbot does not decide what to do. It helps stakeholders understand what the curves already show.
[PULL QUOTE: “Predictive AI is not inaccessible magic. It is a quantitative way to express familiar business dynamics: diminishing returns, marginal cost, and tradeoffs between reach and efficiency.”]
For executives who must steer non-technical teams through AI initiatives, this blend of interactive valuation and conversational explanation offers a pragmatic path. It does not ask business leaders to learn the math behind AUC. It invites them into a dialogue about risk and reward.
Hybrid AI and leadership
Throughout the keynote, Siegel returned to the idea that predictive AI and generative AI are complementary. Predictive models determine who is likely to do what, while generative models can help design what to say or do in response. In his conference work, he refers to this combined landscape as “hybrid AI.” The value in that hybrid, he suggested, still depends on the same fundamentals: aligning models with business objectives, estimating value before deployment, and treating AI adoption as an operational change, not a lab experiment.
This point resonated with leaders from very different organizations. Executives from regulated industries heard echoes of their own governance concerns. They cannot justify deploying models without a clear business case and an understanding of risk. Leaders from high-growth digital companies recognized the pressure to show uplift and ROI, not just impressive demos. Even those focused on generative assistants and agents saw the parallel. Without metrics that matter to the business, generative initiatives risk the same fate as under-valued predictive projects.
Siegel framed the opportunity as both a technical and cultural shift. Technically, organizations need tools and processes that treat valuation as a first-class step. Culturally, data scientists need to see themselves not just as model builders, but as sellers of operational change. Their job is not finished when the model scores well on a test set. It is finished when operations leaders can say, “I understand where this improves my KPIs, and I am comfortable changing the workflow.”
[PULL QUOTE: “Data scientists need to see themselves not just as model builders, but as sellers of operational change.”]
From silent failure to explicit decisions
In one of the most candid moments, he described the quiet way many failed projects disappear. There is seldom a dramatic cancellation. Instead, attention drifts. Stakeholders do not feel equipped to make a go or no-go decision, so they make no decision at all. Resources shift to other priorities. The model moves from production candidate to archive.
Bringing valuation into the process changes that dynamic. When everyone can see the range of possible outcomes, including best case and worst case under different assumptions, the decision may still be “not yet,” but it will be explicit. More often, he argued, it will be “yes, if,” followed by clear guardrails and monitoring plans. Predictive AI becomes something the business can reason about, rather than something it nervously defers.
As the session wound down, Siegel returned to the title: “AI Business Value Is Not an Oxymoron.” The phrase is a reaction against the idea that AI is inherently hard to quantify, or that business value will somehow emerge after enough experimentation. He insisted that value can be estimated upfront, with all caveats and uncertainties spelled out. That estimation is not a luxury. It is a requirement for deployment at scale.
For the AI Realized community, which spans industries from consumer platforms to healthcare, manufacturing, and financial services, the message was consistent with the event’s broader themes. Leadership in AI is not just about picking technologies. It is about designing processes and conversations that let organizations decide where AI deserves a seat in operations, where it does not, and how to keep those decisions grounded in measurable impact.
[PULL QUOTE: “Leadership in AI is not just about picking technologies. It is about designing processes and conversations that let organizations decide where AI deserves a seat in operations.”]
Key Takeaways & Executive Guidance
Treat predictive AI as a triage engine for operations, and pair every model with an explicit decision boundary that reflects business strategy, not just technical performance.
Require model valuation before deployment, using interactive tools or scenarios that express outcomes in profit, savings, and other KPIs executives already use to run the business.
Recognize that model goodness is never a single number, and that placement of the decision threshold can have more impact on value than incremental gains in accuracy.
Close the communication gap between data science and business teams by translating technical metrics into business metrics and making assumptions visible, adjustable, and discussable.
Use conversational interfaces, such as AI-powered chatbots, to explain profit curves, tradeoffs, and scenarios in plain language for different audiences, from frontline managers to board members.
View predictive and generative AI as parts of a hybrid system, where predictive models decide who to act on and generative systems help decide how, with both governed by the same discipline of pre-deployment valuation.
About Eric Siegel
Eric Siegel, Ph.D., is a former Columbia University professor who helps companies deploy machine learning. He is cofounder and CEO of Gooder AI, the founder of the Machine Learning Week conference series, the instructor of the online course “Machine Learning Leadership and Practice – End-to-End Mastery,” and a frequent keynote speaker. He wrote the bestselling Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, which has been used in courses at hundreds of universities, as well as The AI Playbook: Mastering the Rare Art of Machine Learning Deployment.





Powerful breakdown of the valuation gap in predictive AI. The point about decision boundaries carrying more weight than marginal accuracy gains is crucial for operations teams, yet gets burried under technical metrics. Making KPI tradeoffs interactive rather than static fixes the "silent failure" problem where projects stall becuase stakeholders can't quantify upside. The profit curve approach should be standard in every deployment workflow.