Knowledge work automation
8 min read
—
Dec 15, 2025
Do data moats still matter in the era of AI and foundation models? Learn what makes a data moat real, fragile, or obsolete, and how to build one that lasts.

Imogen Jones
Content Writer
"We have a data moat."
The phrase rolls off tongues in pitch decks and strategy sessions. But what does it actually mean? More importantly: is it real?
A data moat is a strategic asset made of information, not inventory. The premise is elegant. Gather exclusive, high-quality data. Transform it into products, insights, or decisions no one else can match. Watch your competitive barrier compound over time.
In practice, many data moats are weak, porous, or illusory. Companies confuse data possession with data leverage. They collect troves of information but lack the infrastructure or urgency to use it well.
Meanwhile, AI is rewriting the rules entirely. Models that once demanded massive proprietary datasets can now be bootstrapped with synthetic or foundation data. Your moat might not be as deep as you think.
This article is for founders and business leaders who want to build enduring value in the age of AI and algorithmic competition.
We’ll cover:
What a data moat actually is (and isn’t)
How it confers real competitive edge
Who can credibly build one
Why some moats are mirages
And how to reinforce yours before the next wave of AI commoditizes it.
What Is a Data Moat, Really?
Warren Buffett popularized the idea of an economic moat: a durable competitive advantage that protects a company from rivals the same way a moat protects a castle. In traditional industries, that moat might be a brand (Coca-Cola), regulatory licenses (utilities), or supply chain scale (Walmart).
“We're trying to find a business with a wide and long-lasting moat around it, surrounding and protecting a terrific economic castle, with an honest lord in charge of the castle … It can be because it's the low-cost producer in some area. It can be because it has a natural franchise or because of its service capabilities, its position in the consumer's mind, or a technological advantage. For any kind of reason at all, it has this moat around it.”
A data moat extends this logic into the digital age. It refers to exclusive data a firm gathers and deploys to improve products, decisions, or models in ways competitors cannot easily replicate. This might consist of proprietary user behavior data, sensor readings, transaction histories, customer insights, or operational data that only your company possesses (or possesses at greater scale and quality than others.)
Before you start hoarding user logs or wiring up every API, remember that possessing raw data means little if it’s:
Not proprietary
Not relevant to business outcomes
Not structured for use
Not tied to systems that learn and improve
Without these conditions, you’re not building a moat. You’re just building a storage unit.

Four Reasons Data Moats Matter (When They’re Real)
In theory, data is the new oil. In practice, most companies are sitting on puddles, not pipelines.
Having data is easy. Turning it into a true competitive advantage is hard. What separates the leaders from the rest isn’t just how much data they collect, but how effectively they use it, protect it, and learn from it over time.
The right data can make your offering feel almost magical. Netflix’s recommendation engine, built on years of viewing data from 200 million+ subscribers, predicts what you’ll love with uncanny precision. Spotify uses listening behavior to craft personalized playlists that keep users coming back.
Personalisation is both delightful and defensible, provided your signals are truly unique and you can keep the loop tight.
Data moats can improve both what customers see, and how companies operate. Rich internal data helps firms make sharper strategic calls. Financial and insurance companies, for instance, rely on decades of transaction and claims data to refine risk models that newcomers can’t easily match.
Exclusive data creates faster learning cycles. A SaaS platform that tracks detailed user behavior can see which features underperform and fix them almost instantly. Each release improves faster than a competitor’s because it’s fueled by its own proprietary feedback loop.
Some organizations even turn their data moat into a business itself, by selling analytics, licensing benchmarks, or opening API access to curated datasets. Think of how major banks monetize decades of market data or how social platforms sell anonymized audience insights. Done well, data becomes not just a competitive advantage, but a whole new revenue stream.
Companies with wide data moats often enjoy outsized market share and profitability, much like firms with classic economic moats do.
Who Can Build a Data Moat?
When we talk about data moats, it’s easy to imagine they belong only to the giants like Google, Amazon, Meta. These companies that operate at such scale they seem to generate defensibility by sheer gravity.
But data moats have never been solely the domain of Big Tech. Many organizations have built meaningful advantages from the data they’ve collected through ordinary business activity. Sectors like law and finance are prime examples.
Large insurers, for example, have long guarded vast troves of customer and claims data. As McKinsey notes:
“Take auto insurers, which have massive amounts of data on their customers—including their cars’ model and year, how often and far they drive, where they live, whether they’re married or single, and how many claims they’ve filed. The huge data moat that auto insurers have amassed out of this demographic and behavioral information has long represented a competitive advantage to sustain their business models and thwart potential disrupters.”
— McKinsey
And for a long time, that was precisely the point: an incumbent’s actuarial models, refined over millions of records and decades of underwriting, were not something a new entrant could replicate quickly.
Yet the picture is more complicated than it first appears. Twenty-five years ago, insurers controlled 95% of the data they had on their customers. In contrast, the past five years have seen an explosion of companies generating and selling relevant datasets. Data brokers now aggregate these sources, allowing insurers to buy access to information that once required decades to accumulate.
This doesn’t erase the value of proprietary data, but it does change the calculus. Internal datasets increasingly sit alongside widely available external data that can narrow the gap between incumbents and challengers.
Which brings us to the broader point: a data moat is no longer guaranteed simply by being large or established.
Any organization that gathers unique, hard-to-replicate data can build one (a small SaaS company with a specialized operational dataset, a regional cooperative with hyper-local agronomic insights, a logistics startup tracking supply chain disruptions.) These niches can be defensible, even without FAANG-level scale.
But at the same time, moats are more porous than they used to be.

In the next section, we’ll further examine the critiques of the data moat narrative.
The Skeptic’s View – “The Empty Promise of Data Moats”
In the a16z essay “The Empty Promise of Data Moats,” experts warn that treating data as a “magical moat” can be dangerously misleading. Many supposed data moats, they argue, are either illusory or far weaker than founders assume.
The danger here is that assuming your dataset alone provides protection means you risk neglecting the areas that actually sustain long-term advantage.
If you’re a startup that assumes the data you’re collecting equals a durable moat, then you might underinvest in the other areas that actually do increase the defensibility of your business long term (verticalization, go-to-market dominance, post-sales account control, the winning brand, etc).
A central distinction in their analysis is the difference between network effects and what they call data effects. A true network effect means each additional user directly increases the value of the product for everyone else, with Facebook being the classic example.
Data network effects, by contrast, are often described as: more users → more data → better product → more users. But as Casado and Lauten note, “there generally isn’t an inherent network effect that comes from merely having more data.”

In many cases, what looks like a data network effect is simply a scale effect. Yes, more data can improve a model, but the resulting defensibility may be overstated.
This is because data is “rarely a strong enough moat” on its own. Unlike classic economies of scale, where being larger reduces marginal cost, data scale often brings diminishing returns. Early data points add significant signal; later ones tend to add noise. Meanwhile, the cost of acquiring truly novel, non-commoditized data increases over time.
This leads to a broader critique: having data is not the same as having defensibility. What matters is how the data is used, whether it creates switching costs, and whether it ties customers more tightly to the product.
It’s worth noting that Casado and Lauten don’t claim data is useless. Rather, they encourage startups to be thoughtful.
None of this is to suggest data is pointless! But it does need more thoughtful consideration than leaping from “we have lots of data” to “therefore we have long-term defensibility”. Because data moats clearly don’t last (or automatically happen) through data collection alone, carefully thinking about the strategies that map onto the data journey can help you compete with — and more intentionally and proactively keep up with — a data advantage.
This is a nuanced view: data can be part of a moat, but rarely the entire moat. And as we’ll see next, the rapid progress of AI is both heightening the importance of data and simultaneously democratizing some capabilities that data moats used to protect.
This article was written in 2019. Since then, the rapidly increasing adoption of AI across industries has changed the way businesses think about data, and data moats. We explore this further below.
AI and Data Moats: A Double-Edged Sword
Artificial Intelligence has a special relationship with data moats. On one hand, AI technologies, especially machine learning models, thrive on data. The more high-quality data you have, the better your AI models can become. This suggests that in the age of AI, data moats are even more valuable, because they feed into AI capabilities that drive competitive advantage.
On the other hand, recent advances in AI are also democratizing capabilities that once required huge proprietary datasets, thereby potentially undermining certain data moats.
Let’s unpack this paradox.
AI as a Force Multiplier for Data Moats
If your company has rich data, AI is the tool that can make that moat truly formidable. Modern AI algorithms have an uncanny ability to extract patterns and make predictions from vast datasets. Thus, a company with proprietary data can train AI models that perform better than any off-the-shelf solution, simply because no one else has that exact data to train on.
The advantage compounds with the rise of agentic AI. Unlike traditional automation, which relies on brittle if-then logic, AI agents can reason over context, adapt to new situations, and apply judgment across workflows. This shifts AI from task execution to decision support.
This is how a data moat becomes a process moat. By embedding AI directly into workflows and analysis, proprietary data becomes an operating advantage.
For example, with V7 Go, teams can query their entire virtual data room using Index Knowledge, a proprietary technology that transforms large, information-dense documents into granular, searchable indexes. Instead of skimming files or relying on keyword search, AI agents can reason across the full body of internal knowledge.
As Alberto Rizzoli, Co-founder and CEO of V7, puts it:
"We set ourselves two goals for Knowledge Hubs. First, if the answer exists anywhere in your internal documents, our AI should find it. It shouldn't just give you its best guess, even if it takes a little longer. Second, our AI Agents must be able to query this knowledge to solve complex tasks with the same up-to-date information as your expert colleagues. We are moving from simple search to true, actionable understanding."
You can learn more about AI agents here: How to Create an AI Agent Without Code: A Practical Guide
AI, Destroyer of Moats
Now for the flip side: AI is also eroding some traditional data moats, largely through the rise of foundation models and techniques like synthetic data. Only a few years ago, one might have thought, “We have a unique dataset, so we can train an AI that no one else can.” But then came models like GPT-5 trained on internet-scale data.
Suddenly, an AI system existed that had ingested trillions of words of text, basically a significant fraction of all human knowledge on the web. This kind of model can perform tasks that previously required specialized datasets.
As one 2025 analysis bluntly put it:
“When GPT-4 can reason about complex problems using training data from across the internet, your proprietary customer dataset stops being a castle wall and starts looking like a speed bump.”
In other words, general models trained on almost everything can sometimes approximate specialized tasks well enough that the advantage of a proprietary dataset dimininishes.
The consequence of AI democratization is that a data moat strategy centered on “more, more, more data” can backfire. AI techniques achieve greater accuracy with less data, emphasizing data quality and relevance over sheer quantity.
So, is AI killing the concept of the data moat? Not entirely, but it’s changing the rules. A proprietary dataset is still very valuable when it powers something unique that broad models can’t do as well.
For instance, an AI model fine-tuned on your specific domain data can outperform a generic model on that domain, and that’s a moat. However, if the domain isn’t unique or if the generic model is close enough, your moat might not hold. The new sustainable advantages lie in how fast you learn and adapt (AI execution velocity), how well you combine AI with human expertise, and how deeply AI is woven into your unique context.
To illustrate, recall our discussion of AI agents and process integration: A competitor might copy your AI tool, but if you've spent the year moving faster, shipping more, and acting on more powerful analytics, the competitor faces an uphill battle to equal that.
Strategies to Build (and Fortify) a Data Moat
Given both the promise and limitations of data moats, how can companies build one that actually holds? The following strategies offer a practical path forward.
The foundation of a data moat is often your first-party data, which means information you directly gather from customers’ interactions with your products or services. Begin by instrumenting your operations to capture valuable data at every touchpoint (while respecting privacy and consent).
For a startup, this means building data collection into your product from day one. Customer usage logs, transaction records, user-generated content, support queries, IoT sensor readings; whatever is relevant to your business, ensure you’re capturing it in a structured way. Modern data architecture can centralize this flood of raw data.
Early on, volumes may be small, but accumulating historical data yields compounding returns later. A moat made of water that anyone can boat across isn’t much of a moat. Likewise, if the data you base your advantage on can be readily acquired or recreated by competitors, your moat may be shallow.
For example, consider publicly accessible data. If your analytics rely mainly on scraping public websites, there’s little stopping a competitor from doing the same. Or if the key data is actually owned by users (and not exclusive to you), competitors might entice users to share it with them too.
The moat can dry up if everyone has access to lakes of data via brokers or open platforms.
Practical tip: Continually audit how proprietary your data really is. If it’s becoming commoditized, focus on combining datasets or extracting insights in ways others cannot. Or shift to differentiate by capability (models, speed, service) rather than sheer data.
Raw data doesn’t create a moat. Insight does.
Invest in analytics, machine learning, and automation to extract value. Over time, the algorithms trained on your first-party data become part of your moat. Even if competitors access similar models, they won’t replicate performance without your unique signals.
Your goal is to ensure that each cycle of product use feeds back into better models, better experiences, and better performance, making it progressively harder for competitors to catch up.
Practical Tip: Treat each production model as a product: track version history, performance over time, and which datasets drove improvements. This makes it clear where your data is genuinely improving outcomes.
To truly cement a data moat, find ways to integrate data into your core value proposition, and even create new offerings.
That might mean:
Giving customers benchmarking reports based on aggregated usage data
Offering dashboards that visualize performance, risk, or efficiency
Packaging premium insights or alerts as a paid add-on
In these cases, your data is no longer just an internal asset, it’s a visible feature that deepens customer reliance on your platform.
Many companies go further: anonymized, aggregated data (or the insights derived from it) can become a new revenue stream. Fintech and market intelligence firms, for example, often sell API access to their datasets or publish industry reports distilled from their unique data. This not only generates revenue but positions them as a data hub in their space.
However, there’s a trade-off: if you monetize data too directly, you might inadvertently give away parts of your moat. A competitor can buy your reports, learn your metrics, and close some of the gap.
Practical Tip: Bias toward monetizing insight, scoring, and recommendations rather than raw data exports. Think: “risk score” or “benchmark percentile” instead of “here’s the full dataset.” This lets you monetize value without fully exposing the crown jewels.
A moat is only good until it’s breached. In the context of data, defending your moat means guarding against data leaks, imitation, and loss of trust.
Security and compliance are paramount. A major data breach can not only erode customer trust but potentially hand your hard-won data assets to the public or hackers.
Platforms like V7 Go are built with stringent security and privacy certifications to ensure that the data flowing through AI workflows remains protected.

When everyone from marketing to product to ops can make data-informed decisions, your moat deepens (because you’re squeezing insight and advantage from data that others might leave idle).
Executive support is key: leaders should champion data-driven experimentation and fund the necessary data infrastructure. The payoff is an organization that not only has unique data, but also unique know-how in exploiting data. That knowledge and process can become a “process moat” in itself. Competitors might acquire similar raw data, but if they lack the internal capability or culture to use it effectively, they won’t catch up.
Practical Tip: Run a recurring, cross-functional “data wins” review, where teams share a decision or improvement driven by data. This normalizes data-driven behavior and uncovers hidden opportunities for compounding advantage.
Not all data is created equal. A trap is to focus on amassing lots of data while neglecting the signal-to-noise ratio and freshness of that data. A huge dataset that is noisy, outdated, or irrelevant can even be a liability – it can confuse your models and analysts. As discussed earlier, more data has diminishing returns if much of it is repetitive or low-quality.
Ironically, a company can become so protective of its data moat that it fails to adapt to new realities, essentially trapped behind its own castle walls. We see this when incumbents cling to their traditional data and resist using external data or partnerships out of fear of diluting their moat.
For instance, some banks now share anonymized fraud data via consortiums. Yes, it “erodes” their individual moat, but collectively they stay ahead of fraudsters, which benefits each bank.
Practical tips: It’s better to have the right data (even if less in quantity) than to have Big Data you can’t fully utilize. Focus on filling data gaps that improve predictive power, and prune data that no longer adds value. You might find you hit an accuracy plateau and further data doesn’t help; that’s a signal to shift effort to other areas (features, algorithms, etc.).Keep scanning the horizon for new data sources and be willing to collaborate or buy data when it makes sense.
Companies are increasingly discovering that the moat isn’t the data itself, it’s the ability to put that data to work.
Platforms like V7 Go exemplify this shift. Rather than simply storing and analyzing data, these systems are built around a different premise: that competitive advantage comes from the velocity at which you can turn raw data into production automation systems, and the depth of insight you can extract from smaller, higher-quality datasets.
When one company can:
extract deeper insights from smaller, more relevant datasets,
automate time-consuming tasks using its own processes and domain context, and
continuously refine those systems through feedback,
… the advantage compounds in a way raw data ownership never could.
The competitive advantage emerges because this data-driven improvement happens automatically and continuously. While competitors might eventually access similar tools, they can't replicate the months or years of interaction data that has trained your agents to understand your specific customers, products, and business context. Your agents become increasingly effective at handling your unique challenges in ways that generic systems simply cannot match.
The strategic question for leaders is no longer “Do we have enough data?” but “Can we operationalize our data advantage before someone else operationalizes theirs?”
Put Your Data to Work
If there’s one theme running through the debate on data moats, it’s this: data has value only when you can turn it into action.
Not just when it sits in a warehouse, not when it accumulates by accident, but when it fuels smarter decisions, faster operations, and automated processes that compound over time.
AI has made this both easier and more urgent. Foundation models have leveled the playing field in many areas, eroding moats based on raw data volume alone. At the same time, companies that operationalize their data, feeding it into AI systems that learn continuously, automate workflows, and adapt to their unique business logic, can build an advantage that is far harder to copy.
Rather than leaving your data idle or relying on brittle, one-off models, V7 Go turns your documents, records, and operational knowledge into the foundation for accelerated, agentic workflows.
If you’re ready to put your data to work—not just to store it—book a demo of V7 Go and see how quickly you can turn your data into living, compounding automation.













