How Finance Firms Run LLMs at Scale

Watch Video

A new framework for building AI agents.

Ideal for

Private markets firms

Your AI remembers chats. It still forgets the business.

Most enterprise AI failures are not model failures. They are architecture failures.

15 min read

—

Jul 8, 2026

Most enterprise AI fails not because of the model, but because there is no persistent layer for the system to remember with. We describe what an enterprise memory layer needs to hold, where thin architecture gets exposed fastest, and why firms that build it early compound their advantage rather than reset it.

Mind map showing connections between SharePoint, Partners, Funds, Drive, Assets, Files

Author

Man with dark hair and beard, wearing a grey shirt.

Casimir Rajnerowicz

Content Creator

Summarize

01 / The problem

Most AI projects in enterprise eventually hit the same wall.

The model is capable. The retrieval works. The demo is impressive. But when the question stops being "what does this document say?" and starts being "what does this fact mean, given everything else we know?", the system stalls.

The problem is not the model.

A model reasons over what you put in front of it. If what you give it is a handful of retrieved chunks from a document store, it will reason over those chunks. It cannot reconstruct the operating memory of a firm from fragments it has never seen connected. It cannot infer naming conventions, source hierarchies, or the decision someone made three quarters ago, unless those things live somewhere it can actually reach.

Most AI products still behave like a brilliant analyst who has joined the meeting cold. You can brief them, hand them files, ask for something useful. But the burden of context never leaves your side of the table. That is why so much of the current AI cycle feels both impressive and disappointing: the demo works, the real process still needs someone to shepherd the system through the same context every single time.

Remembering a conversation is not the same as remembering a business. A business memory is not a chat log, a project folder, or a vector index. It is a structured model of the entities, relationships, metrics, decisions, and source evidence a company runs on. Without it, the model may remember what you said. It still does not know how your firm works.

That gap between reading and remembering is going to matter more than which model you picked.

02 / The short answer

What is missing is a persistent, structured layer.

Your AI keeps forgetting the business because there is nothing for it to remember with.

The model is capable. The retrieval works. What is missing is a persistent, structured layer that holds your firm's entities, relationships, metrics, and source evidence: the foundation that turns retrieved fragments into usable knowledge.

V7 Go builds that layer. It reads the documents firms actually run on, resolves entities across inconsistent naming, maps the relationships between them, and traces every figure back to its source. The result is a knowledge graph your agents can query rather than a document store they have to reassemble on every question.

If that is the answer you needed, book a demo and bring whichever document gives your team the most friction today.

If you want to understand why the architecture matters, the rest of this article works through it: why a better model will not close this gap, what retrieval alone cannot do, and what a real memory layer needs to hold.

03 / The model

A better model will not save you.

The last two years made one thing plain: a smarter model does not hand you a smarter business system.

A model reasons over what you put in front of it. It cannot reconstruct the operating memory of a firm from a handful of retrieved chunks. It cannot infer naming conventions, reporting habits, source hierarchy, or the decision someone made three quarters ago, unless those things live somewhere it can actually use.

Walk into most enterprise AI projects and you will find the same sequence. Pick a model, bolt on a chat window, wire up an agent framework, connect the integrations. Then the answers come back inconsistent, and the patching starts: longer prompts, better retrieval, a bigger context window, more careful instructions. None of this is wrong. It just treats the symptom. The system keeps forgetting because nobody built it anything durable to remember with.

Consider a concrete scenario. A firm has three documents referencing the same investment: a quarterly fund report, a board pack, and an email thread. The fund report uses the full legal name. The board pack uses a shortened version. The email just calls it "the Northwind position." A model reading all three in isolation does not know they refer to the same entity. It treats them as three separate data points. The discrepancies between them look like conflicting information rather than three views of the same fact.

That is not a model quality problem. It is an architecture problem. And a better model does not fix an architecture problem.

04 / Retrieval

Search finds fragments. The firm runs on connections.

Retrieval-augmented generation was a necessary step. It gave models a way to reach external information and made document-heavy workflows genuinely useful. But finding relevant text and maintaining a working model of the business are different jobs. Only the first one is solved.

A vector database is good at answering "what does this document say?" It struggles with "how does this fact relate to everything else we know?", which is where the actual work lives.

Private markets shows this clearly. A firm sits on thousands of quarterly reports, LPAs, capital call notices, data room files, IC memos, board packs, CRM notes, and email threads. The data is not hidden. Anyone can open a document and read it. The problem is that the documents do not know about each other.

A valuation lands in a fund report. The methodology change that explains it sits in a board pack from six months earlier. A later memo revises the investment view. A capital call notice references the legal structure. An email from three months back explains the reason for all of it. Search can surface those fragments one at a time. What the firm needs is the layer that knows they belong together and can flag when they conflict.

That layer is not retrieval. Retrieval finds facts. The memory layer holds the relationships between them.

For years, the hard problem in document AI was extraction. Could software reliably pull structured data out of messy PDFs, scanned reports, footnotes, and reporting packs that no two fund managers format the same way? That problem still matters. But it is no longer the frontier.

The frontier is what happens after the value comes out.

A pile of extracted numbers sitting in isolated rows is not memory. It is a faster way to recreate the spreadsheet problem the firm already had.

The system has to resolve which entities are the same entity. It has to map how they relate. It has to normalise the metrics while preserving the raw source labels, because the underlying number and the label a manager chose to report are both meaningful, and silently overwriting one with the other loses information. It has to anchor every figure to a time period, because a metric without a period is not a fact, it is a liability. And it has to keep a trail back to the document it came from, because high-stakes work cannot run on "the model said so," especially when real business data is full of exceptions that need override logic rather than a silent overwrite.

Without that work, the firm ends up with accurate fragments instead of usable memory. Most tools make the document easier to process and leave the business exactly as hard to understand as before.

05 / Architecture

What a real memory layer needs.

A useful memory layer is not a folder, a transcript, or a pile of embeddings. It needs structure, and that structure has a specific shape.

It holds the entities a firm actually deals in: funds, assets, portfolio companies, vehicles, lenders, GPs, LPs, advisors, and the people who appear across all of them, often under different names, in different documents, at different moments in time. Entity resolution is one of the hardest parts of building a memory layer correctly. The same fund shows up as "Northwind Capital Group" in one report, "Northwind Capital" in the next, and just "Northwind" in a footnote. Without resolution, these are three separate data points. With it, they are three observations of the same entity. The system can query across all three, detect discrepancies, and flag conflicts for a human reviewer.

It holds relationships: which GP manages which fund, which fund holds which asset, which LP committed to which vehicle, which person appears in four places under three spellings. Relationships are what turn a pile of entities into a model of the business.

It holds metrics in normalised form — NAV, TVPI, DPI, IRR, cap rate, NOI, LTV, DSCR, EBITDA, yield on cost — without losing the messy labels the source documents used. Both matter. The normalised version lets you query across the corpus. The original label tells you what the manager chose to report, which is itself a signal about how they want performance understood.

It holds time. A TVPI reported for Q4 2024 is a different fact from the same TVPI reported for Q1 2025, even if the number is identical. Mixing periods without anchoring is how firms end up with incorrect performance histories and auditors who cannot reconcile the records.

And it holds provenance. Every figure, every entity, every relationship should trace back to the document it came from, ideally to the page and paragraph. Because when an analyst queries the system and gets an answer, the next question they always ask is the same: how do you know?

In a compliance-sensitive environment, that question is not rhetorical. A limited partner reviewing a valuation, an auditor tracing a metric, or a compliance officer verifying a disclosure needs to follow the chain from the answer back to the source document. SEC-registered investment advisers are required to maintain records that support any performance figure communicated to clients or prospects. An AI system that cannot trace its outputs back to source documents does not meet that standard, regardless of how accurate the outputs happen to be. The right answer names the document, the page, and highlights the exact paragraph so the reviewer can read it themselves. Without that trail, the system produces outputs that look like knowledge but cannot be audited. That is a worse position than a spreadsheet that makes its cell references visible.

06 / Finance

Finance breaks weak AI first.

Finance is where thin architecture gets exposed fast. The work is document-heavy, the terminology is inconsistent across managers and vintage years, the stakes are high, and the answer almost never sits in one place.

Take fund screening in the secondaries market. A buyer wants to know whether a GP's stated mark is credible, not merely whether the number appears in the report, but whether it holds up against comparable assets, comparable vintages, comparable strategies, and the firm's own prior marks across the whole corpus. That analysis depends on joining observations that live in dozens of different quarterly reports, none of which were formatted to talk to each other. Data platforms like Preqin can supply market-level benchmarks. They cannot tell you whether a specific GP's mark is consistent with its own prior methodology.

That analysis is expensive precisely because the information is scattered. Each report carries one small piece of the answer. The value is in the assembly.

The same logic applies in LP reporting. The Institutional Limited Partners Association has published standardised reporting templates for over a decade. Adoption across GPs remains inconsistent, which is precisely why the problem persists. A hundred quarterly updates from a hundred GPs use different definitions of the same metric. IRR calculated net of fees, gross of fees, since-inception, or since last quarter looks identical in a report unless you know which convention each manager applies. A memory layer that holds those conventions, anchors them to the source, and surfaces them alongside the numbers does not just make reporting faster. It changes which questions the firm can answer: not just what a GP reported, but how they reported it compared with every other GP in the portfolio.

A document tool extracts the mark. A memory layer shows how that mark sits against every other relevant observation the firm has ever made, across every fund, every period, every manager in the portfolio. The first saves an afternoon. The second changes which questions are worth asking at all, because the cost of asking drops from three hours of manual cross-referencing to a query.

The same dynamic appears in data room review, deal-level tracking, and compliance monitoring. The common thread: the answer is never in one document. The value is in knowing which documents belong together and what they mean when read alongside each other. That is a memory problem, not a model problem. Better retrieval helps at the margins. Better memory changes the kind of analysis the firm can do at all.

07 / Compounding

The systems that get more useful every time the business runs.

The strongest AI systems will not be the ones that answer a single question well. They will be the ones that get more useful every time the business runs.

Every document processed should enrich the next workflow. Every entity resolved should shrink tomorrow's ambiguity. Every relationship mapped should sharpen the next piece of reasoning. The hundredth quarterly report should not be just another extraction job. It should make the first ninety-nine more valuable — adding observations, thickening the graph, exposing the discrepancies nobody had time to chase when the reports came in one at a time.

Most tools stop short of this.

They process a document, answer the question, and reset to zero. The next document starts fresh. There is no accumulation. The analyst reviewing the hundred-and-first report has exactly the same context as the analyst who reviewed the first one. Nothing compounds.

Infrastructure does not reset. It compounds. This is the difference between a tool and a system: a tool performs a task; a system builds state. The memory layer is what turns the first into the second — and why the firms that invest in it early will have a structural advantage that widens over time rather than resetting with every model update or vendor switch.

A lot of AI strategy still treats the chat window as the product.

It is not.

The interface is a thin layer over the real system. The valuable part is everything underneath: what the agent can reach, what it can trust, what it can update, and what it carries into the next task. A better interface on top of weak infrastructure is a better experience of the same wrong answer.

The models will keep improving. Claude, GPT, Gemini, and whatever follows them will get better at reasoning — and that makes the memory layer more important, not less. A model that is only summarising a document can tolerate weak memory. A model generating an investment memo, monitoring portfolio risk, reviewing a data room, or reconciling fund reports cannot.

The model does the reasoning. The memory tells it which world it is reasoning inside.

08 / Platform

What V7 builds.

V7 is not trying to be the foundation model, and not trying to win the generic chat interface. The bet is on the layer between a firm's unstructured data and the agents that act on it.

For private markets, that means ingesting the messy material firms actually run on: PDF bundles, fund reports, raw email threads, data rooms, SharePoint directories, capital call notices, board packs, memos, and internal records. V7 Go reads all of it, across formats, periods, and inconsistent naming conventions, and turns it into something durable.

The agents extract information, resolve fragmented entities, map relationships, normalise metrics, flag discrepancies, and trace every answer back to its source in the original document. Not "the model said so." The exact page, the exact paragraph, with interactive highlighting that a compliance team or LP can follow back to source.

What comes out is not a cleaner table. It is a living memory for the firm: a knowledge graph that analysts can query, agents can use, and workflows can build on. Every new document that enters the system enriches the existing graph. The hundredth quarterly report does not just get processed. It makes everything that came before it more legible. The system compounds rather than resets.

This is exactly the kind of infrastructure V7 Go was built for: firms where the answer is never in one place, the sources never agree on naming conventions, and the cost of getting it wrong is measured in mispriced positions and LP relationships that erode slowly before anyone notices.

Most AI roadmaps open with the wrong question: which model should we use? It matters. But it is not the strategic one. The strategic question is where the memory lives. Prompts are too small to hold it. Raw documents cannot reason over themselves. Search returns fragments and leaves the relationships between them on the floor. None of those is a memory layer.

Firms that get this right do not just answer today's questions faster. They start asking better ones: what changed, what conflicts, what repeats across the portfolio, what the firm knows now that it did not last quarter. Asking stops being expensive. That is the point where AI turns into infrastructure rather than interface.

V7 Go processes a messy fund report and does something specific: resolves the entities, maps the relationships, grounds every number back to its source, and flags where two sources disagree. The second run is faster than the first. The tenth run surfaces connections the first nine missed.

If you want to see what that looks like on your own documents, book a 30-minute demo. Bring a quarterly report, a board pack, or a capital call notice, whichever gives your team the most friction today. The demo runs on your material, not a curated example.

The infrastructure bet

Memory is the product.

The next phase of enterprise AI will not be won by whoever has the longest context window. It will be won by whoever builds the best memory.

Request a demo

30 minutes