Why the Most Defensible Voice AI Will be Vertical

1 April 2026
Philipp Werner

In early 2026, ElevenLabs raised a $500M Series C at an $11B valuation, reporting $330M ARR and billions of voice interactions across 70 languages. Conversational latency of under 300ms (where conversation feels natural) has been achieved at the API layer. Then, Mistral launched its first text-to-speech model, signalling that even foundation model labs see voice as a critical frontier. And according to Sifted, European voice AI startups alone have raised over €500M in 2026, a pace that will outstrip last year’s total within months.

It would be easy to think that voice AI is now a solved problem, or at least, a race that can no longer be won by new entrants. But walk into a hospital ward, an insurance operations centre, or a financial compliance floor, and the reality looks very different. Phones still ring. Claims adjusters still type into decades-old systems. Field workers still repeat themselves into headsets that capture only half of what they say.

It is in these highly regulated and complex industries where voice automation will have the highest impact. They are also the ones where general models fall short, because “mostly correct” in these contexts is still wrong. A single incorrect digit on a claim form or a misheard name during KYC can derail an entire workflow. So as the battle shifts from building models to winning customers, the question is no longer who has the best model, but who can observe and understand complex vertical workflows well enough to own them. And that race is still wide open.

The demand for depth over breadth

Earlier this year, my colleague Dylan wrote about why application companies still win when code is free. His argument was that enterprise buyers don’t just purchase lines of code, they purchase eight constants: reliability, repeatability, security, brand accountability, service, business case, roadmap, and informed opinions on how a workflow should actually run.

The last one is the most significant, because the long tail of enterprise employees doesn’t want to figure out what can be automated. They want to buy something that does a job, built by someone who has already pressure-tested the solution across hundreds of deployments.

The same is true for voice AI in regulated industries, with one additional layer of complexity: deep domain expertise and intimate knowledge of tedious processes are prerequisites, not just nice-to-haves.

Insurance interactions, for example, often take the form of a narrative: an accident, a workplace injury, a disputed claim. Insurers must guide their customers to give the right information at each stage of the process before the claim can advance: a claimant cannot reach triage without first providing the incident details, a KYC check cannot proceed without identity verification. The voice AI must know not just what to ask, but what constitutes a complete answer at each stage, and what to do when it is not getting one. Once correctly captured, the narrative must be converted into structured data and trigger a chain of downstream actions: triage, documentation, policy lookup, third-party coordination, etc.

Buyers can notice that level of knowledge, and the informed opinion that arises as a result, within the first few seconds of a demo. It’s the difference between a product that simply transcribes an intake call and one that knows how that call should run, i.e. what questions matter, in what order, and what a compliant outcome looks like. It is the difference between a product that feels like a tool for any job and one that feels like it was purpose-built for this job. And they are willing to pay a premium for it.

Vertical solutions achieve this from processing thousands of domain-specific calls, mapped edge cases, and encoding the correct workflow logic—every deployment strengthens the skill. And the importance of depth over breadth is felt most at the edge cases, where accents, technical terminology, and emotionally charged conversations stress-test models hardest.

The integration layer is a moat, not a tax

Ask any team that has deployed voice AI in a regulated enterprise what took the longest, and they will not say the model, they will say the integration. The legacy infrastructure used in these industries was built to last, not to integrate, so connecting a voice AI system to a legacy claims platform, an EHR, a telephony stack built in 2009, and a compliance logging layer, is more akin to an archaeology project than a software problem.

The average enterprise deployment in insurance or healthcare consumes six to eighteen months of engineering time before the first live call, not on model performance, but on APIs never designed for real-time calls, data formats that vary by carrier, and security reviews that treat every external system as a threat.

The obvious question is: why integrate at all? In an era of capable AI coding tools, why not rebuild the legacy infrastructure from scratch? The answer is that it’s hard. Legacy systems in regulated industries encode decades of edge cases, compliance logic, and institutional knowledge, but that detail is hard to surface because it is often more evident in user behaviour than in documentation.

Before you can replace a system, you need to observe it to truly understand it. Voice AI is the best tool ever built for that observation: every call captures how work actually happens, rather than how the process diagram says it should. Over time, the data corpus becomes the specification for the replacement system.

This is the compounding advantage vertical platforms are building. The integration layer becomes a data-collection mechanism which can then, in the long term, be used to replace the system of record entirely. A vertical platform that has done the integration work has built the connectors, mapped the schemas, and encoded the compliance requirements. More importantly, it arrives with proprietary, domain-specific training data that a horizontal provider cannot easily replicate. As call volume increases, the model improves in terms of the exact workflows, terminology, and exceptions specific to that industry, making the eventual system replacement both possible and defensible.

Compliance becomes a competitive advantage

The knock-on effect of vertical voice AI developing such depth of knowledge around systems, workflows and terminology is a noticeable shift in how regulated enterprise buyers now view governance. A few years ago, compliance teams reflexively blocked any AI deployments, but are now seeing well-designed voice AI as improving their risk posture. It gives them something they have never had before: 100% interaction coverage.

A human call centre can QA maybe 2–3% of calls. A voice AI platform logs, transcribes, scores, and flags every single interaction automatically. Disclosure gaps, risk phrases, regulatory triggers; all caught at scale, not sampled. The audit trail is complete by default.

This reframing of AI from compliance risk to compliance infrastructure is still early, but poses a huge market opportunity. The prerequisite is that platforms are designed for the regulated context from day one, not retrofitted to pass a security review.

Why Horizontal solutions won’t (and can’t) compete

The natural question is whether foundation model providers can simply acquire their way to parity. The same way they built general language capability by ingesting vast amounts of text. The answer is: not easily. Regulated industry call data is protected by HIPAA, GDPR, and contractual NDAs. It is not on the open web. Synthetic data can approximate common cases but cannot replicate the long tail of edge cases that only emerge from thousands of real deployments. And even with the data, the workflow logic, compliance encoding, and domain-specific opinions are not in the data, they are the product of years of iteration by people who understand how these industries actually operate.

Over time, vertical platforms do more with less inference, and their accuracy advantage compounds because the proprietary data that trains their models is exactly the data their workflows produce. A competitor cannot simply download a foundation model and catch up. The moat is regulatory, data-specific, and operational, and it expands with every call processed.

Where the conversation will go next

Our view at Frontline is simple: the next wave of defensible companies in voice AI will not be general assistants. They will be deep vertical systems embedded inside (regulated) workflows and difficult to displace, regardless of how capable general models become, because the data they ingest eventually becomes the specification for the replacement system.

Much of today’s voice AI investment narrative focuses on a few familiar sectors: insurance claims, post-surgical care, and sales automation. We’ve already made early bets across these verticals:

Avallon: vertical AI for insurance claims processing, targeting third-party administrators and carriers in workers’ compensation and property & casualty. Owns the full claims intake and coordination workflow.
Tucuvi: clinical voice AI for post-surgical and chronic condition follow-up, trained specifically on clinical terminology and patient speech patterns in healthcare settings.
Donna: field sales voice intelligence, capturing the unstructured conversations that never make it into CRM and connecting them directly to pipeline data for regulated companies.

But there are several adjacent markets that remain underserved:

Legal services: Depositions, client intake, court reporting, and law firm dictation represent hundreds of millions of hours of structured voice work annually, yet the tooling remains outdated. Not due to technical barriers, but because legal voice data carries chain-of-custody requirements, transcripts can be entered as evidence, and bar association ethics rules create ambiguity around AI-assisted legal work. A platform that solves the governance layer first will be the one to win market share.
Clinical trials: Patient recruitment, protocol adherence check-ins, and adverse event reporting all rely heavily on voice interactions that must be captured with near-perfect accuracy under FDA 21 CFR Part 11. The regulatory requirements are demanding, but so is the value: a single missed adverse event report can derail a trial worth hundreds of millions of dollars.
Public sector case work: Benefits administration, housing services, and immigration processing all face enormous call volumes and highly manual workflows, with procurement timelines that reward specialists who have built the compliance architecture in advance.

If you’re building vertical voice AI in regulated industries, I’d love to chat. You can reach me on linkedin or email.

Frontline Growth

Frontline Seed

Why the Most Defensible Voice AI Will be Vertical

The demand for depth over breadth

The integration layer is a moat, not a tax

Compliance becomes a competitive advantage

Why Horizontal solutions won’t (and can’t) compete

Where the conversation will go next

Tracing the Limits of Today’s AI Approaches

Why Application Companies Still Win When Code Is Free

Voice AI and Financial Services: Introducing Avallon