The 20‑Person Enterprise: Why AI‑Native Teams powered by Million‑Token Context are Positioned to Outrun Software Giants

I’ve spent two decades inside enterprise software—building, advising, and investing—and I’ll be honest: I feel weirdly advantaged right now running a much smaller company with a small, sharp team of engineers. Not because we’re “smarter,” but because a fully AI‑enabled team can now ship breakthroughs and real applications at a pace I would have struggled to achieve with a team five times this size.

And if you asked me what I’d even do with that many engineers today, I’d pause—because the old answer (“more headcount = more throughput”) is quietly dying. The most unpleasant consequence is also the most operationally real: downsizing becomes a strategic necessity, not a theoretical talking point—and that’s one of the hardest parts of running a software business.

I’m not alone in seeing this. Sam Altman put it plainly in a conversation that’s since become something of a reference point in Silicon Valley

“We’re going to see ten-person billion dollar companies pretty soon. In my little group chat with my tech CEO friends, there’s this betting pool for the first year that there’s a one-person billion dollar company — which would have been unimaginable without AI — and now, it will happen.”

Sam Altman

That’s the CEO of OpenAI describing the death of headcount as a moat. Pay attention.

For years, enterprise software rewarded scale. Bigger suites. Bigger backlogs. Bigger teams. Bigger moats.

That playbook made sense when writing and maintaining code was the bottleneck.

“That bottleneck is breaking“

And coding is where the break is most visible—because code is structured, testable, and measurable. Which means models can get feedback fast, and companies can wrap models in guardrails that make outputs increasingly production-grade.

To understand why this is happening—and why the current frontier of AI coding models represents a signpost, not an outlier—we need to go one layer deeper than the usual “AI is changing everything” commentary.

Let’s talk about what’s actually happening inside these systems, why context window size is now a board-level variable, and why incumbents are suddenly vulnerable to being out-iterated by a 20-person team.

What’s changing isn’t “AI writing code.” It’s AI absorbing the whole system.

Here’s the non-controversial reality: modern coding models are getting better because they’re improving at the things senior engineers actually do:

planning before touching code
keeping many constraints in mind at once
navigating large codebases without getting lost
reviewing, debugging, and catching their own mistakes

The frontier models—Anthropic’s Opus class, OpenAI’s o-series, Google’s Gemini Ultra—are all being explicitly positioned around exactly these capabilities: better planning, longer sustained agentic work, more reliable operation in large codebases, and stronger code review and debugging.

The magnitude of that shift is already visible inside the largest tech companies. Satya Nadella disclosed at Meta’s LlamaCon conference in 2025 that AI was writing between 20% and 30% of Microsoft’s code across projects. Sundar Pichai reported the same month that Google was at over 30% of new code written by AI.

“A lot of the code we have in our apps will be built by AI engineers instead of people engineers.”

Mark Zuckerberg

These aren’t pilot programs. These are production pipelines at trillion-dollar companies.

Now add the accelerant: long context.

The leading Opus-class model from Anthropic introduces beta support for million-token context windows—a first for that model family. That’s not a “nice-to-have.” It’s a category change—because it’s the difference between:

"Here's a snippet—help me patch it."

and

"Here's the repo, the spec, the migrations, the tests, and the incident history—make the right change and don't break production."

That shift is why this feels disruptive. It’s no longer about generating code. It’s about operating inside the full complexity of the business.

Under the hood, in plain English: how these models actually work

1. Transformers: the engine behind modern coding intelligence

At the core of most frontier coding models is the decoder-only transformer—a system trained to predict the next token over and over again, with enough skill that it looks like reasoning.

The “magic” inside transformers is self-attention.

A human reads left-to-right and relies on working memory. Self-attention is more like having thousands of invisible highlighters that can jump around the entire text and ask: “What matters most for what I’m trying to produce next?”

One nuance executives should know (because it prevents sloppy thinking): transformers are not “orderless.” Sequence still matters. But attention makes them less fragile when the important detail is far away, buried in a spec, or referenced across files.

Why that matters for code is obvious if you’ve ever debugged a production system: the bug is never where the symptom is.

2. Tokens and context windows: the model’s “working desk”

A token is a chunk of text (not exactly a word—sometimes shorter, sometimes longer). The context window is how many tokens the model can “see” at once.

I think of it like a desk: a small desk forces you to constantly shuffle papers. A big desk lets you lay out the architecture, the requirements, and the edge cases at the same time.

When you go from tens of thousands to hundreds of thousands to one million tokens, you stop doing prompt gymnastics and you start doing system-level work. The leading models are also making progress on what researchers call “context rot”—performance degrading as conversations get long—and the current generation claims to hold and track information over hundreds of thousands of tokens with less drift.

That’s not a UX detail. It’s a foundation for agentic software development.

3. Where multimodal encoders fit—and why they matter for the future of building software

Most text and code LLMs are transformer-based throughout. But “software” is becoming multimodal—and that changes what AI needs to ingest.

Product intent lives in docs, tickets, diagrams, and call transcripts. UI intent lives in Figma. Enterprise truth lives in contracts, spreadsheets, and recorded customer calls.

As more of that enters the build pipeline, vision encoders and multimodal transformer architectures—the systems that turn images, PDFs, and rich documents into representations the reasoning model can act on—become a core part of the enterprise AI stack. Every major frontier model being built today is moving in this direction: transformer-based reasoning engines paired with multimodal perception layers that can ingest the full surface area of how businesses actually communicate and store knowledge.

This isn’t a research curiosity. It’s the direction the entire field is moving, and it’s what will eventually let AI systems bridge the gap between business intent and code execution end-to-end.

This is my view, but it’s grounded in how engineering actually works.

Code has a referee. Compilers, tests, linters, type systems—software has built-in feedback loops. That makes it easier to train on and easier to operationalize. The model can be wrong, get corrected, and improve in a way that’s hard to do in domains without measurable ground truth.

The workflow is modular. Engineering work breaks into tasks that an AI can plan, attempt, verify, and iterate on. That structure is enormously friendly to the way agentic systems work.

The business logic is finally becoming “loadable.” Historically, the hardest part of enterprise software wasn’t writing code—it was understanding the domain rules scattered across five layers and twelve teams. Long context windows plus better retrieval change that equation. For the first time, a model can hold enough of the system in memory to make changes that don’t break things three levels up.

Dario Amodei has said publicly that on coding specifically, he puts himself at 90% confidence that AI will reach what he calls “country of geniuses” capability within a decade — and for end-to-end coding tasks in particular, he believes

“we’ll be there in one or two years.”

Dario Amodei

That’s the CEO of one of the three frontier AI labs, staking his credibility on a timeline.

Why a million tokens changes the economics of enterprise software

Let me translate “1M tokens” into executive reality.

A large context window enables a model to load: multiple repos or a large portion of a monorepo, the PRD and acceptance criteria, the API contracts, database schemas and migrations, test suites, runbooks and postmortems, and security constraints and compliance language—and then act more like a senior engineer who actually understands the terrain.

Long context at this scale is still in beta, with practical pricing constraints that make it better suited to high-value, lower-frequency tasks than routine operations. But strategically, the direction is unmistakable: the model is moving from “chat” to “work.”

And when the model can hold the work, you get fewer handoffs, fewer “paste this file, now that file” rituals, fewer architectural mistakes due to missing context, and a real chance at end-to-end feature delivery: spec → implementation → tests → integration.

The frontier labs are also building server-side mechanisms to manage long-running workflows as you approach context limits—keeping agentic pipelines alive across extended tasks without losing coherence.

That capability, maturing over the next 12 to 18 months, is what turns “AI coding” from a feature into an operating model. As Andrew Lau, CEO of engineering intelligence company Jellyfish, told McKinsey: “I fundamentally believe the software development life cycle will be completely redefined within three years. That is exciting and scary for the industry.”

The frontier signal: the “software factory” is being automated

I’m not writing this to hype a single vendor. I’m using the current frontier—and where it’s clearly heading—as a concrete example of what’s now possible.

What the leading models are explicitly shipping and positioning:

Million-token context windows (in beta) for their flagship coding-grade models
Extended output lengths, which matter because bigger changes can be produced in one coherent pass
Better long-horizon coding behaviors: planning, reliability in large codebases, review and debug
Availability across major enterprise rails—Bedrock, Vertex AI, Microsoft Azure—making enterprise deployment tractable for IT and procurement
Transparent, tiered pricing with caching and batching discounts that make the economics workable for teams willing to design around the constraints

This combination—long context, longer outputs, sustained agentic performance—is what changes the game. Not any single product release, but the clear convergence of capabilities across the frontier.

The threat: incumbents can be underbuilt

Here’s the blunt version I’d tell any board.

If your advantage was: “we have more engineers,” “we ship more features,” “we can afford the complexity”—AI is attacking the core of that advantage.

Because if a small, disciplined, AI-native team can deliver the majority of an incumbent’s value in focused, well-scoped workflows—and do it in months, not years—then the suite becomes a liability: expensive surface area, slow release cycles, legacy entropy.

This is how a 20-person team becomes a credible threat. Not because they’re magical. Because the unit economics of engineering just changed.

And here’s the part executives underestimate: AI amplifies organizational clarity and punishes organizational entropy.

If your architecture is incoherent, your tests are weak, your documentation is fiction, and your roadmap is a negotiation among silos—AI won’t save you.

But if your system is clean and your intent is crisp, AI becomes leverage.

The clearest signal that this is already operational reality came from an internal memo at Shopify, where CEO Tobias Lütke made AI use “a fundamental expectation” across the company. The policy was blunt: teams had to prove that AI could not do a task before asking for more headcount. The default assumption had flipped. Work no longer needed engineers to justify using AI—engineers needed to justify not using it.

That’s not a future scenario. That’s a policy already in effect at one of the world’s most sophisticated software companies.

The uncomfortable part: downsizing isn’t “optional” in the old model

I don’t say this lightly.

When software throughput rises dramatically, the same revenue base can be served by fewer builders—unless you’re reinvesting that leverage into new products, new markets, and faster experimentation.

Companies that don’t self-disrupt will do the worst version of downsizing: reactive, late, and demoralizing.

Companies that do self-disrupt can do the only responsible version: deliberate, strategic, and paired with a plan to redeploy talent into higher-value work—architecture, domain modeling, customer workflow design, trust and governance.

None of this is pleasant. But pretending it isn’t happening is worse.

Now to the most important part of this blog:

How incumbents survive: change the fabric of how you build

If you run an enterprise software company, “using AI tools” is not the strategy.

The strategy is becoming AI-native in delivery, with verification as your backbone.

Here’s what I believe works:

1. Move from “code-first” to “spec-and-eval-first.” The new leverage point is not code. It’s clear specs, executable acceptance tests, and evaluation harnesses that ask: did the change actually satisfy the business rule? AI loves clarity. Ambiguity becomes your new bottleneck.

2. Turn your SDLC into an agentic pipeline—with guardrails in charge. Your model should open PRs, run tests, fix failures, explain changes, pass security scans, and produce rollback plans. The AI writes. Your pipeline decides what ships.

3. Create internal 20-person “strike teams” that are allowed to cannibalize you. Don’t “pilot” AI inside your existing bureaucracy. Spin up small pods with permission to rebuild one workflow end-to-end, ship weekly, cut scope ruthlessly, and replace internal sacred cows.

4. Re-architect for “contextability.” Long context is only valuable if your system is legible: consistent patterns, real tests, clear boundaries, documentation that matches behavior. This is not housekeeping. It’s how you convert AI from a demo into a factory.

5. Treat governance and safety as product features. Powerful models can be misused, and the vendors building them are saying so publicly. Enterprise buyers will demand audit trails, provenance, policy controls, and secure deployment options. Trust becomes a moat.

6. Update your pricing assumptions before your customers force you to. When feature creation gets cheaper, feature-based pricing erodes. Shift toward workflow value, usage and outcome alignment, and platform economics—integration, governance, distribution.

The bottom line

Transformers gave models the ability to reason across complex sequences. Multimodal encoders—vision transformers and allied architectures—are making the full surface area of enterprise reality computable. And million-token context windows are turning “assistants” into something closer to “teammates”—systems that can hold enough of the business in working memory to produce coherent, end-to-end work.

Amodei’s framing is useful here. He describes the trajectory as two exponentials running in parallel: one in raw model capability, one in the diffusion of that capability into the economy. We’re somewhere in the middle of both. The capability curve is visible. The economic curve is just beginning to steepen.

The question for incumbents isn’t whether this is real.

It’s whether you’ll disrupt your own operating model fast enough—before a small, AI-native team does it for you.