This Week in the AI Supercycle — The Routing Paradigm and the War for the Stack

Playback speed

Share post at current time

Share from 0:00

0:00

Generate transcript

A transcript unlocks clips, previews, and editing.

This Week in the AI Supercycle — The Routing Paradigm and the War for the Stack

Gennaro Cuofano and Joel Salinas

Jul 04, 2026

Welcome to the first episode of this week’s AI Supercycle.

We start with a packed week of major AI news, but one theme stands above the rest: routing.

As we’ll see throughout this edition, routing is no longer just a technical concept. It’s becoming the economic layer that determines how intelligence is priced, distributed, and ultimately where value is captured across the AI stack.

Many of this week’s headlines may seem unrelated at first glance. In reality, they’re different expressions of the same structural shift. Let’s break them down.

For full reference, read The Map of AI Redrawn.

The Business Engineer

The Map of AI Redrawn

A decade ago, I started covering the intricacies of the AI ecosystem. As early as 2016–17, I described AI as a multi-layered stack, almost like a layered cake of interconnected technologies and infrastructure…

a month ago · 48 likes · Gennaro Cuofano

Where We Are on the Three Axes

Before the news, the position fix. Three axes, three questions — when, where, who — and this week reads differently on each.

The time axis (the Supercycle): we are deep in Phase 1, but Phase 2 is already firing at the edge. Most of what the week surfaced is Phase-1 linear integration — generative coding, agentic assistants, cheaper software production. That’s the 5-10 year short cycle doing exactly what it’s supposed to. But two items are Phase-2 tells arriving early: agents running 24/7 on operators’ machines (a distribution-surface deployment signal, not a model signal) and Apple’s supply-side pricing defense against an agent-driven demand explosion. The clock runs differently in different regions of the Topology, and this week the edge clock is ahead of the model clock.

The binding research question on the time axis remains the unification of the four scaling laws — pre-training, post-training, reasoning, agentic tool use — into a single self-improvement loop. Keep that variable in your peripheral vision, because one of this week’s stories (the OpenAI inference-cost breakthrough) is a signal pointing straight at it.

The space axis (the Map of AI): the two new layers I added in March did the heavy lifting this week. The harness layer (routing economics) generated most of the operator-facing news, and the governance layer (the release control plane) generated the single most important event — Fable’s suspension and return. The Map had to be redrawn precisely so these two layers had somewhere to live. This week is the vindication.

The power axis (the Three-Tier Architecture): the governance perimeter reasserted itself, hard. Fable is the Cold-War governance regime made concrete — capability disclosed at controlled intervals, through a public-private release framework, because top-tier capability is now a geopolitical input. Nothing about this week loosened Tier 1. If anything, it hardened.

And the spine that holds all three axes together — the dual analogy — was visible in a single week. The microprocessor regime governed the bottom of the stack (the memory cartel, TSMC allocation, Apple’s supply crunch). The Cold War regime governed the top (Fable’s governance gate, the substrate-diversification and co-design moves). The two regimes met, as they always do, at the boundary between Compute Capacity (Layer 5) and Foundation Models (Layer 6) — which is exactly where Anthropic’s dual-substrate story and the “who controls the stack” fight are being resolved.

So the one-sentence abstraction: this week was the Topology deforming in real time, with four cascades firing at once, the governance perimeter reasserting control, and the incumbent capex map spending at containment scale — all of it consistent with a Phase-1-dominant cycle whose edge is already sliding into Phase 2.

Market Map: The Week as Coordinates and Cascades

I’ll organize this not by the nine layers in sequence, but by the four active cascades plus the governance perimeter and the incumbent capex map — because that’s where the commercial action is actually concentrating, and it’s the more complete reading.

The governance perimeter — Fable and the end of the “full release”

Coordinate: Layer 9 (Governance), all phases, meta-tier. Regime: Cold War.

Fable’s suspension and return is the governance layer behaving exactly as the Topology says it must: governance is not a roof on top of the stack, it’s a control plane perpendicular to every layer, and it changes state in days, not years. A model capable enough to surface twenty-year-dormant vulnerabilities is, by definition, a model that trips the release gate. The public instance ships restricted, guardrailed, through a public-private framework.

The durable structural fact — the one to carry forward — is that we have left the era of the “full release,” and we probably never truly had it. In Cold War terms, this is a capability gradient disclosed at controlled intervals. In operator terms, it means you can no longer assume the model you’re handed is the model the lab runs internally. Two camps formed on X (the “it’s nerfed” camp and the “it’s impressive if used correctly” camp); both are right, and the reconciliation is that governance is now a permanent, priced variable, not a regulatory footnote. Plan your security posture accordingly — because, as Joel put it, bad actors have access to the exact same tools you do.

Cascade 1 — the downward cascade from the harness (the biggest region of the week)

This is the one that ties the live to the framework. It started at Layer 7 when the agentic harness stopped being a workflow wrapper and became a routing engine, and it propagates down through Layers 6, 5, and 3. Four of the week’s stories are all the same cascade seen from different layers.

The routing paradigm. Coordinate: Layer 7, Phase 1, Tier 3. “The next Anthropic is a routing company.” Routing is where the market finally discovers what a unit of intelligence is worth — model-agnostic orchestration that abstracts the model zoo away from the enterprise user, puts a powerful model on strategy and cheaper models on execution, and reduces single-model dependency. In framework language: routing is price discovery, and it is the head of the downward cascade. Whoever owns the routing engine at enterprise scale sits at the coordinate that reprices every layer beneath it.

The Sonnet-5 cost paradox. Coordinate: Layer 6→7, Phase 1, Tier 1/3. Cheaper per token ≠ cheaper per task. Agentic models trade token efficiency for autonomy — the same job that took 5,000 tokens now takes 20,000. This is not a bug; it’s the mechanism. Agentic queries consume on the order of 500× the tokens of chat queries, and that ratio is precisely what loads the harness routing region and makes merchant custom silicon (Cerebras, Groq) economically rational one and two layers down. The cost paradox is the cascade’s energy source.

Token-maxing as a metric pathology. Coordinate: incumbent behavior at Layer 8. Meta reportedly burned billions on internal Claude usage and is now capping it. Structurally this is a governance-of-incentives failure: measure effort (tokens) instead of outcome, and consumption is what you get. As execution price falls, outcomes do not rise automatically — they have to be engineered. This is the bad-metric failure mode that bankrupts the organization that adopts it, and it’s a preview of the discipline the routing layer exists to impose.

The OpenAI inference-cost breakthrough. Coordinate: Layer 5-7 margin, Phase 1, Tier 1. Reported against a year-end gross-margin goal around 52%. Read at surface level, it’s a margin story critical to IPO viability. Read at depth, it’s a signal on the unification question: if inference at production scale gets cheap enough to run continuously, inference becomes a training-signal generator — a capex category rather than only an opex line — which is exactly the mechanism by which the four scaling laws could close into a self-improvement loop. This is the week’s highest-leverage story hiding inside a margins headline.

Cascade 2 — the dual-substrate cascade (compute-layer diversification as a valuation feature)

Coordinate: Layer 5, Phase 1, Tier 1. Regime: boundary — where microprocessor meets Cold War.

Anthropic reducing its Nvidia dependency — borrowing compute from Nvidia, Google, Amazon, and now SpaceX while developing custom silicon — is the dual-substrate cascade named in the framework almost line for line. Two independent compute substrates (Google TPU plus SpaceX capacity) is a balance-sheet feature priced directly into secondary markets; single-substrate compute with execution slippage is a liability priced into the IPO conversation. This is strategic redundancy doctrine — a Cold War institutional feature — applied to commercial substrate. Anthropic reportedly pays SpaceX north of a billion a month for compute, which is the flywheel geometry (below) monetizing capacity the incumbents haven’t yet identified as contested territory.

And the sober caveat I gave on the live is the framework’s caveat too: reducing Nvidia dependency is a decade-long story, not a 2030 story. Custom silicon is the beginning of an owned-margin narrative. Into the 2030s, no one replaces Nvidia — the revenue-vs-margin split (chip and memory layer captures the profit; labs grow revenue margin-negative) is the proof.

Cascade 3 — the Physical AI cascade (Phase 2 firing at the edge)

Coordinate: Layer 8, Phase 2, Tier 2/3.

Two of the week’s quieter items are the same cascade pulling Phase 2 forward. Agents running 24/7 re-rate the PC as a Physical AI surface — an agent-run machine is used far more intensely than a human-run one, which is a deployment-surface signal, not a model signal. And Apple’s price hikes are a supply-shock hedge against exactly this: an agent-driven demand explosion for devices colliding with a supply ceiling. The distribution surface is where Phase 2 is already firing, and the framework predicts this cascade shortens Phase 2’s timeline as adjacent endpoint categories get pulled along the same curve.

Which folds into the embedded frontier region (Layer 8, Phase 1, Tier 1): Apple currently ships Gemini-embedded AI on-device. That’s the Apple-Google embedding — a frontier model entering a billion-device distribution surface — and it’s the canonical example of incumbent capex as containment: paying to foreclose the surface before a native AI player reaches it.

The microprocessor floor — the memory cartel and TSMC allocation

Coordinate: Layer 2 (Foundries/Packaging), Phase 1, Tier 1/2. Regime: microprocessor.

The alleged price-fixing among Samsung, SK Hynix, and Micron is the HBM chokepoint doing what chokepoints do. Read structurally, not moralistically: memory has swung boom-and-bust for thirty years; if you sit on the chokepoint and can’t fix demand certainty, you won’t wildly expand supply. A cartel is demand-certainty by other means. The framework’s rule — binding constraints migrate downward when a lower layer becomes scarce — is why this is the constraint of the moment: the HBM oligopoly is forward-purchased through the late 2020s, and everything above it is gated by it.

And Apple is no longer TSMC’s top customer — Nvidia is. Allocation power moved up the stack, above chip design itself. A frontier chip without foundry allocation is a slide deck; a device maker without foundry priority faces a supply bottleneck driven by scarcity, not weak demand. That single reordering — Nvidia ahead of Apple at TSMC — explains Apple’s pricing behavior more completely than any consumer-demand story.

The incumbent capex map — containment, co-design, and the geometry race

This is the deepest read, and it’s where the “two camps” conversation from the live resolves into the framework.

The co-design war Joel and I described — the Nvidia-Palantir camp wanting to commoditize the model layer versus the OpenAI-Anthropic camp wanting to embed into the enterprise stack — is the diagonal of dominance being contested through geometries. Commoditizing the model layer means pushing value below the Layer 5/6 boundary into the microprocessor region the incumbents control (that’s horizontal geometry — Nvidia selling shovels to the containment effort, collecting rent on every capex dollar that crosses Layers 3-5). Embedding into the enterprise means the labs climbing up into Layers 7-8 fast enough to become indispensable before they’re commoditized.

And the enterprise’s own logic — the one I laid out on the live — is the defect strategy in disguise: keep your data in a controllable ontology/memory layer, not fine-tuned into a frontier model’s weights, so you retain control, security, and no lock-in, and can swap the model with a dropdown. That’s an enterprise deliberately positioning off the diagonal to stay resilient to single-layer disruption.

The capex behind all of it is containment, not expansion. Microsoft Frontier’s reported ~$2.5B bet on the last mile, its army of forward-deployed engineers, Meta’s spend and its token caps, Apple’s embedding — none of it is forward-projected Phase-1 optimism. It’s backward-induced from Phase-2 survival anxiety. The incumbents have internalized the second half of the incumbent paradox — that native AI players take the dominant positions in Phase 2 — and are spending at containment scale to foreclose Phase-2 territory before it can be occupied. Microsoft Frontier’s FDE model is the last-mile bottleneck capitalized: enterprise AI doesn’t scale on model quality, it scales on hybrid technical-commercial experts embedded inside the organization.

Meta’s Allbirds-vs-SpaceX fork is the geometry question stated cleanly. SpaceX runs a flywheel geometry — engines at Layers 1, 5, and 8 compounding through loops, monetizing a compute overbuild (the $1B/month to Anthropic) from outside the contested terrain. Meta has the compute overbuild but not the loops. So the fork is real: convert capacity into a genuine flywheel (SpaceX) or strand it as a pivot narrative (Allbirds). The tell will be whether Meta builds compounding loops or just resells idle capacity.

Read the Week by Coordinates

For the individual operator. Match model power to task, because you’re really choosing a position in the harness routing region. Powerful, agentic models as orchestrators; cheaper models for execution. Don’t dump everything on Sonnet 5; don’t write your posts through Fable. You don’t send the CEO for coffee.

For teams. Update the context and memory on a cadence — a model advising on two-year-old context is an advisor who hasn’t spoken to you in two years. The human-in-the-loop is the alignment mechanism that keeps the architecture tracking a moving business.

For harness / loop builders. Add loops into an architecture you’ve already tested with yourself inside it; run them on a cadence; stay present to re-tune as efficacy decays. You are building a position at the head of the downward cascade — treat it as infrastructure, not a script.

For anyone designing incentives. Never measure by tokens consumed. As execution price falls, outcomes must be engineered. Token-maxing is the metric pathology that ended a company or two this quarter.

For enterprises choosing a foundation. Optimize for control, security, and no lock-in. Keep data in a controllable ontology/memory layer, not in frontier weights. Favor substrates that let you swap the model — that optionality is a deliberate off-diagonal position, and it’s worth more than any single model’s edge today.

The reading discipline, applied. For every company in the week, specify the coordinate, name the geometry, read the adjacent cascades, and ask whether the geometry compounds faster than the coordinates it occupies. Anthropic at Layers 5-7, Tier 1, gated by whether dual-substrate diversification stays priced as an asset. Nvidia across Layers 3-5, gated by hyperscaler custom silicon and merchant flankers. SpaceX at Layers 1/5/8, gated by Starship. Meta at Layer 8, gated by whether three billion users keep engaging with its surfaces in current form. Refuse the singular. Specify the coordinate.

What’s Next

The next Anthropic is a routing company. In framework terms: whoever owns the head of the downward cascade — the enterprise routing engine — reprices every layer beneath it and becomes the next trillion-dollar company inside three years. I’ll keep saying it so no one can claim we didn’t.
The unification question is the variable to watch. The OpenAI inference breakthrough matters most as a signal toward closing inference-as-training-signal into a self-improvement loop. Whichever frontier lab closes that pipeline first compounds at a rate no competitor can match, and the model race resumes in a more brutal form.
Incumbent capex will not normalize. Microsoft, Meta, and Apple are spending at containment scale because the logic is foreclosure, not integration. Anyone modeling capex normalization in 2027-2028 is reading Phase-1 logic into a Phase-2 contest.
Memory is the binding constraint, and it’s below the model layer. Watch the HBM chokepoint and TSMC allocation, not benchmark leaderboards, if you want to know what’s actually possible next year. The bottleneck migrates downward; the memory cartel is where the cycle is physically gated.
Governance is permanent perimeter. No more full releases. Every top-tier capability ships through a public-private framework that can change state in days.
Four events would force a redraw: a frontier lab credibly closing the four-scaling-law unification loop; a sovereign compute contract crossing $50B and formalizing the third buyer pole; a China-side Tier-1 breakthrough in silicon or foundry that collapses Strategic Denial; and a Starship outcome that hardens or breaks the flywheel geometry. Three of the four sit below or beside the model layer — which is the whole point.

Key Takeaways & Mental Models

Read by coordinates, not headlines. Every story is a triple of (layer, phase, tier). Ten headlines this week were largely four cascades plus a governance-perimeter event. Mechanism: the coordinate determines compounding rate, risk profile, and access regime; the label hides all three.
Routing is the downward cascade. “The next Anthropic is a routing company” and “the harness layer became a routing engine” are the same claim from two ends. Mechanism: a Layer-7 deformation propagates down through 6, 5, and 3, repricing each — and routing is where the market discovers what intelligence is worth.
The cost paradox is the cascade’s fuel. Cheaper per token ≠ cheaper per task; agentic queries run ~500× the tokens of chat. Mechanism: the token ratio loads the routing region and makes merchant silicon rational two layers down.
The dual analogy located itself this week. Microprocessor regime at the memory/foundry floor; Cold War regime at the governance perimeter; the two collide at the Layer 5/6 boundary where Anthropic’s dual substrate sits. Mechanism: the collision is why the Three-Tier Architecture exists at all.
Capex is containment, not expansion. Microsoft Frontier, Meta’s spend, Apple’s embedding are backward-induced from Phase-2 survival anxiety. Mechanism: incumbents foreclose Phase-2 territory at every layer before native players can occupy it.
The co-design war is the geometry race. Commoditize-the-model = horizontal geometry pushing value below the 5/6 line; embed-in-enterprise = climbing into 7-8; keep-data-in-ontology = an off-diagonal defect play for resilience. Mechanism: each is a different bet about which axis compounds fastest.
Governance is perimeter, not roof. Fable is a capability gradient disclosed at a controlled interval. Mechanism: Layer 9 surrounds the stack as a control plane and changes state in days, not years.
Bottlenecks migrate downward. The memory cartel and TSMC reallocation (Nvidia ahead of Apple) are the binding constraints of the moment. Mechanism: when a lower layer goes scarce, the constraint — and the pricing power — moves down to it.
Inference-as-training-signal is the unification tell. The margin breakthrough is really a signal on the highest-leverage variable in the cycle. Mechanism: continuous cheap inference turns production deployment into a training-signal generator, the seam where the four scaling laws could close into one loop.
Allbirds vs. SpaceX is the flywheel test. A compute overbuild is either a compounding flywheel (loops at multiple layers) or a stranded pivot. Mechanism: geometry, not capacity, decides which.

The Bottom Line — Where We Are

We are deep in Phase 1, with the edge already sliding into Phase 2. The model race did not resume this week; it stayed paused at the unification question, while the geometry race — structured entirely around incumbent containment — ran in full view. The most important event was not a model at all; it was a governance-perimeter action (Fable) and a margin signal pointing at self-improvement (OpenAI inference).

Underneath, four cascades were firing at once — the downward cascade from the harness, the dual-substrate cascade at compute, the Physical AI cascade at the edge, and the sovereign-compute thirst feeding SpaceX — which is itself the signal the framework predicts: cascades accelerate as the Topology matures. A single week now carries what used to take a quarter.

So where are we? At the coordinate where the microprocessor floor (memory, foundries) is the binding physical constraint; where the Cold War perimeter (governance) is reasserting control of release; where the incumbents are spending at containment scale to foreclose Phase 2; and where the single highest-leverage commercial position — the enterprise routing engine at the head of the downward cascade — is still unclaimed.

The Topology will look different again in a few months. The discipline is the same as it was on the live: read by coordinates, not by labels — and keep asking whether the geometry compounds faster than the ground it stands on.

Stay close.