AI Infrastructure Stack

Layer 1: Chipmaking Equipment

Chips are made by projecting circuit patterns onto silicon using light. The smaller the features, the more transistors fit on a chip, and the faster and more efficient it becomes. Shrinking features requires progressively shorter wavelengths of light, and producing the shortest wavelengths used today requires generating a specific kind of plasma by firing a laser at a tin droplet thousands of times per second. Stabilizing that process at production scale took decades of engineering. ASML is the only company that has achieved it, which is why every advanced chip manufacturer buys its machines and why export controls on ASML have become a primary policy lever.

ASML is not the only critical equipment supplier. Building a chip involves hundreds of steps, including depositing material layers, etching patterns, and cleaning surfaces. Companies such as Applied Materials, Lam Research, and Tokyo Electron provide the tools for these processes.

Layer 2: Chip Fabrication

Designing a chip and manufacturing it are separate disciplines. NVIDIA, Google, and AMD design chips but do not build them. TSMC does. Nearly all frontier AI chips are fabricated in TSMC’s facilities.

The central performance metric is yield, the share of chips on each wafer that function correctly. Because wafers cost roughly the same to process regardless of outcome, higher yield directly lowers cost per working chip. TSMC leads on yield, which is its core advantage over Samsung and Intel. Both competitors are investing heavily to close the gap.

The concentration of production in Taiwan creates systemic risk. A disruption at TSMC would halt the entire industry. TSMC has begun diversifying geographically, with its Arizona facility entering high-volume production in early 2025 at yields comparable to Taiwan. A second Arizona fab is being equipped, with production targeted for 2027. Despite this, most advanced chip production remains in Taiwan.

Layer 3: Advanced Packaging

After fabrication, chips must be connected to memory and, in large systems, to other chips. For AI workloads, these connections must move data at extremely high speeds. Traditional methods that route signals through circuit boards are too slow.

Modern systems place chips and memory adjacent to each other and connect them with dense, short links. TSMC’s CoWoS technology is the leading approach and underpins NVIDIA’s most advanced systems.

Packaging is now a binding constraint. CoWoS capacity is effectively booked through 2027, and demand is expected to increase further with NVIDIA’s next platform in the second half of 2026. In many cases, chips are ready before they can be packaged and delivered.

Layer 4: Memory

Processors can perform large volumes of computation but require continuous data input. If data delivery is too slow, compute resources sit idle. For AI workloads, memory bandwidth is as important as compute performance.

AI systems use vertically stacked memory to increase data throughput. This architecture allows significantly faster data movement than conventional memory. As of early 2026, the industry is transitioning to a new generation required for NVIDIA’s next platform. SK Hynix and Samsung began production in February 2026, while Micron is expanding capacity aggressively.

All major producers have sold out their 2026 capacity. New supply, primarily from South Korea, will not materially increase availability until 2027. As a result, apparent GPU shortages are often driven by memory constraints rather than chip supply.

Layer 5: Chip Design

Chip design determines what a processor does and how it is optimized. NVIDIA’s GPUs are designed for parallel computation, making them well suited for AI training and inference.

The current leading platform is NVIDIA’s Blackwell generation, which entered volume production through 2025. A mid-cycle update with higher memory capacity followed. The next platform, Vera Rubin, is expected in the second half of 2026, with significantly higher memory bandwidth and compute performance.

AMD competes with its Instinct line and has secured a major partnership with Meta. Google and Amazon design custom chips for internal use. OpenAI has diversified its hardware sourcing across NVIDIA and AMD.

NVIDIA’s position is reinforced by CUDA, its software ecosystem, which has become the default standard for AI development. This creates switching costs: moving to alternative hardware requires adapting tools, retraining teams, and rebuilding accumulated knowledge. NVIDIA also provides networking hardware, allowing it to control both compute and communication within AI systems.

Layer 6: Networking and Interconnect

At large scale, system performance depends on how efficiently chips communicate. Training runs involve thousands of processors exchanging data continuously. Weak interconnects reduce effective throughput regardless of raw compute capacity.

NVIDIA dominates this layer with technologies that link chips within racks and across data centers. Together, these systems form the communication backbone of AI clusters and shape feasible training architectures.

Lower-cost Ethernet-based alternatives are emerging, but NVIDIA remains the default for high-performance workloads. This strengthens its competitive position beyond chip-level performance.

Layer 7: Data Centers

AI hardware operates in large facilities requiring land, construction, cooling systems, and sustained electricity supply. A single large cluster can consume power comparable to a small city.

Investment levels in 2026 are unprecedented. Amazon, Google, Microsoft, and Meta are collectively spending hundreds of billions of dollars, primarily on AI infrastructure. Additional projects include Stargate, a joint venture building facilities across multiple U.S. states, and Meta’s large-scale campus in Louisiana.

Power availability is the primary constraint. Securing grid connections can take years. Companies are pursuing direct agreements with nuclear providers, investing in small modular reactors, and deploying on-site generation. Cooling is also a constraint, with newer systems requiring liquid cooling and large volumes of water. Location decisions are driven primarily by access to power.

Layer 8: Cloud Providers and Neoclouds

Most companies access compute by renting it. AWS, Microsoft Azure, and Google Cloud operate global data center networks and sell compute capacity. Each is developing custom chips to reduce dependence on NVIDIA and improve margins.

New entrants such as CoreWeave, Lambda Labs, and Crusoe have scaled by providing GPU capacity more quickly than traditional cloud providers. CoreWeave, in particular, supplies capacity to multiple major technology firms simultaneously.

This layer determines access, speed of provisioning, and pricing for compute, making it the immediate constraint for most AI companies.

Layer 9: Model Developers

At the top are companies building AI models. Some, such as Meta, own infrastructure and purchase hardware directly. Others, including OpenAI and Anthropic, rely on cloud providers and neoclouds, although OpenAI has diversified across multiple suppliers.

There are two distinct cost structures: training and inference. Training is a large, periodic expense. Inference is an ongoing cost that scales with usage. As AI systems move into widespread deployment, inference increasingly dominates total cost.

For companies without owned infrastructure, both costs are determined by suppliers and reflect persistent constraints across the underlying layers.

AI Infrastructure Stack

Comments

Leave a Reply Cancel reply