Why AI server is so expensive

August 05, 2025

from NVIDIA Doc Hub

Cost breakdown of DGX H100 Approximate Hardware Cost (as of mid-2025)

Component	Description	Approx. Cost (USD)
GPU/Accelerator	NVIDIA H100 SXM / AMD MI300X	$25,000–$35,000 each
Motherboard/CPU	Dual-socket Xeon / EPYC + PCIe 5.0	$4,000–$8,000
System RAM	1 TB DDR5 ECC RDIMM	~$5,000–$7,000
NVMe SSDs	8 TB enterprise-grade	~$1,000–$2,000
Power supply + Cooling	High-efficiency PSU + liquid/air cooling	$2,000–$5,000
Chassis + Networking	Rackmount server case, NICs, cabling	$2,000–$4,000

A further breakdown of H100 GPU (Hopper architecture)

HBM Type: HBM2e
Number of Stacks: 5 stacks
Capacity per Stack: 16 GB
Total HBM Memory: 80 GB
Bandwidth: Up to 3.35 TB/s

⚙️ H100 uses TSMC’s CoWoS packaging to integrate the Hopper GPU die with 5 HBM2e stacks on a silicon interposer.

from SpringerLink

➡️ Total per node (8 GPU AI box):
💰 $250,000 – $400,000+ (depending on quantity, config, and vendor)

⚠️ For enterprise-grade setups (e.g., DGX H100 or AMD Instinct Platform), expect $500K+ per node including support, software stack (like CUDA, ROCm), and warranty.

If you move up to the next level with GB200, here’s a breakdown comparing NVIDIA’s GB200 NVL72 rack to a DGX H100 node—in terms of compute power, memory, cost, and scale:

🔁 GB200 NVL72 vs DGX H100 — Side-by-Side Comparison

Spec / Feature	GB200 NVL72 Rack	DGX H100 (1 Node)
GPUs	72× B200 GPUs (via 36 SuperChips)	8× H100 SXM GPUs
HBM Memory	~13.5 TB HBM3e (384 GB per GPU)	640 GB HBM2e (80 GB per GPU)
CPU	36× Grace CPUs (in each SuperChip)	2× Intel Xeon (typical setup)
CPU Memory	~17 TB LPDDR5X	1–2 TB DDR5 ECC
Interconnect	5th-gen NVLink (up to 130 TB/s NVSwitch)	NVLink 4.0 (~900 GB/s node-wide)
Total Performance	720 PFLOPs FP4 / 240 PFLOPs FP8 (theoretical)	~32 PFLOPs FP8 per node
AI Model Size Fit	Up to ~1 trillion parameter models	Fits ~175B model in 16-bit precision
Power Usage	~120 kW per rack	~10–12 kW per node
Cost (USD)	~$3.0M–3.5M per rack	~$300K–400K per node

🧮 Rough Equivalence (Compute Power)

1x GB200 NVL72 rack is roughly equivalent to 20–25× DGX H100 nodes in raw AI compute
- 72 B200 GPUs ≈ 9 full DGX nodes (8× GPUs each)
- But B200 GPUs are 2–3× faster per GPU than H100 at FP8/FP4
- Plus better NVLink and unified memory help scale better

🎯 Rule of Thumb:
1x GB200 rack ≈ 160–200 H100 GPUs in real-world AI training performance
(depending on workload, precision, and memory scaling)

🧠 Memory Capacity & Model Size

Capability	DGX H100 Node	GB200 NVL72 Rack
Can train GPT-3 (175B)	✅ (multi-node setup)	✅ (single rack)
Can train GPT-4	❌ (not directly)	✅ (with tuning)
Can train 1T+ model	❌	✅ (fits in memory)

💵 Cost Equivalence

System	Rough Performance Equivalent	Cost (USD)
GB200 NVL72	~20× DGX H100 nodes	~$3.0–3.5M
DGX H100 x20	~same as GB200 rack	~$6.0–8.0M

So:

GB200 NVL72 is ~2× more cost-effective per unit of compute/memory than scaling up with DGX H100s
Offers better memory hierarchy, higher interconnect bandwidth, and lower latency

✅ Summary:

If you know CoreWeave - the cloud computing company is coming from the bitcoin mining that give you some idea about the AI server. the state of the art AI server packs tons of CPU power into a node/chassis/rack and equipped with huge memory to run those LLM, not to mention the interconnect among the GPU's and nodes.

For those big corporations, they drive the battlefield toward large capital game that small and mid-size companies have no way to catch up but buying into the services they offer in the cloud.

The DeepSeek story broke that myth. There are difference use cases. Different use cases have different demands. For example:

Category	LLM Training	Inference at Scale	Edge Inference
Compute Demand	🔥 Extremely High	🔷 High	🧊 Low
Memory Demand	🔥 Terabytes	🔷 Hundreds of GBs	🧊 MBs to GBs
Latency Target	❌ Not real-time	✅ Low latency	✅ Real-time / Instant
Deployment	Cloud superclusters	Scalable cloud APIs	Phones, devices, IoT
Cost Profile	💸 CapEx + OpEx	💰 Mostly OpEx	💵 Low, embedded
Typical Models	GPT-4, LLaMA-3	Claude, ChatGPT	LLaMA.cpp, Gemini Nano
Optimization	Data + parallelism	Quantization + batching	Distillation + pruning

For AI to be popular, it would have to grow outside of the cloud. That means the cost structure would have to come down a lot. As a matter of fact, NVIDIA has Jetson Nano for $249. However, when you can make millions in one case, who would want to do small jobs?

千萬不要以為這幾個跨國的CSP是大善人建構這些昂貴的data center免費給你用聽群聯的潘講他們收到的quote: only a few cents per prompt. 他估算上千的員工一天幾十個將會是不少錢而且是no upper cap. 不是到多少就免費了!

Search This Blog

投資理財