Why AI server is so expensive

 

from NVIDIA Doc Hub

Cost breakdown of DGX H100 Approximate Hardware Cost (as of mid-2025)
Component Description Approx. Cost (USD)
GPU/Accelerator NVIDIA H100 SXM / AMD MI300X $25,000–$35,000 each
Motherboard/CPU Dual-socket Xeon / EPYC + PCIe 5.0 $4,000–$8,000
System RAM 1 TB DDR5 ECC RDIMM ~$5,000–$7,000
NVMe SSDs 8 TB enterprise-grade ~$1,000–$2,000
Power supply + Cooling High-efficiency PSU + liquid/air cooling $2,000–$5,000
Chassis + Networking Rackmount server case, NICs, cabling $2,000–$4,000

A further breakdown of H100 GPU (Hopper architecture)
  • HBM Type: HBM2e
  • Number of Stacks: 5 stacks
  • Capacity per Stack: 16 GB
  • Total HBM Memory: 80 GB
  • Bandwidth: Up to 3.35 TB/s

⚙️ H100 uses TSMC’s CoWoS packaging to integrate the Hopper GPU die with 5 HBM2e stacks on a silicon interposer.

from SpringerLink


➡️ Total per node (8 GPU AI box):
💰 $250,000 – $400,000+ (depending on quantity, config, and vendor)

⚠️ For enterprise-grade setups (e.g., DGX H100 or AMD Instinct Platform), expect $500K+ per node including support, software stack (like CUDA, ROCm), and warranty.

If you move up to the next level with GB200, here’s a breakdown comparing NVIDIA’s GB200 NVL72 rack to a DGX H100 node—in terms of compute power, memory, cost, and scale:

🔁 GB200 NVL72 vs DGX H100 — Side-by-Side Comparison

Spec / Feature GB200 NVL72 Rack DGX H100 (1 Node)
GPUs 72× B200 GPUs (via 36 SuperChips) 8× H100 SXM GPUs
HBM Memory ~13.5 TB HBM3e (384 GB per GPU) 640 GB HBM2e (80 GB per GPU)
CPU 36× Grace CPUs (in each SuperChip) 2× Intel Xeon (typical setup)
CPU Memory ~17 TB LPDDR5X 1–2 TB DDR5 ECC
Interconnect 5th-gen NVLink (up to 130 TB/s NVSwitch) NVLink 4.0 (~900 GB/s node-wide)
Total Performance 720 PFLOPs FP4 / 240 PFLOPs FP8 (theoretical) ~32 PFLOPs FP8 per node
AI Model Size Fit Up to ~1 trillion parameter models Fits ~175B model in 16-bit precision
Power Usage ~120 kW per rack ~10–12 kW per node
Cost (USD) ~$3.0M–3.5M per rack ~$300K–400K per node

🧮 Rough Equivalence (Compute Power)

  • 1x GB200 NVL72 rack is roughly equivalent to 20–25× DGX H100 nodes in raw AI compute

    • 72 B200 GPUs ≈ 9 full DGX nodes (8× GPUs each)

    • But B200 GPUs are 2–3× faster per GPU than H100 at FP8/FP4

    • Plus better NVLink and unified memory help scale better

🎯 Rule of Thumb:
1x GB200 rack ≈ 160–200 H100 GPUs in real-world AI training performance
(depending on workload, precision, and memory scaling)


🧠 Memory Capacity & Model Size

Capability DGX H100 Node GB200 NVL72 Rack
Can train GPT-3 (175B) ✅ (multi-node setup) ✅ (single rack)
Can train GPT-4 ❌ (not directly) ✅ (with tuning)
Can train 1T+ model ✅ (fits in memory)

💵 Cost Equivalence

System Rough Performance Equivalent Cost (USD)
GB200 NVL72 ~20× DGX H100 nodes ~$3.0–3.5M
DGX H100 x20 ~same as GB200 rack ~$6.0–8.0M

So:

  • GB200 NVL72 is ~2× more cost-effective per unit of compute/memory than scaling up with DGX H100s

  • Offers better memory hierarchy, higher interconnect bandwidth, and lower latency


✅ Summary: 

If you know CoreWeave - the cloud computing company is coming from the bitcoin mining that give you some idea about the AI server. the state of the art AI server packs tons of CPU power into a node/chassis/rack and equipped with huge memory to run those LLM, not to mention the interconnect among the GPU's and nodes.

For those big corporations, they drive the battlefield toward large capital game that small and mid-size companies have no way to catch up but buying into the services they offer in the cloud.

The DeepSeek story broke that myth. There are difference use cases. Different use cases have different demands. For example:

Category LLM Training Inference at Scale Edge Inference
Compute Demand 🔥 Extremely High 🔷 High 🧊 Low
Memory Demand 🔥 Terabytes 🔷 Hundreds of GBs 🧊 MBs to GBs
Latency Target ❌ Not real-time ✅ Low latency ✅ Real-time / Instant
Deployment Cloud superclusters Scalable cloud APIs Phones, devices, IoT
Cost Profile 💸 CapEx + OpEx 💰 Mostly OpEx 💵 Low, embedded
Typical Models GPT-4, LLaMA-3 Claude, ChatGPT LLaMA.cpp, Gemini Nano
Optimization Data + parallelism Quantization + batching Distillation + pruning

For AI to be popular, it would have to grow outside of the cloud. That means the cost structure would have to come down a lot. As a matter of fact, NVIDIA has Jetson Nano for $249. However, when you can make millions in one case, who would want to do small jobs?

千萬不要以為這幾個跨國的CSP是大善人建構這些昂貴的data center免費給你用 聽群聯的潘講他們收到的quote: only a few cents per prompt. 他估算上千的員工一天幾十個將會是不少錢 而且是no upper cap. 不是到多少就免費了!


Comments

Popular posts from this blog

00918, 00915 也來亂

00929之亂

Factset Insight - Heading into Q3 2025