APEX

News
Home > Cases > Cases > AI Supercluster InfiniBand Fabric Solution

News Navigation

Hot Articles

Recommend Articles

AI Supercluster InfiniBand Fabric Solution

Time: 2025-12-25 11:16:10
Number of views: 1864
Writting By: Admin


High-Performance Computing InfiniBand Networking Solution

8x GB200 NVL72 Cluster with 1.6T OSFP InfiniBand Architecture

Complete Bill of Materials and Technical Specifications for AI/HPC Workloads、


Solution Overview

This solution provides a fully optimized InfiniBand networking architecture for 8 GB200 NVL72 supercomputing nodes, designed for AI training, scientific simulations, and high-performance computing workloads. The implementation features a two-tier leaf-spine topology with 1:1 non-blocking design, utilizing 1.6T OSFP optical modules for switch-to-switch connectivity and 400G OSFP modules for server-to-leaf connections.


Network Architecture & Topology

Network Topology

Architecture: Two-tier leaf-spine InfiniBand fabric

Oversubscription Ratio: 1:1 (non-blocking)

Total Compute Nodes: 8 × GB200 NVL72 systems

Total GPU Count: 576 GPUs (72 per system)

Fabric Bandwidth: 1.6T per link, aggregate 12.8T bisection bandwidth

Layer Configuration

Leaf Layer: 3 × NVIDIA Q3400-RA switches (2 minimal, 3 recommended)

Spine Layer: 1 × NVIDIA Q3400-RA switch

Server Connectivity: 400G ConnectX-7 NICs to leaf switches

Switch Connectivity: 1.6T OSFP links between leaf and spine

Total Switch Ports: 576 × 400G server ports + 216 × 1.6T fabric ports


Complete Bill of Materials (BOM)

Optical Transceiver Requirements

Component TypeSpecificationQuantityUnit LocationTotal Ports
1.6T OSFP InfiniBand Transceiver1.6T OSFP, HDR InfiniBand, 0-100m over SMF216Switch-to-switch links216 ports
Leaf Layer (Uplink)1.6T OSFP to Spine switches723 Leaf switches × 24 uplinks72 ports
Leaf Layer (Downlink)1.6T OSFP to Spine switches723 Leaf switches × 24 downlinks72 ports
Spine Layer1.6T OSFP to Leaf switches721 Spine switch × 72 ports72 ports
400G OSFP InfiniBand Transceiver400G OSFP, NDR InfiniBand, 0-100m over SMF576Server-to-leaf connections576 ports
ConnectX-7 NIC Transceivers400G OSFP, per NIC requirement57672 NICs × 8 servers576 ports

Switch & NIC Hardware Requirements

Component TypeModel/SpecificationQuantityConfiguration Details
InfiniBand Leaf SwitchesNVIDIA Q3400-RA (36-port 1.6T OSFP)324 downlinks to servers, 12 uplinks to spine (per switch)
InfiniBand Spine SwitchNVIDIA Q3400-RA (36-port 1.6T OSFP)172 ports total (3×24 from leaf switches)
400G InfiniBand NICsNVIDIA ConnectX-7 VPI (400G OSFP)57672 per GB200 system, 8 systems total
InfiniBand CablesMTP/MPO-24 to 6×LC duplex, 5-30m216For 1.6T OSFP connections (24 fibers per cable)
400G DAC/AOC Cables400G OSFP to OSFP, 3-5m576Alternative to optical for short reaches (<5m)
Optical Fiber Panels96-port LC duplex, 1RU6For structured fiber management

Detailed Configuration Breakdown

Leaf Layer Configuration (3 Switches)

Switch Model: NVIDIA Q3400-RA with 36 × 1.6T OSFP ports

Server Connections: 24 × 400G downlinks per switch (connects to 24 ConnectX-7 NICs)

Spine Connections: 12 × 1.6T uplinks per switch (connects to spine switch)

Total Port Utilization: 24 + 12 = 36 ports (fully utilized)

Connectivity per Leaf: Each leaf connects to 8 GB200 nodes (3 leaves × 8 = 24 nodes capacity)


Spine Layer Configuration (1 Switch)

Switch Model: NVIDIA Q3400-RA with 36 × 1.6T OSFP ports

Leaf Connections: 72 × 1.6T downlinks (24 from each of 3 leaf switches)

Port Utilization: 72 ports used on spine switch

Redundancy: Optional second spine for N+1 redundancy (adds 72 more 1.6T transceivers)


Server Configuration (8 × GB200 NVL72)

NICs per Server: 72 × ConnectX-7 400G VPI adapters

Transceivers per Server: 72 × 400G OSFP InfiniBand optical modules

Cabling per Server: 72 × fiber connections to leaf switches

Port Mapping: Each server connects to all 3 leaf switches (24 ports per leaf)

Bandwidth per Server: 400G × 72 = 28.8Tbps theoretical per server


Performance Specifications

Network Performance Metrics

Bisection Bandwidth: 12.8 Tbps full non-blocking capacity

Latency: < 600ns switch-to-switch, < 1μs end-to-end

Message Rate: 200 million messages per second per port

Fabric Bandwidth: 1.6T per link, aggregate 345.6T across fabric

GPU-to-GPU Bandwidth: 400G per GPU, full bisection bandwidth


Detailed Performance Table

Performance MetricSpecificationValueIndustry Comparison
Switch ASIC BandwidthQ3400 switch capacity25.6 TbpsIndustry-leading for HDR InfiniBand
Port Speed1.6T OSFP interface1.6 Tbps (200 GB/s)Next-generation beyond 800G
Fabric LatencyEnd-to-end latency< 1 μsOptimal for AI training synchronization
Power per Port1.6T OSFP transceiver18-22WEfficient for high-density deployments
Cooling RequirementPer switch chassis3-5 kWLiquid cooling recommended

Optical Transceiver Specifications

1.6T OSFP InfiniBand Transceiver Details

Form Factor: OSFP (Octal Small Form Factor Pluggable)

Data Rate: 1.6 Tbps (8 × 200G lanes)

Protocol: HDR InfiniBand (600G per lane effective)

Reach: 0-100m over OM4 MMF, 0-2km over SMF

Wavelength: 850nm VCSEL for MMF, 1310nm for SMF

Power Consumption: 18-22W typical

Operating Temperature: 0°C to 70°C commercial

Compatibility: NVIDIA Q3400-RA switches, Quantum-2 ASIC


400G OSFP InfiniBand Transceiver Details

Form Factor: OSFP (Octal Small Form Factor Pluggable)

Data Rate: 400 Gbps (8 × 50G lanes)

Protocol: NDR InfiniBand (400G per port)

Reach: 0-100m over OM4 MMF, 0-2km over SMF

Wavelength: 850nm VCSEL for MMF, 1310nm for SMF

Power Consumption: 10-12W typical

Operating Temperature: 0°C to 70°C commercial

Compatibility: NVIDIA ConnectX-7 NICs, Quantum-2 ASIC


Configuration Options & Alternatives

Alternative Deployment Scenarios

Minimal Configuration: 2 Leaf + 1 Spine

Redundant Configuration: 3 Leaf + 2 Spine (N+1)

All-To-All Configuration: Direct ConnectX-7 to Spine

Multi-rail Configuration: Dual NICs per GPU

Extended Reach: SMF transceivers for >100m

Cost Optimized: 400G DAC for <3m connections

ConfigurationTransceiver CountCost ImpactPerformance ImpactRecommended Use
Minimal (2 Leaf)144 1.6T + 576 400G-25% switch cost2:1 oversubscriptionBudget-constrained AI training
Recommended (3 Leaf)216 1.6T + 576 400GBaseline1:1 non-blockingProduction AI/HPC clusters
Redundant (3+2)288 1.6T + 576 400G+33% switch costN+1 fault toleranceMission-critical workloads
Dual-rail (2×NIC)216 1.6T + 1,152 400G+100% NIC cost2× bandwidth per GPUExtreme performance requirements

Key Technical Considerations

Critical Implementation Factors

Thermal Management: 1.6T OSFP transceivers generate significant heat (18-22W each); ensure adequate cooling

Power Requirements: Each Q3400-RA switch consumes 3-5kW; plan power distribution accordingly

Cable Management: 576 fiber connections require structured cabling and proper bend radius protection

Compatibility Testing: All optical transceivers must be validated with NVIDIA switches and NICs

Firmware Management: Consistent firmware levels across all switches and NICs for optimal performance

Monitoring & Management: Implement NVIDIA UFM or similar for fabric management and monitoring

Article Tags: