
GB200 NVL72 Cluster for Oceanographic Research Applications
Network Architecture for Ocean Current and Thermocline Simulation Research
This project deploys 8 GB200 NVL72 supercomputing nodes for oceanographic research, including ocean current simulation, thermocline analysis, and other complex computational tasks. The cluster employs one of the most sophisticated heterogeneous network architectures, integrating three network protocols (Nvlink, InfiniBand, and Ethernet) to achieve optimized separation of compute, storage, and management networks.
Scale Out Network: 400G InfiniBand, two-tier network architecture, 1:1 oversubscription ratio
Storage Network: 400G Ethernet, dedicated storage connectivity
Management Network: 10G Ethernet, dedicated cluster management
Scale Up Network: Nvlink, connected via Nvlink Switch Tray for direct GPU-to-GPU communication
Network Complexity: Three network protocols, four switch models, highly heterogeneous design
InfiniBand Network
Purpose: Scale Out parallel computing communication
Speed: 400G
Architecture: Two-tier Leaf-Spine architecture
Oversubscription Ratio: 1:1 (non-blocking design)
Purpose: Storage access and data transfer
Speed: 400G (storage), 10G (management)
Type: Separate storage and management networks
Purpose: High-speed GPU-to-GPU Scale Up communication
Connection Method: Via Nvlink Switch Tray
Features: Ultra-low latency, direct GPU memory access
Server-Side Network Configuration (Per Server)
| NIC Type | Quantity | Speed | Purpose |
|---|---|---|---|
| ConnectX-7 400G NIC | 72 units | 400G | InfiniBand network connectivity |
| BlueField-3 400G NIC | 18 units | 400G | Storage network connectivity |
| Transceiver Type | Quantity | Speed | Paired with NIC |
|---|---|---|---|
| 400G Optical Transceiver | 72 units | 400G | ConnectX-7 400G NIC |
| 200G Optical Transceiver | 36 units | 200G | BlueField-3 400G NIC |
Switch Model: NVIDIA Q3400-RA
Switch Quantity: 4 units (3 Leaf tier, 1 Spine tier)
| Location | Transceiver Type | Quantity | Speed | Purpose |
|---|---|---|---|---|
| Overall Configuration | 1.6T OSFP Optical Transceiver | 216 units | 1.6T | Total transceivers for IB network |
| Leaf Tier Switches | 1.6T OSFP Optical Transceiver | 144 units | 1.6T | 72 uplink + 72 downlink |
| Spine Tier Switch | 1.6T OSFP Optical Transceiver | 72 units | 1.6T | Connect to all Leaf switches |
Leaf Tier: 3 Q3400-RA switches, each connecting a portion of compute nodes
Spine Tier: 1 Q3400-RA switch, connecting all Leaf tier switches
Architecture Advantage: Two-tier non-blocking architecture ensuring no blocking between any nodes
Oversubscription Ratio: 1:1, optimized for high-performance computing
Storage Network Switches
| Switch Model | Quantity | Speed | Transceiver Configuration |
|---|---|---|---|
| NVIDIA SN5600 | 2 units | 800G | 800G OSFP Optical Transceiver × 72 units |
Management/Internet Access Switches
| Switch Model | Type | Quantity | Speed | Purpose |
|---|---|---|---|---|
| NVIDIA SN2201 | Management Switches | 8 units | 10G | Cluster management network |
| NVIDIA SN2201 | Internet Access Switches | 8 units | 10G | External network access |
Total Optical Transceivers: 396 units across the entire cluster
High-Speed Transceivers: 1.6T OSFP (216 units) and 800G OSFP (72 units) for compute and storage networks
Protocol Support: Transceivers must support both InfiniBand and Ethernet protocols
Compatibility Requirements: Must be compatible with NVIDIA ConnectX-7, BlueField-3, Q3400-RA, and SN5600 platforms
This case study demonstrates the following complexity characteristics:
Integration of three distinct network protocols: Nvlink, InfiniBand, and Ethernet
Utilization of four different switch models: Q3400-RA, SN5600, SN2201
Physical separation of network functions: compute network, storage network, management network
Mixed speed requirements: from 10G management network to 1.6T InfiniBand network
Ocean Current Simulation: Large-scale parallel computing to simulate global ocean circulation patterns
Thermocline Analysis: Research on ocean temperature stratification and its climate impacts
Ocean Dynamics Research: Simulation of tidal, wave, and current interactions
Climate Modeling Support: Providing high-resolution ocean data for global climate models
Research Value: Network architecture ensures efficient computation execution, shortening research cycles
Highly optimized topology specifically designed for oceanographic simulation computing
Performance Critical: Low latency and high reliability essential for scientific computing
| Challenge | Solution Implemented | Outcome |
|---|---|---|
| Protocol Heterogeneity | Separate physical networks for each protocol with optimized topologies | Maximum performance for each workload type |
| High-Bandwidth Requirements | 1.6T OSFP transceivers with 1:1 non-blocking architecture | No communication bottlenecks during parallel computations |
| Thermal Management | High-efficiency cooling for high-power optical transceivers | Stable operation in dense computing environment |
| Cable Management | Structured cabling with proper bend radius protection | Reliable signal transmission and easy maintenance |