APEX

News
Home > Cases > Cases > Scientific HPC Network Solution

News Navigation

Hot Articles

Recommend Articles

Scientific HPC Network Solution

Time: 2025-12-25 11:18:09
Number of views: 1864
Writting By: Admin

High-Performance Computing Cluster Network Case Study

GB200 NVL72 Cluster for Oceanographic Research Applications

Network Architecture for Ocean Current and Thermocline Simulation Research

Case Overview

This project deploys 8 GB200 NVL72 supercomputing nodes for oceanographic research, including ocean current simulation, thermocline analysis, and other complex computational tasks. The cluster employs one of the most sophisticated heterogeneous network architectures, integrating three network protocols (Nvlink, InfiniBand, and Ethernet) to achieve optimized separation of compute, storage, and management networks.


Network Architecture Overview

Scale Out Network: 400G InfiniBand, two-tier network architecture, 1:1 oversubscription ratio

Storage Network: 400G Ethernet, dedicated storage connectivity

Management Network: 10G Ethernet, dedicated cluster management

Scale Up Network: Nvlink, connected via Nvlink Switch Tray for direct GPU-to-GPU communication

Network Complexity: Three network protocols, four switch models, highly heterogeneous design


Network Types and Protocols

InfiniBand Network

Purpose: Scale Out parallel computing communication

Speed: 400G

Architecture: Two-tier Leaf-Spine architecture

Oversubscription Ratio: 1:1 (non-blocking design)


Ethernet Network

Purpose: Storage access and data transfer

Speed: 400G (storage), 10G (management)

Type: Separate storage and management networks


Nvlink Network

Purpose: High-speed GPU-to-GPU Scale Up communication

Connection Method: Via Nvlink Switch Tray

Features: Ultra-low latency, direct GPU memory access

Server-Side Network Configuration (Per Server)


Network Interface Card Configuration

NIC TypeQuantitySpeedPurpose
ConnectX-7 400G NIC72 units400GInfiniBand network connectivity
BlueField-3 400G NIC18 units400GStorage network connectivity


Optical Transceiver Configuration

Transceiver TypeQuantitySpeedPaired with NIC
400G Optical Transceiver72 units400GConnectX-7 400G NIC
200G Optical Transceiver36 units200GBlueField-3 400G NIC


InfiniBand Switch Configuration

Switch Model: NVIDIA Q3400-RA

Switch Quantity: 4 units (3 Leaf tier, 1 Spine tier)


Optical Transceiver Configuration

LocationTransceiver TypeQuantitySpeedPurpose
Overall Configuration1.6T OSFP Optical Transceiver216 units1.6TTotal transceivers for IB network
Leaf Tier Switches1.6T OSFP Optical Transceiver144 units1.6T72 uplink + 72 downlink
Spine Tier Switch1.6T OSFP Optical Transceiver72 units1.6TConnect to all Leaf switches


Network Architecture Description

Leaf Tier: 3 Q3400-RA switches, each connecting a portion of compute nodes

Spine Tier: 1 Q3400-RA switch, connecting all Leaf tier switches

Architecture Advantage: Two-tier non-blocking architecture ensuring no blocking between any nodes

Oversubscription Ratio: 1:1, optimized for high-performance computing


Ethernet Switch Configuration

Storage Network Switches

Switch ModelQuantitySpeedTransceiver Configuration
NVIDIA SN56002 units800G800G OSFP Optical Transceiver × 72 units

Management/Internet Access Switches

Switch ModelTypeQuantitySpeedPurpose
NVIDIA SN2201Management Switches8 units10GCluster management network
NVIDIA SN2201Internet Access Switches8 units10GExternal network access


Optical Transceiver Requirements Summary

Total Optical Transceivers: 396 units across the entire cluster

High-Speed Transceivers: 1.6T OSFP (216 units) and 800G OSFP (72 units) for compute and storage networks

Protocol Support: Transceivers must support both InfiniBand and Ethernet protocols

Compatibility Requirements: Must be compatible with NVIDIA ConnectX-7, BlueField-3, Q3400-RA, and SN5600 platforms


Network Complexity Analysis

This case study demonstrates the following complexity characteristics:

Integration of three distinct network protocols: Nvlink, InfiniBand, and Ethernet

Utilization of four different switch models: Q3400-RA, SN5600, SN2201

Physical separation of network functions: compute network, storage network, management network

Mixed speed requirements: from 10G management network to 1.6T InfiniBand network


Application Scenarios and Value Proposition

Ocean Current Simulation: Large-scale parallel computing to simulate global ocean circulation patterns

Thermocline Analysis: Research on ocean temperature stratification and its climate impacts

Ocean Dynamics Research: Simulation of tidal, wave, and current interactions

Climate Modeling Support: Providing high-resolution ocean data for global climate models

Research Value: Network architecture ensures efficient computation execution, shortening research cycles

Highly optimized topology specifically designed for oceanographic simulation computing

Performance Critical: Low latency and high reliability essential for scientific computing


Key Technical Challenges and Solutions

ChallengeSolution ImplementedOutcome
Protocol HeterogeneitySeparate physical networks for each protocol with optimized topologiesMaximum performance for each workload type
High-Bandwidth Requirements1.6T OSFP transceivers with 1:1 non-blocking architectureNo communication bottlenecks during parallel computations
Thermal ManagementHigh-efficiency cooling for high-power optical transceiversStable operation in dense computing environment
Cable ManagementStructured cabling with proper bend radius protectionReliable signal transmission and easy maintenance


Article Tags: