You are here

Hardware

logocolor4.png


 

 Cluster  Specifications  Nodes  Cores  Accelerators GPUs  In Service Out of Warranty
 21g + Dual AMD "Rome" 7502 2.5GHz 32-core/64-thread CPU, 1TB 3200 memory,  MLNX fabric (100 Gbps), PCIe 4, 2TB SSD  8  256 8x AMD MI100 GPU, Inter-GPU Infinity fabric

64

 10/1/2021 2026
 19g Intel Xeon "Skylake" Gold 5118, 196 GB memory, Omni-Path fabric (100 Gbps), 1TB disk  32  768 8x NVIDIA GeForce RTX 2080 GPU (no ECC) * 256  4/3/2019 4/3/2022
 18p 16 GB HBM, 92 GB main memory, Omni-Path fabric (100 Gbps), 200TB SSD  180  12,240 Knight's Landing 0  2018 2020
 16p 16 GB HBM, 192 GB main memory, OmniPath fabric (100 Gbps), 1TB disk  264  16,896 Xeon Phi 7230 0  2016 2019
  30,160   320  
 
All clusters have multiple high speed, low latency uplinks into the main disk server fabric, and access all of the filesystems over Omni Path or Infiniband.
 
* Since the gaming cards do not have ECC memory, they should only be used for matrix inversions where a quick test of correctness at the end of the inversion is possible.

Hadron contraction achieves 8Tflops, assume 25% GPU I/O per box so 64Tflops total. Dslash for double complex is about 250Gflops and 1Tflops for single precision on one MI100. So total is 16Tflops (Hadron contraction) and 64 Tflops (Dslash) for 21g cluster. 

Following are the hardware topologies of the various cluster compute nodes (listed newest to oldest):
  1. 21g - AMD GPU Cluster
  2. 19g - NVIDIA GPU cluster
  3. 18p - KNL cluster
  4. 16p - KNL cluster
21g

21g Hardware Details

$> lscpu
Architecture:            x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:              Little Endian
CPU(s):                    128
On-line CPU(s) list:   0-127
Thread(s) per core:   2
Core(s) per socket:   32
Socket(s):                2
NUMA node(s):         8
Vendor ID:               AuthenticAMD
CPU family:              23
Model:                     49
Model name:            AMD EPYC 7502 32-Core Processor
Stepping:                 0
CPU MHz:                 2354.424
CPU max MHz:          2500.0000
CPU min MHz:           1500.0000
BogoMIPS:                4999.66
Virtualization:            AMD-V
L1d cache:                32K
L1i cache:                 32K
L2 cache:                  512K
L3 cache:                  16384K
NUMA node0 CPU(s):  0-7,64-71
NUMA node1 CPU(s):  8-15,72-79
NUMA node2 CPU(s):  16-23,80-87
NUMA node3 CPU(s):  24-31,88-95
NUMA node4 CPU(s):  32-39,96-103
NUMA node5 CPU(s):  40-47,104-111
NUMA node6 CPU(s):  48-55,112-119
NUMA node7 CPU(s):  56-63,120-127

$> rocm-smi --showtopo

======================= ROCm System Management Interface =======================
=========================== Weight between two GPUs ============================
           GPU0      GPU1       GPU2       GPU3      GPU4       GPU5       GPU6       GPU7         
GPU0   0            52           15           15           72           72           72           15           
GPU1   52           0            40           52           15           15           15           72           
GPU2   15           40           0            15           72           72           72           15           
GPU3   15           52           15           0            72           72           72           15           
GPU4   72           15           72           72           0            15           15           52           
GPU5   72           15           72           72           15           0            15           52           
GPU6   72           15           72           72           15           15           0            40           
GPU7   15           72           15           15           52           52           40           0            

============================ Hops between two GPUs =============================
           GPU0      GPU1      GPU2      GPU3      GPU4      GPU5      GPU6      GPU7         
GPU0   0            3            1            1            3            3            3            1            
GPU1   3            0            2            3            1            1            1            3            
GPU2   1            2            0            1            3            3            3            1            
GPU3   1            3            1            0            3            3            3            1            
GPU4   3            1            3            3            0            1            1            3            
GPU5   3            1            3            3            1            0            1            3            
GPU6   3            1            3            3            1            1            0            2            
GPU7   1            3            1            1            3            3            2            0            

========================== Link Type between two GPUs ==========================
           GPU0      GPU1     GPU2      GPU3         GPU4     GPU5        GPU6        GPU7    NUMA NODE    
GPU0   0            PCIE       XGMI      XGMI        PCIE       PCIE         PCIE         XGMI       3   
GPU1   PCIE       0            PCIE       PCIE         XGMI      XGMI        XGMI        PCIE        2    
GPU2   XGMI      PCIE       0            XGMI        PCIE       PCIE         PCIE         XGMI       2    
GPU3   XGMI      PCIE       XGMI      0              PCIE       PCIE         PCIE         XGMI       1  
GPU4   PCIE       XGMI      PCIE       PCIE         0            XGMI        XGMI        PCIE        7  
GPU5   PCIE       XGMI      PCIE       PCIE         XGMI      0              XGMI        PCIE        6    
GPU6   PCIE       XGMI      PCIE       PCIE         XGMI      XGMI        0              PCIE        5   
GPU7   XGMI      PCIE       XGMI      XGMI        PCIE       PCIE         PCIE         0             5    

============================= End of ROCm SMI Log ==============================

19g

19g Hardware Details

$> lscpu
Architecture:              x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:                Little Endian
CPU(s):                      48
On-line CPU(s) list:     0-47
Thread(s) per core:     2
Core(s) per socket:     12
Socket(s):                  2
NUMA node(s):           2
Vendor ID:                 GenuineIntel
CPU family:                6
Model:                       85
Model name:              Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Stepping:                  4
CPU MHz:                  1700.000
CPU max MHz:           2301.0000
CPU min MHz:            1000.0000
BogoMIPS:                 4600.00
Virtualization:             VT-x
L1d cache:                 32K
L1i cache:                  32K
L2 cache:                   1024K
L3 cache:                   16896K
NUMA node0 CPU(s):  0-11,24-35
NUMA node1 CPU(s):  12-23,36-47

$> numactl --hardware
available:       2 nodes (0-1)
node 0 cpus:  0 ... 35
node 0 size:   96920 MB
node 0 free:   16831 MB
node 1 cpus:  12 ... 47
node 1 size:   98304 MB
node 1 free:   5503 MB
node distances:
node   0   1
  0:   10  21
  1:   21  10

$> nvidia-smi topo -m

       GPU0    GPU1   GPU2   GPU3   GPU4    GPU5    GPU6    GPU7    CPU Affinity
GPU0     X     PIX    PIX    PIX    NODE    NODE    NODE    NODE    0-11,24-35
GPU1    PIX     X     PIX    PIX    NODE    NODE    NODE    NODE    0-11,24-35
GPU2    PIX    PIX     X     PIX    NODE    NODE    NODE    NODE    0-11,24-35
GPU3    PIX    PIX    PIX     X     NODE    NODE    NODE    NODE    0-11,24-35
GPU4    NODE   NODE   NODE   NODE    X      PIX     PIX     PIX     0-11,24-35
GPU5    NODE   NODE   NODE   NODE   PIX      X      PIX     PIX     0-11,24-35
GPU6    NODE   NODE   NODE   NODE   PIX     PIX      X      PIX     0-11,24-35
GPU7    NODE   NODE   NODE   NODE   PIX     PIX     PIX      X      0-11,24-35

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

18p

18p Hardware Details

$> lscpu
Architecture:             x86_64
CPU op-mode(s):       32-bit, 64-bit
Byte Order:               Little Endian
CPU(s):                     272
On-line CPU(s) list:    0-271
Thread(s) per core:    4
Core(s) per socket:    68
Socket(s):                 1
NUMA node(s):          1
Vendor ID:                GenuineIntel
CPU family:               6
Model:                      87
Model name:             Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Stepping:                 1
CPU MHz:                 1046.117
CPU max MHz:          1600.0000
CPU min MHz:           1000.0000
BogoMIPS:                2793.60
L1d cache:                32K
L1i cache:                 32K
L2 cache:                  1024K
NUMA node0 CPU(s): 0-271

$> numactl --hardware
available:       1 nodes (0)
node 0 cpus:  0 ... 271
node 0 size:   98207 MB
node distances:
node   0
  0:    10

16p

16p Hardware Details

$> lscpu
Architecture:              x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:                Little Endian
CPU(s):                      256
On-line CPU(s) list:     0-255
Thread(s) per core:     4
Core(s) per socket:     64
Socket(s):                  1
NUMA node(s):           1
Vendor ID:                 GenuineIntel
CPU family:                6
Model:                       87
Model name:              Intel(R) Xeon Phi(TM) CPU 7230 @ 1.30GHz
Stepping:                  1
CPU MHz:                  1401.105
CPU max MHz:           1500.0000
CPU min MHz:            1000.0000
BogoMIPS:                 2593.81
L1d cache:                 32K
L1i cache:                  32K
L2 cache:                   1024K
NUMA node0 CPU(s):   0-255

$> numactl --hardware
available:      1 nodes (0)
node 0 cpus:  0 ... 255
node 0 size:   196511 MB
node distances:
node   0
  0:    10

 


If you need support or have questions please use the following support web page.