Xeon Phi Specifications

Details

 

  • 16p (2016 Phi) -- 264 nodes, Xeon Phi 7230, 16 GB high bandwidth memory, 192 GB main memory, OmniPath fabric (100 Gb/s), 1 TB disk.

Each Knights Landing (KNL) node has 64 cores, hyper-threaded 4 ways (256 virtual cores) running at 1.3 GHz.  The on-package high bandwidth memory has a bandwidth above 450 GB/s, and the main memory has a bandwidth of about 90 GB/s (available concurrently).

Infiniband Fabric

 

Infiniband Fabric

 

The different clusters have different optimizations of the Infiniband network fabric.  Both have what is referred to as a "fat tree" topology.  Individual nodes are connected to "leaf switches", and the leaf switches are connected to "core" switches.   They differ in the amount of bandwidth between leaf switches.

GPU Specifications

Host Details

  • 19g (2019 GeForce RTX 2080) -- 32 nodes, octa RTX 2080 GPU, 24 Intel(R) Xeon(R) Gold 5118 cores, 196 GB memory, Omni-Path fabric (100 Gb/s), 1TB disk.
  • 12k (2012 Kepler) -- 46 nodes, 16 Intel 2.0 GHz cores, 128 GB memory, quad GPU (Kepler K20m), FDR (56 Gb/s) Infiniband, 1 TB disk.

GPU cards

Tesla cards (w/ ECC memory)

Scientific Computing at Jefferson Lab

(this will eventually be an outline plus most important news, upcoming events, etc.)


Scientific Computing consists of two main systems, one for Experimental Physics and the other for Lattice QCD (theory) computing. Many resources (file servers, offline storage, wide area networking) are shared between the two, with appropriate allocations. 

See

Wide Area Networking

Jefferson Lab has a 10g wide area networking connection to a MAN (metropolitan area network) with a 10g connection up to ESnet in Washington D.C. and a redundant 10g connection to ESnet in Atlanta.  JLab can reasonably use 5 Gbps of this, and Scientific Computing can reasonably use 4 Gpbs.  Thus each of CLAS, GlueX, A+C+misc, and LQCD can on average use 1 Gbps, although may on occasion find they can sustain 5-6 Gbps.

Tape Library (offline storage)

IBM TS3500 Tape Library

The JLab Mass Storage System (MSS) is an IBM TS3500 tape library with LTO drives,  installed in 2008 to replace JLab's original StorageTek silo with Redwood technology. The TS3500 is a modular system, with an expandable number of frames for tape slots and drives, and an expandable number of tape drives. The lab's JASMine software provides the user interface to the MSS.

Our current configuration consists of

Experimental Physics File System Layout

Experimental Physics users see a file system layout with many parts:

/home: is a file system accessible from all CUE nodes, and is the user's normal home directory, held on central file servers.

/group: is a file system accessible from all CUE nodes, and is a shared space for a group such as an experiment, held on central file servers.

HPC / LQCD File System Layout

LQCD / HPC users see a file system layout with 5 parts:

/home: Every user will have a home directory after he/she gets an account. Note that for performance and fault tolerance, this is a different home directory from the general lab computing home directory. With a default user quota of 2 GB, /home is not a large file system and is backed up daily. It is designed to store non-data files, such as scripts and executables. The home directory is mounted on interactive nodes and compute nodes. This disk space is managed by individual users.

Disk Servers (online storage)

Scientific Computing currently has 2 (physical) types of file servers:

Experimental Physics Computing

The batch farm contains ~300 CentOS 7.7 nodes, with 8, 16, 24, 32, or 36 cores. Each core is run with two hardware threads, for two job slots per core, for a total of ~24000 job slots

Pages