File system layout

The Farm interactive and batch systems include parts of the Common User Environment (CUE), which means that you have access to the same /home  /group  /apps  /site  file systems as you do from your desktop. Documentation on these desktop accessible files systems is at CUE Directories.  Note: the /home area's backup system supports recovering files you accidentally delete.

Scientific Computing Filesystems

LQCD / HPC Portal

Information about the computing system is available at the LQCD / HPC Portal: lqcd.jlab.org.  On the entry page you can find information about the status of the various clusters, as well as important news items about new capabilities, current problems or planned outages.  From the menu on the left you can get more detailed information about status and utilization, including reports by user or project for any arbitrary time interval.  This portal is also used by system administrators to adjust quotas and projec

Compile-Link-Test

JLab has 3 distinct system types for which you can develop applications:

  • standard x86 clusters
  • NVIDIA GPU systems (2 generations: Fermi, Kepler)
  • Intel Xeon Phi systems, a.k.a. MIC (Knights Corner generation)

Each of these have different compiler options and libraries, and for each of these there are examples in the relevant subsections later in this book.

Transferring files

There are multiple ways to move files to/from JLab.  Keep in mind that the LQCD interactive nodes do NOT have offsite network access, and so you will either need to set up ssh tunnels, or you will need to use another node to do the transfers.

Transfering Files via Globus Online

The preferred way to transfer large files in from the lab is to use Globus (formerly GlobusOnline). Please visit the Globus site and set up a Globus account. Then files can be transferred to JLab machines using the Globus endpoint jlab#gw12

Batch system basics

Jefferson Lab uses OpenPBS (Torque) with the Maui open source scheduler.  You will need a batch system account associated with your project in order to submit a batch job, and you will need to be an authorized user on that account.  Please see your project leader for relevant account information, and if necessary ask him/her to email Chip Watson to get your username added to the project.  For new (small) projects, you may simply email Chip Watson.

Steps in using the batch system:

File system layout

/home: Every user has a home directory. Note that this is a different home directory from the general lab computing home directory. /home is not a large file system and the default quota is 2 GB and is backed up daily. It is intended to store non-data files, such as scripts and executables. The home directory is mounted on both interactive nodes and compute nodes. This disk space is managed by individual users.

Obtaining a user account

Who can use the LQCD / HPC clusters?
Anyone who is a part of the USQCD collaboration or the Lattice QCD SciDAC project may have an account.  JLab users not associated with LQCD may also have small allocations. Large allocations of time are allocated annually by the LQCD Scientific Advisory Committee (see usqcd.org for more details about this collaboration). Small allocations may be requested from the Scientific Computing manager, Chip Watson. (Technical details: Anyone that has a valid Jefferson Lab user account and is in the unix

Getting Started

Steps to getting started using the LQCD systems at Jefferson Lab:

  1. Get a user account
  2. Learn about file systems and data management
  3. Learn to create and submit batch scripts
  4. Learn about transferring files to/from JLab

The sub-sections below will walk you through an overview of these steps; additional, more detailed information can be found in the subsequent chapters.

ZFS Appliance

ZFS (see ZFS wiki entry) is a file system with several distinct features and advantages.  Storage is managed as pools, and pools may be hierarchically split into smaller pools.  This gives the effect of hierarchical quotas, useful for our application, and a model for our own in-house storage management software running above Lustre.  ZFS (Z File System) implements RAID-Z, which has a tree of embedded checksums.  These checksums are always checked on read, so data integrity is very high.  Open ZFS is used

Lustre File System

The Lustre system (a distributed file system) spans multiple file servers, called Object Storage Servers (OSS), while presenting a single flat namespace as if it were a single server.  File system directory information is held in a Meta Data Server (MDS), which at JLab is implemented as a dual headed system with the two servers sharing a SAS disk array. One of the two heads is active and the other inactive, and they implement a hot failover in case the primary server fails.

Pages