IF you had a computer account at JLab in the past then please verify this by searching your information in the JLab phone book.
You can change your unexpired password by logging into any of the central Linux systems and typing /apps/bin/jpasswd. Or, you may use the web interface by logging in to the JLab Computer Center web site (https://cc.jlab.org) and clicking on the "Password Change" link in the "Web Utilities" section on the right side of the page. Before you change your password, we recommend that you review the password rules at https://cc.jlab.org/passwordrules. These rules are based on federal requirements for JLab computer systems and must be followed by all JLab Computer Account Holders.
If you get an ERROR when attempting to change your password, please contact the IT Division Helpdesk (email: helpdesk@jlab.org or phone: 757-269-7155).
There are several partitions as listed below. For an up to date partition list please see the following web page.
The default partition is phi, use '-p partition-name' in your SLURM job submission command to use a partition different from the default phi partition.
Right now there is no default MPI configured in SLURM. The following command will list APIs srun
supports for MPI.
$ srun --mpi=list
srun: MPI types are...
...
Use the '--constraint=flat,quad' or '-Cflat,quad' option to request nodes in flat-quad mode. If there are insufficient nodes available, SLURM will reboot nodes in to the requested mode. Similarly, if a user needs cache-quad mode, they must use '--constraint=cache,quad' or '-Ccache,quad'.
Use '--constraint=cache,quad,18p' or '-Ccache,quad,18p' option to request 18p nodes in cache-quad mode. Similarly, use '--constraint=flat,quad,18p' or '-Cflat,quad,18p' option to request 18p nodes in flat-quad mode. To request 16p nodes, replace 18p with 16p in the before mentioned options.
The latest USQCD jeopardy policy is on this web page.
In order to compile your code using CUDA, on the cluster login nodes (qcdi1401 or qcdi1402) check available CUDA versions as follows:
[@qcdi1402 ~]$ module use /dist/modulefiles/
[@qcdi1402 ~]$ module avail
----------------------------------------------------------- /dist/modulefiles/ -----------------------------------------------------------
anaconda2/4.4.0 anaconda3/5.2.0 cmake/3.21.1 curl/7.59 gcc/7.1.0 gcc/8.4.0 go/1.15.4
anaconda2/5.2.0 cmake/3.17.5 cuda/10.0 gcc/10.2.0 gcc/7.2.0 gcc/9.3.0 singularity/2.3.1
anaconda3/4.4.0 cmake/3.18.4 cuda/9.0 gcc/5.3.0 gcc/7.5.0 go/1.13.5 singularity/3.6.4
------------------------------------------------------------ /etc/modulefiles ------------------------------------------------------------
anaconda ansys18 gcc_4.6.3 gcc-4.9.2 gcc-6.2.0 gsl-1.15 mvapich2-1.8
anaconda2 ansys2020r1 gcc-4.6.3 gcc_5.2.0 gcc-6.3.0 hdf5-1.8.12 mvapich2-2.1
......
Load the desired CUDA version as follows:
[@qcdi1402 ~]$ module load cuda/10.0
There is no way to just reserve a single gpu on 21g. You have to run 8 separate programs (without the srun) with each run configured to "see" a different gpu. That can be accomplished by setting ROCR_VISIBLE_DEVICES for each srun properly as shown by an example below:
!/bin/bash
#SBATCH --nodes=1
#SBATCH -p 21g
export OMP_NUM_THREADS=16
ROCR_VISIBLE_DEVICES=0 ./mybinary &
ROCR_VISIBLE_DEVICES=1 ./mybinary &
ROCR_VISIBLE_DEVICES=2 ./mybinary &
ROCR_VISIBLE_DEVICES=3 ./mybinary &
ROCR_VISIBLE_DEVICES=4 ./mybinary &
ROCR_VISIBLE_DEVICES=5 ./mybinary &
ROCR_VISIBLE_DEVICES=6 ./mybinary &
ROCR_VISIBLE_DEVICES=7 ./mybinary &
wait
The blocking of certain remote sites from the cluster login nodes is a mitigation strategy implemented by the JLab Computer Security team. While we cannot circumvent this blockade, we recommend that something like this should work in most cases.
Example of a failing command run on cluster log node:
[@qcdi2001 ~]$ wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.8/src...
--2022-02-28 13:10:01-- https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.8/src...
Resolving support.hdfgroup.org (support.hdfgroup.org)... 50.28.50.143
Connecting to support.hdfgroup.org (support.hdfgroup.org)|50.28.50.143|:443... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2022-02-28 13:10:01 ERROR 503: Service Unavailable.
Recommended option for a successful execution of the above command:
[@qcdi2001 ~]$ ssh jlabl5 curl https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.8/src... > hdf5-1.10.8.tar.bz2