Why isn't my job running, it has been waiting a long time?
There are many possible reasons:
- your project has a small fair share, and so other projects have higher priority
- another person has been heavily using your same project, such that your project is already above it's fair share target for the current scheduling interval
- there are not enough nodes available of the type you have requested
- there is a reservation on the system (or on the nodes you need), possibly for scheduled maintenance (you can see this by looking at the portal page lqcd.jlab.org and checking the news, or clicking on Cluster Status => Nodes to see if there is a reservation, indicated by a half yellow status bar for each reserved node
- nodes might be idle as the system drains enough nodes to run a large job that is currently highest priority.
If you believe none of these are true and you believe the system is misbehaving, you may submit a trouble ticket.
How do I run an interactive job?
An interactive job provides a terminal to a node that fits the parameters passed to qsub. This allows you to run commands as though you had accessed the node via ssh.
- ssh to qcdi
- run the following qsub -A <account to be used> -q <phi, ib, or gpu> -l walltime=<time for interactive job to run>,<other parameters -I
- the terminal session will display the job id and then wait until it launches, which will then present a prompt that can be used like a regular ssh session.
Note that the interactive job will run as soon as pbs allocates a node to it, then it will run for the duration passed by walltime. This means that the job could start immediately, or it could sit in the queue for several hours waiting for a free node. Note (and warning): if you close the terminal or enter ctrl+c the job will be deleted (useful when you are finished your work).