/work from within a batch job

Its recommended that you use the script cache_cp to copy files from /work into and out of the compute nodes used by your jobs. At this writing, rcp is symlinked to cache_cp. To use cache_cp, you use it as if you were using rcp, except that you do not need to provide the correct server name that is the source or destination under /work. The script will fill in the appropriate server for you.

cache_cp overview

cache_cp, the client program, reads a configuration file /etc/cache_cp.conf, which contains the port for the cache_cpd daemon, and a series of mappings like:

#               hpcdata7
/cache/LHPC                             hpcdata7:/cache7/LHPC
/cache/casa                             hpcdata7:/cache7/casa

#               hpcdata8
/work/HASTE                             hpcdata8:/work/HASTE
/work/JLabLQCD                          hpcdata8:/work/JLabLQCD

The cache_cp program then translates the maps on the left hand side to the value on the right hand side. It also cleans up /w/work to be just /work, etc. If the files are to be put onto a data server, the file sizes are summed up, and sent along to the server. If the files are to be retrieved from the data server, the server sums up the file sizes.

These sums are used to determine if the transfer is "large" or "small". For simplicity, all recursive copies (those that have an argument of -r on the commandline) are considered large. Once a transfer is classified to be large or small, they are put into separate queues on the server. Each of these queues have limits on the number of concurrent copies.

A PUT or a GET request is sent to the server, and the server sends back a "TICKET", which is the md5checksum of the request. The client then loops asking the server if the "TICKET" is next in line. If it is, then the server gives an "OK". If not, the server tells the client to sleep for a period of time.

Once the client gets the "OK", the rcp starts. During the transfer, the client updates the server that the transfer is in progress. Once the transfer is done, the client sends a "DONE" to the server.

The server has a supervisor thread that ensures that the connection states are correct, and it will purge any requests that have been idle for over 2 minutes.