Best Practices

  1. Try to submit multi-threaded jobs when possible. Using a multi-threaded job can save memory foot-print of the job and may increase the performance of the job.
  2. Let Auger manage the input and output files. Do not call jget/jput from the farm job. If jget/jput is called inside one's job, the time period which the job waits for the tape I/O to finish is counted toward the wall time of this job. Note: jget and jput inside Auger will be disabled in the near future.
  3. Split a large job which needs more than 100 INPUTs into small jobs each of which processes a small set of files. This allows Auger and Jasmine to schedule the jobs with small set of files sooner.
  4. Do a large chunk of reads/writes (4MB or greater per read/write) in the farm job. If you read/write to Lustre file system directly, please make sure your read/write are in large chunks (>= 10 MB at least). If your jobs slow down the performance of the Lustre file system by doing a lot of small file I/O, the number of your jobs will be severely limited.
  5. Request right amount of resources, such as the number of cores, memory, disk and job time.
  6. Specify correct project and track name to improve farm accounting statistics.
  7. When a job fails, check log and error message in ~/.farm_out directory before sending a CCPR.
  8. Clean up ~/.farm_out directory periodically.
  9. Open/close database connections when needed. Don't leave a connection open for a long period of time.
  10. Use the newest farm node when they are available.
  11. Test the script (submit few test jobs) before start to run large number of jobs.