You are here
Best Practices
Fri, 02/20/2015 - 13:56 — ychen
- Try to submit multi-threaded jobs when possible. Using a multi-threaded job can save memory foot-print of the job and may increase the performance of the job.
- Let Auger manage the input and output files. Do not call jget/jput from the farm job. If jget/jput is called inside one's job, the time period which the job waits for the tape I/O to finish is counted toward the wall time of this job. Note: jget and jput inside Auger will be disabled in the near future.
- Split a large job which needs more than 100 INPUTs into small jobs each of which processes a small set of files. This allows Auger and Jasmine to schedule the jobs with small set of files sooner.
- Do a large chunk of reads/writes (4MB or greater per read/write) in the farm job. If you read/write to Lustre file system directly, please make sure your read/write are in large chunks (>= 10 MB at least). If your jobs slow down the performance of the Lustre file system by doing a lot of small file I/O, the number of your jobs will be severely limited.
- Request right amount of resources, such as the number of cores, memory, disk and job time.
- Specify correct project and track name to improve farm accounting statistics.
- When a job fails, check log and error message in ~/.farm_out directory before sending a CCPR.
- Clean up ~/.farm_out directory periodically.
- Open/close database connections when needed. Don't leave a connection open for a long period of time.
- Use the newest farm node when they are available.
- Test the script (submit few test jobs) before start to run large number of jobs.