Date: Wed, 16 Aug 2017 13:28:06 -0400 (EDT)
From: Ying Chen <email@example.com>
To: marki <firstname.lastname@example.org>, Alexander Austregesilo <email@example.com>,
Sean Dobbs <firstname.lastname@example.org>, pmatt <email@example.com>
I have done the implementation of the changes to the cache management policy.
Here are the summary of the changes:
1) All files used in farm jobs (as input starts /mss/…) and bigger than 3 MB
will be deleted first.
2) The pin from farm job will not hold due to pin quota. This means even there is
no enough pin quota, the request will still be processed.
3) When the pin excess the pin quota, most near expired pin will be removed
(not necessary the oldest pin).
4) When there is no pin quota available, user’s pin will fail and user’s jcache
wil be hold (same as like before).
I hope these changes will make most pin unnecessary so you don’t have to pin
a large amount of files. In next few days, I will reduce halld’s pin quota gradually
to test the new software, eventually drop to 150TB.
Thanks for your patient and support.
Note added on point (3): Files greater that two days from pin expiration will not be unpinned. This means that that the group can be over its pin quota.
Note on point (4): Farm pins will always be granted, event if group is over its pin quota. Farm pins are likely to disappear shortly; they are removed when the associated job is finished.
From conversation with Ying, a long time ago
Lustre group quota
used -- no information available
Present: Chip, Graham, MMI, Richard (UConn)
- quality of service feature/concern
- I/O bandwidth an issue
- need 128 GB
- 20 TB of disk
- 1 Gbit coming in should be sufficient
- should arrive mid-Jan.
- personnel to stand the system up should be available
- OSG stuff, Richard
- Ticket source: UConn
- Users need to log into appliance
- Shadow jobs on appliance for each running job
- Action items:
- get numbers for data footprint from dc2
- add off-site computing to official computing plan for GlueX
Ole, Brad, Sandy, Harut, MMI
- new nodes centos7, Lustre client upgrade, version above that of server
- xfs -> ext4, performance problems, severe
- one local disk instead of two on these nodes
- pbs, torque, maui
- see adaptive computing dot com
- open source: torque and maui
- hall b data challenge three weeks , few days, will use moved LQCD nodes