Data Transfer Technologies

  1. OSG
    1. Bestman
    2. hadoop
  2. XROOTD
  3. Globus Online

 

Advertisements

SRM Reading List

From 2013-09-04

http://en.wikipedia.org/wiki/Storage_Resource_Manager
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CGEQFjAE&url=https%3A%2F%2Fwww.opensciencegrid.org%2Ftwiki%2Fpub%2FEducation%2FCourseMaterials%2FSRM.ppt&ei=U3vkUJ7lNJG10QGK5oDIBA&usg=AFQjCNGMtKnG06yFGFEREfOxt4qRfoo2DA&sig2=bZjlAxO1LEfyTUmv3AlMBw&cad=rja
https://www.opensciencegrid.org/bin/view/Documentation/Release3/NavUserApplications
https://www.opensciencegrid.org/bin/view/Documentation/SrmBasics
http://www.youtube.com/watch?v=4hsL6vTc1Yg&feature=plcp&context=C3e32ca8UDOEgsToPDskKBZLXYsQUHgz3fQ9ePim55

Notes on UConn transfer of DC2 files

Speed measurements

3+3 1100 files per hour
4+4 1529 files per hour
5+5 1785 files per hour
6+6 2611 in 70 min., 2300 in 62 min.

lorentz:marki:scripts> perl -e ‘$x = -f “/volatile/halld/data_challenge/uconn/rest/dana_rest_09001_4394942.hddm”; print “$x\n”;’

was very fast

lorentz:marki:scripts> find /volatile/halld/data_challenge/uconn/rest -name dana_rest_09001_4394941.hddm

not so much.

No new files found at UConn (ca. 5/31). In fact some of them are not there anymore.

Speeding Up the SRM

Transferring 100 files from run 9003 from UConn with 6 other streams going on ifarm1101:

ifarm1101:marki:srm_test> date ; srmcp -copyjobfile=copyjob.txt ; date
Wed May 28 10:41:31 EDT 2014
Wed May 28 11:05:27 EDT 2014

Srmcp help message says that multiple-streams requires that the server be in active mode. Try:

srmcp -debug -server_mode=active -copyjobfile=copyjob_short.txt

where copyjob_short.txt has three DC2 files in it. Fails with the following error text:

“””””
copying CopyJob, source = gsiftp://stat26.phys.uconn.edu:2811/Gluex/test/dana_rest_09003_4052210.hddm destination = file:////scratch/marki/srm_test/data/dana_rest_09003_4052210.hddm
2014-05-28 15:36:24,976 [Thread-73] ERROR org.dcache.srm.util.GridftpClient – org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server reported transfer failure (error code 1) [Nested exception message:  Custom message: Unexpected reply: 426 Transfer failed due to unexpected exception: java.net.ConnectException: Connection timed out] [Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom message: Unexpected reply: 426 Transfer failed due to unexpected exception: java.net.ConnectException: Connection timed out]
2014-05-28 15:36:24,979 [Thread-1] ERROR org.dcache.srm.util.GridftpClient –  transfer exception
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server reported transfer failure (error code 1) [Nested exception message:  Custom message: Unexpected reply: 426 Transfer failed due to unexpected exception: java.net.ConnectException: Connection timed out]
at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java:101) ~[gridftp-2.0.5-dcache-rc4.jar:na]
at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:182) ~[gridftp-2.0.5-dcache-rc4.jar:na]
at org.globus.ftp.vanilla.TransferMonitor.start(TransferMonitor.java:109) ~[gridftp-2.0.5-dcache-rc4.jar:na]
at org.globus.ftp.FTPClient.transferRunSingleThread(FTPClient.java:1488) ~[gridftp-2.0.5-dcache-rc4.jar:na]
at org.globus.ftp.FTPClient.get2(FTPClient.java:1751) ~[gridftp-2.0.5-dcache-rc4.jar:na]
at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:886) ~[srm-2.2.11-SNAPSHOT.jar:2.2.11-SNAPSHOT]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
copy failed with the error
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Server reported transfer failure (error code 1) [Nested exception message:  Custom message: Unexpected reply: 426 Transfer failed due to unexpected exception: java.net.ConnectException: Connection timed out] [Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException:  Custom message: Unexpected reply: 426 Transfer failed due to unexpected exception: java.net.ConnectException: Connection timed out]
try again
sleeping for 180000 before retrying
“””””

Get a similar error when trying streams_num=4 with or without server_mode=active.

We might be suffering from the lots-of-small-files problem. The server time-out in the error seems consistent with that theory. See http://toolkit.globus.org/alliance/publications/papers/Pipelining.pdf .

There is quasi-similar error report at http://www.lofar.org/wiki/doku.php?id=public:grid:troubleshooting which recommends setting stream_num=1 to fix the problem, but that error is “no route to host” not “connection timed out”

OpenShop Certificates at JLab for the SRM

On May 14 Richard added new host certificates to the appropriate directory under /apps/osg to account for some of the SRM server hosts at UConn licensed under “OpenShop”, his private label. Those will be changed in the future to refer to DigiCert as is more standard.

This was causing intermittent failures that depended on the number of files requested. If more than about 20 were in an SRM requested then it was likely that one of the untrusted servers would be involved and the transfer would bomb on the client side with no trace on the server except for a log entry indicating that the client had “canceled” the request.