-
Notifications
You must be signed in to change notification settings - Fork 91
WIP data-mover #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP data-mover #2474
Changes from all commits
ac3f09f
fe73277
3d0fe01
88c863a
7047565
c01f224
c36d49f
4db9e9c
058cffb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Data-mover | ||
|
||
Data-mover is a tool to move data between Puhti and Mahti local filesystems and | ||
Allas and LUMI-O object storage servers, when | ||
[simple transfers](../faq/how-to-move-data-between-puhti-and-allas.md#move-data-with-rclone) | ||
are not practical, either because there are many small files, or the size of the | ||
dataset is large. | ||
|
||
We wish the data-mover tool `data-mover` to be simple to use, and handle all possible | ||
hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) | ||
, and stores the data in Restic repository format. | ||
Restic (as used by data-mover) in turn uses [Rclone](https://rclone.org) backend for the actual data transfers to | ||
the object storage servers and back. In addition, the data-mover tool does the | ||
data transfers in the background, using batch jobs, allowing larger transfers | ||
than would be practical in regular interactive login sessions. | ||
|
||
## Simple example case, moving data from Puhti to Allas and back | ||
|
||
Below is a guide for a simple scenario, moving data from Puhti project scratch | ||
directory to corresponding project in Allas, and then back. Similar works with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similar what works? Unclear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you mean exact same instructions work on mahti, which is true. Lumi-O is slightly different, need to specify more There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These object storages are really bad from the perspective of traditional HPC use. The mapping between filesystem to object storage is far from 1-to-1, object storage is completely separate machine with it's own authentication and authorisation, there are many different transfer tools/clients, APIs, and object storage server configurations, all different and often incompatible, instead of OS just handling it... I started writing it all out, noticed that it would be a long article, wrote TLDR text (what it is now), and deleted the start of the more complete guide. This tool is supposed to be easy to use. If the documentation is long, it means the tool is not easy to use :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll see if I can redirect the reader quicker to more comprehensive docs for using other services than puhti and allas There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I reread the doc. Similar is exactly how it is. Very unclear, but truthfully so :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could just say that easy instructions work the same way in Mahti. Cross service usage, using Lumi-O instead of allas is possible, please go read the advanced section if you are interested |
||
Mahti and LUMI-O. Please have a look at `data-mover help` and `data-mover <sub-command> --help` | ||
for additional documentation. | ||
|
||
### Setting up the connection from Puhti to Allas | ||
|
||
1. Your CSC project needs to have Allas service enabled. The project PI can add | ||
Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and | ||
the project members need to [accept the service terms](../../accounts/how-to-add-service-access-for-project.md). | ||
|
||
2. Create a configuration for rclone and store the authentication token in the | ||
file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from | ||
[Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the | ||
"Tools" drop-down menu, and | ||
[create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. So this uses the the Open on demand style s3 configuration. This will be bit confusing for the old users of S3 allas. |
||
4. Open a terminal to Puhti, and take the data-mover tool `data-mover` into use with | ||
``` | ||
module load .data-mover | ||
``` | ||
|
||
### Moving a single directory in Puhti to Allas | ||
|
||
1. Delete all the files that are not needed from the scratch directory, | ||
`/scratch/project_<projid>/exampledir`, for example. There is no need | ||
to compress the files. | ||
|
||
3. Move the data to Allas | ||
``` | ||
data-mover export /scratch/project_<projid>/exampledir | ||
``` | ||
|
||
3. Check the status of the data transfer with | ||
``` | ||
data-mover status | ||
``` | ||
|
||
### Listing the data in Allas | ||
|
||
``` | ||
data-mover list | ||
``` | ||
|
||
### Moving data from Allas to Puhti | ||
|
||
Import data back to the original directory with | ||
``` | ||
data-mover import /scratch/project_<projid>/exampledir | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens for the overlapping files in exampledir? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you remove old exports? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by overlapping files? Deleting an export from Allas means you moved something that you should have simply deleted in the first place :D Ok, there is |
||
## Links to related material | ||
|
||
- [Lue tool for data inventory](lue.md) | ||
- [Data cleaning](clean-up-data.md) | ||
- [Allas introduction](../../data/Allas/introduction.md) |
Uh oh!
There was an error while loading. Please reload this page.