Skip to content

Commit f90fb73

Browse files
rsync: --filter on .git/info/exclude (skypilot-org#652)
* rsync: --filter on .git/info/exclude * Update docs. * Use --exclude-from, and check if git exclude exists * Update docs
1 parent 6fad18a commit f90fb73

File tree

3 files changed

+28
-10
lines changed

3 files changed

+28
-10
lines changed

docs/source/examples/syncing-code-artifacts.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ scripts, access checkpoints, etc.).
4747
For large, multi-gigabyte workdirs, uploading may be slow because the they
4848
are synced to the remote VM(s) with :code:`rsync`. To exclude large files in
4949
your workdir from being uploaded, add them to the :code:`.gitignore` file
50-
under the workdir.
50+
(or a ``.git/info/exclude`` file) under the workdir.
5151

5252
.. note::
5353

@@ -94,8 +94,9 @@ For more details, see `this example <https://github.com/sky-proj/sky/blob/master
9494

9595
.. note::
9696

97-
Items listed in a :code:`.gitignore` file under any local file_mount source
98-
are also ignored (the same behavior as handling ``workdir``).
97+
Items listed in a :code:`.gitignore` file (or a ``.git/info/exclude`` file)
98+
under a local file_mount source are also ignored (the same behavior as
99+
handling ``workdir``).
99100

100101
Uploading or reusing large files
101102
--------------------------------------

docs/source/reference/yaml-spec.rst

+5-4
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@ describe all fields available.
1313
1414
# Working directory (optional), synced to ~/sky_workdir on the remote cluster
1515
# each time launch or exec is run with the yaml file.
16+
#
1617
# NOTE: Sky does not currently support large, multi-gigabyte workdirs as the
1718
# files are synced to the remote VM with `rsync`. Please consider using Sky
18-
# Storage to transfer large datasets and files. If a .gitignore exists anywhere
19-
# within the working directory tree, the behavior will match git's behavior
20-
# for finding and using .gitignore files. Files and directories included in
21-
# a .gitignore file will be ignored by Sky.
19+
# Storage to transfer large datasets and files.
20+
#
21+
# If a .gitignore file (or a .git/info/exclude file) exists in the working
22+
# directory, files and directories listed in those files will be ignored.
2223
workdir: ~/my-task-code
2324
2425
# Number of nodes (optional) to launch including the head node. If not

sky/backends/cloud_vm_ray_backend.py

+19-3
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,12 @@ def _path_size_megabytes(path: str, exclude_gitignore: bool = False) -> int:
131131
'falling back to du -shk')
132132
pass
133133
return int(
134-
subprocess.check_output(['du', '-sh', '-k', path
135-
]).split()[0].decode('utf-8')) // (2**10)
134+
subprocess.check_output([
135+
'du',
136+
'-sh',
137+
'-k',
138+
path,
139+
]).split()[0].decode('utf-8')) // (2**10)
136140

137141

138142
class RayCodeGen:
@@ -2147,7 +2151,19 @@ def _rsync_up(
21472151
# to get a total progress bar, but it requires rsync>=3.1.0 and Mac
21482152
# OS has a default rsync==2.6.9 (16 years old).
21492153
rsync_command = ['rsync', '-Pavz']
2150-
rsync_command.append('--filter=\':- .gitignore\'')
2154+
# Legend
2155+
# dir-merge: ignore file can appear in any subdir, applies to that
2156+
# subdir downwards
2157+
# Note that "-" is mandatory for rsync and means all patterns in the
2158+
# ignore files are treated as *exclude* patterns. Non-exclude
2159+
# patterns, e.g., "! do_not_exclude" doesn't work, even though git
2160+
# allows it.
2161+
rsync_command.append('--filter=\'dir-merge,- .gitignore\'')
2162+
git_exclude = '.git/info/exclude'
2163+
if (pathlib.Path(source) / git_exclude).exists():
2164+
# Ensure file exists; otherwise, rsync will error out.
2165+
rsync_command.append('--exclude-from=.git/info/exclude')
2166+
21512167
ssh_options = ' '.join(
21522168
backend_utils.ssh_options_list(ssh_key,
21532169
self._ssh_control_name(handle)))

0 commit comments

Comments
 (0)