Skip to content

Commit

Permalink
Always use vocab.spm from artifacts directory in training steps (#545)
Browse files Browse the repository at this point in the history
This ensures that it will be uploaded alongside models if we are spot terminated, which makes automatically resuming in a rerun much easier.
  • Loading branch information
bhearsum authored May 2, 2024
1 parent 40abd8a commit 6857447
Showing 1 changed file with 8 additions and 7 deletions.
15 changes: 8 additions & 7 deletions taskcluster/scripts/pipeline/train-taskcluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,13 @@ case "$pretrained_model_mode" in
;;
"continue"|"init"|"None")
if [ "$pretrained_model_mode" == "None" ]; then
vocab="$MOZ_FETCHES_DIR/vocab.spm"
else
vocab="$TASK_WORKDIR/artifacts/vocab.spm"
# In any non-pretrained mode this file is pulled from an upstream
# task. We copy it over to the artifacts directory earlier to
# ensure that it is published even if the task is interrupted
# (eg: by a spot termination in GCP). This makes resuming training
# easier.
mkdir -p "$TASK_WORKDIR/artifacts"
cp "$MOZ_FETCHES_DIR/vocab.spm" "$TASK_WORKDIR/artifacts/vocab.spm"
fi

if [ "$pretrained_model_mode" == "init" ]; then
Expand All @@ -54,13 +58,10 @@ case "$pretrained_model_mode" in
"$train_set_prefix" \
"$valid_set_prefix" \
"$model_dir" \
"$vocab" \
"$TASK_WORKDIR/artifacts/vocab.spm" \
"$best_model_metric" \
"$alignments" \
"$seed" \
"${extra_params[@]}"
if [ "$pretrained_model_mode" == "None" ]; then
cp "$vocab" "$model_dir"
fi
;;
esac

0 comments on commit 6857447

Please sign in to comment.