|
| 1 | + |
| 2 | +# **TuneConfig** |
| 3 | + |
| 4 | + |
| 5 | +## **Hydro args**: |
| 6 | + |
| 7 | ++ **`scaling_num`**: The number of model width scaling. Hydro supports switch different tuning modes ``Hydro (Scaling+Fusion) | Hydro (Fusion Only) | Ray (Classic HPO)`` via setting this value. Specifically, |
| 8 | + |
| 9 | + 0 = Using Ray Tune (disable both scaling and fusion), |
| 10 | + |
| 11 | + 1 = Using Hydro (fusion only, disable scaling), |
| 12 | + |
| 13 | + Any integer > 1 (preferably powers of 2) enables both scaling and fusion. |
| 14 | + |
| 15 | + Default value is 8. |
| 16 | + |
| 17 | ++ **`fusion_limit`**: User defined maximum model fusion number. Only work when `scaling_num` > 0. |
| 18 | + |
| 19 | + 0 = Disabling fusion (scaling only). |
| 20 | + |
| 21 | + 1 = Similar with disabling fusion, but will still replace original model with Hydro modules. |
| 22 | + |
| 23 | + If set to None, Hydro will automatically profile and determine the actual fusion number according to GPU memory capacity. |
| 24 | + |
| 25 | + If set to a positive integer, Hydro will use this value as the fusion limit. |
| 26 | + |
| 27 | + If set to a dict, Hydro will use this dict as the fusion for different batch size. |
| 28 | + |
| 29 | + Default is None. |
| 30 | + |
| 31 | +!!! Danger "If you want to tune `batch_size` hyperparameter" |
| 32 | + |
| 33 | + Please do name it as `batch_size`, other names (like bs, bsz, etc.) will not be recognized by Hydro. |
| 34 | + |
| 35 | + |
| 36 | ++ **`eager_transfer`**: The ratio of maximum trials (`num_samples`) to start a target model trial. Must in (0, 1]. |
| 37 | + |
| 38 | + 1 = Disabling eager transfer. |
| 39 | + |
| 40 | + Default value is 0.5. |
| 41 | + |
| 42 | ++ **`trial_compile`**: Whether to enable torch.compile() to further accelerate model |
| 43 | + training throughput. If enabled, Hydro does not support model checkpointing |
| 44 | + and multi-fidelity tuning algorithms. Default is False. |
| 45 | + |
| 46 | +## **Source Code**: |
| 47 | +```python |
| 48 | +@dataclass |
| 49 | +class TuneConfig: |
| 50 | + """Tune specific configs. |
| 51 | +
|
| 52 | + Args: |
| 53 | + ====================================================================== |
| 54 | + Hydro args: |
| 55 | + scaling_num: The number of model width scaling. Hydro supports switch |
| 56 | + different tuning modes ``Hydro (Scaling+Fusion) | Hydro (Fusion Only) |
| 57 | + | Ray (Classic HPO)`` via setting this value. Specifically, |
| 58 | + 0 = Using Ray Tune (disable both scaling and fusion), |
| 59 | + 1 = Using Hydro (fusion only, disable scaling), |
| 60 | + Any integer > 1 (preferably powers of 2) enables both scaling and fusion |
| 61 | + Default value is 8. |
| 62 | + fusion_limit: User defined maximum model fusion number. Only work when |
| 63 | + `scaling_num` > 0. Default is None. |
| 64 | + 0 = Disabling fusion (scaling only). |
| 65 | + 1 = Similar with disabling fusion, but will still replace original model |
| 66 | + with Hydro modules. |
| 67 | + If set to None, Hydro will automatically profile and determine the actual |
| 68 | + fusion number according to GPU memory capacity. |
| 69 | + If set to a positive integer, Hydro will use this value as the fusion limit. |
| 70 | + If set to a dict, Hydro will use this dict as the fusion for different batch size. |
| 71 | + eager_transfer: The ratio of maximum trials (`num_samples`) to start a |
| 72 | + target model trial. Must in (0, 1]. |
| 73 | + 1 = Disabling eager transfer. Default value is 0.5. |
| 74 | + trial_compile: Whether to enable torch.compile() to further accelerate model |
| 75 | + training throughput. If enabled, Hydro does not support model checkpointing |
| 76 | + and multi-fidelity tuning algorithms. Default is False. |
| 77 | +
|
| 78 | +
|
| 79 | +
|
| 80 | + ====================================================================== |
| 81 | + Ray args: |
| 82 | + metric: Metric to optimize. This metric should be reported |
| 83 | + with `tune.report()`. If set, will be passed to the search |
| 84 | + algorithm and scheduler. |
| 85 | + mode: Must be one of [min, max]. Determines whether objective is |
| 86 | + minimizing or maximizing the metric attribute. If set, will be |
| 87 | + passed to the search algorithm and scheduler. |
| 88 | + search_alg: Search algorithm for optimization. Default to |
| 89 | + random search. |
| 90 | + scheduler: Scheduler for executing the experiment. |
| 91 | + Choose among FIFO (default), MedianStopping, |
| 92 | + AsyncHyperBand, HyperBand and PopulationBasedTraining. Refer to |
| 93 | + ray.tune.schedulers for more options. |
| 94 | + num_samples: Number of times to sample from the |
| 95 | + hyperparameter space. Defaults to 1. If `grid_search` is |
| 96 | + provided as an argument, the grid will be repeated |
| 97 | + `num_samples` of times. If this is -1, (virtually) infinite |
| 98 | + samples are generated until a stopping condition is met. |
| 99 | + max_concurrent_trials: Maximum number of trials to run |
| 100 | + concurrently. Must be non-negative. If None or 0, no limit will |
| 101 | + be applied. This is achieved by wrapping the ``search_alg`` in |
| 102 | + a :class:`ConcurrencyLimiter`, and thus setting this argument |
| 103 | + will raise an exception if the ``search_alg`` is already a |
| 104 | + :class:`ConcurrencyLimiter`. Defaults to None. |
| 105 | + time_budget_s: Global time budget in |
| 106 | + seconds after which all trials are stopped. Can also be a |
| 107 | + ``datetime.timedelta`` object. |
| 108 | + reuse_actors: Whether to reuse actors between different trials |
| 109 | + when possible. This can drastically speed up experiments that start |
| 110 | + and stop actors often (e.g., PBT in time-multiplexing mode). This |
| 111 | + requires trials to have the same resource requirements. |
| 112 | + Defaults to ``True`` for function trainables (including most |
| 113 | + Ray AIR trainers) and ``False`` for class and registered trainables |
| 114 | + (e.g. RLlib). |
| 115 | + trial_name_creator: Optional function that takes in a Trial and returns |
| 116 | + its name (i.e. its string representation). Be sure to include some unique |
| 117 | + identifier (such as `Trial.trial_id`) in each trial's name. |
| 118 | + NOTE: This API is in alpha and subject to change. |
| 119 | + trial_dirname_creator: Optional function that takes in a trial and |
| 120 | + generates its trial directory name as a string. Be sure to include some |
| 121 | + unique identifier (such as `Trial.trial_id`) is used in each trial's |
| 122 | + directory name. Otherwise, trials could overwrite artifacts and checkpoints |
| 123 | + of other trials. The return value cannot be a path. |
| 124 | + NOTE: This API is in alpha and subject to change. |
| 125 | + chdir_to_trial_dir: Whether to change the working directory of each worker |
| 126 | + to its corresponding trial directory. Defaults to `True` to prevent |
| 127 | + contention between workers saving trial-level outputs. |
| 128 | + If set to `False`, files are accessible with paths relative to the |
| 129 | + original working directory. However, all workers on the same node now |
| 130 | + share the same working directory, so be sure to use |
| 131 | + `session.get_trial_dir()` as the path to save any outputs. |
| 132 | + """ |
| 133 | + |
| 134 | + # Currently this is not at feature parity with `tune.run`, nor should it be. |
| 135 | + # The goal is to reach a fine balance between API flexibility and conciseness. |
| 136 | + # We should carefully introduce arguments here instead of just dumping everything. |
| 137 | + mode: Optional[str] = None |
| 138 | + metric: Optional[str] = None |
| 139 | + search_alg: Optional[Union[Searcher, SearchAlgorithm]] = None |
| 140 | + scheduler: Optional[TrialScheduler] = None |
| 141 | + num_samples: int = 1 |
| 142 | + max_concurrent_trials: Optional[int] = None |
| 143 | + time_budget_s: Optional[Union[int, float, datetime.timedelta]] = None |
| 144 | + reuse_actors: Optional[bool] = None |
| 145 | + trial_name_creator: Optional[Callable[[Trial], str]] = None |
| 146 | + trial_dirname_creator: Optional[Callable[[Trial], str]] = None |
| 147 | + chdir_to_trial_dir: bool = True |
| 148 | + scaling_num: int = 8 |
| 149 | + fusion_limit: Optional[Union[int, Dict]] = None |
| 150 | + eager_transfer: float = 0.5 |
| 151 | + trial_compile: bool = False |
| 152 | +``` |
0 commit comments