You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
r.Log.Info("Found associated RayCluster for RayJob", "rayjob", rayJobInstance.Name, "raycluster", rayClusterNamespacedName)
521
-
522
-
// Case1: The job is submitted to an existing ray cluster, simply return the rayClusterInstance.
523
-
// We do not use rayJobInstance.Spec.RayClusterSpec == nil to check if the cluster selector mode is activated.
524
-
// This is because a user might set both RayClusterSpec and ClusterSelector. with rayJobInstance.Spec.RayClusterSpec == nil,
525
-
// though the RayJob controller will still use ClusterSelector, but it's now able to update the replica.
526
-
// this could result in a conflict as both the RayJob controller and the autoscaler in the existing RayCluster might try to update replicas simultaneously.
527
-
iflen(rayJobInstance.Spec.ClusterSelector) !=0 {
528
-
r.Log.Info("ClusterSelector is being used to select an existing RayCluster. RayClusterSpec will be disregarded", "raycluster", rayClusterNamespacedName)
529
-
returnrayClusterInstance, nil
530
-
}
531
-
532
-
// Note, unlike the RayService, which creates new Ray clusters if any spec is changed,
533
-
// RayJob only supports changing the replicas. Changes to other specs may lead to
534
-
// unexpected behavior. Therefore, the following code focuses solely on updating replicas.
535
-
536
-
// Case2: In-tree autoscaling is enabled, only the autoscaler should update replicas to prevent race conditions
537
-
// between user updates and autoscaler decisions. RayJob controller should not modify the replica. Consider this scenario:
538
-
// 1. The autoscaler updates replicas to 10 based on the current workload.
539
-
// 2. The user updates replicas to 15 in the RayJob YAML file.
540
-
// 3. Both RayJob controller and the autoscaler attempt to update replicas, causing worker pods to be repeatedly created and terminated.
// Note, currently, there is no method to verify if the user has updated the RayJob since the last reconcile.
543
-
// In future, we could utilize annotation that stores the hash of the RayJob since last reconcile to compare.
544
-
// For now, we just log a warning message to remind the user regadless whether user has updated RayJob.
545
-
r.Log.Info("Since in-tree autoscaling is enabled, any adjustments made to the RayJob will be disregarded and will not be propagated to the RayCluster.")
546
-
returnrayClusterInstance, nil
547
-
}
548
-
549
-
// Case3: In-tree autoscaling is disabled, respect the user's replicas setting.
550
-
// Loop over all worker groups and update replicas.
time.Second*3, time.Millisecond*500).Should(BeNil(), "My myRayJob = %v", myRayJob.Name)
331
-
})
332
-
333
-
// if In-tree autoscaling is enabled, the autoscaler should adjust the number of replicas based on the workload.
334
-
// This test emulates the behavior of the autoscaler by directly updating the RayCluster and verifying if the number of worker pods increases accordingly.
335
-
It("should create new worker since autoscaler increases the replica", func() {
336
-
Eventually(
337
-
getRayClusterNameForRayJob(ctx, myRayJob),
338
-
time.Second*15, time.Millisecond*500).Should(Not(BeEmpty()), "My RayCluster name = %v", myRayJob.Status.RayClusterName)
0 commit comments