Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a worker group to an additional network #813

Open
kumar-aamit opened this issue Mar 6, 2025 · 0 comments
Open

Adding a worker group to an additional network #813

kumar-aamit opened this issue Mar 6, 2025 · 0 comments

Comments

@kumar-aamit
Copy link

Name of Feature or Improvement

RDMA Networks

Description of Problem the Feature Should Solve

RDMA Networks for Efficient LLM Training

Describe the Solution You Would Like to See

Description of the proposed solution.
"workerGroupSpecs": [
{
"replicas": cluster.config.num_workers,
"minReplicas": cluster.config.num_workers,
"maxReplicas": cluster.config.num_workers,
"groupName": f"small-group-{cluster.config.name}",
"rayStartParams": {
"block": "true",
"num-gpus": str(worker_gpu_count),
"resources": worker_resources,
},
"template": V1PodTemplateSpec(
metadata=V1ObjectMeta(
annotations={
"k8s.v1.cni.cncf.io/networks": [,,...]
}
),
spec=get_pod_spec(
cluster,
[get_worker_container_spec(cluster)],
cluster.config.worker_tolerations,
)
),
}
],

Describe Alternatives You Have Considered

Description of any alternative solutions or features you have considered.

Additional Context

Add any other context, screenshots, console logs, etc. about the request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant