Clarification on deployment configuration

It would be helpful to have further documentation on deployment recommendations in a production setting.

For example:

- Should the parameter server / lighthouse server be colocated for performance? Is it necessary to have high speed interconnect between the lighthouse server and the worker nodes?
- Can a single lighthouse server be shared amongst multiple training jobs? If so, how are the instances/jobs distinguished from each other?
- What kind of minimum specs are recommended for the lighthouse / parameter servers? How does this relate to model size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on deployment configuration #235

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on deployment configuration #235

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions