Skip to content

Commit

Permalink
Update 2023-05-03-cais-cluster-documentation.md
Browse files Browse the repository at this point in the history
  • Loading branch information
WilliamHodgkins authored Aug 20, 2024
1 parent a59e8db commit 9019ae6
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _posts/2023-05-03-cais-cluster-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ title: Welcome to the Center for AI Safety Cluster

# Cluster Overview

The cluster is hosted on OCI and is based on 32 bare metal BM.GPU.A100-v2.8 nodes and a number of service nodes. Each GPU node is configured with 8 NVIDIA A100 GPU cards with 8X80 GB memory, 27.2 TB local NVMe SSD Storage and Two 64 core AMD EPYC Milan, for a total of 256 GPUs, 4,096 CPU cores and 870 TB of file system storage.
The cluster is hosted on OCI and is based on 32 bare metal BM.GPU.A100-v2.8 nodes and a number of service nodes. Each GPU node is configured with 8 NVIDIA A100 GPU cards with 8X80 GB memory, 27.2 TB local NVMe SSD Storage and two 64 core AMD EPYC Milan, for a total of 256 GPUs, 4,096 CPU cores and 870 TB of file system storage.

The nodes are connected by a remote direct memory access (RDMA) network for data communication. Each node has eight 2 x 100 Gbps network interface cards (NICs), providing a total of 1,600 Gbit/sec inter-node network bandwidth with latency as low as single-digit microseconds.

Expand Down

0 comments on commit 9019ae6

Please sign in to comment.