|
| 1 | +# Nscale Topology Provider |
| 2 | + |
| 3 | +The `nscale` topology provider reads topology data from the Nscale Radar API and converts it into Topograph's canonical three-tier topology graph. |
| 4 | + |
| 5 | +The provider uses two Nscale APIs: |
| 6 | + |
| 7 | +- **Radar API**: returns each instance's network path via `GET /v1/topology` |
| 8 | +- **Instance API**: returns instance metadata via `GET /v2/instances?organizationID=<org>®ionID=<region>` |
| 9 | + |
| 10 | +The Radar response supplies the provider instance ID, switch path, and optional block ID. The Instance API response maps provider instance IDs to hostnames using `metadata.id` and `metadata.name`; this is used by the Slurm engine when Topograph discovers Slurm nodes automatically. |
| 11 | + |
| 12 | +## When to Use This Provider |
| 13 | + |
| 14 | +Use this provider for Nscale environments where Radar is the topology source. It is most commonly used with the Slurm engine to generate `topology.conf` from the current Slurm node list. |
| 15 | + |
| 16 | +If the request payload supplies explicit `nodes`, Topograph uses those instance ID to node name mappings directly. If `nodes` is omitted and the Slurm engine is used, Topograph runs `scontrol show nodes -o`, asks the Nscale Instance API for the instance catalog in the configured region, and keeps entries whose `metadata.name` matches a Slurm node name. |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +- A Radar API endpoint reachable from the Topograph host |
| 21 | +- An Instance API endpoint reachable from the Topograph host |
| 22 | +- An Nscale organization ID |
| 23 | +- An API token with permission to read topology and instance metadata |
| 24 | +- The Nscale region ID for the cluster |
| 25 | +- For Slurm auto-discovery, `scontrol` must be available to the Topograph process |
| 26 | + |
| 27 | +## Credentials |
| 28 | + |
| 29 | +| Field | Required | Description | |
| 30 | +|---|---|---| |
| 31 | +| `org` | Yes | Nscale organization ID | |
| 32 | +| `token` | Yes | Bearer token used for Radar and Instance API requests | |
| 33 | +| `region` | Required for Slurm auto-discovery | Nscale region ID used for Instance API lookup and Slurm region assignment | |
| 34 | + |
| 35 | +Store credentials in a YAML file: |
| 36 | + |
| 37 | +```yaml |
| 38 | +org: <ORGANIZATION_ID> |
| 39 | +token: <API_TOKEN> |
| 40 | +region: <REGION_ID> |
| 41 | +``` |
| 42 | +
|
| 43 | +Reference that file from the Topograph config: |
| 44 | +
|
| 45 | +```yaml |
| 46 | +credentialsPath: /etc/topograph/nscale-credentials.yaml |
| 47 | +``` |
| 48 | +
|
| 49 | +Credentials can also be supplied directly in the topology request payload under `provider.creds`. |
| 50 | + |
| 51 | +## Parameters |
| 52 | + |
| 53 | +| Field | Required | Description | |
| 54 | +|---|---|---| |
| 55 | +| `radarApiUrl` | Yes | Base URL for the Radar API, for example `https://radar.example.com` | |
| 56 | +| `instanceApiUrl` | Yes | Base URL for the Instance API, for example `https://api.example.com` | |
| 57 | +| `trimTiers` | No | Number of highest topology tiers to trim from output. Defaults to `0` | |
| 58 | + |
| 59 | +The top-level Topograph `pageSize` setting controls pagination for the Radar topology request. |
| 60 | + |
| 61 | +## Configuration |
| 62 | + |
| 63 | +Example Topograph config for Slurm: |
| 64 | + |
| 65 | +```yaml |
| 66 | +http: |
| 67 | + port: 49021 |
| 68 | + ssl: false |
| 69 | +
|
| 70 | +provider: nscale |
| 71 | +engine: slurm |
| 72 | +
|
| 73 | +requestAggregationDelay: 15s |
| 74 | +credentialsPath: /etc/topograph/nscale-credentials.yaml |
| 75 | +
|
| 76 | +providerParams: |
| 77 | + radarApiUrl: https://radar.example.com |
| 78 | + instanceApiUrl: https://api.example.com |
| 79 | +
|
| 80 | +engineParams: |
| 81 | + plugin: topology/tree |
| 82 | + topologyConfigPath: /etc/slurm/topology.conf |
| 83 | +``` |
| 84 | + |
| 85 | +Example request payload: |
| 86 | + |
| 87 | +```json |
| 88 | +{ |
| 89 | + "provider": { |
| 90 | + "name": "nscale", |
| 91 | + "creds": { |
| 92 | + "org": "<ORGANIZATION_ID>", |
| 93 | + "token": "<API_TOKEN>", |
| 94 | + "region": "<REGION_ID>" |
| 95 | + }, |
| 96 | + "params": { |
| 97 | + "radarApiUrl": "https://radar.example.com", |
| 98 | + "instanceApiUrl": "https://api.example.com" |
| 99 | + } |
| 100 | + }, |
| 101 | + "engine": { |
| 102 | + "name": "slurm", |
| 103 | + "params": { |
| 104 | + "plugin": "topology/tree" |
| 105 | + } |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +If you already have the instance ID to hostname mapping, you can include it explicitly: |
| 111 | + |
| 112 | +```json |
| 113 | +{ |
| 114 | + "provider": { |
| 115 | + "name": "nscale", |
| 116 | + "creds": { |
| 117 | + "org": "<ORGANIZATION_ID>", |
| 118 | + "token": "<API_TOKEN>", |
| 119 | + "region": "<REGION_ID>" |
| 120 | + }, |
| 121 | + "params": { |
| 122 | + "radarApiUrl": "https://radar.example.com", |
| 123 | + "instanceApiUrl": "https://api.example.com" |
| 124 | + } |
| 125 | + }, |
| 126 | + "engine": { |
| 127 | + "name": "slurm" |
| 128 | + }, |
| 129 | + "nodes": [ |
| 130 | + { |
| 131 | + "region": "<REGION_ID>", |
| 132 | + "instances": { |
| 133 | + "<INSTANCE_ID_1>": "node001", |
| 134 | + "<INSTANCE_ID_2>": "node002" |
| 135 | + } |
| 136 | + } |
| 137 | + ] |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +## How It Works |
| 142 | + |
| 143 | +For each region in the compute instance list, the provider fetches topology pages from Radar: |
| 144 | + |
| 145 | +```text |
| 146 | +GET <radarApiUrl>/v1/topology?limit=<pageSize>&offset=<offset> |
| 147 | +Authorization: Bearer <token> |
| 148 | +X-Organization: <org> |
| 149 | +X-Region: <region> |
| 150 | +``` |
| 151 | + |
| 152 | +Each returned instance is translated as follows: |
| 153 | + |
| 154 | +| Radar field | Topograph field | |
| 155 | +|---|---| |
| 156 | +| `instance_id` | Instance ID | |
| 157 | +| `network_node_path[0]` | Core tier | |
| 158 | +| `network_node_path[1]` | Spine tier | |
| 159 | +| `network_node_path[2]` | Leaf tier | |
| 160 | +| `block_id` | Accelerator / NVLink domain | |
| 161 | + |
| 162 | +For Slurm auto-discovery, the provider also fetches instance metadata: |
| 163 | + |
| 164 | +```text |
| 165 | +GET <instanceApiUrl>/v2/instances?organizationID=<org>®ionID=<region> |
| 166 | +Authorization: Bearer <token> |
| 167 | +``` |
| 168 | + |
| 169 | +It builds the same map produced by: |
| 170 | + |
| 171 | +```bash |
| 172 | +curl -s -H "Authorization: Bearer $TOKEN" \ |
| 173 | + "$INSTANCE_API_URL/v2/instances?organizationID=$ORG®ionID=$REGION" \ |
| 174 | + | jq -r '.[] | "\(.metadata.id)\t\(.metadata.name)"' |
| 175 | +``` |
| 176 | + |
| 177 | +## Verifying the Output |
| 178 | + |
| 179 | +First verify that the Instance API returns the hostnames Slurm knows: |
| 180 | + |
| 181 | +```bash |
| 182 | +curl -s -H "Authorization: Bearer $TOKEN" \ |
| 183 | + "$INSTANCE_API_URL/v2/instances?organizationID=$ORG®ionID=$REGION" \ |
| 184 | + | jq -r '.[] | "\(.metadata.id)\t\(.metadata.name)"' |
| 185 | +``` |
| 186 | + |
| 187 | +Then trigger topology generation: |
| 188 | + |
| 189 | +```bash |
| 190 | +id=$(curl -s -X POST -H "Content-Type: application/json" -d @payload.json http://localhost:49021/v1/generate) |
| 191 | +curl -s "http://localhost:49021/v1/topology?uid=$id" |
| 192 | +``` |
| 193 | + |
| 194 | +For the Slurm engine, verify that the generated `topology.conf` contains the expected switch hierarchy or block topology for the Nscale instances. |
0 commit comments