@@ -222,6 +222,101 @@ place and then rerun the dispersion report::
222
222
Sample represents 1.00% of the object partition space
223
223
224
224
225
+ --------------------------------
226
+ Cluster Telemetry and Monitoring
227
+ --------------------------------
228
+
229
+ Various metrics and telemetry can be obtained from the object servers using
230
+ the recon server middleware and the swift-recon cli. To do so update your
231
+ object-server.conf to enable the recon middleware by adding a pipeline entry
232
+ and setting its one option::
233
+
234
+ [pipeline:main]
235
+ pipeline = recon object-server
236
+
237
+ [filter:recon]
238
+ use = egg:swift#recon
239
+ recon_cache_path = /var/cache/swift
240
+
241
+ The recon_cache_path simply sets the directory where stats for a few items will
242
+ be stored. Depending on the method of deployment you may need to create this
243
+ directory manually and ensure that swift has read/write.
244
+
245
+ If you wish to enable reporting of replication times you can enable recon
246
+ support in the object-replicator section of the object-server.conf::
247
+
248
+ [object-replicator]
249
+ ...
250
+ recon_enable = yes
251
+ recon_cache_path = /var/cache/swift
252
+
253
+ Finally if you also wish to track asynchronous pending's you will need to setup
254
+ a cronjob to run the swift-recon-cron script periodically::
255
+
256
+ */5 * * * * swift /usr/bin/swift-recon-cron /etc/swift/object-server.conf
257
+
258
+ Once enabled a GET request for "/recon/<metric>" to the object server will
259
+ return a json formatted response::
260
+
261
+ fhines@ubuntu:~$ curl -i http://localhost:6030/recon/async
262
+ HTTP/1.1 200 OK
263
+ Content-Type: application/json
264
+ Content-Length: 20
265
+ Date: Tue, 18 Oct 2011 21:03:01 GMT
266
+
267
+ {"async_pending": 0}
268
+
269
+ The following metrics and telemetry are currently exposed:
270
+
271
+ ================== ====================================================
272
+ Request URI Description
273
+ ------------------ ----------------------------------------------------
274
+ /recon/load returns 1,5, and 15 minute load average
275
+ /recon/async returns count of async pending
276
+ /recon/mem returns /proc/meminfo
277
+ /recon/replication returns last logged object replication time
278
+ /recon/mounted returns *ALL * currently mounted filesystems
279
+ /recon/unmounted returns all unmounted drives if mount_check = True
280
+ /recon/diskusage returns disk utilization for storage devices
281
+ /recon/ringmd5 returns object/container/account ring md5sums
282
+ /recon/quarantined returns # of quarantined objects/accounts/containers
283
+ ================== ====================================================
284
+
285
+ This information can also be queried via the swift-recon command line utility::
286
+
287
+ fhines@ubuntu:~$ swift-recon -h
288
+ ===============================================================================
289
+ Usage:
290
+ usage: swift-recon [-v] [--suppress] [-a] [-r] [-u] [-d] [-l] [--objmd5]
291
+
292
+
293
+ Options:
294
+ -h, --help show this help message and exit
295
+ -v, --verbose Print verbose info
296
+ --suppress Suppress most connection related errors
297
+ -a, --async Get async stats
298
+ -r, --replication Get replication stats
299
+ -u, --unmounted Check cluster for unmounted devices
300
+ -d, --diskusage Get disk usage stats
301
+ -l, --loadstats Get cluster load average stats
302
+ -q, --quarantined Get cluster quarantine stats
303
+ --objmd5 Get md5sums of object.ring.gz and compare to local
304
+ copy
305
+ --all Perform all checks. Equivalent to -arudlq --objmd5
306
+ -z ZONE, --zone=ZONE Only query servers in specified zone
307
+ --swiftdir=SWIFTDIR Default = /etc/swift
308
+
309
+ For example, to obtain quarantine stats from all hosts in zone "3"::
310
+
311
+ fhines@ubuntu:~$ swift-recon -q --zone 3
312
+ ===============================================================================
313
+ [2011-10-18 19:36:00] Checking quarantine dirs on 1 hosts...
314
+ [Quarantined objects] low: 4, high: 4, avg: 4, total: 4
315
+ [Quarantined accounts] low: 0, high: 0, avg: 0, total: 0
316
+ [Quarantined containers] low: 0, high: 0, avg: 0, total: 0
317
+ ===============================================================================
318
+
319
+
225
320
------------------------
226
321
Debugging Tips and Tools
227
322
------------------------
0 commit comments