Python client for MogileFS.
This implementation has some unique production-ready features, and have been used with large scale data-intensive applications (hundreds of nodes, several PBs, billions of files.)
These includes:
- tracker load balancing
- test on borrow
- fault tolerance
- connection keep alive
- the zone option a.k.a. alternative IP
To install pymogilefs, simply:
$ git clone [email protected]:hrchu/pymogilefs.git
$ cd pymogilefs
$ pip install .
Client usage:
>>> from pymogilefs.client import Client
>>> client = Client(trackers=[''], domain='testdomain')
>>> response = client.list_keys(prefix='test', limit=5)
>>> print(
{'key_count': 5,
'keys': {1: 'testkey',
2: 'test_file2_0.115351657953_1480606271.65',
3: 'test_file2_0.380149553659_1480606080.71',
4: 'test_file_0.0129341319339_1480606080.74',
5: 'test_file_0.0397767495074_1480606080.8'},
'next_after': 'testkey'}
>>> buf = client.get_file('testkey')
>>> len(
Admin usage:
>>> from pymogilefs.backend import Backend
>>> backend = Backend(trackers=[''])
>>> devices = backend.get_devices()
>>> print(['devices']['9'])
{'devid': '16',
'hostid': '5',
'mb_asof': '',
'mb_free': '45181',
'mb_total': '59640',
'mb_used': '14459',
'observed_state': '',
'reject_bad_md5': '',
'status': 'dead',
'utilization': '',
'weight': '100'}
Ref more examples in example/
Note that it is recommended to create a resource instance for each thread / process in a multithreaded or multiprocess application rather than sharing a single instance among the threads / processes.
- The timeout option only effect store node connections. Tracker timeout is hard coded.
There are a few Python client projects for MogileFS around, however, these projects seem to be outdated and abandoned. This work is based on one of them. Many thanks!
Forks and pull requests are highly appreciated.