Hoagie is a plug-n-play workload and database generator to evaluate novel system architectures, design decisions, protocols, and algorithms. It uses published specifications to create a database of data items and a workload that references these data items. Hoagie's modular design enables an experimentalist to use it either offline or online. In offline mode, Hoagie outputs a trace file that can be used to issue requests to a target system. In online mode, Hoagie is plugged into an existing benchmark that invokes it to generate requests one at a time to its target system.
trace_specification.properties specifies input parameters. The program writes generated traces to /tmp/output.
java -jar hoagie-trace-client.jar trace_specification.properties /tmp/output
Parameter | Value | Description |
read | A value between 0.0 to 1.0 | The percentage of read requests in the generated trace |
replace | A value between 0.0 to 1.0 | The percentage of replace requests in the generated trace |
update | A value between 0.0 to 1.0 | The percentage of update requests in the generated trace |
hours | A positive integer | The total number of hours to generate requests |
seed | A integer | The seed used in the random generator. Hoagie generates the same sequence of requests given the same seed. |
zipf | A positive double value | The zipfian contant that controls the popularity skew. |
Hoaige generates traces based on stats published by Facebook [1]. It contains several distributions. [1] Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). ACM, New York, NY, USA, 53-64. DOI=http://dx.doi.org/10.1145/2254756.2254766
Name | Distribution | Value Range |
Key size (bytes) | Generalized Extreme Value Distribution (location=30.7984, scale=8.20449, shape=0.078688) [1] | [1, 250] |
Value size (bytes) | Generalized Pareto Distribution (location=0, scale=214.476, shape=0.348238) [1] | [1, 1,000,000] |
Inter-arrival time (microseconds) | Generalized Pareto Distribution (location=0, scale=16.0292, shape=0.154971) [1] | [1, 1000] |
The sum of probabilities of the above distributions is less than 100% since they impose a maximum limit. We increase the probability for the distribution’s mean such that the sum of probabilities becomes 100%.
Value size distribution: we use the generalized pareto distribution for the entire value range since the sum exceeds 100% if we include the provided probabilities of the first 14 bytes [1].
The trace is generated with a Zipfian distribution (alpha=100). The following figures show the generated distributions, which resemble CDF graphs of Figure 8 in [1].
The generated trace is in the following format.
Operation | Key | Key size (bytes) | Value size (bytes) | Timestamp (microseconds) |
READ,10499,58,9,12
trace_specification.properties specifies input parameters.
java -jar hoagie-lru-cache-client.jar trace_specification.properties
It outputs cache stats after processing 1-second requests.
Term | Definition |
seconds | The number of seconds of requests that have been processed |
misses | The number of misses within this second |
reads | The number of reads within this second |
num-entries-in-cache | The number of entries in the LRU cache within this second |
evictions | The number of evictions within this second |
miss-ratio | The number of misses divided by the number of reads within this second |
evict-miss-ratio | The number of evictions divided by the number of misses within this second |
seconds,misses,reads,num-entries-in-cache,evictions,miss-ratio,evict-miss-ratio
1,32328,47996,2810,28688,0.6736,0.8874