Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Features] Integrating ART #29

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/cmake-single-platform.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This starter workflow is for a CMake project running on a single platform. There is a different starter workflow if you need cross-platform coverage.
# See: https://github.com/actions/starter-workflows/blob/main/ci/cmake-multi-platform.yml
name: Basic Tests for BLISS Benchmark

on:
push:
branches:
- "main"
pull_request:
branches: [ "main" ]

env:
# Customize the CMake build type here (Release, Debug, RelWithDebInfo, etc.)
BUILD_TYPE: Release

jobs:
build:
# The CMake configure and build commands are platform agnostic and should work equally well on Windows or Mac.
# You can convert this to a matrix build if you need cross-platform coverage.
# See: https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/managing-complex-workflows#using-a-build-matrix
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
submodules: 'true'
token: ${{ secrets.ACTIONS_ACCESS_TOKEN }}

- name: Configure CMake
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}}

- name: Build
# Build your program with the given configuration
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}}

- name: Verify Tests Exist

working-directory: ${{github.workspace}}/tests
run: sh unit_test_exists.sh

- name: Test
working-directory: ${{github.workspace}}/build
# Execute tests defined by the CMake configuration.
# See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
run: ctest -C ${{env.BUILD_TYPE}}

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -201,3 +201,5 @@ cmake-build-debug/
src/bliss/.idea/

db_working_home

.DS_Store
8 changes: 8 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ set(CMAKE_CXX_STANDARD 17)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH})

enable_testing()

if(CMAKE_BUILD_TYPE STREQUAL Debug)
ADD_DEFINITIONS(-DDEBUG)
endif()
Expand All @@ -34,17 +36,22 @@ endif()
# =============================================================================
add_subdirectory(external)

add_subdirectory(tests)

# =============================================================================
# HEADER bliss
# Bliss lib files
# =============================================================================
add_library(bliss OBJECT
${CMAKE_SOURCE_DIR}/src/bliss/util/timer.h
${CMAKE_SOURCE_DIR}/src/bliss/util/reader.h
${CMAKE_SOURCE_DIR}/src/bliss/util/args.h
${CMAKE_SOURCE_DIR}/src/bliss/util/config.h
${CMAKE_SOURCE_DIR}/src/bliss/bliss_index.h
${CMAKE_SOURCE_DIR}/src/bliss/bench_lipp.h
${CMAKE_SOURCE_DIR}/src/bliss/bench_alex.h
${CMAKE_SOURCE_DIR}/src/bliss/bench_btree.h
${CMAKE_SOURCE_DIR}/src/bliss/bench_art.h
)

target_compile_features(bliss PUBLIC
Expand All @@ -57,6 +64,7 @@ target_link_libraries(bliss PUBLIC
alex
lipp
tlx
art
)

target_include_directories(bliss PUBLIC
Expand Down
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# bliss_benchmark
# BLISS Benchmark
The purpose of this program is to benchmark the sortedness performance on various indexes.\
This research project is part of the [Data-intensive Systems and Computing (DiSC) lab](https://disc.bu.edu/) at Boston University.

Expand Down Expand Up @@ -36,3 +36,49 @@ The program currently accepts the following parameters:
-v, --verbosity [=arg(=1)] Verbosity [0: Info| 1: Debug | 2: Trace] (default: 0)
-i, --index arg Index type (alex|lipp) (default: btree)
```

## Contributing to this Project
If you are interested in contributing to this benchmarking effort,
please reach out to [Aneesh Raman]([email protected]) / [Andy Huynh]([email protected]) / [Manos Athanassoulis]([email protected]).

### Integrating a New Index
We primarily import indexes as CMake libraries, before integrating them into the benchmarking framework.

#### Importing using CMake
Import the library in `external/CMakeLists.txt`. Then, link the library to the `bliss` executable in the `CMakeLists.txt` file in the root project directory.

#### Building the Adapter
Every index in the framework uses an adapter to interact with the benchmark. These adapters are found under `src/bliss`.

- The abstract class for the adapter is found at `src/bliss/bliss_index.h`.
- Add the adapter code for the new index `<abc>` in its own file called `bench_abc.h` under `src/bliss`.

#### Adding to the Benchmark
The benchmark code is found at `bliss_bench.cpp`. To add the index to the benchmark:

- Include the relevant header file, e.g., `#include bliss/bench_abc.h`.
- In the `main()` function, add the additional condition when checking `config.index` for parsing the new index.

#### Adding Unit Tests
Currently, we support basic unit tests with `put()` and `get()` operations in the benchmark.

For the newly integrated index (e.g., `abc`), add relevant unit tests under the `tests/` folder.
- Create a new directory under `tests/` for the index `abc` by prefixing the folder with `test_*` (i.e., `mkdir tests/test_abc`).
- Each index folder gets its own `CMakeLists.txt` file that will link with the outer `tests/CMakeLists.txt` file.
- Copy the `CMakeLists.txt` file from one of the existing indexes into `tests/abc` (i.e., `cp tests/test_btree/CMakeLists.txt tests/test_abc/`).
- Modify `tests/CMakeLists.txt` to include the new subdirectory (i.e., add a new line with `add subdirectory(test_abc)`).

You can create one or multiple cpp files under `tests/test_abc/` for your unit tests.

- Name every unit test file prefixed with the index name (e.g., `abc_tests.cpp`).
- Include the header file `bliss_index_tests.h` in your test file to import common util code.

**You may refer to `tests/test_btree/btree_tests.cpp` for samples.**

## Issues & Additional Information
You may report bugs/issues directly on Github [here](https://github.com/BU-DiSC/bliss_benchmark/issues).

For additional information, contact:
- [Aneesh Raman]([email protected])
- [Andy Huynh]([email protected])
- [Manos Athanassoulis]([email protected])
2 changes: 1 addition & 1 deletion data/example.data
Original file line number Diff line number Diff line change
Expand Up @@ -997,4 +997,4 @@
2991
2994
2997
3000
3000
23 changes: 20 additions & 3 deletions external/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,21 @@ FetchContent_MakeAvailable(cxxopts)



FetchContent_Declare(
art
GIT_REPOSITORY https://github.com/BU-DiSC/ART
GIT_TAG main
)
FetchContent_GetProperties(art)
if (NOT art_POPULATED)
FetchContent_Populate(art)
endif()

add_library(art INTERFACE)
target_include_directories(art INTERFACE ${art_SOURCE_DIR})



FetchContent_Declare(
alex
GIT_REPOSITORY https://github.com/microsoft/ALEX.git
Expand Down Expand Up @@ -59,10 +74,12 @@ endif()
add_library(lipp INTERFACE)
target_include_directories(lipp INTERFACE ${lipp_SOURCE_DIR}/src/core)



FetchContent_Declare(
tlx
GIT_REPOSITORY https://github.com/tlx/tlx.git
GIT_TAG master
tlx
GIT_REPOSITORY https://github.com/tlx/tlx.git
GIT_TAG master
)
FetchContent_GetProperties(tlx)
if (NOT tlx_POPULATED)
Expand Down
3 changes: 1 addition & 2 deletions script/bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from infra.pybliss import BlissArgs, PyBliss
from infra.util import get_file_params

INDEXES = ["btree"]
INDEXES = ["btree", "art"]
PRELOAD_FACTOR = 0.4
WRITE_FACTOR = 0.4
READ_FACTOR = 0.2
Expand Down Expand Up @@ -77,7 +77,6 @@ def main(args):
)

args = parser.parse_args()

log_level = logging.WARNING
if args.verbose == 1:
log_level = logging.INFO
Expand Down
51 changes: 51 additions & 0 deletions src/bliss/bench_ART.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#ifndef BLISS_BENCH_ART
#define BLISS_BENCH_ART

#include "ART.h"

#include <vector>

#include "bliss/bliss_index.h"
#include "spdlog/spdlog.h"


namespace bliss {

template <typename KEY_TYPE, typename VALUE_TYPE>
class BlissARTIndex : public BlissIndex<KEY_TYPE, VALUE_TYPE> {
public:
static constexpr size_t KEY_SIZE = sizeof(KEY_TYPE);
static constexpr size_t VALUE_SIZE = sizeof(KEY_TYPE);
ART::Node* _index;
BlissARTIndex() {
_index = nullptr;
};

void bulkload(
std::vector<std::pair<KEY_TYPE, VALUE_TYPE>> values) override {
// expects the pairs to be pre-sorted before performing bulk load
for (const auto& pair : values) {
put(pair.first, pair.second);
}
}

bool get(KEY_TYPE key) override {
uint8_t ARTkey[KEY_SIZE];
ART::loadKey(key, ARTkey);
uint8_t depth = 0;
ART::Node* leaf = ART::lookup(_index, ARTkey, KEY_SIZE, depth, KEY_SIZE);
return ART::isLeaf(leaf) && ART::getLeafValue(leaf) == key;
}

void put(KEY_TYPE key, VALUE_TYPE value) override {
uint8_t ARTkey[KEY_SIZE];
ART::loadKey(key, ARTkey);
ART::insert(_index, &_index, ARTkey, 0, key, 8);
}

void end_routine() override {}
};

} // namespace bliss

#endif // !BLISS_BENCH_ART
65 changes: 65 additions & 0 deletions src/bliss/util/args.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#ifndef BLISS_ARGS_H
#define BLISS_ARGS_H
#include <cxxopts.hpp>
#include <iostream>
#include <string>

#include "bliss/util/config.h"

using namespace bliss::utils::config;

namespace bliss {
namespace utils {
namespace args {
BlissConfig parse_args(int argc, char *argv[]) {
BlissConfig config;
cxxopts::Options options(
"bliss", "BLISS: Benchmarking Learned Index Structures for Sortedness");

try {
options.add_options()("d,data_file", "Path to the data file",
cxxopts::value<std::string>())(
"p,preload_factor", "Preload factor",
cxxopts::value<double>()->default_value("0.5"))(
"w,write_factor", "Write factor",
cxxopts::value<double>()->default_value("0.25"))(
"r,read_factor", "Read factor",
cxxopts::value<double>()->default_value("0.1"))(
"m,mixed_read_write_ratio", "Read write ratio",
cxxopts::value<double>()->default_value("0.5"))(
"s,seed", "Random Seed value",
cxxopts::value<int>()->default_value("0"))(
"v,verbosity", "Verbosity [0: Info| 1: Debug | 2: Trace]",
cxxopts::value<int>()->default_value("0")->implicit_value("1"))(
"i,index", "Index type [alex | lipp | btree | bepstree | lsm]",
cxxopts::value<std::string>()->default_value("btree"))(
"file_type", "Input file type [binary | txt]",
cxxopts::value<std::string>()->default_value("txt"))(
"use_preload", "Use index defined preload",
cxxopts::value<bool>()->default_value("false"));

auto result = options.parse(argc, argv);
config = {
.data_file = result["data_file"].as<std::string>(),
.preload_factor = result["preload_factor"].as<double>(),
.write_factor = result["write_factor"].as<double>(),
.read_factor = result["read_factor"].as<double>(),
.mixed_read_write_ratio =
result["mixed_read_write_ratio"].as<double>(),
.seed = result["seed"].as<int>(),
.verbosity = result["verbosity"].as<int>(),
.index = result["index"].as<std::string>(),
.file_type = result["file_type"].as<std::string>(),
.use_preload = result["use_preload"].as<bool>(),
};
} catch (const std::exception &e) {
std::cerr << "Error: " << e.what() << std::endl;
std::cerr << options.help() << std::endl;
exit(1);
}
return config;
}
} // namespace args
} // namespace utils
} // namespace bliss
#endif
38 changes: 38 additions & 0 deletions src/bliss/util/config.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#ifndef BLISS_CONFIG_H
#define BLISS_CONFIG_H

#include <spdlog/common.h>

#include <string>

namespace bliss {
namespace utils {
namespace config {
struct BlissConfig {
std::string data_file;
double preload_factor;
double write_factor;
double read_factor;
double mixed_read_write_ratio;
int seed;
int verbosity;
std::string index;
std::string file_type;
bool use_preload;
};

void display_config(BlissConfig config) {
spdlog::trace("Data File: {}", config.data_file);
spdlog::trace("Preload Factor: {}", config.preload_factor);
spdlog::trace("Write Factor: {}", config.write_factor);
spdlog::trace("Read Factor: {}", config.read_factor);
spdlog::trace("Read Write Ratio: {}", config.mixed_read_write_ratio);
spdlog::trace("Verbosity {}", config.verbosity);
spdlog::trace("Index: {}", config.index);
spdlog::trace("File type: {}", config.file_type);
}
} // namespace config
} // namespace utils
} // namespace bliss

#endif
Loading
Loading