You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24-7
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,20 @@ The Spatial Bloom Filters (SBF) are a compact, set-based data structure that ext
7
7
## Usage ##
8
8
Spatial Bloom Filters have been first proposed for use in location-privacy application, but have found application in a number of domains, including network security and the Internet of Things.
9
9
10
-
The libSBF-testdatasets repository contains a set of sample datasets useful in testing the [C++](https://github.com/spatialbloomfilter/libSBF-cpp"libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python"libSBF-python") implementations of the SBF data structure.
10
+
The libSBF-testdatasets repository contains a set of sample datasets useful in testing the [C++](https://github.com/spatialbloomfilter/libSBF-cpp"libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python"libSBF-python") implementations of the SBF data structure. For more details on the implementation, and how to use the library please refer to the [homepage](http://sbf.csr.unibo.it/"SBF project homepage") of the project.
11
11
12
-
The datasets are provided in 2 dimensions:
13
-
-[8-bit](8bit), with a maximum number of areas of 255 (where the filter is an array of 1-byte values);
14
-
-[16-bit](16bit), with a maximum number of areas of 65,535 (where the filter is an array of 2-byte values).
15
-
This reflects the memory management of the implementation of the SBF libraries. The 8-bit datasets are also more compact (around 500K each), allowing for easier and quicker testing, while the 16-bit datasets are useful for larger and more comprehensive case studies (each dataset is around 200M). Datasets are provided in CSV format.
12
+
There are two different kinds of set provided:
13
+
- large, *randomly generated datasets* (useful in testing the implamentations and for benchmarking);
14
+
-*geospatial datasets*, based on real-world areas and coordinates (that can be used to test the SBF in different application domains).
15
+
16
+
**Randomly generated datasets**
17
+
18
+
The datasets are provided in 3 different sizes:
19
+
-[8-bit](8bit), with a maximum number of areas of 255 (where the filter is an array of 1-byte values) and more than 65,000 elements in total;
20
+
-[8-bit-large](8-bit-large), with the same number of areas as the 8-bit, but more than 16 million elements;
21
+
-[16-bit](16bit), with a maximum number of areas of 65,535 (where the filter is an array of 2-byte values) and more than 16 million elements.
22
+
23
+
The 8/16-bit differentiation reflects the memory management of the implementation of the SBF libraries (1/2 bytes). The 8-bit datasets are also more compact (around 500K each), allowing for easier and quicker testing, while the 16-bit datasets are useful for larger and more comprehensive case studies (each dataset is around 200M). Datasets are provided in CSV format.
16
24
17
25
For each dimension, a number of datasets (CSV files) are provided:
18
26
-`elements.csv`: a list of elements, one per line, where elements are alphanumeric (0-9,a-z,A-Z) strings;
@@ -25,17 +33,26 @@ Datasets with elements only have one element per line; the datasets with both ar
25
33
26
34
The datasets can be used to test any code that uses the libSBF libraries. Both the [C++](https://github.com/spatialbloomfilter/libSBF-cpp"libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python"libSBF-python") libraries also provide a sample application that can be used with the test datasets. The sample applications create an SBF and insert elements from the selected `area-element-X.csv`, and test membership using the `non-elements.csv` dataset.
27
35
28
-
For more details on the implementation, and how to use the library please refer to the [homepage](http://sbf.csr.unibo.it/"SBF project homepage") of the project.
36
+
**Geospatial datasets**
37
+
38
+
These datasets contain elements extracted from a conventional geospatial grid on a specified geographical area. Each element represents a single cell of the grid. The grid is derived from the World Geodetic System (WGS84), with an arbitrary precision specific to each dataset.
39
+
40
+
The following datasets are available in the [geo](geo) folder:
41
+
-`vatican-city-0001.csv`: an area covering the Vatican City with 4 areas of interest.
42
+
43
+
For each of the files above, two graphical PDF files representing the mapped geographical area are also included. The first file shows the map with an overlay consisting of the World Geodetic System projections (WGS84). The second file also shows the Areas of Interest (AoI) mapped into the dataset.
29
44
30
45
## Bibliography ##
31
46
The SBFs have been proposed in the following research papers:
32
47
- Luca Calderoni, Paolo Palmieri, Dario Maio: *Location privacy without mutual trust: The spatial Bloom filter.* Computer Communications, vol. 68, pp. 4-16, September 2015. ISSN 0140-3664
33
48
- Paolo Palmieri, Luca Calderoni, Dario Maio: *Spatial Bloom Filters: Enabling Privacy in Location-aware Applications*. In: Inscrypt 2014. Lecture Notes in Computer Science, vol. 8957, pp. 16–36, Springer, 2015.
34
49
50
+
A comprehensive list of papers that use the SBF data structure can be found [here](http://sbf.csr.unibo.it/publications.html"SBF Publications").
51
+
35
52
## Authors ##
36
53
Luca Calderoni, Dario Maio - University of Bologna (Italy)
37
54
38
-
Paolo Palmieri - Cranfield University (UK)
55
+
Paolo Palmieri - University College Cork (Ireland)
39
56
40
57
## License ##
41
58
The datasets are released under the [MIT License](LICENSE).
0 commit comments