Skip to content

Commit 77c70c5

Browse files
authored
Update README.md with the geo dataset
1 parent 83b7c60 commit 77c70c5

File tree

1 file changed

+24
-7
lines changed

1 file changed

+24
-7
lines changed

README.md

+24-7
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,20 @@ The Spatial Bloom Filters (SBF) are a compact, set-based data structure that ext
77
## Usage ##
88
Spatial Bloom Filters have been first proposed for use in location-privacy application, but have found application in a number of domains, including network security and the Internet of Things.
99

10-
The libSBF-testdatasets repository contains a set of sample datasets useful in testing the [C++](https://github.com/spatialbloomfilter/libSBF-cpp "libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python "libSBF-python") implementations of the SBF data structure.
10+
The libSBF-testdatasets repository contains a set of sample datasets useful in testing the [C++](https://github.com/spatialbloomfilter/libSBF-cpp "libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python "libSBF-python") implementations of the SBF data structure. For more details on the implementation, and how to use the library please refer to the [homepage](http://sbf.csr.unibo.it/ "SBF project homepage") of the project.
1111

12-
The datasets are provided in 2 dimensions:
13-
- [8-bit](8bit), with a maximum number of areas of 255 (where the filter is an array of 1-byte values);
14-
- [16-bit](16bit), with a maximum number of areas of 65,535 (where the filter is an array of 2-byte values).
15-
This reflects the memory management of the implementation of the SBF libraries. The 8-bit datasets are also more compact (around 500K each), allowing for easier and quicker testing, while the 16-bit datasets are useful for larger and more comprehensive case studies (each dataset is around 200M). Datasets are provided in CSV format.
12+
There are two different kinds of set provided:
13+
- large, *randomly generated datasets* (useful in testing the implamentations and for benchmarking);
14+
- *geospatial datasets*, based on real-world areas and coordinates (that can be used to test the SBF in different application domains).
15+
16+
**Randomly generated datasets**
17+
18+
The datasets are provided in 3 different sizes:
19+
- [8-bit](8bit), with a maximum number of areas of 255 (where the filter is an array of 1-byte values) and more than 65,000 elements in total;
20+
- [8-bit-large](8-bit-large), with the same number of areas as the 8-bit, but more than 16 million elements;
21+
- [16-bit](16bit), with a maximum number of areas of 65,535 (where the filter is an array of 2-byte values) and more than 16 million elements.
22+
23+
The 8/16-bit differentiation reflects the memory management of the implementation of the SBF libraries (1/2 bytes). The 8-bit datasets are also more compact (around 500K each), allowing for easier and quicker testing, while the 16-bit datasets are useful for larger and more comprehensive case studies (each dataset is around 200M). Datasets are provided in CSV format.
1624

1725
For each dimension, a number of datasets (CSV files) are provided:
1826
- `elements.csv`: a list of elements, one per line, where elements are alphanumeric (0-9,a-z,A-Z) strings;
@@ -25,17 +33,26 @@ Datasets with elements only have one element per line; the datasets with both ar
2533

2634
The datasets can be used to test any code that uses the libSBF libraries. Both the [C++](https://github.com/spatialbloomfilter/libSBF-cpp "libSBF-cpp") and [Python](https://github.com/spatialbloomfilter/libSBF-python "libSBF-python") libraries also provide a sample application that can be used with the test datasets. The sample applications create an SBF and insert elements from the selected `area-element-X.csv`, and test membership using the `non-elements.csv` dataset.
2735

28-
For more details on the implementation, and how to use the library please refer to the [homepage](http://sbf.csr.unibo.it/ "SBF project homepage") of the project.
36+
**Geospatial datasets**
37+
38+
These datasets contain elements extracted from a conventional geospatial grid on a specified geographical area. Each element represents a single cell of the grid. The grid is derived from the World Geodetic System (WGS84), with an arbitrary precision specific to each dataset.
39+
40+
The following datasets are available in the [geo](geo) folder:
41+
- `vatican-city-0001.csv`: an area covering the Vatican City with 4 areas of interest.
42+
43+
For each of the files above, two graphical PDF files representing the mapped geographical area are also included. The first file shows the map with an overlay consisting of the World Geodetic System projections (WGS84). The second file also shows the Areas of Interest (AoI) mapped into the dataset.
2944

3045
## Bibliography ##
3146
The SBFs have been proposed in the following research papers:
3247
- Luca Calderoni, Paolo Palmieri, Dario Maio: *Location privacy without mutual trust: The spatial Bloom filter.* Computer Communications, vol. 68, pp. 4-16, September 2015. ISSN 0140-3664
3348
- Paolo Palmieri, Luca Calderoni, Dario Maio: *Spatial Bloom Filters: Enabling Privacy in Location-aware Applications*. In: Inscrypt 2014. Lecture Notes in Computer Science, vol. 8957, pp. 16–36, Springer, 2015.
3449

50+
A comprehensive list of papers that use the SBF data structure can be found [here](http://sbf.csr.unibo.it/publications.html "SBF Publications").
51+
3552
## Authors ##
3653
Luca Calderoni, Dario Maio - University of Bologna (Italy)
3754

38-
Paolo Palmieri - Cranfield University (UK)
55+
Paolo Palmieri - University College Cork (Ireland)
3956

4057
## License ##
4158
The datasets are released under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)