Qlever Integration in the RDF Processing Toolkit #1818
Replies: 3 comments
-
@Aklakan Thanks, Claus, that looks very useful and we will look into it, especially since we have been developing a similar tool, see https://github.com/ad-freiburg/qlever-control/pulls. Maybe there are some synergy effects. One question: Docker can have a significant performance overhead, especially when high IO and multi-threading is involved. In particular, that is the case for the index building (data loading). Have you thought about this? |
Beta Was this translation helpful? Give feedback.
-
I used the python qlever tool to load wikidata truthy with ~8B triples in 4-5 hours on my notebook from 2022 - which is pretty impressive - and if I did not overlook anything, then this tool is also just a wrapper for docker (I extracted the In short: it's good enough for me :)
If something emerges, I am glad to contribute. |
Beta Was this translation helpful? Give feedback.
-
@Aklakan Thanks for the reply + some clarifications and questions:
|
Beta Was this translation helpful? Give feedback.
-
I am the developer of the RDF Processing Toolkit (RPT), which in a nutshell a Java-based CLI wrapper for ad-hoc loading of RDF data and running SPARQL queries against it. It is useful both for scripting as well as rapid prototyping of data integration tasks.
With basic usage you can just provide data, update and query statements as arguments and they will be run in the given order, and the results will be printed to the console (i.e. multiple construct queries can be supplied to produce an RDF document).
By default, RPT uses the engine based on Apache Jena as because we wrote many SPARQL extension functions for it.
RPT supports different engines via the
-e
argument, such astdb2
and now also 🎉qlever
🎉.This will start a qlever docker container and run the data loading and querying against it.
Data loading is optimized and will use qlever's index builder. Also, compressed data such as
bzip2
is automatically decompressed on the host (if lbzip2 or bzip2 is available) and supplied to the container via named pipes - so JVM overhead is avoided.Use
--loc
to specify a folder from where to load/store the database and--db-keep
to retain a created database. Without these options, the data will be stored in a temporary directory and deleted whenrpt
exits - recall, the main use case is scripting and rapid prototyping - for production you'd rather write e.g. a docker compose setup.There are probably many things that could be further improved, but:
If you are looking for quick way to try out qlever and/or compare/mix it with other engines, you may want to give RPT a try 😃
Cheers,
Claus
Beta Was this translation helpful? Give feedback.
All reactions