Index CSV data into Vespa using Logstash. You only need Docker/Podman/...
Here's how:
# clone the repo
git clone https://github.com/radu-gheorghe/logstash-vespa-output-demo.git
cd logstash-vespa-output-demo
# run the docker compose file with Vespa and Logstash
docker compose up # or podman compose up
# profit
curl -XPOST -H "Content-Type: application/json" -d\
'{ "yql": "select * from sources * where true"}'\
'http://localhost:8080/search/' | jq .
If you want to profit more, check out the Vespa Query Language docs and its sister pages.
An awesome search engine. More details here. The first container in the compose file is a Vespa container.
A very flexible ETL tool. More details here. The second container in the compose file is a Logstash container, which is configured to index the CSV data into Vespa. Here's how:
- First, we need a Vespa application package. Here, blog_posts_app contains a simple application package, configuring everything from number of nodes to schema.
- During Logstash startup, we deploy the application package to Vespa, to make sure that e.g. the schema is there.
- Also during Logstash startup, we install the Vespa output plugin for Logstash.
- Finally, we run Logstash with logstash.conf, which reads the CSV file, parses it, and writes the documents to Vespa.
A simple CSV file with blog posts. You can replace it with your own data, just make sure to update:
- logstash.conf to parse the right fields.
- The Vespa schema to match those fields. We have IDE support to help with that.
- If you change the document type from
postto something else, make sure to update:- the schema file name: it needs to match the document type name within it
document_typein logstash.conf- the document type in services.xml
- If you change the CSV file name, make sure to update:
- logstash.conf to point to the right filename
- the volume path in docker-compose.yml.