NOTE: from version 10.1.0 the CLI command is
tdbpy
instead ofterminusdb
In the last lesson we imported Employees.csv using the tdbpy importcsv
commmand. It autogenerated the schema and piped in the data from the CSV. If we check the schema.py we can see the schema that was generated from the CSV:
class EmployeesFromCSV(DocumentTemplate):
employee_id: str
manager: Optional["EmployeesFromCSV"]
name: Optional[str]
team: Optional[str]
title: Optional[str]
You may noticed that the schema is not the same as the one we have talked about in Lesson 1:
class Employee(DocumentTemplate):
"""Employee of the Company"""
address: "Address"
contact_number: str
manager: Optional["Employee"]
name: str
team: "Team"
title: str
This is because we had data in the Contact.csv. Fetching data from different CSVs and matching them to our schema requires a little more customization. This can be done by creating a Python script using the TerminusDB Python Client.
Let's start a new .py
file insert_data.py. You can copy and paste the one in this repo or build one yourself. We'll explain the example script so you understand what it does.
In the first half of the script, we have to manage and import the data form CSV. In Python there is the csv
standard library that helps reading of CSV files. Go ahead and import that:
import csv
We also need to import WOQLClient
which is the client that communitcates with the TerminusDB/ TerminusCMS and schema.py
:
from terminusdb_client import WOQLClient
from schema import *
At the top of the script, we prepare a few empty dictionaries to hold the data, we use dictionaries because the keys can be the Employees id
for easy mapping:
employees = {}
contact_numbers = {}
addresses = {}
managers = {}
The goal is to populate the employees
dictionaries with the Employee
objects. To help, we also need contact_numbers
to hold the contact numbers while reading the Contact.csv
. The rest of the information in Contact.csv
will be used to construct Address
objects and stored in addresses
. managers
is used to store the employee id in the Manager
column in Employees.csv
. We store the id at first and make the linking later because the manager of that employee may have not been "created" yet.
Then we go head and read the CSVs and do the corresponding data managing:
with open("Contact.csv") as file:
csv_file = csv.reader(file)
next(csv_file) # skiping header
for row in csv_file:
contact_numbers[row[0]] = row[1]
street = row[2].split(",")[0]
street_num = int(street.split(" ")[0])
street_name = " ".join(street.split(" ")[1:])
town = row[2].split(",")[1]
addresses[row[0]] = Address(
street_num=street_num, street=street_name, town=town, postcode=row[3]
)
with open("Employees.csv") as file:
csv_file = csv.reader(file)
next(csv_file) # skiping header
for row in csv_file:
team = eval(f"Team.{row[3].lower()}")
employees[row[0]] = Employee(
_id="Employee/" + row[0],
name=row[1],
title=row[2],
address=addresses[row[0]],
contact_number=contact_numbers[row[0]],
team=team
)
managers[row[0]] = row[4]
Finally, we have to make the manager links:
for emp_id, man_id in managers.items():
if man_id:
employees[emp_id].manager = employees[man_id]
Now, the employees
dictionary should be populated with the Employee
objects, ready to be inserted into the database.
The next step is the insert all Employees
into the database. But before that, we need to create a client with our endpoint:
client = Client("http://127.0.0.1:6363/")
Then we will connect the client to our database. If you are connecting locally and use the default setting, just provide the database you are connecting to:
client.connect(db="getting_started")
If you are using TerminusCMS, you can find the information of your endpoint, team, and API token from the TerminusCMS dashboard under profile.
Now we are all ready, the last thing to do is to insert the documents:
client.insert_document(list(employees.values()), commit_msg="Adding 4 Employees")
Go back to the terminal and run the script. Make sure you are in a Python environment that has terminusdb-client
installed.
$ python insert_data.py
To check the data has been inserted correctly, use the tdbpy alldocs
command:
$ tdbpy alldocs --type Employee
If you used TerminusCMS check it's there in the TerminusCMS dashboard
Lesson 4 - Update and import new data that links to old data