Skip to content

Commit 0fbb419

Browse files
authored
Merge pull request #92 from LuCEresearchlab/minor-patch
Added safety for dataset uploading
2 parents 5587663 + 36e6da8 commit 0fbb419

File tree

3 files changed

+14
-3
lines changed

3 files changed

+14
-3
lines changed

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,15 @@ In case there are issues due to dependencies try to rebuild the containers (will
3535
docker-compose up --build # rebuild
3636
```
3737

38-
Note: The clustering might take a while, the clusters tend to finish close to each other don't panic if you see no
39-
progress for several minutes.
38+
Notes:
39+
40+
- The clustering might take a while, the clusters tend to finish close to each other don't panic if you see no progress
41+
for several minutes.
42+
- If the tagging-service is interrupted before completing the clustering it'll be necessary to manually log into the
43+
database and delete the object under `dataset_db/dataset` with `dataset_id` equal to the one that was interrupted.
44+
45+
If the entry is not deleted it'll be impossible to complete the clustering for it and the dataset will always result
46+
as loading.
4047

4148
Optimizations: It's possible to change the resources allocated to the python-service from `.env `, this will impact
4249
clustering performance, to change the resources modify the environment variables:

frontend/src/pages/tagging/DatasetSelection.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ function DatasetSelection() {
5959
<TableBody>
6060
{datasets.map((dataset: DatasetDesc) => {
6161
const loading_cluster = dataset.clusters_computed != dataset.nr_questions
62-
const needed_time_s = 1000 * 60 * 2 * dataset.nr_questions
62+
const needed_time_s = 1000 * 90 * dataset.nr_questions
6363
const started = new Date(dataset.creation_data)
6464
const now = new Date()
6565

tagging-service/flaskr/endpoints/upload_api.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@ def thread_function(dataset):
6666

6767
json_dataset = json.loads(uploaded_file.read())
6868

69+
dataset_from_db = get_dataset(dataset_id=json_dataset['dataset_id'])
70+
if dataset_from_db is not None and dataset_from_db['clusters_computed'] != len(dataset_from_db['questions']):
71+
return f'rejected file: {uploaded_file.name}, dataset still uploading'
72+
6973
Thread(target=thread_function, args=(json_dataset,)).start()
7074

7175
return f'uploaded file: {uploaded_file.name} successfully'

0 commit comments

Comments
 (0)