-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Videos randomly disappearing #1453
Comments
Another example: https://signbank.cls.ru.nl//dictionary/gloss/47336 When glos video has been deleted, then NME video has moved from NME to glos. |
At Glos AFBROKKELEN (https://signbank.cls.ru.nl//dictionary/gloss/46883), I am seeing this:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Can you increase the amount if time between updating/uploading and retrieving? Like maybe increase it to 10 minutes? There was previously a problem that the time between operations was too frequent. |
I had this problem locally on my own computer using iCloud for storage. |
Yes I could do that but i would rather for Signbank side do something about the transactions, for example for incoming transactions I always enqueue them in a list instead of executing them immediately. When the server has finished processing a certain transaction then it could handle the next one. Would that be something you can work on? |
@rem0g that sounds like an interesting approach. I will discuss that with @vanlummelhuizen how to implement that. He is the Django expert. A queuing mechanism. |
@rem0g How are you deleting the video? (There are some "signals" when objects are deleted. These may or may not move or delete video files. It would help in debugging to know what commands have been done.) Theoretically, if you are uploading video files at rapid speed, the temporary files (that Unix is making) could end up being linked to the wrong object in Django. I suspect this for a long time, but cannot fix this myself. I will ask the others.) I implemented a lot of code in November/December for managing the video files. There are pull requests for these. But nobody has reviewed them yet. The intention is that the dataset manager can inspect what is in the file system. That also allows to retrieve the videos from deleted glosses. The gloss IDs are not reused, so new videos should not have any interference with deleted glosses, since they always have the ID in the filename and these are not reused.) |
@rem0g for this gloss: https://signbank.cls.ru.nl/dictionary/gloss/47361 the NME video is not in the correct format! (On Firefox, it shows that it is not supported.) Recall that you asked us to not test for MP4 anymore. Thus it can be that incorrect formats are causing problems. |
@rem0g here are the gloss video objects for AFBROKKELEN, yes, you can see that the same filename appears multiple times, for different GlossVideo objects with various perspectives and NME set. |
Is it possible something was wrong with the permissions on your source file? Or that it was a symbolic link? |
@vanlummelhuizen can you help on this issue? |
@rem0g there are hundreds of videos with the wrong filename as you point out. Did you change anything in your script? I copied the newest database to signbank-test in order to inspect the filenames (in the objects). https://signbank-test.cls.ru.nl/datasets/checks/5 (There are no files, but you can see the filenames in the objects) The last time I checked filenames (end of November) everything was as expected. The problem is that all the gloss video objects of a gloss are indeed sharing a single video file. They are all pointing to the same file. I can only think that this is being caused by an alias or something. That the file system is pointing to a single file during upload. Babbling, but I know Django does not allow to upload multiple files in the Django Form Template. (We used to do this for the eaf files in the Dataset Manager, but when we updated to Django 4.2, the code had to be modified to only upload one file.) Perhaps Django is somehow doing something here since multiple video files are in the same API request. (The Django feature was removed for security reasons from Django.) |
From Django manual
|
I need help with this issue. One of the glosses that is messed up is STRUISVOGEL Here, you see all the objects refer to the same file (the perspective videos have the same file name as the normal video): Here you see a stats for the file:
But in the file system, the timestamp on the file shows that it was not changed on 20 december.
|
I don't know how to solve this. Hopefully, @vanlummelhuizen can have a go. |
Hello, thank you for looking into the issue. For me the issue is not clear yet but now I know it's not caused by my script as everything relies on Signbank API. For uploading videos i do use this API: /dictionary/api_update_gloss/{glossid}/video That's it. As for NME video upload i do use: /dictionary/api_create_gloss_nmevideo/{datasetid}/{glossid}/} And for deleting NME video: /dictionary/api_delete_gloss_nmevideo/{datasetid}/{glossid}/{videoid}/ For every time if i want to upload a certain NME video i do execute api_delete_glos_nmevideo first, but for that i obtain unique ID from the nme video and then delete that and then upload the new NME video. |
Hi all, I added 'blocking' to indicate extra extra priority. If there's something on our end we can do, please let us know. |
I'm not able to solve this myself. I am aware of this problem for a long time, but it was on a local server running on iCloud. So I chalked that up to Apple quirks. There are quite a few messed up video objects/ files now. I'm playing with this locally, so I can inspect the file system and the admin without messing up anything. Since multiple video objects refer to exactly the same file, it is not possible to "fix" this, other than to delete objects that point to the wrong file. But to do this, we need to turn off the "normal" process of deleting, otherwise the "correct" file may be deleted. I made a command (pull request) for renaming backup video files, since that has been messed up for a long time. |
@rem0g can you stop deleting the videos on your side? Because objects are referring to the same file, this is causing a shared file to be deleted. I still think this is due to the API commands happening too fast for the file system. But once a file ends up shared by different objects, that escalates the problem, domino effect. |
Another try, can you make the index start at 1 of the NME video? Using 0 can be dangerous, since it is also sometimes the same as False or None in the database. Because things are strings in the API it could be a problem with type coercion. Also, Django automatically converts objects to their "id" when they are passed around. They only remain an actual object in the template if it's via context variables. It could possibly be that the 0 is somehow become a "Null" value, so Django thinks the field is not set in the object. |
Using 0 as replacement for gloss sounds weird as I understand the index number is for NME videos only. Nothing has been sent to Signbank via API after last reupload of all videos before my vacation (happened past friday or so). NME autoupload and api_delete_gloss_nmevideo API call has been disabled since then too. Per @uklomp comment everything was fine this morning, so something has happened on Signbank server outside of our scope? I hope @Woseseltops can solve this issue this week as we are nearing the end of our Signbank project. |
The only difference between a normal GlossVideo object and the (subclass) GlossVideoNME object is that the NME object has an "offset" field added to it. Otherwise they have all the same fields in the objects. In the model, the default offset is 1. I was merely wondering out loud if it's possible that an offset with value 0 is somehow "empty" in the database. Thereby making Django think that an NME object is a normal video object. (I have no idea if that could happen. I was wondering out loud. That's why I suggested to just use 1. Then it's clear it would not be empty.)
We haven't changed anything on our side. Only the CNCZ could have changed something (the university owns the websites. I have no idea whether there are any limits to how much data we can store.) |
This issue has been eating up nearly all of my Signbank time (and more), but I can't figure it out; the code looks okay as far as I can see. My main approach therefore is to try to reproduce the problem, using these API endpoints in various orders:
Any other endpoints you are using, @rem0g ? |
After a longer session with @vanlummelhuizen , we came to the conclusion that the following is likely happening:
The question now is, where in the code does process X live and what does it do? The video path consists of the sign language, dataset, the lemma and the private key, so it must be related to changing any of these. Does this ring any bell for you @rem0g ? If this line of thinking is correct, reuploading the videos should not lead to any new problems, as long as process X is not triggered. |
@Woseseltops @vanlummelhuizen Both of the signal methods for updating the dataset or lemma of a gloss move the video files! The dataset signal processor will move all the videos of the dataset. The signal methods are at the bottom of video/models.py Can updating the timestamp of the gloss trigger anything? Can a transaction rollback be happening? Could process X be comparing the package information to what it expects for video paths, then because the videos were not finished being stored by the signbank file system, then SignCollect uploads some videos again? |
Hi all, I don't mean to put any extra pressure on you because I believe you're on top of it, but is it possible to have an indication when this is solved? We can't proceed with the last phase in the project right now, so it would be good to know if you're thinking this will take another week or another month -or that you have no idea. Thanks! |
We have not been able to duplicate the error when we do it ourselves. If you go to the Dataset Manager, Manage Video Storage, you can see all the paths of all the video objects for the dataset (NGT is dataset 5) https://signbank.cls.ru.nl/datasets/checks/5 The page takes a while to load. Further down the page, you can see the file paths for i.e., the perspective, NME, and normal videos. (Then you can see to what extent the paths are messed up.) |
@uklomp We are working on it, but as @susanodd mentions, we have not been able to replicate the problem. Nor can we pinpoint it in the logs or code base. This makes it extremely difficult to give a prognosis. I will discuss this (and possible workarounds) with @Woseseltops tomorrow. We will keep you posted. |
Perhaps I found the code that causes this. Question for @uklomp and @rem0g : do you remember that one of you changed the default language of the NGT dataset on Tuesday January 14 around 13:15? Around that time I see a POST request to I will explain what I found below. A POST to Global-signbank/signbank/dictionary/update.py Line 2943 in 759d126
dataset.save() .
This triggers a post_save signal that in turn calls the following function: Global-signbank/signbank/video/models.py Lines 1049 to 1090 in 759d126
In the last if-block in the code above (line 1083 and further), the Global-signbank/signbank/video/models.py Lines 832 to 871 in 759d126
In the code above, there is a call to Code of Global-signbank/signbank/video/models.py Lines 139 to 181 in 759d126
|
@Woseseltops asked if we are using only those endpoints, for the script regarding NME video autoupload i am using those endpoints:
For the gloss video autoupload i am using only: @vanlummelhuizen mentioned about /datasets/change_details/5, I havent implemented the endpoint so far in the scripts. To be sure I havent missed anything i have done command 'cat * | grep change_details' on all scripts, but nothing found. It's also not described in API documentation. I don't really see anything useful with the endpoint change_details as we use endpoints for gloss video and gloss phonology update/retrieval/deletion only, so probably it would be the best if the change_details endpoint would be ..removed? Either way, if the fix is implemented then i will reupload all base gloss videos again. |
@rem0g Perhaps the change was done through the front-end. |
I think you found it! Just doing a "save" on a dataset object unleashes that signal method that moves the videos. Even if the dataset name hasn't been changed, the dataset has been updated so the method is called. Given that there are thousands and thousands of video objects, it would take a long time for it to complete this, especially when users are using signbank at the same time. I always thought the method would only be called if the dataset acronym had been changed, since that is used as a folder name. |
Update: I have been able to replicate the problem. Apparently there are two bugs:
|
There are subclass methods, but they seem to not be applied. Same for the Elsewhere (not in the video code)
Some code elsewhere has been revised to explicitly retrieve the objects from the subclasses and exclude them from the query.
For Morphemes, this has been a problem in the past. That was repaired by introducing a second variable and explicitly casting the (subclass) object to be a Morpheme in the new variable. Because of implicit type checking, it seems Python retains the type first given to a variable (for example, inside of a loop). (That could be an issue with the API code that addresses videos of multiple types?) |
Can we please add the "acronym" to the updated_fields for the dataset signals? That would have prevented all of this. |
There's also an overloaded method name. The method You're right, there is no subclass method for get_video_file_path. But the arguments were supposed to catch this. |
Gomer will run the script for uploading all the videos again tonight! |
Script has done running but I noticed a lot of glosses dont have base videos, there is a quite large number that is missing videos even after they were uploaded. I'm going to run the script again. |
Can you build in a sufficient time interval to make sure the file system has completed. Because there is an extreme number of "backup" videos at the moment (some glosses have 80 backup files), all of the backup video objects are also updated. (There is a pull request for Admin commands to get rid of them, but it has been ignored by the colleagues.) |
This gloss: https://signbank.cls.ru.nl/dictionary/gloss/47683/ The NME video has the same name as the normal video. |
Some glosses has disappeared videos for example:
https://signbank.cls.ru.nl/dictionary/gloss/47361
https://signbank.cls.ru.nl/dictionary/gloss/47475
Some glosses has wrong still image from the video.
Also some glosses has wrong video perception.
Some glosses even has NMM video as gloss video.
This is happening everywhere, I have checked my scripts and everything looks fine at my end.
The text was updated successfully, but these errors were encountered: