Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videos randomly disappearing #1453

Open
rem0g opened this issue Dec 23, 2024 · 60 comments
Open

Videos randomly disappearing #1453

rem0g opened this issue Dec 23, 2024 · 60 comments

Comments

@rem0g
Copy link
Collaborator

rem0g commented Dec 23, 2024

Some glosses has disappeared videos for example:

https://signbank.cls.ru.nl/dictionary/gloss/47361

https://signbank.cls.ru.nl/dictionary/gloss/47475

Some glosses has wrong still image from the video.

Also some glosses has wrong video perception.

Some glosses even has NMM video as gloss video.

This is happening everywhere, I have checked my scripts and everything looks fine at my end.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

Another example:

https://signbank.cls.ru.nl//dictionary/gloss/47336

When glos video has been deleted, then NME video has moved from NME to glos.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

At Glos AFBROKKELEN (https://signbank.cls.ru.nl//dictionary/gloss/46883), I am seeing this:

  {
                "ID": "211108",
                "Index": "0",
                "Description: Dutch": "afbrokkelen",
                "Description: English": "crumble",
                "Link": "https://signbank.cls.ru.nl//dictionary/protected_media/glossvideo/NGT/AF/AFBROKKELEN-46883.mp4"
            },
            {
                "ID": "208048",
                "Index": "1",
                "Description: Dutch": "afbrokkelen",
                "Description: English": "crumble",
                "Link": "https://signbank.cls.ru.nl//dictionary/protected_media/glossvideo/NGT/AF/AFBROKKELEN-46883.mp4"
            }
            
            
            Both has same link and that should not be the case. 

@susanodd

This comment was marked as off-topic.

@susanodd
Copy link
Collaborator

Can you increase the amount if time between updating/uploading and retrieving?

Like maybe increase it to 10 minutes? There was previously a problem that the time between operations was too frequent.
With videos in could be that a previous operation was not completed before a new one is performed. (It makes a difference if the file system is continuously writing new files. The transactions to create objects are faster than the file system.

@susanodd
Copy link
Collaborator

I had this problem locally on my own computer using iCloud for storage.
But I could not repeat it on the real server.
It was that the files had not been completely copied to iCloud. So they "disappeared" like that. They were actually on disc, but had names with "." before the filename, so not visible. This was on Apple's Unix.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

Yes I could do that but i would rather for Signbank side do something about the transactions, for example for incoming transactions I always enqueue them in a list instead of executing them immediately. When the server has finished processing a certain transaction then it could handle the next one.

Would that be something you can work on?

@susanodd
Copy link
Collaborator

@rem0g that sounds like an interesting approach. I will discuss that with @vanlummelhuizen how to implement that. He is the Django expert. A queuing mechanism.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

Another example:

https://signbank.cls.ru.nl//dictionary/gloss/47336

When glos video has been deleted, then NME video has moved from NME to glos.

@rem0g How are you deleting the video?

(There are some "signals" when objects are deleted. These may or may not move or delete video files. It would help in debugging to know what commands have been done.)
(When a gloss is deleted, the video files are not deleted.)

Theoretically, if you are uploading video files at rapid speed, the temporary files (that Unix is making) could end up being linked to the wrong object in Django. I suspect this for a long time, but cannot fix this myself. I will ask the others.)

I implemented a lot of code in November/December for managing the video files. There are pull requests for these. But nobody has reviewed them yet. The intention is that the dataset manager can inspect what is in the file system. That also allows to retrieve the videos from deleted glosses. The gloss IDs are not reused, so new videos should not have any interference with deleted glosses, since they always have the ID in the filename and these are not reused.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g for this gloss:

https://signbank.cls.ru.nl/dictionary/gloss/47361

the NME video is not in the correct format!

(On Firefox, it shows that it is not supported.)

Recall that you asked us to not test for MP4 anymore. Thus it can be that incorrect formats are causing problems.
(The images cannot be generated for videos in the wrong format. Hence, an old image will remain shown.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g here are the gloss video objects for AFBROKKELEN, yes, you can see that the same filename appears multiple times, for different GlossVideo objects with various perspectives and NME set.

afbrokkelen-46883-glossvideo-objects

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

Is it possible something was wrong with the permissions on your source file? Or that it was a symbolic link?
It looks like something was wrong with the source file that it kept being bound to different objects.
Also, all of the files attached to the objects have very similar time stamps.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@vanlummelhuizen can you help on this issue?

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g there are hundreds of videos with the wrong filename as you point out.

Did you change anything in your script?

I copied the newest database to signbank-test in order to inspect the filenames (in the objects).

https://signbank-test.cls.ru.nl/datasets/checks/5

(There are no files, but you can see the filenames in the objects)

The last time I checked filenames (end of November) everything was as expected.
It may have started when you can upload three videos at the same time in the API?

The problem is that all the gloss video objects of a gloss are indeed sharing a single video file. They are all pointing to the same file.

I can only think that this is being caused by an alias or something. That the file system is pointing to a single file during upload.

Babbling, but I know Django does not allow to upload multiple files in the Django Form Template. (We used to do this for the eaf files in the Dataset Manager, but when we updated to Django 4.2, the code had to be modified to only upload one file.)

Perhaps Django is somehow doing something here since multiple video files are in the same API request. (The Django feature was removed for security reasons from Django.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

From Django manual

Together MemoryFileUploadHandler and TemporaryFileUploadHandler provide Django’s default file upload behavior of reading small files into memory and large ones onto disk.

You can write custom handlers that customize how Django handles files. You could, for example, use custom handlers to enforce user-level quotas, compress data on the fly, render progress bars, and even send data to another storage location directly without storing it locally. See Writing custom upload handlers for details on how you can customize or completely replace upload behavior.
Where uploaded data is stored

Before you save uploaded files, the data needs to be stored somewhere.

By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold the entire contents of the upload in memory. This means that saving the file involves only a read from memory and a write to disk and thus is very fast.

However, if an uploaded file is too large, Django will write the uploaded file to a temporary file stored in your system’s temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload. If an upload is large enough, you can watch this file grow in size as Django streams the data onto disk.

These specifics – 2.5 megabytes; /tmp; etc. – are “reasonable defaults” which can be customized as described in the next section.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

I need help with this issue.

One of the glosses that is messed up is STRUISVOGEL

Here, you see all the objects refer to the same file (the perspective videos have the same file name as the normal video):

STRUISVOGEL-perspective-videos-admin

Here you see a stats for the file:

stat ../writable/glossvideo/NGT/ST/STRUISVOGEL-45874.mp4
  File: ../writable/glossvideo/NGT/ST/STRUISVOGEL-45874.mp4
  Size: 1105797   	Blocks: 2158       IO Block: 131072 regular file
Device: 20007dh/2097277d	Inode: 864571      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1001/wwwsignbank)
Access: 2025-01-06 08:10:59.506318499 +0000
Modify: 2024-12-01 05:01:46.404138548 +0000
Change: 2024-12-20 13:58:20.693990022 +0000
 Birth: 2024-12-01 05:01:46.386138096 +0000

But in the file system, the timestamp on the file shows that it was not changed on 20 december.

-rw-r--r-- 1 ubuntu wwwsignbank 1105797 Dec 1 05:01 STRUISVOGEL-45874.mp4

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

I don't know how to solve this.

Hopefully, @vanlummelhuizen can have a go.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 6, 2025

Hello, thank you for looking into the issue. For me the issue is not clear yet but now I know it's not caused by my script as everything relies on Signbank API. For uploading videos i do use this API:

/dictionary/api_update_gloss/{glossid}/video

That's it.

As for NME video upload i do use:

/dictionary/api_create_gloss_nmevideo/{datasetid}/{glossid}/}

And for deleting NME video:

/dictionary/api_delete_gloss_nmevideo/{datasetid}/{glossid}/{videoid}/

For every time if i want to upload a certain NME video i do execute api_delete_glos_nmevideo first, but for that i obtain unique ID from the nme video and then delete that and then upload the new NME video.

@uklomp
Copy link
Collaborator

uklomp commented Jan 7, 2025

Hi all, I added 'blocking' to indicate extra extra priority. If there's something on our end we can do, please let us know.

@susanodd
Copy link
Collaborator

susanodd commented Jan 7, 2025

I'm not able to solve this myself. I am aware of this problem for a long time, but it was on a local server running on iCloud. So I chalked that up to Apple quirks.

There are quite a few messed up video objects/ files now.

I'm playing with this locally, so I can inspect the file system and the admin without messing up anything.

Since multiple video objects refer to exactly the same file, it is not possible to "fix" this, other than to delete objects that point to the wrong file. But to do this, we need to turn off the "normal" process of deleting, otherwise the "correct" file may be deleted.

I made a command (pull request) for renaming backup video files, since that has been messed up for a long time.
But this is something different, since objects refer to the same file.

@susanodd
Copy link
Collaborator

susanodd commented Jan 7, 2025

@rem0g can you stop deleting the videos on your side? Because objects are referring to the same file, this is causing a shared file to be deleted.

I still think this is due to the API commands happening too fast for the file system. But once a file ends up shared by different objects, that escalates the problem, domino effect.

@susanodd
Copy link
Collaborator

susanodd commented Jan 14, 2025

Another try, can you make the index start at 1 of the NME video?

Using 0 can be dangerous, since it is also sometimes the same as False or None in the database. Because things are strings in the API it could be a problem with type coercion. Also, Django automatically converts objects to their "id" when they are passed around. They only remain an actual object in the template if it's via context variables.

It could possibly be that the 0 is somehow become a "Null" value, so Django thinks the field is not set in the object.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 14, 2025

Using 0 as replacement for gloss sounds weird as I understand the index number is for NME videos only.

Nothing has been sent to Signbank via API after last reupload of all videos before my vacation (happened past friday or so). NME autoupload and api_delete_gloss_nmevideo API call has been disabled since then too.

Per @uklomp comment everything was fine this morning, so something has happened on Signbank server outside of our scope? I hope @Woseseltops can solve this issue this week as we are nearing the end of our Signbank project.

@susanodd
Copy link
Collaborator

Using 0 as replacement for gloss sounds weird as I understand the index number is for NME videos only.

The only difference between a normal GlossVideo object and the (subclass) GlossVideoNME object is that the NME object has an "offset" field added to it. Otherwise they have all the same fields in the objects. In the model, the default offset is 1.

I was merely wondering out loud if it's possible that an offset with value 0 is somehow "empty" in the database. Thereby making Django think that an NME object is a normal video object. (I have no idea if that could happen. I was wondering out loud. That's why I suggested to just use 1. Then it's clear it would not be empty.)

Nothing has been sent to Signbank via API after last reupload of all videos before my vacation (happened past friday or so). NME autoupload and api_delete_gloss_nmevideo API call has been disabled since then too.

Per @uklomp comment everything was fine this morning, so something has happened on Signbank server outside of our scope? I hope @Woseseltops can solve this issue this week as we are nearing the end of our Signbank project.

We haven't changed anything on our side. Only the CNCZ could have changed something (the university owns the websites. I have no idea whether there are any limits to how much data we can store.)

@susanodd susanodd removed their assignment Jan 16, 2025
@Woseseltops
Copy link
Collaborator

This issue has been eating up nearly all of my Signbank time (and more), but I can't figure it out; the code looks okay as far as I can see. My main approach therefore is to try to reproduce the problem, using these API endpoints in various orders:

  • /dictionary/api_update_gloss/{glossid}/video (center video, but also perspective videos)
  • /dictionary/api_create_gloss_nmevideo/{datasetid}/{glossid}/
  • /dictionary/api_delete_gloss_nmevideo/{datasetid}/{glossid}/{videoid}/

Any other endpoints you are using, @rem0g ?

@Woseseltops
Copy link
Collaborator

Woseseltops commented Jan 16, 2025

After a longer session with @vanlummelhuizen , we came to the conclusion that the following is likely happening:

  • Some mysterious process, let's call it 'process X', wants to rename or move the main video
  • To do this, it incorrectly requests ALL videos with GlossVideo.objects.filter(gloss=glossid)
  • It renames each video with the new, same path, and stores this in the database. Every time this happens, the new file overwrites the previous one, so only the last one remains.

The question now is, where in the code does process X live and what does it do? The video path consists of the sign language, dataset, the lemma and the private key, so it must be related to changing any of these.

Does this ring any bell for you @rem0g ? If this line of thinking is correct, reuploading the videos should not lead to any new problems, as long as process X is not triggered.

@susanodd
Copy link
Collaborator

susanodd commented Jan 16, 2025

The question now is, where in the code does process X live and what does it do? The video path consists of the sign language, dataset, the lemma and the private key, so it must be related to changing any of these.

@Woseseltops @vanlummelhuizen Both of the signal methods for updating the dataset or lemma of a gloss move the video files!

The dataset signal processor will move all the videos of the dataset.

The signal methods are at the bottom of video/models.py

Can updating the timestamp of the gloss trigger anything?

Can a transaction rollback be happening?

Could process X be comparing the package information to what it expects for video paths, then because the videos were not finished being stored by the signbank file system, then SignCollect uploads some videos again?

@uklomp
Copy link
Collaborator

uklomp commented Jan 21, 2025

Hi all, I don't mean to put any extra pressure on you because I believe you're on top of it, but is it possible to have an indication when this is solved? We can't proceed with the last phase in the project right now, so it would be good to know if you're thinking this will take another week or another month -or that you have no idea. Thanks!

@susanodd
Copy link
Collaborator

We have not been able to duplicate the error when we do it ourselves.

If you go to the Dataset Manager, Manage Video Storage, you can see all the paths of all the video objects for the dataset (NGT is dataset 5)

https://signbank.cls.ru.nl/datasets/checks/5

The page takes a while to load.
The first panel shows "video file paths" that do not have a gloss pointing to them. (Those are videos of deleted glosses, the files remain in the file system.) (I was hoping to find SINTERKLAAS here.)

Further down the page, you can see the file paths for i.e., the perspective, NME, and normal videos. (Then you can see to what extent the paths are messed up.)

@vanlummelhuizen
Copy link
Collaborator

Hi all, I don't mean to put any extra pressure on you because I believe you're on top of it, but is it possible to have an indication when this is solved? We can't proceed with the last phase in the project right now, so it would be good to know if you're thinking this will take another week or another month -or that you have no idea. Thanks!

@uklomp We are working on it, but as @susanodd mentions, we have not been able to replicate the problem. Nor can we pinpoint it in the logs or code base. This makes it extremely difficult to give a prognosis.

I will discuss this (and possible workarounds) with @Woseseltops tomorrow. We will keep you posted.

@vanlummelhuizen
Copy link
Collaborator

Perhaps I found the code that causes this. Question for @uklomp and @rem0g : do you remember that one of you changed the default language of the NGT dataset on Tuesday January 14 around 13:15? Around that time I see a POST request to /datasets/change_details/5, where 5 is the ID of the NGT dataset.

I will explain what I found below.

A POST to /datasets/change_details/5 calls

def update_dataset(request, datasetid):
where the dataset is saved with dataset.save().

This triggers a post_save signal that in turn calls the following function:

@receiver(models.signals.post_save, sender=Dataset)
def process_dataset_changes(sender, instance, **kwargs):
"""
Makes changes to GlossVideos if a Dataset has been changed.
:param sender:
:param instance:
:param kwargs:
:return:
"""
# If the acronym has been changed, change all GlossVideos
# and rename directories.
dataset = instance
if dataset._initial['acronym'] and dataset.acronym != dataset._initial['acronym']:
# Move all media
glossvideos = GlossVideo.objects.filter(gloss__lemma__dataset=dataset)
for glossvideo in glossvideos:
glossvideo.move_video(move_files_on_disk=False)
# Rename dirs
glossvideo_path_original = os.path.join(WRITABLE_FOLDER, GLOSS_VIDEO_DIRECTORY, dataset._initial['acronym'])
glossvideo_path_new = os.path.join(WRITABLE_FOLDER, GLOSS_VIDEO_DIRECTORY, dataset.acronym)
if os.path.exists(glossvideo_path_original):
os.rename(glossvideo_path_original, glossvideo_path_new)
glossimage_path_original = os.path.join(WRITABLE_FOLDER, GLOSS_IMAGE_DIRECTORY, dataset._initial['acronym'])
glossimage_path_new = os.path.join(WRITABLE_FOLDER, GLOSS_IMAGE_DIRECTORY, dataset.acronym)
if os.path.exists(glossimage_path_original):
os.rename(glossimage_path_original, glossimage_path_new)
# Make sure that _initial reflect the database for the dataset object
dataset._initial['acronym'] = dataset.acronym
# If the default language has been changed, change all GlossVideos
# and move all video/poster files accordingly.
if dataset._initial['default_language'] and dataset.default_language != dataset._initial['default_language']:
# Move all media
glossvideos = GlossVideo.objects.filter(gloss__lemma__dataset=dataset)
for glossvideo in glossvideos:
glossvideo.move_video(move_files_on_disk=True)
# Make sure that _initial reflect the database for the dataset object
dataset._initial['default_language'] = dataset.default_language

In the last if-block in the code above (line 1083 and further), the move_video method of the GlossVideo class (!) is called on every GlossVideo object, including GlossVideoNME and GlossVideoPerspective objects. Note that this is the move_video method in the class GlossVideo and not the methods with the same name in either GlossVideoNME or GlossVideoPerspective:

def move_video(self, move_files_on_disk=True):
"""
Calculates the new path, moves the video file to the new path and updates the videofile field
:return:
"""
old_path = str(self.videofile)
new_path = get_video_file_path(self, old_path, version=self.version)
if old_path != new_path:
if move_files_on_disk:
source = os.path.join(settings.WRITABLE_FOLDER, old_path)
destination = os.path.join(settings.WRITABLE_FOLDER, new_path)
if os.path.exists(source):
destination_dir = os.path.dirname(destination)
if not os.path.exists(destination_dir):
os.makedirs(destination_dir)
if os.path.isdir(destination_dir):
shutil.move(source, destination)
# Small video
(source_no_extension, ext) = os.path.splitext(source)
source_small = add_small_appendix(source)
(destination_no_extension, ext) = os.path.splitext(destination)
destination_small = add_small_appendix(destination)
if os.path.exists(source_small):
shutil.move(source_small, destination_small)
# Image
source_image = source_no_extension.replace(settings.GLOSS_VIDEO_DIRECTORY, settings.GLOSS_IMAGE_DIRECTORY)\
+ '.png'
destination_image = destination_no_extension.replace(settings.GLOSS_VIDEO_DIRECTORY, settings.GLOSS_IMAGE_DIRECTORY)\
+ '.png'
if os.path.exists(source_image):
destination_image_dir = os.path.dirname(destination_image)
if not os.path.exists(destination_image_dir):
os.makedirs(destination_image_dir)
if os.path.isdir(destination_image_dir):
shutil.move(source_image, destination_image)
self.videofile.name = new_path
self.save()

In the code above, there is a call to get_video_file_path (line 838), but since it is the method in GlossVideo, there is no nmevideo or perspective argument in the call. This results in new_path without any NME of Perspective indicator, making it the same as the neutral video name and different from the original. This triggers a file rename, and when the file on disk is renamed, the neutral video file is overwritten by another file. Finally, if this is done is a particular order, e.g. 'left', 'nme', 'right' (alphabetical), you end up with only the right-perspective video that has the file name of the neutral video.

Code of get_video_file_path:

def get_video_file_path(instance, filename, nmevideo=False, perspective='', offset=1, version=0):
"""
Return the full path for storing an uploaded video
:param instance: A GlossVideo instance
:param filename: the original file name
:param nmevideo: boolean whether this is an nme video
:param perspective: optional string for either 'left' or 'right'
:param offset: order in sequence of NME video
:param version: the version to determine the number of .bak extensions
:return:
"""
(base, ext) = os.path.splitext(filename)
idgloss = instance.gloss.idgloss
from signbank.tools import get_two_letter_dir
two_letter_dir = get_two_letter_dir(idgloss)
video_dir = settings.GLOSS_VIDEO_DIRECTORY
try:
dataset_dir = instance.gloss.lemma.dataset.acronym
except KeyError:
dataset_dir = ""
if settings.DEBUG_VIDEOS:
print('get_video_file_path: dataset_dir is empty for gloss ', str(instance.gloss.pk))
if nmevideo:
nme_video_offset = '_nme_' + str(offset)
filename = idgloss + '-' + str(instance.gloss.id) + nme_video_offset + ext
elif perspective:
video_perpsective = '_' + perspective
filename = idgloss + '-' + str(instance.gloss.id) + video_perpsective + ext
elif version > 0:
filename = idgloss + '-' + str(instance.gloss.id) + ext + '.bak' + str(instance.id)
else:
filename = idgloss + '-' + str(instance.gloss.id) + ext
path = os.path.join(video_dir, dataset_dir, two_letter_dir, filename)
if hasattr(settings, 'ESCAPE_UPLOADED_VIDEO_FILE_PATH') and settings.ESCAPE_UPLOADED_VIDEO_FILE_PATH:
from django.utils.encoding import escape_uri_path
path = escape_uri_path(path)
if settings.DEBUG_VIDEOS:
print('get_video_file_path: ', path)
return path

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 22, 2025

@Woseseltops asked if we are using only those endpoints, for the script regarding NME video autoupload i am using those endpoints:

url = f"https://signbank.cls.ru.nl/dictionary/get_gloss_data/5/{gloss_id}"
url = f"https://signbank.cls.ru.nl/dictionary/api_delete_gloss_nmevideo/5/{gloss_id}/{videoid}/"
url = f"https://signbank.cls.ru.nl/dictionary/api_create_gloss_nmevideo/5/{gloss_id}/"

For the gloss video autoupload i am using only:
url = f"https://signbank.cls.ru.nl/dictionary/api_update_gloss/{gloss}/video"

@vanlummelhuizen mentioned about /datasets/change_details/5, I havent implemented the endpoint so far in the scripts. To be sure I havent missed anything i have done command 'cat * | grep change_details' on all scripts, but nothing found. It's also not described in API documentation.

I don't really see anything useful with the endpoint change_details as we use endpoints for gloss video and gloss phonology update/retrieval/deletion only, so probably it would be the best if the change_details endpoint would be ..removed? Either way, if the fix is implemented then i will reupload all base gloss videos again.

@vanlummelhuizen
Copy link
Collaborator

@rem0g Perhaps the change was done through the front-end.

@susanodd
Copy link
Collaborator

susanodd commented Jan 22, 2025

@rem0g Perhaps the change was done through the front-end.

I think you found it!

Just doing a "save" on a dataset object unleashes that signal method that moves the videos. Even if the dataset name hasn't been changed, the dataset has been updated so the method is called.

Given that there are thousands and thousands of video objects, it would take a long time for it to complete this, especially when users are using signbank at the same time.

I always thought the method would only be called if the dataset acronym had been changed, since that is used as a folder name.

@susanodd
Copy link
Collaborator

susanodd commented Jan 22, 2025

@uklomp was doing something with the dataset description. She found html code in it. So that could be what triggered it. I think she updated the description to remove the code. (#1452)

Except I think the videos were causing problems before the holidays as well. It maybe started on December 20th?

@vanlummelhuizen
Copy link
Collaborator

Update: I have been able to replicate the problem. Apparently there are two bugs:

  1. In the post_save receiver the current default language and the one from the Change Dataset form are compared incorrectly, i. e. ID vs name:
    if dataset._initial['default_language'] and dataset.default_language != dataset._initial['default_language']:
  2. As I wrote yesterday, the GlossVideoPerspective and GlossVideoNME objects are handled as if they are GlossVideo objects:
    glossvideos = GlossVideo.objects.filter(gloss__lemma__dataset=dataset)
    for glossvideo in glossvideos:
    glossvideo.move_video(move_files_on_disk=True)

@susanodd
Copy link
Collaborator

susanodd commented Jan 23, 2025

There are subclass methods, but they seem to not be applied. Same for the __str__ method which is used extensively to obtain the filename. It always calls the superclass __str__ method, in spite of there being subclass methods. I only discovered that last week after putting a bunch more print statements for DEBUG_VIDEOS for the new admin functionality (on signbank-susan).

Elsewhere (not in the video code)

Some code elsewhere has been revised to explicitly retrieve the objects from the subclasses and exclude them from the query.

For Morphemes, this has been a problem in the past. That was repaired by introducing a second variable and explicitly casting the (subclass) object to be a Morpheme in the new variable. Because of implicit type checking, it seems Python retains the type first given to a variable (for example, inside of a loop). (That could be an issue with the API code that addresses videos of multiple types?)

@susanodd
Copy link
Collaborator

Can we please add the "acronym" to the updated_fields for the dataset signals? That would have prevented all of this.

@susanodd
Copy link
Collaborator

There's also an overloaded method name. The method get_video_path exists for many models. Gloss, AnnotatedSentence, GlossVideoNME, GlossVideoPerspective.

You're right, there is no subclass method for get_video_file_path. But the arguments were supposed to catch this.
This method is inside the "upload_to". (Maybe bound inside the object?)

@uklomp
Copy link
Collaborator

uklomp commented Jan 28, 2025

Gomer will run the script for uploading all the videos again tonight!

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 29, 2025

Script has done running but I noticed a lot of glosses dont have base videos, there is a quite large number that is missing videos even after they were uploaded. I'm going to run the script again.

@susanodd
Copy link
Collaborator

susanodd commented Jan 29, 2025

Script has done running but I noticed a lot of glosses dont have base videos, there is a quite large number that is missing videos even after they were uploaded. I'm going to run the script again.

Can you build in a sufficient time interval to make sure the file system has completed.

Because there is an extreme number of "backup" videos at the moment (some glosses have 80 backup files), all of the backup video objects are also updated.

(There is a pull request for Admin commands to get rid of them, but it has been ignored by the colleagues.)

@susanodd
Copy link
Collaborator

susanodd commented Feb 3, 2025

This gloss:

https://signbank.cls.ru.nl/dictionary/gloss/47683/

The NME video has the same name as the normal video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@Woseseltops @rem0g @vanlummelhuizen @susanodd @uklomp and others