-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving database efficiency #44
Comments
Hi @meteorologist15 |
Thank you for pushing this PR! I was wondering though, were there any changes made that could break my old esgpull.db? I just installed a brand new instance of esgpull, copied over my esgpull.db, and when I ran 'esgpull show', I got the following message: DatabaseError: (sqlite3.DatabaseError) database disk image is malformed (Background on this error at: https://sqlalche.me/e/20/4xp6)I then tried to transfer over a copy of an older esgpull.db than I had made when I had just translated my synda.db over to an esgpull.db. That also failed with a similar message: DatabaseError: (sqlite3.DatabaseError) database disk image is malformed (Background on this error at: https://sqlalche.me/e/20/4xp6)Any thoughts? Thanks! |
Also, I see additional "esgpull.db-shm" and "esgpull.db-wal" files in my 'db' directory, in addition to the clean esgpull.db upon installation. Could you perhaps describe what they are for? Thanks! |
Perhaps the 'LEGACY' parameter is tripping things up? (though it had worked before?) From the log: [2024-07-19 18:27:57] DEBUG root |
My recent usage of esgpull has prompted me to officially raise an issue with regards to how efficiently (or inefficiently in this case) the database is utilized. Our carryover esgpull.db from the Synda days stands at around 2.3 GB in size. Starting this past Monday (2024-06-03), I have been attempting to download a fairly large grouping of data (~2.0 TiB; 60,000+ files), and after adding and tracking the query, the "update" process (esgpull update <QUERY_ID>) is STILL trudging along, with the database seemingly processing/validating/querying one file approximately every 60-90 seconds. If the pace continues, I'll be able to get through this "update" step in a little over a month from now, a very noticeable scaling issue. I'm not sure exactly how esgpull's "update" command utilizes/accesses the database, but I would presume that whatever algorithm is being used now should eventually be tuned to scale better with bigger requests and an already-packed esgpull.db.
The text was updated successfully, but these errors were encountered: