-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: integrate seqvars inhouse rocksdb in worker call (#2069) #2070
feat: integrate seqvars inhouse rocksdb in worker call (#2069) #2070
Conversation
Warning Rate limit exceeded@holtgrewe has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 29 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes in this pull request include modifications to the Celery configuration for task routing, the addition of a new environment variable for worker database paths, the introduction of a management command for rebuilding the seqvars in-house database, and the creation of a new Django migration for a background job model. Additionally, new classes and methods related to background job execution were added, along with a new Celery task and periodic task registration. The Dockerfile was updated to reflect a new version of a dependency. Changes
Possibly related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
deps-report 🔍Commit scanned: e194e50 Vulnerable dependencies4 dependencies have vulnerabilities 😱
Outdated dependencies43 outdated dependencies found (including 17 outdated major versions)😢
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2070 +/- ##
=====================================
- Coverage 91% 91% -1%
=====================================
Files 677 678 +1
Lines 38445 38532 +87
=====================================
+ Hits 35098 35135 +37
- Misses 3347 3397 +50
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (3)
backend/seqvars/tasks.py (1)
Line range hint 8-12
: Fix incorrect module reference in docstring.
The docstring references cases_import.models
but the model appears to be in seqvars.models
.
- """Task to execute a ``cases_import.models.SeqvarsQueryExecutionBackgroundJob``."""
+ """Task to execute a ``seqvars.models.SeqvarsQueryExecutionBackgroundJob``."""
backend/seqvars/management/commands/buildseqvarsinhousedb.py (1)
11-13
: Consider enhancing the help message with more details.
While the help message is present, it could be more informative by including:
- What the seqvars inhouse database is
- When/why someone would need to rebuild it
- Any prerequisites or side effects of rebuilding
- help = "(Re-) build the seqvars inhouse database."
+ help = """(Re-) build the seqvars inhouse database.
+
+ This command rebuilds the sequence variants in-house database which is used for [purpose].
+ You might need to run this when [scenarios].
+
+ Note: This operation may take significant time and resources.
+ """
backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (1)
38-42
: Consider index implications of the ordering configuration.
The model is configured to order by -pk
which is appropriate for background jobs as it shows the most recent jobs first. Since this is the primary key, the ordering will use the table's clustered index.
However, if you plan to frequently query this table by sodar_uuid
, consider adding a database index to optimize such queries.
You can add an index in the model's Meta class like this:
class Meta:
ordering = ["-pk"]
indexes = [
models.Index(fields=["sodar_uuid"]),
]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- backend/config/celery.py (1 hunks)
- backend/config/settings/base.py (1 hunks)
- backend/seqvars/management/commands/buildseqvarsinhousedb.py (1 hunks)
- backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (1 hunks)
- backend/seqvars/models/base.py (4 hunks)
- backend/seqvars/models/executors.py (5 hunks)
- backend/seqvars/plugins.py (2 hunks)
- backend/seqvars/tasks.py (2 hunks)
- utils/docker/Dockerfile (1 hunks)
🔇 Additional comments (14)
backend/seqvars/plugins.py (2)
3-6
: LGTM! Clean import organization.
The multi-line import statement is well-organized and follows Python's style guidelines.
17-17
: LGTM! Proper registration of the new background job.
The new background job is correctly registered in the plugin's job specifications, following the established pattern.
Let's verify the background job class implementation:
✅ Verification successful
Background job implementation verified and properly integrated
The verification confirms that the SeqvarsInhouseDbBuildBackgroundJob
is correctly implemented and integrated:
- Proper model definition with required fields and custom manager
- Correct specification of
spec_name
matching the plugin registration - Complete implementation of executor with proper job handling
- Migration file present for database schema updates
- Task implementation for background job execution
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the SeqvarsInhouseDbBuildBackgroundJob class implementation
# and its integration with the background job system
# Check the background job class definition
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob {
$$$
}'
# Check for any related test files
fd -g '*test*.py' | xargs rg 'SeqvarsInhouseDbBuildBackgroundJob'
Length of output: 139
Script:
#!/bin/bash
# Let's try a broader search pattern to locate the class definition and its usage
# Search for the class name in all files
rg -l "SeqvarsInhouseDbBuildBackgroundJob"
# Search for any class that extends BackgroundJob
ast-grep --pattern 'class $_ extends BackgroundJob {
$$$
}'
# Alternative search without extends keyword
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'
# Look for background job related files
fd -g '*background*.py'
Length of output: 771
Script:
#!/bin/bash
# Let's examine the class implementation and its usage
# Check the class implementation in base.py
rg -A 10 "class SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/models/base.py
# Check how it's used in tasks.py
rg -A 5 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/tasks.py
# Check the migration file to understand the model structure
rg -A 10 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py
# Check the executor implementation
rg -A 5 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/models/executors.py
Length of output: 2783
backend/seqvars/tasks.py (1)
15-20
: Implementation looks good!
The task follows good practices:
- Uses bind=True for task instance access
- Includes type hints for better code maintainability
- Properly delegates to the executors module
backend/seqvars/management/commands/buildseqvarsinhousedb.py (2)
1-9
: LGTM! Well-organized imports and clear module documentation.
The imports are properly organized with standard library imports separated from project imports, and the module's purpose is clearly documented.
7-8
: Verify the implementation of referenced components.
Let's verify the implementation of the model and task to ensure they're properly defined.
✅ Verification successful
Components are properly implemented and registered as Celery tasks
The verification shows that both components are correctly implemented and registered:
- The model
SeqvarsInhouseDbBuildBackgroundJob
is properly defined inbackend/seqvars/models/base.py
with its manager and required fields. - The executor function
run_seqvarsbuildinhousedbbackgroundjob
is implemented inbackend/seqvars/models/executors.py
with its correspondingInhouseDbBuildBackgroundJobExecutor
class. - The function is properly registered as a Celery task in
backend/seqvars/tasks.py
using the@app.task
decorator and is also set up as a periodic task to run weekly (on Mondays).
The implementation follows the project's patterns for background jobs and task registration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of model and task
# Check model implementation
echo "Checking model implementation..."
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'
# Check task registration
echo "Checking task registration..."
rg -A 5 "run_seqvarsbuildinhousedbbackgroundjob" backend/seqvars/tasks.py
Length of output: 281
Script:
#!/bin/bash
# Let's try a broader search to locate these components
# Search for the model in all Python files
echo "Searching for SeqvarsInhouseDbBuildBackgroundJob model..."
rg -l "class SeqvarsInhouseDbBuildBackgroundJob" --type py
# Show the model implementation if found
echo "\nModel implementation:"
rg -A 10 "class SeqvarsInhouseDbBuildBackgroundJob" --type py
# Search for the executor function
echo "\nSearching for run_seqvarsbuildinhousedbbackgroundjob..."
rg -l "run_seqvarsbuildinhousedbbackgroundjob" --type py
# Show the executor implementation if found
echo "\nExecutor implementation:"
rg -A 10 "def run_seqvarsbuildinhousedbbackgroundjob" --type py
# Check if the function is registered as a task
echo "\nChecking task registration..."
rg "@task.*run_seqvarsbuildinhousedbbackgroundjob" --type py
Length of output: 2876
Script:
#!/bin/bash
# Let's verify the task registration and executor implementation
# Check if the function is imported and registered in tasks.py
echo "Checking tasks.py for registration..."
rg -A 5 "from seqvars.models.executors import run_seqvarsbuildinhousedbbackgroundjob" backend/seqvars/tasks.py
# Check the full executor implementation
echo "\nFull executor class implementation..."
rg -B 5 -A 20 "class InhouseDbBuildBackgroundJobExecutor" backend/seqvars/models/executors.py
# Check if there are any celery task decorators
echo "\nChecking for celery task decorators..."
rg "@shared_task|@app.task" backend/seqvars/models/executors.py
Length of output: 1679
Script:
#!/bin/bash
# Let's check tasks.py directly and look for celery task registration in the project
# Check contents of tasks.py
echo "Checking contents of tasks.py..."
rg -l "tasks.py" --type py
# Show contents of any tasks.py files found
echo "\nContents of tasks.py files:"
fd "tasks.py" --type f --exec cat {}
# Look for celery task registration patterns in the project
echo "\nLooking for celery task patterns..."
rg -l "from celery import shared_task|from celery import task|@shared_task|@task" --type py
Length of output: 12451
backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (2)
21-36
: LGTM! Model fields are well-defined.
The model fields follow Django best practices:
- Primary key is auto-incrementing
- UUID field ensures global uniqueness
- ForeignKey relationship is properly configured with cascade deletion and descriptive help text
17-43
: Verify background job cleanup strategy.
Since this model is tied to background jobs, ensure there's a cleanup strategy in place to prevent unbounded growth of the database table.
Let's check if there are any cleanup tasks or retention policies defined:
✅ Verification successful
Background job cleanup is properly implemented
The codebase has comprehensive cleanup mechanisms for background jobs:
-
Automatic cleanup through periodic Celery tasks:
clear_inactive_variant_sets
runs every hour at minute 11clear_expired_exported_files
runs daily at 1:11run_seqvarsinhousedbbuildbackgroundjob
runs weekly on Mondays at minute 11
-
Manual cleanup commands:
clear_inactive_variant_sets
management commandsvs_bg_sv_set_cleanup
management command with configurable timeout
-
Cleanup implementation details:
- Removes inactive variant sets older than 12 hours by default
- Configurable cleanup timeout via
VARFISH_SV_CLEANUP_BUILDING_SV_SETS
setting - Handles cleanup of related database tables and annotations
The SeqvarsInhouseDbBuildBackgroundJob
model is integrated into this existing cleanup infrastructure, ensuring proper database maintenance.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Look for background job cleanup configurations or tasks
# Check for cleanup tasks in Celery configuration
echo "Checking for cleanup tasks in Celery configuration..."
rg -l "cleanup|retention|purge" --type py backend/
# Check for existing cleanup implementations in bgjobs
echo "Checking bgjobs app for cleanup implementations..."
rg -l "cleanup|retention|purge" backend/bgjobs/
Length of output: 1675
Script:
#!/bin/bash
# Let's check the actual content of cleanup-related files and background job management
# Check maintenance command for variant sets cleanup
echo "=== Checking variant sets cleanup command ==="
cat backend/maintenance/management/commands/clear_inactive_variant_sets.py
# Check SV cleanup management command
echo -e "\n=== Checking SV cleanup command ==="
cat backend/svs/management/commands/svs_bg_sv_set_cleanup.py
# Look for background job lifecycle management
echo -e "\n=== Checking background job lifecycle management ==="
rg -A 10 "class BackgroundJob" backend/
Length of output: 8092
Script:
#!/bin/bash
# Let's check for settings related to cleanup and the tasks implementation
# Check settings for cleanup configurations
echo "=== Checking settings for cleanup configurations ==="
rg "CLEANUP|RETENTION|PURGE" backend/config/settings/
# Check the cleanup task implementations
echo -e "\n=== Checking cleanup task implementations ==="
rg -A 10 "cleanup_variant_sets|cleanup_bg_sv_set_task" backend/
# Check for any celery periodic task configurations
echo -e "\n=== Checking celery periodic task configurations ==="
rg -A 5 "celery\.schedules|PeriodicTask|crontab" backend/
Length of output: 13775
backend/config/celery.py (1)
35-35
: LGTM! The routing rule follows established patterns.
The new routing rule appropriately directs in-house database build tasks to the maintenance queue, consistent with other maintenance operations.
Let's verify that the task naming follows the routing pattern:
✅ Verification successful
Task name matches the routing pattern correctly
The task run_seqvarsinhousedbbuildbackgroundjob
follows the routing pattern *.run_*inhousedbbuild*
and will be correctly routed to the maintenance queue. The implementation shows both the task definition and its periodic schedule setup, confirming proper integration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify task naming matches the routing pattern
# Test: Search for task definitions matching the routing pattern
# Expect: Task names should match "*.run_*inhousedbbuild*" pattern
rg -l "run_.*inhousedbbuild.*" --type py
Length of output: 92
Script:
#!/bin/bash
# Let's examine the task definition to verify it matches the routing pattern
rg "run_.*inhousedbbuild" -A 5 -B 2 backend/seqvars/tasks.py
Length of output: 604
utils/docker/Dockerfile (2)
148-148
: Verify RocksDB configuration for the worker.
The Dockerfile includes librocksdb8.9
as a runtime dependency, which is required for the seqvars inhouse rocksdb integration. However, let's verify if any additional configurations are needed.
Consider documenting the RocksDB configuration requirements, such as:
- Required environment variables
- Volume mount paths for persistence
- Performance tuning parameters
✅ Verification successful
RocksDB configuration appears to be correctly set up
The RocksDB configuration for the worker appears to be properly configured:
- The Dockerfile correctly includes
librocksdb8.9
as a runtime dependency - The worker is configured to use RocksDB for storage with proper paths (
--path-out-rocksdb
) - No additional environment variables or special configurations are required as the worker handles the RocksDB setup internally
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify RocksDB-related configurations
# Test 1: Check for any RocksDB-related environment variables or configurations
rg -i "rocksdb|seqvars.*db" -g "!*.{pyc,min.js,min.css}"
# Test 2: Check for any RocksDB-related volume mounts or paths
fd -g "docker-compose*.y*ml" -X rg -i "rocksdb|seqvars.*db"
Length of output: 24561
Script:
#!/bin/bash
# Let's check for RocksDB configuration in docker-compose files and worker configuration
fd -g 'docker-compose*.y*ml' -X cat {} \;
# Also check for any RocksDB configuration in worker-related files
rg -i "rocksdb" -g "!*.{pyc,min.js,min.css}" -A 3 -B 3
Length of output: 1055
148-148
: Verify compatibility with the updated worker version.
The update of varfish-server-worker
from 0.16.0
to 0.16.1
aligns with the PR's objective of integrating seqvars inhouse rocksdb. However, let's verify the compatibility and requirements.
Consider documenting any version-specific requirements or configurations needed for the seqvars inhouse rocksdb integration in the repository's documentation.
✅ Verification successful
Based on the gathered information, I can now provide a final response:
Worker version update is compatible with RocksDB integration
The update from 0.16.0 to 0.16.1 is compatible with the seqvars inhouse RocksDB integration. The codebase shows:
- RocksDB runtime dependency is already included in the Dockerfile
- Environment variable
VARFISH_WORKER_DB_PATH
is properly configured for worker database paths - Worker implementation includes RocksDB path handling in the executors
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the worker version compatibility and requirements
# Test 1: Check if the worker version exists in the registry
gh api \
-H "Accept: application/vnd.oci.image.index.v1+json" \
/orgs/varfish-org/packages/container/varfish-server-worker/versions \
| jq -r '.[] | select(.metadata.container.tags | contains(["0.16.1"]))'
# Test 2: Check for any version-specific documentation or migration steps
fd -g "CHANGELOG*" -g "README*" -g "UPGRADE*" -X rg -i "0\.16\.(0|1)"
Length of output: 24561
Script:
#!/bin/bash
# Let's verify the worker version and RocksDB requirements
# Check docker-compose files for worker configuration and environment variables
fd -g 'docker-compose*.y*ml' -X cat {} \;
# Check for any RocksDB or seqvars database related environment variables or paths
rg -i '(SEQVARS.*DB.*PATH|ROCKSDB.*PATH|worker.*db.*path)' -g '!*.{pyc,min.js,min.css}'
# Check for any worker configuration files
fd -g '*worker*.{yml,yaml,conf,cfg,ini}' -X cat {} \;
Length of output: 9049
backend/config/settings/base.py (1)
505-507
: LGTM! Consider adding documentation for the paths.
The separation of read-only and writable paths is a good practice. Consider adding comments to document the specific purpose of each path for operators.
Add comments above each path setting:
# Path to database for the worker (base database with sub entries for mehari etc.).
WORKER_DB_PATH = env.str("VARFISH_WORKER_DB_PATH", "/data/varfish-static/data")
+# Path for writable worker database operations (e.g., storing in-house data).
WORKER_RW_PATH = env.str("VARFISH_WORKER_RW_PATH", "/data/varfish-dynamic/data")
backend/seqvars/models/base.py (1)
Line range hint 2435-2500
: LGTM: Well-structured background job implementation.
The implementation of SeqvarsQueryExecutionBackgroundJob
and its manager follows Django best practices with proper transaction handling, clear field definitions, and appropriate relationships.
backend/seqvars/models/executors.py (2)
283-291
: 🛠️ Refactor suggestion
Ensure symlink updates handle errors and edge cases
While updating the "active" symlink, exceptions might occur (e.g., if the symlink already exists or due to permission issues). Adding error handling will make the code more robust and prevent unexpected crashes. Additionally, since this approach is Unix-specific, consider documenting this limitation or providing an alternative for non-Unix systems if cross-platform support is required.
48-55
: Verify the security of subprocess calls
In the run_worker
function, ensure that the cmd
list and env
dictionary do not contain untrusted input that could pose a security risk when passed to subprocess.check_call
. Confirm that settings.WORKER_EXE_PATH
and args
are sourced from trusted inputs and are properly validated.
Run the following script to identify where args
and settings.WORKER_EXE_PATH
are defined to ensure they are securely set:
✅ Verification successful
Let me analyze the findings:
WORKER_EXE_PATH
is defined inbackend/config/settings/base.py
and sourced from an environment variable with a default value:
WORKER_EXE_PATH = env.str("VARFISH_WORKER_EXE_PATH", "varfish-server-worker")
- The
args
parameter inrun_worker
is used in several places in the codebase, all within controlled contexts:
- In
backend/seqvars/models/executors.py
for sequence variant operations - In
backend/cases_import/models/executors.py
for case import operations
The args are constructed with hardcoded string literals like "seqvars", "query", "ingest", etc. and passed through command-line flags.
Let me verify one more aspect about input validation:
Let me try a different pattern to check the implementation:
Let me check how the args are constructed before being passed to run_worker:
Based on the comprehensive analysis, I can now provide the final response:
The subprocess call is secure in its current implementation
The run_worker
function is used in a controlled manner where:
WORKER_EXE_PATH
is sourced from environment variables with a safe default value- The
args
parameter is only constructed with:- Hardcoded string literals (e.g., "seqvars", "query", "aggregate")
- Command-line flags (e.g., "--case-uuid", "--genomebuild")
- Validated data from trusted sources:
- UUIDs via
str(self.case.sodar_uuid)
- Timestamps via
timezone.now().strftime()
- Settings paths via
settings.WORKER_DB_PATH
- File paths constructed within the application
- UUIDs via
The environment variables are also properly controlled, being sourced from application settings and system environment.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Locate definitions and usages of 'args' and 'settings.WORKER_EXE_PATH' in the codebase.
# Find where 'WORKER_EXE_PATH' is defined or assigned.
rg 'WORKER_EXE_PATH\s*='
# Find all usages and assignments of 'args'.
rg -A 3 -B 3 '\bargs\s*='
Length of output: 37922
Script:
# Check if there's any input validation or sanitization of args before passing to subprocess
ast-grep --pattern 'def run_worker($$$) {
$$$
subprocess.check_call($$$)
$$$
}'
Length of output: 87
Script:
# Find the run_worker implementation and surrounding context
rg -A 10 -B 10 "def run_worker"
Length of output: 3236
Script:
# Find where args are constructed before being passed to run_worker
rg -A 10 "args = \[" backend/seqvars/models/executors.py backend/cases_import/models/executors.py
Length of output: 3941
|
||
@app.task(bind=True) | ||
def run_seqvarsinhousedbbuildbackgroundjob(_self, *, seqvarsinhousedbbuildbackgroundjob_pk: int): | ||
"""Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect module reference in docstring.
The docstring references cases_import.models
but the model appears to be in seqvars.models
.
- """Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``."""
+ """Task to execute a ``seqvars.models.SeqvarsInhouseDbBuildBackgroundJob``."""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"""Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``.""" | |
"""Task to execute a ``seqvars.models.SeqvarsInhouseDbBuildBackgroundJob``.""" |
@app.on_after_finalize.connect | ||
def setup_periodic_tasks(sender, **_kwargs): | ||
"""Register periodic tasks""" | ||
# Regularly rebuild the in-house database. | ||
sender.add_periodic_task( | ||
schedule=crontab(day_of_week=1, minute=11), sig=run_seqvarsinhousedbbuildbackgroundjob.s() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Fix task signature and improve documentation.
The review comment is correct. Based on the code analysis:
- The task signature is incorrect as
run_seqvarsinhousedbbuildbackgroundjob
requires aseqvarsinhousedbbuildbackgroundjob_pk
parameter, which is confirmed by its implementation. - The model has a custom manager with
create_full()
method that properly creates both the background job and the model instance. - The command
buildseqvarsinhousedb.py
shows the correct usage pattern: create the job first usingcreate_full()
and then pass its PK to the task.
The suggested improvements in the review comment are accurate and should be implemented:
- Create a job instance using
create_full()
before scheduling - Pass the job PK to the task
- Document the schedule details in the docstring
🔗 Analysis chain
Fix task signature and improve documentation.
There are several issues with the periodic task setup:
- The task signature is incorrect as
run_seqvarsinhousedbbuildbackgroundjob
requires aseqvarsinhousedbbuildbackgroundjob_pk
parameter. - The schedule timing (11 minutes past the hour) seems arbitrary and should be documented.
Let's verify if this task is being used elsewhere:
Consider these improvements:
- Create a job instance before scheduling the task
- Add schedule details to the docstring
- Consider moving the schedule configuration to settings
Example implementation:
@app.on_after_finalize.connect
def setup_periodic_tasks(sender, **_kwargs):
- """Register periodic tasks"""
+ """Register periodic tasks.
+
+ Schedules:
+ - Rebuild in-house database: Every Monday at 11 minutes past the hour
+ """
# Regularly rebuild the in-house database.
+ job = executors.SeqvarsInhouseDbBuildBackgroundJob.objects.create()
sender.add_periodic_task(
- schedule=crontab(day_of_week=1, minute=11), sig=run_seqvarsinhousedbbuildbackgroundjob.s()
+ schedule=crontab(day_of_week=1, minute=11),
+ sig=run_seqvarsinhousedbbuildbackgroundjob.s(
+ seqvarsinhousedbbuildbackgroundjob_pk=job.pk
+ )
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any other periodic task configurations or similar patterns
rg -l "add_periodic_task.*seqvars"
Length of output: 36
Script:
#!/bin/bash
# Let's verify the task signature and implementation
ast-grep --pattern 'def run_seqvarsinhousedbbuildbackgroundjob'
# Also check for any related task configurations
rg "run_seqvarsinhousedbbuildbackgroundjob" -A 3 -B 3
# Check the executor implementation
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'
Length of output: 1078
Script:
#!/bin/bash
# Let's check the executor implementation in a different way
rg "class SeqvarsInhouseDbBuildBackgroundJob" -A 5 -B 1
# Also check for any imports or usage of this class
rg "SeqvarsInhouseDbBuildBackgroundJob" -A 2 -B 2
# Check if there are any other similar background job implementations
rg "BackgroundJob.*objects\.create" -A 2 -B 2
Length of output: 22341
def handle(self, *_args, **options): | ||
"""Entrypoint from command line""" | ||
|
||
try: | ||
self._handle() | ||
except Exception as e: | ||
self.stderr.write( | ||
self.style.ERROR( | ||
"A problem occured (see below).\n\n--- BEGIN TRACEBACK ---\n" | ||
f"{traceback.format_exc()}--- END TRACEBACK ---\n" | ||
) | ||
) | ||
raise CommandError("Could not re-build the seqvars inhouse db.") from e | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix typo and consider using logging instead of stderr.
- There's a typo in the error message: "occured" should be "occurred"
- Consider using Django's logging framework instead of direct stderr writes for better consistency and configurability.
- "A problem occured (see below).\n\n--- BEGIN TRACEBACK ---\n"
+ "A problem occurred (see below).\n\n--- BEGIN TRACEBACK ---\n"
Consider refactoring to use logging:
import logging
logger = logging.getLogger(__name__)
# In the exception handler:
logger.error("Could not re-build the seqvars inhouse db:\n%s", traceback.format_exc())
def _handle(self): | ||
# Create a new job to execute. | ||
job = SeqvarsInhouseDbBuildBackgroundJob.objects.create_full() | ||
self.stderr.write(self.style.SUCCESS("Executing seqvars inhouse db build job...")) | ||
run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk) | ||
self.stderr.write(self.style.SUCCESS("... done executing job")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing job execution robustness and user feedback.
The current implementation has several areas for improvement:
- The job execution is synchronous, which could block the command for long-running operations
- There's no progress feedback during execution
- Missing validation before starting the job
- No cleanup handling in case of failure
Consider implementing these improvements:
def _handle(self):
+ # Check if any job is already running
+ if SeqvarsInhouseDbBuildBackgroundJob.objects.filter(status="running").exists():
+ raise CommandError("Another build job is already running")
+
# Create a new job to execute.
job = SeqvarsInhouseDbBuildBackgroundJob.objects.create_full()
- self.stderr.write(self.style.SUCCESS("Executing seqvars inhouse db build job..."))
- run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk)
- self.stderr.write(self.style.SUCCESS("... done executing job"))
+ try:
+ self.stderr.write(self.style.SUCCESS(f"Started job with ID: {job.pk}"))
+ self.stderr.write("You can monitor the progress in the admin interface or logs")
+
+ # Option 1: Run asynchronously
+ run_seqvarsbuildinhousedbbackgroundjob.delay(pk=job.pk)
+
+ # Option 2: Run synchronously with progress
+ # run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk)
+ # while not job.refresh_from_db():
+ # self.stderr.write(f"Progress: {job.progress}%")
+ # time.sleep(5)
+ except Exception as e:
+ job.status = "failed"
+ job.error_message = str(e)
+ job.save()
+ raise
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager): | ||
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob`` | ||
together with the backing ``BackgroundJob``. | ||
""" | ||
|
||
@transaction.atomic | ||
def create_full(self): | ||
bg_job = BackgroundJob.objects.create( | ||
name="Building seqvars inhouse DB", | ||
project=None, | ||
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name, | ||
user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER), | ||
) | ||
instance = super().create( | ||
bg_job=bg_job, | ||
) | ||
return instance | ||
|
||
|
||
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model): | ||
"""Background job for re-building the seqvars inhouse DB.""" | ||
|
||
# We use a custom manager that provides creation together with the ``BackgroundJob``. | ||
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager() | ||
|
||
#: Task description for logging. | ||
task_desc = "Seqvars Query Execution" | ||
|
||
#: String identifying model in BackgroundJob. | ||
spec_name = "seqvars.runinhousedbbuild" | ||
|
||
#: The SODAR UUID. | ||
sodar_uuid = models.UUIDField( | ||
default=uuid_object.uuid4, | ||
unique=True, | ||
) | ||
|
||
#: The background job for state management etc. | ||
bg_job = models.ForeignKey( | ||
BackgroundJob, | ||
null=False, | ||
related_name="seqvarsinhousedbbuildbackgroundjob", | ||
help_text="Background job for state etc.", | ||
on_delete=models.CASCADE, | ||
) | ||
|
||
def get_human_readable_type(self): | ||
return self.task_desc | ||
|
||
class Meta: | ||
ordering = ["-pk"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple issues in SeqvarsInhouseDbBuildBackgroundJob implementation.
- The task description is incorrect (copied from query execution job)
- The create_full method implementation could be improved
- Missing documentation
Apply these fixes:
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager):
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob``
together with the backing ``BackgroundJob``.
"""
@transaction.atomic
- def create_full(self):
+ def create_full(self, *, user: typing.Optional[User] = None) -> "SeqvarsInhouseDbBuildBackgroundJob":
+ """Create a new background job for building the in-house database.
+
+ :param user: The user to associate with the job. If None, uses the admin user.
+ :return: The created background job instance.
+ """
+ if user is None:
+ user = User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER)
bg_job = BackgroundJob.objects.create(
name="Building seqvars inhouse DB",
project=None,
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name,
- user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER),
+ user=user,
)
instance = super().create(
bg_job=bg_job,
)
return instance
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model):
"""Background job for re-building the seqvars inhouse DB."""
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager()
#: Task description for logging.
- task_desc = "Seqvars Query Execution"
+ task_desc = "Build Seqvars In-house Database"
The changes:
- Fix the task description to accurately reflect the job's purpose
- Make the create_full method more flexible by accepting an optional user parameter
- Add proper documentation to the create_full method
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager): | |
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob`` | |
together with the backing ``BackgroundJob``. | |
""" | |
@transaction.atomic | |
def create_full(self): | |
bg_job = BackgroundJob.objects.create( | |
name="Building seqvars inhouse DB", | |
project=None, | |
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name, | |
user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER), | |
) | |
instance = super().create( | |
bg_job=bg_job, | |
) | |
return instance | |
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model): | |
"""Background job for re-building the seqvars inhouse DB.""" | |
# We use a custom manager that provides creation together with the ``BackgroundJob``. | |
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager() | |
#: Task description for logging. | |
task_desc = "Seqvars Query Execution" | |
#: String identifying model in BackgroundJob. | |
spec_name = "seqvars.runinhousedbbuild" | |
#: The SODAR UUID. | |
sodar_uuid = models.UUIDField( | |
default=uuid_object.uuid4, | |
unique=True, | |
) | |
#: The background job for state management etc. | |
bg_job = models.ForeignKey( | |
BackgroundJob, | |
null=False, | |
related_name="seqvarsinhousedbbuildbackgroundjob", | |
help_text="Background job for state etc.", | |
on_delete=models.CASCADE, | |
) | |
def get_human_readable_type(self): | |
return self.task_desc | |
class Meta: | |
ordering = ["-pk"] | |
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager): | |
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob`` | |
together with the backing ``BackgroundJob``. | |
""" | |
@transaction.atomic | |
def create_full(self, *, user: typing.Optional[User] = None) -> "SeqvarsInhouseDbBuildBackgroundJob": | |
"""Create a new background job for building the in-house database. | |
:param user: The user to associate with the job. If None, uses the admin user. | |
:return: The created background job instance. | |
""" | |
if user is None: | |
user = User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER) | |
bg_job = BackgroundJob.objects.create( | |
name="Building seqvars inhouse DB", | |
project=None, | |
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name, | |
user=user, | |
) | |
instance = super().create( | |
bg_job=bg_job, | |
) | |
return instance | |
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model): | |
"""Background job for re-building the seqvars inhouse DB.""" | |
# We use a custom manager that provides creation together with the ``BackgroundJob``. | |
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager() | |
#: Task description for logging. | |
task_desc = "Build Seqvars In-house Database" | |
#: String identifying model in BackgroundJob. | |
spec_name = "seqvars.runinhousedbbuild" | |
#: The SODAR UUID. | |
sodar_uuid = models.UUIDField( | |
default=uuid_object.uuid4, | |
unique=True, | |
) | |
#: The background job for state management etc. | |
bg_job = models.ForeignKey( | |
BackgroundJob, | |
null=False, | |
related_name="seqvarsinhousedbbuildbackgroundjob", | |
help_text="Background job for state etc.", | |
on_delete=models.CASCADE, | |
) | |
def get_human_readable_type(self): | |
return self.task_desc | |
class Meta: | |
ordering = ["-pk"] |
print(" ".join(args)) | ||
run_worker( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid printing sensitive command-line arguments
Printing the command-line arguments with print(" ".join(args))
may inadvertently expose sensitive information in logs or console output. If the arguments contain confidential data, consider removing this print statement or sanitizing the output to prevent potential information leakage.
Apply this diff to remove the print statement:
- print(" ".join(args))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
print(" ".join(args)) | |
run_worker( | |
run_worker( |
run_worker( | ||
args=args, | ||
env=env, | ||
) | ||
except Exception: | ||
pass | ||
print("Error while executing worker / importing results", file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions in execute_query
are not properly handled
In execute_query
, if an exception occurs during run_worker
, the exception is caught, and an error message is printed, but the exception is not re-raised or handled further. This may lead to the program continuing in an invalid state. Consider re-raising the exception to ensure that the calling method is aware of the failure and can handle it appropriately.
Apply this diff to re-raise the exception after logging:
except Exception as e:
print("Error while executing worker / importing results", file=sys.stderr)
+ raise e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
run_worker( | |
args=args, | |
env=env, | |
) | |
except Exception: | |
pass | |
print("Error while executing worker / importing results", file=sys.stderr) | |
run_worker( | |
args=args, | |
env=env, | |
) | |
except Exception as e: | |
print("Error while executing worker / importing results", file=sys.stderr) | |
raise e |
**aws_config_env_internal(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid merging all environment variables into the subprocess
When constructing the env
dictionary, merging the entire os.environ
can introduce unintended environment variables into the subprocess, which might affect its behavior or security. It is safer to specify only the necessary environment variables explicitly.
Apply this diff to specify only the required environment variables:
- env = {
- **dict(os.environ.items()),
- "LC_ALL": "C",
- **aws_config_env_internal(),
- }
+ env = {
+ "LC_ALL": "C",
+ **aws_config_env_internal(),
+ }
Committable suggestion was skipped due to low confidence.
seqvars_file = PedigreeInternalFile.objects.filter( | ||
case=case, | ||
designation="variant_calls/seqvars/ingested-vcf", | ||
)[0] | ||
if seqvars_file.genomebuild == genome_release: | ||
paths.append(f"{bucket}/{seqvars_file.path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential absence of PedigreeInternalFile
objects
When accessing PedigreeInternalFile.objects.filter(...)[0]
, there's a risk of an IndexError
if the queryset is empty. This can happen if no files match the filter criteria. To prevent the application from crashing, use .first()
and check if the result is None
before proceeding.
Apply this diff to safely handle cases with no matching files:
- seqvars_file = PedigreeInternalFile.objects.filter(
- case=case,
- designation="variant_calls/seqvars/ingested-vcf",
- )[0]
+ seqvars_file = PedigreeInternalFile.objects.filter(
+ case=case,
+ designation="variant_calls/seqvars/ingested-vcf",
+ ).first()
+ if not seqvars_file:
+ continue
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
seqvars_file = PedigreeInternalFile.objects.filter( | |
case=case, | |
designation="variant_calls/seqvars/ingested-vcf", | |
)[0] | |
if seqvars_file.genomebuild == genome_release: | |
paths.append(f"{bucket}/{seqvars_file.path}") | |
seqvars_file = PedigreeInternalFile.objects.filter( | |
case=case, | |
designation="variant_calls/seqvars/ingested-vcf", | |
).first() | |
if not seqvars_file: | |
continue | |
if seqvars_file.genomebuild == genome_release: | |
paths.append(f"{bucket}/{seqvars_file.path}") |
Summary by CodeRabbit
Release Notes
New Features
Configuration Updates
Bug Fixes
Chores
varfish-server-worker
image.