Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: integrate seqvars inhouse rocksdb in worker call (#2069) #2070

Merged

Conversation

holtgrewe
Copy link
Collaborator

@holtgrewe holtgrewe commented Oct 30, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a management command for manually rebuilding the seqvars in-house database.
    • Added a new background job for in-house database builds, scheduled to run periodically.
    • Enhanced task routing for better task distribution based on naming conventions.
  • Configuration Updates

    • New environment variable for worker database paths added for improved configuration flexibility.
  • Bug Fixes

    • Improved error handling and logging for background job execution.
  • Chores

    • Updated Dockerfile to use the latest version of the varfish-server-worker image.

@holtgrewe holtgrewe linked an issue Oct 30, 2024 that may be closed by this pull request
Copy link
Contributor

coderabbitai bot commented Oct 30, 2024

Warning

Rate limit exceeded

@holtgrewe has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 29 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 24fadb0 and e194e50.

Walkthrough

The changes in this pull request include modifications to the Celery configuration for task routing, the addition of a new environment variable for worker database paths, the introduction of a management command for rebuilding the seqvars in-house database, and the creation of a new Django migration for a background job model. Additionally, new classes and methods related to background job execution were added, along with a new Celery task and periodic task registration. The Dockerfile was updated to reflect a new version of a dependency.

Changes

File Path Change Summary
backend/config/celery.py Added routing rule for tasks matching "*.run_*inhousedbbuild*" to the "maintenance" queue.
backend/config/settings/base.py Introduced new environment variable WORKER_RW_PATH for writable worker database paths.
backend/seqvars/management/commands/buildseqvarsinhousedb.py Added new management command class Command for rebuilding the seqvars in-house database with structured error handling.
backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py Created new migration for model SeqvarsInhouseDbBuildBackgroundJob with fields id, sodar_uuid, and bg_job.
backend/seqvars/models/base.py Added manager class SeqvarsInhouseDbBuildBackgroundJobManager and model class SeqvarsInhouseDbBuildBackgroundJob.
backend/seqvars/models/executors.py Introduced new executor classes and methods for managing background job execution, including aws_config_env_internal().
backend/seqvars/plugins.py Added SeqvarsInhouseDbBuildBackgroundJob to BackgroundJobsPlugin and updated job_specs dictionary.
backend/seqvars/tasks.py Added new Celery task run_seqvarsinhousedbbuildbackgroundjob and periodic task registration for execution.
utils/docker/Dockerfile Updated varfish-server-worker image from version 0.16.0 to 0.16.1.

Possibly related issues

🐇 In the garden, I hop and play,
New tasks and paths brighten the day!
With jobs that build and routes anew,
Our worker's strength will surely grow too!
So let’s cheer for changes, big and small,
For in this world, we can do it all! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Oct 30, 2024

deps-report 🔍

Commit scanned: e194e50

Vulnerable dependencies

4 dependencies have vulnerabilities 😱
Dependency Advisory Versions impacted
djangorestframework (transitive) Affected versions of the package djangorestframework are vulnerable to Cross-site Scripting (XSS) via the break_long_headers template filter due to improper input sanitization before splitting and joining with
tags.
<3.15.2
jinja2 (transitive) In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the source parameter as a template object, renders it, and then returns it. The attacker can exploit it with INJECTION COMMANDS in a URI. NOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing. >=0
setuptools (transitive) Affected versions of Setuptools allow for remote code execution via its download functions. These functions, which are used to download packages from URLs provided by users or retrieved from package index servers, are susceptible to code injection. If these functions are exposed to user-controlled inputs, such as package URLs, they can execute arbitrary commands on the system. <70.0.0
sqlalchemy Sqlalchemy 2.0.0b1 avoids leaking cleartext passwords to the open for careless uses of str(engine.URL()) in logs and prints. sqlalchemy/sqlalchemy#8563 <2.0.0b1

Outdated dependencies

43 outdated dependencies found (including 17 outdated major versions)😢
Dependency Installed version Latest version
alabaster (transitive) 0.7.16 1.0.0
argon2-cffi (transitive) 21.3.0 23.1.0
billiard (transitive) 3.6.4.0 4.2.1
crispy-bootstrap4 (transitive) 2022.1 2024.10
django 3.2.25 5.1.2
django-model-utils (transitive) 4.3.1 5.0.0
django-rest-knox (transitive) 4.2.0 5.0.2
django-sodar-core 0.13.4 1.0.2
mistune (transitive) 2.0.5 3.0.2
packaging (transitive) 23.2 24.1
protobuf 3.20.3 5.28.3
setuptools (transitive) 67.6.1 75.3.0
sphinx (transitive) 6.2.1 8.1.3
sphinx-rtd-theme (transitive) 1.2.2 3.0.1
sqlalchemy 1.4.54 2.0.36
unidecode (transitive) 0.4.21 1.3.8
xmlschema (transitive) 2.5.1 3.4.2
Dependency Installed version Latest version
botocore (transitive) 1.35.36 1.35.51
celery (transitive) 5.2.7 5.4.0
django-autocomplete-light (transitive) 3.9.4 3.11.0
django-crispy-forms (transitive) 2.0 2.3
django-db-file-storage (transitive) 0.5.5 0.5.6.1
django-debug-toolbar 4.3.0 4.4.6
django-environ (transitive) 0.10.0 0.11.2
django-iconify (transitive) 0.1.1 0.4
django-plugins-bihealth 0.4.0 0.5.2
django-postgres-copy 2.3.7 2.7.6
djangorestframework (transitive) 3.14.0 3.15.2
docutils (transitive) 0.18.1 0.21.2
drf-keyed-list-bihealth 0.1.1 0.2.1
drf-writable-nested 0.7.0 0.7.1
markdown (transitive) 3.4.1 3.7
mypy-protobuf (dev) 3.3.0 3.6.0
prettytable 3.11.0 3.12.0
pydantic-core (transitive) 2.23.4 2.25.1
requests-http-signature 0.2.0 0.7.1
rules (transitive) 3.3 3.5
selenium (dev) 4.25.0 4.26.0
tqdm 4.66.5 4.66.6
types-protobuf (dev) 5.28.0.20240924 5.28.3.20241030
versioneer (transitive) 0.28 0.29
wheel (transitive) 0.40.0 0.44.0
yarl (transitive) 1.16.0 1.17.0

Logs

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 44.21053% with 53 lines in your changes missing coverage. Please review.

Project coverage is 91%. Comparing base (65cbca2) to head (e194e50).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
backend/seqvars/models/executors.py 24% 47 Missing ⚠️
backend/seqvars/models/base.py 76% 4 Missing ⚠️
backend/seqvars/tasks.py 71% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2070   +/-   ##
=====================================
- Coverage     91%     91%   -1%     
=====================================
  Files        677     678    +1     
  Lines      38445   38532   +87     
=====================================
+ Hits       35098   35135   +37     
- Misses      3347    3397   +50     
Files with missing lines Coverage Δ
backend/config/celery.py 77% <ø> (ø)
backend/config/settings/base.py 75% <100%> (+<1%) ⬆️
...rations/0015_seqvarsinhousedbbuildbackgroundjob.py 100% <100%> (ø)
backend/seqvars/plugins.py 80% <100%> (ø)
backend/seqvars/tasks.py 83% <71%> (-17%) ⬇️
backend/seqvars/models/base.py 95% <76%> (-1%) ⬇️
backend/seqvars/models/executors.py 41% <24%> (-14%) ⬇️

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (3)
backend/seqvars/tasks.py (1)

Line range hint 8-12: Fix incorrect module reference in docstring.

The docstring references cases_import.models but the model appears to be in seqvars.models.

-    """Task to execute a ``cases_import.models.SeqvarsQueryExecutionBackgroundJob``."""
+    """Task to execute a ``seqvars.models.SeqvarsQueryExecutionBackgroundJob``."""
backend/seqvars/management/commands/buildseqvarsinhousedb.py (1)

11-13: Consider enhancing the help message with more details.

While the help message is present, it could be more informative by including:

  • What the seqvars inhouse database is
  • When/why someone would need to rebuild it
  • Any prerequisites or side effects of rebuilding
-    help = "(Re-) build the seqvars inhouse database."
+    help = """(Re-) build the seqvars inhouse database.
+
+    This command rebuilds the sequence variants in-house database which is used for [purpose].
+    You might need to run this when [scenarios].
+    
+    Note: This operation may take significant time and resources.
+    """
backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (1)

38-42: Consider index implications of the ordering configuration.

The model is configured to order by -pk which is appropriate for background jobs as it shows the most recent jobs first. Since this is the primary key, the ordering will use the table's clustered index.

However, if you plan to frequently query this table by sodar_uuid, consider adding a database index to optimize such queries.

You can add an index in the model's Meta class like this:

class Meta:
    ordering = ["-pk"]
    indexes = [
        models.Index(fields=["sodar_uuid"]),
    ]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 65cbca2 and 24fadb0.

📒 Files selected for processing (9)
  • backend/config/celery.py (1 hunks)
  • backend/config/settings/base.py (1 hunks)
  • backend/seqvars/management/commands/buildseqvarsinhousedb.py (1 hunks)
  • backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (1 hunks)
  • backend/seqvars/models/base.py (4 hunks)
  • backend/seqvars/models/executors.py (5 hunks)
  • backend/seqvars/plugins.py (2 hunks)
  • backend/seqvars/tasks.py (2 hunks)
  • utils/docker/Dockerfile (1 hunks)
🔇 Additional comments (14)
backend/seqvars/plugins.py (2)

3-6: LGTM! Clean import organization.

The multi-line import statement is well-organized and follows Python's style guidelines.


17-17: LGTM! Proper registration of the new background job.

The new background job is correctly registered in the plugin's job specifications, following the established pattern.

Let's verify the background job class implementation:

✅ Verification successful

Background job implementation verified and properly integrated

The verification confirms that the SeqvarsInhouseDbBuildBackgroundJob is correctly implemented and integrated:

  • Proper model definition with required fields and custom manager
  • Correct specification of spec_name matching the plugin registration
  • Complete implementation of executor with proper job handling
  • Migration file present for database schema updates
  • Task implementation for background job execution
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the SeqvarsInhouseDbBuildBackgroundJob class implementation
# and its integration with the background job system

# Check the background job class definition
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob {
  $$$
}'

# Check for any related test files
fd -g '*test*.py' | xargs rg 'SeqvarsInhouseDbBuildBackgroundJob'

Length of output: 139


Script:

#!/bin/bash
# Let's try a broader search pattern to locate the class definition and its usage

# Search for the class name in all files
rg -l "SeqvarsInhouseDbBuildBackgroundJob"

# Search for any class that extends BackgroundJob
ast-grep --pattern 'class $_ extends BackgroundJob {
  $$$
}'

# Alternative search without extends keyword
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'

# Look for background job related files
fd -g '*background*.py'

Length of output: 771


Script:

#!/bin/bash
# Let's examine the class implementation and its usage

# Check the class implementation in base.py
rg -A 10 "class SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/models/base.py

# Check how it's used in tasks.py
rg -A 5 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/tasks.py

# Check the migration file to understand the model structure
rg -A 10 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py

# Check the executor implementation
rg -A 5 "SeqvarsInhouseDbBuildBackgroundJob" backend/seqvars/models/executors.py

Length of output: 2783

backend/seqvars/tasks.py (1)

15-20: Implementation looks good!

The task follows good practices:

  • Uses bind=True for task instance access
  • Includes type hints for better code maintainability
  • Properly delegates to the executors module
backend/seqvars/management/commands/buildseqvarsinhousedb.py (2)

1-9: LGTM! Well-organized imports and clear module documentation.

The imports are properly organized with standard library imports separated from project imports, and the module's purpose is clearly documented.


7-8: Verify the implementation of referenced components.

Let's verify the implementation of the model and task to ensure they're properly defined.

✅ Verification successful

Components are properly implemented and registered as Celery tasks

The verification shows that both components are correctly implemented and registered:

  1. The model SeqvarsInhouseDbBuildBackgroundJob is properly defined in backend/seqvars/models/base.py with its manager and required fields.
  2. The executor function run_seqvarsbuildinhousedbbackgroundjob is implemented in backend/seqvars/models/executors.py with its corresponding InhouseDbBuildBackgroundJobExecutor class.
  3. The function is properly registered as a Celery task in backend/seqvars/tasks.py using the @app.task decorator and is also set up as a periodic task to run weekly (on Mondays).

The implementation follows the project's patterns for background jobs and task registration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of model and task

# Check model implementation
echo "Checking model implementation..."
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'

# Check task registration
echo "Checking task registration..."
rg -A 5 "run_seqvarsbuildinhousedbbackgroundjob" backend/seqvars/tasks.py

Length of output: 281


Script:

#!/bin/bash
# Let's try a broader search to locate these components

# Search for the model in all Python files
echo "Searching for SeqvarsInhouseDbBuildBackgroundJob model..."
rg -l "class SeqvarsInhouseDbBuildBackgroundJob" --type py

# Show the model implementation if found
echo "\nModel implementation:"
rg -A 10 "class SeqvarsInhouseDbBuildBackgroundJob" --type py

# Search for the executor function
echo "\nSearching for run_seqvarsbuildinhousedbbackgroundjob..."
rg -l "run_seqvarsbuildinhousedbbackgroundjob" --type py

# Show the executor implementation if found
echo "\nExecutor implementation:"
rg -A 10 "def run_seqvarsbuildinhousedbbackgroundjob" --type py

# Check if the function is registered as a task
echo "\nChecking task registration..."
rg "@task.*run_seqvarsbuildinhousedbbackgroundjob" --type py

Length of output: 2876


Script:

#!/bin/bash
# Let's verify the task registration and executor implementation

# Check if the function is imported and registered in tasks.py
echo "Checking tasks.py for registration..."
rg -A 5 "from seqvars.models.executors import run_seqvarsbuildinhousedbbackgroundjob" backend/seqvars/tasks.py

# Check the full executor implementation
echo "\nFull executor class implementation..."
rg -B 5 -A 20 "class InhouseDbBuildBackgroundJobExecutor" backend/seqvars/models/executors.py

# Check if there are any celery task decorators
echo "\nChecking for celery task decorators..."
rg "@shared_task|@app.task" backend/seqvars/models/executors.py

Length of output: 1679


Script:

#!/bin/bash
# Let's check tasks.py directly and look for celery task registration in the project

# Check contents of tasks.py
echo "Checking contents of tasks.py..."
rg -l "tasks.py" --type py

# Show contents of any tasks.py files found
echo "\nContents of tasks.py files:"
fd "tasks.py" --type f --exec cat {}

# Look for celery task registration patterns in the project
echo "\nLooking for celery task patterns..."
rg -l "from celery import shared_task|from celery import task|@shared_task|@task" --type py

Length of output: 12451

backend/seqvars/migrations/0015_seqvarsinhousedbbuildbackgroundjob.py (2)

21-36: LGTM! Model fields are well-defined.

The model fields follow Django best practices:

  • Primary key is auto-incrementing
  • UUID field ensures global uniqueness
  • ForeignKey relationship is properly configured with cascade deletion and descriptive help text

17-43: Verify background job cleanup strategy.

Since this model is tied to background jobs, ensure there's a cleanup strategy in place to prevent unbounded growth of the database table.

Let's check if there are any cleanup tasks or retention policies defined:

✅ Verification successful

Background job cleanup is properly implemented

The codebase has comprehensive cleanup mechanisms for background jobs:

  1. Automatic cleanup through periodic Celery tasks:

    • clear_inactive_variant_sets runs every hour at minute 11
    • clear_expired_exported_files runs daily at 1:11
    • run_seqvarsinhousedbbuildbackgroundjob runs weekly on Mondays at minute 11
  2. Manual cleanup commands:

    • clear_inactive_variant_sets management command
    • svs_bg_sv_set_cleanup management command with configurable timeout
  3. Cleanup implementation details:

    • Removes inactive variant sets older than 12 hours by default
    • Configurable cleanup timeout via VARFISH_SV_CLEANUP_BUILDING_SV_SETS setting
    • Handles cleanup of related database tables and annotations

The SeqvarsInhouseDbBuildBackgroundJob model is integrated into this existing cleanup infrastructure, ensuring proper database maintenance.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for background job cleanup configurations or tasks

# Check for cleanup tasks in Celery configuration
echo "Checking for cleanup tasks in Celery configuration..."
rg -l "cleanup|retention|purge" --type py backend/

# Check for existing cleanup implementations in bgjobs
echo "Checking bgjobs app for cleanup implementations..."
rg -l "cleanup|retention|purge" backend/bgjobs/

Length of output: 1675


Script:

#!/bin/bash
# Let's check the actual content of cleanup-related files and background job management

# Check maintenance command for variant sets cleanup
echo "=== Checking variant sets cleanup command ==="
cat backend/maintenance/management/commands/clear_inactive_variant_sets.py

# Check SV cleanup management command
echo -e "\n=== Checking SV cleanup command ==="
cat backend/svs/management/commands/svs_bg_sv_set_cleanup.py

# Look for background job lifecycle management
echo -e "\n=== Checking background job lifecycle management ==="
rg -A 10 "class BackgroundJob" backend/

Length of output: 8092


Script:

#!/bin/bash
# Let's check for settings related to cleanup and the tasks implementation

# Check settings for cleanup configurations
echo "=== Checking settings for cleanup configurations ==="
rg "CLEANUP|RETENTION|PURGE" backend/config/settings/

# Check the cleanup task implementations
echo -e "\n=== Checking cleanup task implementations ==="
rg -A 10 "cleanup_variant_sets|cleanup_bg_sv_set_task" backend/

# Check for any celery periodic task configurations
echo -e "\n=== Checking celery periodic task configurations ==="
rg -A 5 "celery\.schedules|PeriodicTask|crontab" backend/

Length of output: 13775

backend/config/celery.py (1)

35-35: LGTM! The routing rule follows established patterns.

The new routing rule appropriately directs in-house database build tasks to the maintenance queue, consistent with other maintenance operations.

Let's verify that the task naming follows the routing pattern:

✅ Verification successful

Task name matches the routing pattern correctly

The task run_seqvarsinhousedbbuildbackgroundjob follows the routing pattern *.run_*inhousedbbuild* and will be correctly routed to the maintenance queue. The implementation shows both the task definition and its periodic schedule setup, confirming proper integration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify task naming matches the routing pattern

# Test: Search for task definitions matching the routing pattern
# Expect: Task names should match "*.run_*inhousedbbuild*" pattern
rg -l "run_.*inhousedbbuild.*" --type py

Length of output: 92


Script:

#!/bin/bash
# Let's examine the task definition to verify it matches the routing pattern
rg "run_.*inhousedbbuild" -A 5 -B 2 backend/seqvars/tasks.py

Length of output: 604

utils/docker/Dockerfile (2)

148-148: Verify RocksDB configuration for the worker.

The Dockerfile includes librocksdb8.9 as a runtime dependency, which is required for the seqvars inhouse rocksdb integration. However, let's verify if any additional configurations are needed.

Consider documenting the RocksDB configuration requirements, such as:

  • Required environment variables
  • Volume mount paths for persistence
  • Performance tuning parameters
✅ Verification successful

RocksDB configuration appears to be correctly set up

The RocksDB configuration for the worker appears to be properly configured:

  1. The Dockerfile correctly includes librocksdb8.9 as a runtime dependency
  2. The worker is configured to use RocksDB for storage with proper paths (--path-out-rocksdb)
  3. No additional environment variables or special configurations are required as the worker handles the RocksDB setup internally
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify RocksDB-related configurations

# Test 1: Check for any RocksDB-related environment variables or configurations
rg -i "rocksdb|seqvars.*db" -g "!*.{pyc,min.js,min.css}"

# Test 2: Check for any RocksDB-related volume mounts or paths
fd -g "docker-compose*.y*ml" -X rg -i "rocksdb|seqvars.*db"

Length of output: 24561


Script:

#!/bin/bash
# Let's check for RocksDB configuration in docker-compose files and worker configuration
fd -g 'docker-compose*.y*ml' -X cat {} \;

# Also check for any RocksDB configuration in worker-related files
rg -i "rocksdb" -g "!*.{pyc,min.js,min.css}" -A 3 -B 3

Length of output: 1055


148-148: Verify compatibility with the updated worker version.

The update of varfish-server-worker from 0.16.0 to 0.16.1 aligns with the PR's objective of integrating seqvars inhouse rocksdb. However, let's verify the compatibility and requirements.

Consider documenting any version-specific requirements or configurations needed for the seqvars inhouse rocksdb integration in the repository's documentation.

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Worker version update is compatible with RocksDB integration

The update from 0.16.0 to 0.16.1 is compatible with the seqvars inhouse RocksDB integration. The codebase shows:

  • RocksDB runtime dependency is already included in the Dockerfile
  • Environment variable VARFISH_WORKER_DB_PATH is properly configured for worker database paths
  • Worker implementation includes RocksDB path handling in the executors
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the worker version compatibility and requirements

# Test 1: Check if the worker version exists in the registry
gh api \
  -H "Accept: application/vnd.oci.image.index.v1+json" \
  /orgs/varfish-org/packages/container/varfish-server-worker/versions \
  | jq -r '.[] | select(.metadata.container.tags | contains(["0.16.1"]))'

# Test 2: Check for any version-specific documentation or migration steps
fd -g "CHANGELOG*" -g "README*" -g "UPGRADE*" -X rg -i "0\.16\.(0|1)"

Length of output: 24561


Script:

#!/bin/bash
# Let's verify the worker version and RocksDB requirements

# Check docker-compose files for worker configuration and environment variables
fd -g 'docker-compose*.y*ml' -X cat {} \;

# Check for any RocksDB or seqvars database related environment variables or paths
rg -i '(SEQVARS.*DB.*PATH|ROCKSDB.*PATH|worker.*db.*path)' -g '!*.{pyc,min.js,min.css}'

# Check for any worker configuration files
fd -g '*worker*.{yml,yaml,conf,cfg,ini}' -X cat {} \;

Length of output: 9049

backend/config/settings/base.py (1)

505-507: LGTM! Consider adding documentation for the paths.

The separation of read-only and writable paths is a good practice. Consider adding comments to document the specific purpose of each path for operators.

Add comments above each path setting:

 # Path to database for the worker (base database with sub entries for mehari etc.).
 WORKER_DB_PATH = env.str("VARFISH_WORKER_DB_PATH", "/data/varfish-static/data")

+# Path for writable worker database operations (e.g., storing in-house data).
 WORKER_RW_PATH = env.str("VARFISH_WORKER_RW_PATH", "/data/varfish-dynamic/data")
backend/seqvars/models/base.py (1)

Line range hint 2435-2500: LGTM: Well-structured background job implementation.

The implementation of SeqvarsQueryExecutionBackgroundJob and its manager follows Django best practices with proper transaction handling, clear field definitions, and appropriate relationships.

backend/seqvars/models/executors.py (2)

283-291: 🛠️ Refactor suggestion

Ensure symlink updates handle errors and edge cases

While updating the "active" symlink, exceptions might occur (e.g., if the symlink already exists or due to permission issues). Adding error handling will make the code more robust and prevent unexpected crashes. Additionally, since this approach is Unix-specific, consider documenting this limitation or providing an alternative for non-Unix systems if cross-platform support is required.


48-55: Verify the security of subprocess calls

In the run_worker function, ensure that the cmd list and env dictionary do not contain untrusted input that could pose a security risk when passed to subprocess.check_call. Confirm that settings.WORKER_EXE_PATH and args are sourced from trusted inputs and are properly validated.

Run the following script to identify where args and settings.WORKER_EXE_PATH are defined to ensure they are securely set:

✅ Verification successful

Let me analyze the findings:

  1. WORKER_EXE_PATH is defined in backend/config/settings/base.py and sourced from an environment variable with a default value:
WORKER_EXE_PATH = env.str("VARFISH_WORKER_EXE_PATH", "varfish-server-worker")
  1. The args parameter in run_worker is used in several places in the codebase, all within controlled contexts:
  • In backend/seqvars/models/executors.py for sequence variant operations
  • In backend/cases_import/models/executors.py for case import operations

The args are constructed with hardcoded string literals like "seqvars", "query", "ingest", etc. and passed through command-line flags.

Let me verify one more aspect about input validation:


Let me try a different pattern to check the implementation:


Let me check how the args are constructed before being passed to run_worker:


Based on the comprehensive analysis, I can now provide the final response:

The subprocess call is secure in its current implementation

The run_worker function is used in a controlled manner where:

  1. WORKER_EXE_PATH is sourced from environment variables with a safe default value
  2. The args parameter is only constructed with:
    • Hardcoded string literals (e.g., "seqvars", "query", "aggregate")
    • Command-line flags (e.g., "--case-uuid", "--genomebuild")
    • Validated data from trusted sources:
      • UUIDs via str(self.case.sodar_uuid)
      • Timestamps via timezone.now().strftime()
      • Settings paths via settings.WORKER_DB_PATH
      • File paths constructed within the application

The environment variables are also properly controlled, being sourced from application settings and system environment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Locate definitions and usages of 'args' and 'settings.WORKER_EXE_PATH' in the codebase.

# Find where 'WORKER_EXE_PATH' is defined or assigned.
rg 'WORKER_EXE_PATH\s*='

# Find all usages and assignments of 'args'.
rg -A 3 -B 3 '\bargs\s*='

Length of output: 37922


Script:

# Check if there's any input validation or sanitization of args before passing to subprocess
ast-grep --pattern 'def run_worker($$$) {
  $$$
  subprocess.check_call($$$)
  $$$
}'

Length of output: 87


Script:

# Find the run_worker implementation and surrounding context
rg -A 10 -B 10 "def run_worker"

Length of output: 3236


Script:

# Find where args are constructed before being passed to run_worker
rg -A 10 "args = \[" backend/seqvars/models/executors.py backend/cases_import/models/executors.py

Length of output: 3941


@app.task(bind=True)
def run_seqvarsinhousedbbuildbackgroundjob(_self, *, seqvarsinhousedbbuildbackgroundjob_pk: int):
"""Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect module reference in docstring.

The docstring references cases_import.models but the model appears to be in seqvars.models.

-    """Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``."""
+    """Task to execute a ``seqvars.models.SeqvarsInhouseDbBuildBackgroundJob``."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Task to execute a ``cases_import.models.SeqvarsInhouseDbBuildBackgroundJob``."""
"""Task to execute a ``seqvars.models.SeqvarsInhouseDbBuildBackgroundJob``."""

Comment on lines +23 to +29
@app.on_after_finalize.connect
def setup_periodic_tasks(sender, **_kwargs):
"""Register periodic tasks"""
# Regularly rebuild the in-house database.
sender.add_periodic_task(
schedule=crontab(day_of_week=1, minute=11), sig=run_seqvarsinhousedbbuildbackgroundjob.s()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Fix task signature and improve documentation.

The review comment is correct. Based on the code analysis:

  1. The task signature is incorrect as run_seqvarsinhousedbbuildbackgroundjob requires a seqvarsinhousedbbuildbackgroundjob_pk parameter, which is confirmed by its implementation.
  2. The model has a custom manager with create_full() method that properly creates both the background job and the model instance.
  3. The command buildseqvarsinhousedb.py shows the correct usage pattern: create the job first using create_full() and then pass its PK to the task.

The suggested improvements in the review comment are accurate and should be implemented:

  1. Create a job instance using create_full() before scheduling
  2. Pass the job PK to the task
  3. Document the schedule details in the docstring
🔗 Analysis chain

Fix task signature and improve documentation.

There are several issues with the periodic task setup:

  1. The task signature is incorrect as run_seqvarsinhousedbbuildbackgroundjob requires a seqvarsinhousedbbuildbackgroundjob_pk parameter.
  2. The schedule timing (11 minutes past the hour) seems arbitrary and should be documented.

Let's verify if this task is being used elsewhere:

Consider these improvements:

  1. Create a job instance before scheduling the task
  2. Add schedule details to the docstring
  3. Consider moving the schedule configuration to settings

Example implementation:

 @app.on_after_finalize.connect
 def setup_periodic_tasks(sender, **_kwargs):
-    """Register periodic tasks"""
+    """Register periodic tasks.
+    
+    Schedules:
+    - Rebuild in-house database: Every Monday at 11 minutes past the hour
+    """
     # Regularly rebuild the in-house database.
+    job = executors.SeqvarsInhouseDbBuildBackgroundJob.objects.create()
     sender.add_periodic_task(
-        schedule=crontab(day_of_week=1, minute=11), sig=run_seqvarsinhousedbbuildbackgroundjob.s()
+        schedule=crontab(day_of_week=1, minute=11),
+        sig=run_seqvarsinhousedbbuildbackgroundjob.s(
+            seqvarsinhousedbbuildbackgroundjob_pk=job.pk
+        )
     )
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any other periodic task configurations or similar patterns
rg -l "add_periodic_task.*seqvars"

Length of output: 36


Script:

#!/bin/bash
# Let's verify the task signature and implementation
ast-grep --pattern 'def run_seqvarsinhousedbbuildbackgroundjob'

# Also check for any related task configurations
rg "run_seqvarsinhousedbbuildbackgroundjob" -A 3 -B 3

# Check the executor implementation
ast-grep --pattern 'class SeqvarsInhouseDbBuildBackgroundJob'

Length of output: 1078


Script:

#!/bin/bash
# Let's check the executor implementation in a different way
rg "class SeqvarsInhouseDbBuildBackgroundJob" -A 5 -B 1

# Also check for any imports or usage of this class
rg "SeqvarsInhouseDbBuildBackgroundJob" -A 2 -B 2

# Check if there are any other similar background job implementations
rg "BackgroundJob.*objects\.create" -A 2 -B 2

Length of output: 22341

Comment on lines +15 to +28
def handle(self, *_args, **options):
"""Entrypoint from command line"""

try:
self._handle()
except Exception as e:
self.stderr.write(
self.style.ERROR(
"A problem occured (see below).\n\n--- BEGIN TRACEBACK ---\n"
f"{traceback.format_exc()}--- END TRACEBACK ---\n"
)
)
raise CommandError("Could not re-build the seqvars inhouse db.") from e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix typo and consider using logging instead of stderr.

  1. There's a typo in the error message: "occured" should be "occurred"
  2. Consider using Django's logging framework instead of direct stderr writes for better consistency and configurability.
-                    "A problem occured (see below).\n\n--- BEGIN TRACEBACK ---\n"
+                    "A problem occurred (see below).\n\n--- BEGIN TRACEBACK ---\n"

Consider refactoring to use logging:

import logging

logger = logging.getLogger(__name__)

# In the exception handler:
logger.error("Could not re-build the seqvars inhouse db:\n%s", traceback.format_exc())

Comment on lines +29 to +34
def _handle(self):
# Create a new job to execute.
job = SeqvarsInhouseDbBuildBackgroundJob.objects.create_full()
self.stderr.write(self.style.SUCCESS("Executing seqvars inhouse db build job..."))
run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk)
self.stderr.write(self.style.SUCCESS("... done executing job"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing job execution robustness and user feedback.

The current implementation has several areas for improvement:

  1. The job execution is synchronous, which could block the command for long-running operations
  2. There's no progress feedback during execution
  3. Missing validation before starting the job
  4. No cleanup handling in case of failure

Consider implementing these improvements:

 def _handle(self):
+    # Check if any job is already running
+    if SeqvarsInhouseDbBuildBackgroundJob.objects.filter(status="running").exists():
+        raise CommandError("Another build job is already running")
+
     # Create a new job to execute.
     job = SeqvarsInhouseDbBuildBackgroundJob.objects.create_full()
-    self.stderr.write(self.style.SUCCESS("Executing seqvars inhouse db build job..."))
-    run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk)
-    self.stderr.write(self.style.SUCCESS("... done executing job"))
+    try:
+        self.stderr.write(self.style.SUCCESS(f"Started job with ID: {job.pk}"))
+        self.stderr.write("You can monitor the progress in the admin interface or logs")
+        
+        # Option 1: Run asynchronously
+        run_seqvarsbuildinhousedbbackgroundjob.delay(pk=job.pk)
+        
+        # Option 2: Run synchronously with progress
+        # run_seqvarsbuildinhousedbbackgroundjob(pk=job.pk)
+        # while not job.refresh_from_db():
+        #     self.stderr.write(f"Progress: {job.progress}%")
+        #     time.sleep(5)
+    except Exception as e:
+        job.status = "failed"
+        job.error_message = str(e)
+        job.save()
+        raise

Comment on lines +2502 to +2552
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager):
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob``
together with the backing ``BackgroundJob``.
"""

@transaction.atomic
def create_full(self):
bg_job = BackgroundJob.objects.create(
name="Building seqvars inhouse DB",
project=None,
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name,
user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER),
)
instance = super().create(
bg_job=bg_job,
)
return instance


class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model):
"""Background job for re-building the seqvars inhouse DB."""

# We use a custom manager that provides creation together with the ``BackgroundJob``.
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager()

#: Task description for logging.
task_desc = "Seqvars Query Execution"

#: String identifying model in BackgroundJob.
spec_name = "seqvars.runinhousedbbuild"

#: The SODAR UUID.
sodar_uuid = models.UUIDField(
default=uuid_object.uuid4,
unique=True,
)

#: The background job for state management etc.
bg_job = models.ForeignKey(
BackgroundJob,
null=False,
related_name="seqvarsinhousedbbuildbackgroundjob",
help_text="Background job for state etc.",
on_delete=models.CASCADE,
)

def get_human_readable_type(self):
return self.task_desc

class Meta:
ordering = ["-pk"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Multiple issues in SeqvarsInhouseDbBuildBackgroundJob implementation.

  1. The task description is incorrect (copied from query execution job)
  2. The create_full method implementation could be improved
  3. Missing documentation

Apply these fixes:

class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager):
    """Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob``
    together with the backing ``BackgroundJob``.
    """

    @transaction.atomic
-   def create_full(self):
+   def create_full(self, *, user: typing.Optional[User] = None) -> "SeqvarsInhouseDbBuildBackgroundJob":
+       """Create a new background job for building the in-house database.
+
+       :param user: The user to associate with the job. If None, uses the admin user.
+       :return: The created background job instance.
+       """
+       if user is None:
+           user = User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER)
        bg_job = BackgroundJob.objects.create(
            name="Building seqvars inhouse DB",
            project=None,
            job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name,
-           user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER),
+           user=user,
        )
        instance = super().create(
            bg_job=bg_job,
        )
        return instance

class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model):
    """Background job for re-building the seqvars inhouse DB."""

    objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager()

    #: Task description for logging.
-   task_desc = "Seqvars Query Execution"
+   task_desc = "Build Seqvars In-house Database"

The changes:

  1. Fix the task description to accurately reflect the job's purpose
  2. Make the create_full method more flexible by accepting an optional user parameter
  3. Add proper documentation to the create_full method
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager):
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob``
together with the backing ``BackgroundJob``.
"""
@transaction.atomic
def create_full(self):
bg_job = BackgroundJob.objects.create(
name="Building seqvars inhouse DB",
project=None,
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name,
user=User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER),
)
instance = super().create(
bg_job=bg_job,
)
return instance
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model):
"""Background job for re-building the seqvars inhouse DB."""
# We use a custom manager that provides creation together with the ``BackgroundJob``.
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager()
#: Task description for logging.
task_desc = "Seqvars Query Execution"
#: String identifying model in BackgroundJob.
spec_name = "seqvars.runinhousedbbuild"
#: The SODAR UUID.
sodar_uuid = models.UUIDField(
default=uuid_object.uuid4,
unique=True,
)
#: The background job for state management etc.
bg_job = models.ForeignKey(
BackgroundJob,
null=False,
related_name="seqvarsinhousedbbuildbackgroundjob",
help_text="Background job for state etc.",
on_delete=models.CASCADE,
)
def get_human_readable_type(self):
return self.task_desc
class Meta:
ordering = ["-pk"]
class SeqvarsInhouseDbBuildBackgroundJobManager(models.Manager):
"""Custom manager class that allows to create a ``SeqvarsInhouseDbBuildBackgroundJob``
together with the backing ``BackgroundJob``.
"""
@transaction.atomic
def create_full(self, *, user: typing.Optional[User] = None) -> "SeqvarsInhouseDbBuildBackgroundJob":
"""Create a new background job for building the in-house database.
:param user: The user to associate with the job. If None, uses the admin user.
:return: The created background job instance.
"""
if user is None:
user = User.objects.get(username=settings.PROJECTROLES_ADMIN_OWNER)
bg_job = BackgroundJob.objects.create(
name="Building seqvars inhouse DB",
project=None,
job_type=SeqvarsInhouseDbBuildBackgroundJob.spec_name,
user=user,
)
instance = super().create(
bg_job=bg_job,
)
return instance
class SeqvarsInhouseDbBuildBackgroundJob(JobModelMessageMixin, models.Model):
"""Background job for re-building the seqvars inhouse DB."""
# We use a custom manager that provides creation together with the ``BackgroundJob``.
objects: SeqvarsInhouseDbBuildBackgroundJobManager = SeqvarsInhouseDbBuildBackgroundJobManager()
#: Task description for logging.
task_desc = "Build Seqvars In-house Database"
#: String identifying model in BackgroundJob.
spec_name = "seqvars.runinhousedbbuild"
#: The SODAR UUID.
sodar_uuid = models.UUIDField(
default=uuid_object.uuid4,
unique=True,
)
#: The background job for state management etc.
bg_job = models.ForeignKey(
BackgroundJob,
null=False,
related_name="seqvarsinhousedbbuildbackgroundjob",
help_text="Background job for state etc.",
on_delete=models.CASCADE,
)
def get_human_readable_type(self):
return self.task_desc
class Meta:
ordering = ["-pk"]

Comment on lines +274 to +275
print(" ".join(args))
run_worker(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid printing sensitive command-line arguments

Printing the command-line arguments with print(" ".join(args)) may inadvertently expose sensitive information in logs or console output. If the arguments contain confidential data, consider removing this print statement or sanitizing the output to prevent potential information leakage.

Apply this diff to remove the print statement:

-            print(" ".join(args))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(" ".join(args))
run_worker(
run_worker(

Comment on lines +166 to +171
run_worker(
args=args,
env=env,
)
except Exception:
pass
print("Error while executing worker / importing results", file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Exceptions in execute_query are not properly handled

In execute_query, if an exception occurs during run_worker, the exception is caught, and an error message is printed, but the exception is not re-raised or handled further. This may lead to the program continuing in an invalid state. Consider re-raising the exception to ensure that the calling method is aware of the failure and can handle it appropriately.

Apply this diff to re-raise the exception after logging:

            except Exception as e:
                print("Error while executing worker / importing results", file=sys.stderr)
+               raise e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run_worker(
args=args,
env=env,
)
except Exception:
pass
print("Error while executing worker / importing results", file=sys.stderr)
run_worker(
args=args,
env=env,
)
except Exception as e:
print("Error while executing worker / importing results", file=sys.stderr)
raise e

Comment on lines +162 to 163
**aws_config_env_internal(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid merging all environment variables into the subprocess

When constructing the env dictionary, merging the entire os.environ can introduce unintended environment variables into the subprocess, which might affect its behavior or security. It is safer to specify only the necessary environment variables explicitly.

Apply this diff to specify only the required environment variables:

-        env = {
-            **dict(os.environ.items()),
-            "LC_ALL": "C",
-            **aws_config_env_internal(),
-        }
+        env = {
+            "LC_ALL": "C",
+            **aws_config_env_internal(),
+        }

Committable suggestion was skipped due to low confidence.

Comment on lines +303 to +308
seqvars_file = PedigreeInternalFile.objects.filter(
case=case,
designation="variant_calls/seqvars/ingested-vcf",
)[0]
if seqvars_file.genomebuild == genome_release:
paths.append(f"{bucket}/{seqvars_file.path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential absence of PedigreeInternalFile objects

When accessing PedigreeInternalFile.objects.filter(...)[0], there's a risk of an IndexError if the queryset is empty. This can happen if no files match the filter criteria. To prevent the application from crashing, use .first() and check if the result is None before proceeding.

Apply this diff to safely handle cases with no matching files:

-                seqvars_file = PedigreeInternalFile.objects.filter(
-                    case=case,
-                    designation="variant_calls/seqvars/ingested-vcf",
-                )[0]
+                seqvars_file = PedigreeInternalFile.objects.filter(
+                    case=case,
+                    designation="variant_calls/seqvars/ingested-vcf",
+                ).first()
+                if not seqvars_file:
+                    continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
seqvars_file = PedigreeInternalFile.objects.filter(
case=case,
designation="variant_calls/seqvars/ingested-vcf",
)[0]
if seqvars_file.genomebuild == genome_release:
paths.append(f"{bucket}/{seqvars_file.path}")
seqvars_file = PedigreeInternalFile.objects.filter(
case=case,
designation="variant_calls/seqvars/ingested-vcf",
).first()
if not seqvars_file:
continue
if seqvars_file.genomebuild == genome_release:
paths.append(f"{bucket}/{seqvars_file.path}")

@holtgrewe holtgrewe merged commit 0d5e380 into main Nov 1, 2024
16 of 17 checks passed
@holtgrewe holtgrewe deleted the 2069-integrate-seqvars-inhouse-rocksdb-in-varfish-worker-call branch November 1, 2024 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate seqvars inhouse RocksDB in varfish-worker call.
1 participant