Skip to content

Cache journals#508

Closed
Dagonite wants to merge 18 commits into
mainfrom
cache-journals
Closed

Cache journals#508
Dagonite wants to merge 18 commits into
mainfrom
cache-journals

Conversation

@Dagonite

@Dagonite Dagonite commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Closes #509.

Description

Adds a shared Valkey cache helper to run-detection. The helper handles lazy client creation, VALKEY_URL configuration, JSON serialization, TTL-based writes, and graceful fallback when Valkey is unavailable.

The EnginX journal lookup now uses this cache to store the derived run-number-to-cycle map, keyed as run_detection:enginx:run_number_cycle_map. On startup or lookup, run-detection first tries to read the cached map; if it is missing or Valkey is unavailable, it parses the journal XML files and writes the derived map back to Valkey with a TTL.

Local testing

Start Valkey:

docker run --rm -d --name run-detection-valkey -p 6379:6379 valkey/valkey:8-alpine

Run from repo root:

$env:VALKEY_URL = "redis://localhost:6379/0"
$env:ENGINX_JOURNAL_DIR = "test/test_data/e2e_data/NDXENGINX/Instrument/logs/journal"
$env:ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_TTL_SECONDS = "3600"

@'
from rundetection.cache import cache_get_json
from rundetection.rules.enginx_rules import (
    ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY,
    build_enginx_run_number_cycle_map,
)

mapping = build_enginx_run_number_cycle_map()
cached = cache_get_json(ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY)

print(mapping.get(241391))
print(cached is not None)
print(cached.get("241391"))
'@ | python

Expected output:

20_1
True
20_1

Introduce rundetection/cache.py to manage a shared Valkey client and provide JSON get/set helpers with disable-on-error behavior. Refactor EnginX rules to read journal files from a configurable directory, coerce cached JSON back to int keys, and use the Valkey cache (with configurable TTL) to avoid rebuilding the run-number->cycle map on every call. Add unit tests for the cache helpers and EnginX caching behavior and include sample journal XML fixtures. Also expose ARCHIVE_ROOT from env in run_detection.
@codecov

codecov Bot commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.00000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.89%. Comparing base (c047d0f) to head (e28bf17).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rundetection/cache.py 88.76% 10 Missing ⚠️
rundetection/rules/enginx_rules.py 92.79% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #508      +/-   ##
==========================================
+ Coverage   93.83%   94.89%   +1.06%     
==========================================
  Files          18       19       +1     
  Lines        1152     1333     +181     
==========================================
+ Hits         1081     1265     +184     
+ Misses         71       68       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared Valkey-backed caching utility for run-detection and applies it to EnginX journal parsing so the derived run-number → cycle mapping can be reused across runs/processes with a TTL, falling back to journal XML parsing when Valkey is unavailable.

Changes:

  • Add rundetection.cache Valkey helper with lazy client creation, JSON get/set, TTL writes, and “disable-on-error” fallback behavior.
  • Update EnginX journal mapping build to read/write the mapping via Valkey (configurable TTL + journal directory), with tests validating cached vs. uncached paths.
  • Extend EnginX test journal fixtures and add new unit tests for the cache module.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
rundetection/cache.py New Valkey cache helper for JSON get/set and lazy client creation with fallback behavior.
rundetection/rules/enginx_rules.py Adds Valkey-based caching for EnginX run-number → cycle mapping; makes journal dir/TTL configurable.
rundetection/run_detection.py Adds ARCHIVE_ROOT env-derived constant (currently unused).
test/test_cache.py New unit tests for Valkey cache helpers (client creation, disable-on-error, JSON get/set).
test/rules/test_enginx_rules.py Adds tests covering cached mapping, cache miss parsing + cache population, and TTL-disabled behavior.
test/test_data/e2e_data/NDXENGINX/Instrument/logs/journal/journal_21_1.xml Additional journal fixture for mapping parsing tests.
test/test_data/e2e_data/NDXENGINX/Instrument/logs/journal/journal_22_1.xml Additional journal fixture for mapping parsing tests.
pyproject.toml Adds valkey dependency (and reorders/updates dependency list).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rundetection/run_detection.py
Comment thread rundetection/cache.py Outdated
Comment thread rundetection/cache.py Outdated
Comment thread rundetection/cache.py Outdated
Comment thread rundetection/rules/enginx_rules.py
Comment thread rundetection/rules/enginx_rules.py
Dagonite and others added 6 commits May 12, 2026 11:16
Close and suppress errors on Valkey client shutdown, include cache keys in error logs, and avoid repeated connection attempts by marking the client disabled. Add in-process memoization for EnginX run->cycle mappings (_ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE) with helper functions to read, cache and clear the mapping; avoid caching empty mappings and emit warnings when journal paths are missing, not directories, or contain no XML. Update build_enginx_run_number_cycle_map to prefer Valkey but fall back to the in-process cache and only persist non-empty mappings. Add and adjust unit tests to cover cache disable/close behavior, logging, and EnginX journal memoization; also include minor docstring and formatting cleanups across modules.
Split and clean up EnginX journal parsing: introduce _ENGINX_RUN_NAME_PREFIX constant and helper functions (_add_enginx_run_to_cycle_map, _walk_enginx_journal_node, _enginx_journal_files) to centralize prefix handling, recursive walking, and file discovery. _enginx_journal_files now returns None on missing/invalid directories or no XML files; _read_enginx_run_number_cycle_map uses these helpers and parses journal XMLs into the run->cycle mapping. Tests updated to use a REPEATED_BUILD_CALL_COUNT constant for repeated-call assertions and to match the refactored behavior.
Introduce a _Closable Protocol and import typing.cast to ensure the Valkey client is treated as closable before calling close(). Adjust imports to include Protocol and cast, and tidy a docstring formatting. This change satisfies the type checker while preserving existing lazy client behavior and error handling.
@Dagonite Dagonite requested a review from keiranjprice101 May 12, 2026 11:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

rundetection/run_detection.py:70

  • The producer docstring has the :return: field inlined into the summary text, which makes the docstring hard to scan and inconsistent with other docstrings in the repo. Please format the docstring so the summary and :return: (and any params) are on separate lines.
    """
    Return a context managed pika producer channel :return:
    BlockingChannel.
    """

rundetection/run_detection.py:190

  • The process_notifications docstring now inlines :param/:return: into the summary (wrapped across lines). Please revert to the standard docstring layout with summary first, then a blank line, then :param ...: and :return: entries on their own lines for clarity and consistency.
    """
    Produce messages until the notification queue is empty :param
    notification_queue: The notification queue :return: None.
    """

rundetection/run_detection.py:203

  • The notify_failures docstring now inlines :param/:return: into the summary text. Please restore the previous multi-line formatting (summary line(s) first, then parameter/return fields on separate lines) to keep docstrings readable and consistent.
    """
    Produce Failure messages until the failure queue is empty :param
    failure_queue: The failure Queue :return: None.
    """

rundetection/run_detection.py:259

  • The main docstring has been collapsed into a single line that mixes the summary and :return: field. Please format it as a standard multi-line docstring (summary on its own line, then :return: on a separate line) to match the project’s usual style.
def main() -> None:
    """Entry point for run detection :return: None."""

Comment thread rundetection/rules/enginx_rules.py Outdated
Comment thread rundetection/run_detection.py
@keiranjprice101

Copy link
Copy Markdown
Collaborator

@Dagonite
What is going on with some of the docstrings in this PR?

For example:
image

Why have they been squashed onto less lines?

@Pasarus Pasarus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to request a few changes here, there is a lot of overcomplication of naming. A lot of concepts and implementations that are overly complicated and with poor readability, are just not necessary or actively the opposite of what we would want from this software.

I am going to re-review the entire thing when the suggestions and comments have been responded to and changes made.

Comment thread rundetection/rules/enginx_rules.py Outdated

ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY = "run_detection:enginx:run_number_cycle_map"
ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_TTL_SECONDS = 60 * 60
_ENGINX_RUN_NAME_PREFIX = "ENGINX"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ENGINX_RUN_NAME_PREFIX = "ENGINX"
ENGINX_PREFIX = "ENGINX"

Comment thread rundetection/rules/enginx_rules.py Outdated
expires_at: float


_ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE: dict[Path, _EnginxRunNumberCycleMapCacheEntry] = {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE: dict[Path, _EnginxRunNumberCycleMapCacheEntry] = {}
ENGINX_CACHE: dict[Path, EnginxCacheEntry] = {}

Comment thread rundetection/rules/enginx_rules.py Outdated


@dataclass(slots=True)
class _EnginxRunNumberCycleMapCacheEntry:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class _EnginxRunNumberCycleMapCacheEntry:
class EnginxCacheEntry:

Comment thread rundetection/rules/enginx_rules.py Outdated

logger = logging.getLogger(__name__)

ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY = "run_detection:enginx:run_number_cycle_map"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY = "run_detection:enginx:run_number_cycle_map"
ENGINX_CACHE_KEY = "run_detection:enginx:run_number_cycle_map"

Comment thread rundetection/rules/enginx_rules.py Outdated
logger = logging.getLogger(__name__)

ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_KEY = "run_detection:enginx:run_number_cycle_map"
ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_TTL_SECONDS = 60 * 60

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENGINX_RUN_NUMBER_CYCLE_MAP_CACHE_TTL_SECONDS = 60 * 60
ENGINX_CACHE_TTL_SECONDS = 60 * 60

Comment thread rundetection/cache.py Outdated
return state.client


def _disable_cache(exc: Exception) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the point of this function is? Are we expecting a world in which this operates without a valkey cache?If so would it not make more sense to have a very simple environment variable with on or off then everytime we go to cache check this global variable and do it if we have it else don't? or check if the valkey env vars are set at all and if not fail.

This "silent" failing is actually the opposite of the behavior we would want, as it bypasses all our observability and alerting structure.

Comment thread rundetection/cache.py Outdated
disabled: bool = False


class _Closable(Protocol):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we are using Protocol here? It seems excessive for our needs, and adds a concept we haven't used elsewhere within the project.

Dagonite added 3 commits May 19, 2026 23:37
Introduce VALKEY_CACHE_ENABLED env var to allow opting out of the Valkey cache and treat several literal values as disabled. Refactor Valkey client lifecycle: add _valkey_cache_enabled check, rename _disable_cache to _disable_valkey_cache, simplify client teardown, and switch some exception logging from exception() to warning() while preserving graceful degradation on Valkey errors. Improve JSON serialization/logging paths in cache_get_json/cache_set_json. Refactor EnginX journal caching: add in-process EnginxCacheEntry, break out _get_cached/_cache functions, handle malformed ENGINX entry names robustly, prefer local in-process cache before Valkey, and adjust related tests to assert new behaviors and logging.
@Dagonite

Copy link
Copy Markdown
Collaborator Author

@Dagonite What is going on with some of the docstrings in this PR?

For example: image

Why have they been squashed onto less lines?

It's because my editor's global formatter is modifying the docstrings. Is there a repo level formatter config?

@Dagonite

Copy link
Copy Markdown
Collaborator Author

Closing and moving to #516.

@Dagonite Dagonite closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache journal files

4 participants