diff --git a/connectus/Readme.md b/connectus/Readme.md index 8cc5548036f..ad68712c2f5 100644 --- a/connectus/Readme.md +++ b/connectus/Readme.md @@ -1,4 +1,7 @@ Note, this folder should not be merged to master. + +> **Architecture note.** [`connectus/workflow_state.py`](workflow_state.py:1) is now a thin backward-compatible shim that re-exports the real package at [`connectus/workflow_state/`](workflow_state/__init__.py:1). The CLI entrypoint, validators, state machine, CSV I/O, display helpers, and config loader live there. Behavior is identical; the file split is purely for maintainability. The canonical Python import is `from workflow_state import …`. + ## Authentication Type Catalog Each integration's authentication is classified into an **Auth Class** string @@ -85,7 +88,7 @@ python3 connectus/check_command_params.py \ `--integration-id` is **optional but strongly recommended inside the migration workflow**. When set, the analyzer additionally pulls the auth-derived ignore set from -[`workflow_state.py auth-params `](workflow_state.py:1) and unions it +[`workflow_state.py auth-params `](workflow_state/cli.py:1) and unions it into its own ignore set, guaranteeing that any param already declared in `Auth Details` (auth secrets + `other_connection`) cannot leak into the per-command output. Standalone runs outside the migration workflow can @@ -127,7 +130,10 @@ analyzer and processes its output. The [`workflow_state.py`](workflow_state.py) script manages the **16 workflow columns** (columns 5–20) of [`connectus/connectus-migration-pipeline.csv`](connectus-migration-pipeline.csv). It models the workflow as a **single linear 16-step sequence**, strictly gated. The current step is always the first step that is not yet done. -State is **purely derived from row contents** — there is no separate "current step" pointer. Re-issuing any `set-*`, `markpass`, or `skip` for a step at-or-behind the current step writes the new value AND clears every step that follows it ("cascade reset"). The ONLY exception is `set-assignee`, which is administrative and never resets later steps. +State is **purely derived from row contents** — there is no separate "current step" pointer. Re-issuing any `set-*`, `markpass`, or `skip` for a step at-or-behind the current step writes the new value AND clears every step that follows it ("cascade reset"). Two carve-outs apply: + +- **`set-assignee`** never cascades (governed by the YAML flag `cascade_on_set: false`). +- **`reset-to` and `fail`** preserve any step tagged `preserve_on_reset: true` in [`workflow_state_config.yml`](workflow_state_config.yml). Today the three Params\* data columns (#3 `Params to Commands`, #4 `Params for test with default in code`, #5 `Params same in other handlers`) carry that flag — see Rule 8 below. ### The 16-Step Sequence @@ -155,12 +161,14 @@ State is **purely derived from row contents** — there is no separate "current 1. **Single linear sequence.** The current step is the first step not yet done. 2. **Strict ordering.** Any `set-*`/`markpass`/`skip` targeting a step **ahead** of the current step is rejected with a message naming the missing prerequisite. 3. **Cascade reset.** Re-issuing any `set-*`/`markpass`/`skip` at-or-behind current writes the new value AND clears every step after it. -4. **`set-assignee` carve-out.** `set-assignee` is the ONLY exception — it updates step #1 in place without cascading. Re-assigning an integration mid-flight does NOT wipe progress. +4. **`set-assignee` carve-out.** `set-assignee` (step #1) updates in place without cascading. Re-assigning an integration mid-flight does NOT wipe progress. Configured via `cascade_on_set: false` in [`workflow_state_config.yml`](workflow_state_config.yml). 5. **Optional step #5.** `Params same in other handlers` may be `skip`-ped; that writes the sentinel `"N/A"` and unblocks step #6. Setting it to a real JSON value later cascade-resets steps #6+. 6. **Flag step #12 → step #13 auto-N/A.** Setting `requires auth parity test` to `NO` or `N/A` automatically writes `"N/A"` into `auth parity test passes`. Setting it to `YES` leaves #13 empty so the user must `markpass` it. 7. **Normalization on read AND write.** Any value past the first incomplete step is auto-cleared (with a one-line stderr warning per affected row). Contradictions are not allowed to persist. -8. **`fail` and `reset-to`.** Both verbs clear the named step AND every step after it (the named step becomes the new current step). They have identical behavior; `reset-to` is the explicit name, `fail` reads as "this step failed, redo it". -9. **`reset` (no step).** Clears all 16 workflow columns for the integration. Identity columns (`Integration ID`, `Integration File Path`, `Connector ID`) are preserved. +8. **`fail` and `reset-to` honour `preserve_on_reset`.** Both verbs clear the named step AND every step after it (the named step becomes the new current step). They have identical behaviour. **EXCEPTION:** any step tagged `preserve_on_reset: true` in [`workflow_state_config.yml`](workflow_state_config.yml) keeps its value across these operations — its name is reported in the CLI output (`Preserved (preserve_on_reset=true): [...]`) and in the api response (`result["preserved"]`). Today the three Params\* data columns (#3, #4, #5) are preserved so a failed checkpoint does not wipe per-command param research. + - **Explicit-target carve-out:** if the user names a preserved step **directly** as the `reset-to`/`fail` target, that one step IS cleared (the user's intent wins), but later preserved steps in the same operation are still preserved. + - **`set-auth` is NOT covered by `preserve_on_reset`.** Auth changes invalidate every downstream artifact — `set-auth` continues to wipe steps #3-#16 (Params\* included) by design. See `apply_step_action` in [`connectus/workflow_state/state_machine.py`](workflow_state/state_machine.py). +9. **`reset` (no step).** Clears all 16 workflow columns for the integration. Identity columns (`Integration ID`, `Integration File Path`, `Connector ID`) are preserved. **`preserve_on_reset` is intentionally ignored** — `reset` is the "wipe the row" verb with no carve-outs. ### CLI Commands @@ -287,7 +295,7 @@ python3 connectus/workflow_state.py auth-params "Cisco Spark" --format=json The script exposes functions that can be imported and called directly: ```python -from connectus.workflow_state import ( +from workflow_state import ( get_integration_status, next_step_for, markpass_integration_step, diff --git a/connectus/auth_config_parser/DESIGN.md b/connectus/auth_config_parser/DESIGN.md new file mode 100644 index 00000000000..f1fbebfd2c5 --- /dev/null +++ b/connectus/auth_config_parser/DESIGN.md @@ -0,0 +1,748 @@ +# `auth_config_parser` — Package Design + +Standalone Python package that extracts, formalizes, and improves the Auth +Details Config parser currently embedded in +[`workflow_state.py`](../workflow_state.py). + +--- + +## 1. Motivation + +The Auth Details parsing/validation logic in +[`workflow_state.py`](../workflow_state.py:430) has grown to ~600 lines +spanning six functions and three regex constants. It is consumed by: + +1. **`workflow_state.py`** itself — the `set-auth` CLI setter and the + `auth-params` helper. +2. **`check_command_params.py`** — via + [`auth_param_ids()`](../workflow_state.py:940) for the overlap-rejection + ignore set. +3. **`check_auth_parity.py`** (planned) — will need structured access to + parsed `AuthDetails` objects, not raw dicts. + +Extracting this into a self-contained package provides: + +- **Typed data model** — dataclasses with proper type hints replace ad-hoc + dicts, enabling IDE autocompletion and `mypy` checking. +- **Separation of concerns** — parsing (raise on bad input) vs. validation + (return error lists) vs. utilities (param extraction) are cleanly split. +- **Testability** — pure functions with no CSV/filesystem dependencies. +- **Reusability** — `check_auth_parity.py` and future tools import from + one canonical package instead of reaching into `workflow_state`. + +--- + +## 2. Package Layout + +``` +connectus/auth_config_parser/ +├── __init__.py # Public API re-exports +├── types.py # Dataclasses, enums, custom exceptions +├── parser.py # Pure parsing functions +├── validator.py # Validation functions (return error lists) +├── utils.py # Utility functions (param extraction) +├── DESIGN.md # This file +└── tests/ + ├── __init__.py + ├── test_parser.py # Parser unit tests + ├── test_validator.py# Validation tests + └── test_utils.py # Utility tests +``` + +--- + +## 3. Module Specifications + +### 3.1 `types.py` — Data Model + +All public types live here. Pure Python, no external dependencies. + +```python +from __future__ import annotations + +import enum +from dataclasses import dataclass, field + + +class AuthType(enum.Enum): + """The 7 valid auth-type enum values for Auth Details entries.""" + OAuth2AuthCode = "OAuth2AuthCode" + OAuth2ClientCreds = "OAuth2ClientCreds" + OAuth2JWT = "OAuth2JWT" + APIKey = "APIKey" + Plain = "Plain" + Other = "Other" + NoneRequired = "NoneRequired" + + +class ClauseOperator(enum.Enum): + """Operators in the config expression mini-grammar.""" + REQUIRED = "REQUIRED" + OPTIONAL = "OPTIONAL" + CHOICE = "CHOICE" + + +@dataclass(frozen=True) +class AuthEntry: + """One entry in auth_types[]: a single UCP connection type. + + Attributes: + type: The auth-type enum value. + name: Free-form logical id (unique within the row). + xsoar_params: XSOAR field paths supplying secrets for this + connection type. Bare ids or dotted forms. + interpolated: When True, the value is templated at runtime + rather than supplied by the user. Defaults to False. + """ + type: AuthType + name: str + xsoar_params: list[str] + interpolated: bool = False + + +@dataclass(frozen=True) +class ConfigClause: + """One clause in a config expression. + + Attributes: + operator: REQUIRED, OPTIONAL, or CHOICE. + names: The connection-type names referenced by this clause. + """ + operator: ClauseOperator + names: list[str] + + +@dataclass(frozen=True) +class ConfigExpression: + """Parsed config expression. + + Attributes: + none_required: True when the expression is the literal + 'NoneRequired'. When True, clauses is empty. + clauses: Ordered list of parsed clauses. Empty when + none_required is True. + """ + none_required: bool = False + clauses: list[ConfigClause] = field(default_factory=list) + + @property + def referenced_names(self) -> list[str]: + """All connection-type names referenced across all clauses, + in order, possibly with duplicates.""" + names: list[str] = [] + for clause in self.clauses: + names.extend(clause.names) + return names + + +@dataclass(frozen=True) +class AuthDetails: + """Fully parsed Auth Details JSON object. + + Attributes: + auth_types: List of auth entries, sorted by (type, name). + config: Parsed config expression. + other_connection: Sorted list of YML param ids for + connection-adjacent non-auth params. None when the key + is absent (legacy rows). + """ + auth_types: list[AuthEntry] + config: ConfigExpression + other_connection: list[str] | None = None + + @property + def auth_type_names(self) -> set[str]: + """Set of all auth_types[].name values.""" + return {e.name for e in self.auth_types} + + +class AuthConfigParseError(Exception): + """Raised by parser functions when input is structurally invalid. + + Attributes: + message: Human-readable description of the parse failure. + errors: List of individual error strings (may contain >1 for + multi-error reporting). + """ + def __init__(self, message: str, errors: list[str] | None = None): + super().__init__(message) + self.message = message + self.errors = errors or [message] +``` + +#### Design decisions + +| Decision | Rationale | +|----------|-----------| +| `AuthType` is a `str` enum | Enables `AuthType("APIKey")` construction from JSON values and `entry.type.value` for serialization. | +| `ConfigExpression.none_required` flag | Avoids a sentinel clause; `NoneRequired` is semantically distinct from an empty clause list. | +| `AuthDetails.other_connection` is `Optional` | Legacy CSV rows lack this key. `None` signals "not present" vs `[]` which means "present but empty". | +| All dataclasses are `frozen=True` | Parsed results are immutable value objects. | +| `AuthConfigParseError.errors` list | Mirrors the validator's multi-error pattern so callers can display all issues at once. | + +--- + +### 3.2 `parser.py` — Pure Parsing Functions + +Converts raw input (strings, dicts, JSON) into typed data model objects. +Raises `AuthConfigParseError` on invalid input. + +#### Public API + +```python +def parse_config(expr: str) -> ConfigExpression: + """Parse a config expression string into a ConfigExpression. + + Args: + expr: The config expression string, e.g. + 'REQUIRED(api_key) + OPTIONAL(oauth_creds)' + or 'NoneRequired'. + + Returns: + A ConfigExpression with parsed clauses. + + Raises: + AuthConfigParseError: If the expression is malformed. + + Examples: + >>> parse_config("NoneRequired") + ConfigExpression(none_required=True, clauses=[]) + + >>> parse_config("REQUIRED(api_key)") + ConfigExpression(none_required=False, clauses=[ + ConfigClause(operator=ClauseOperator.REQUIRED, names=["api_key"]) + ]) + + >>> parse_config("REQUIRED(creds) + OPTIONAL(oauth)") + ConfigExpression(none_required=False, clauses=[ + ConfigClause(operator=ClauseOperator.REQUIRED, names=["creds"]), + ConfigClause(operator=ClauseOperator.OPTIONAL, names=["oauth"]), + ]) + """ +``` + +```python +def parse_auth_details(data: str | dict) -> AuthDetails: + """Parse Auth Details from a JSON string or pre-parsed dict. + + Performs structural parsing only — converts raw JSON into typed + objects. Does NOT perform cross-referencing validation (e.g. + checking that config names match auth_types names). Use + validate_auth_details() for full validation. + + Args: + data: Either a JSON string or an already-parsed dict. + + Returns: + An AuthDetails object. + + Raises: + AuthConfigParseError: If the input is not valid JSON, not a + dict, or is missing required keys / has wrong types. + + Examples: + >>> details = parse_auth_details({ + ... "auth_types": [{"type": "APIKey", "name": "api_key", + ... "xsoar_params": ["api_key"]}], + ... "config": "REQUIRED(api_key)", + ... "other_connection": ["url", "proxy"] + ... }) + >>> details.auth_types[0].type + AuthType.APIKey + >>> details.config.clauses[0].operator + ClauseOperator.REQUIRED + """ +``` + +#### Internal helpers (private) + +| Function | Purpose | +|----------|---------| +| `_parse_config_impl(expr)` | Core config parsing; returns `(ConfigExpression, list[str])` — the parsed result and any errors. Extracted from current [`_parse_auth_config()`](../workflow_state.py:445). | +| `_parse_auth_entry(index, raw_dict)` | Parse one `auth_types[]` entry dict into an `AuthEntry`. | + +#### Regex constants (module-level, private) + +Moved from [`workflow_state.py`](../workflow_state.py:435): + +```python +_CLAUSE_RE = re.compile( + r"^\s*(REQUIRED|OPTIONAL|CHOICE)\s*\(\s*([^)]*?)\s*\)\s*$" +) +_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$") +_SPLIT_RE = re.compile(r"\s*\+\s*") +``` + +#### Relationship to current code + +| Current function | New location | Change | +|-----------------|--------------|--------| +| [`_parse_auth_config()`](../workflow_state.py:445) | `parser._parse_config_impl()` | Returns `ConfigExpression` + error list instead of `(names, errors)` tuple. The public `parse_config()` raises on errors. | +| JSON parsing in [`validate_auth_detail()`](../workflow_state.py:522) lines 572-578 | `parser.parse_auth_details()` | Structural parsing extracted; validation stays in `validator.py`. | + +--- + +### 3.3 `validator.py` — Validation Functions + +Returns error lists (empty = valid). Never raises. Matches the current +[`validate_auth_detail()`](../workflow_state.py:522) contract. + +#### Public API + +```python +def validate_config(expr: str) -> list[str]: + """Validate a config expression string. + + Returns a list of human-readable error strings. Empty list means + the expression is syntactically valid. + + This validates syntax only — it does NOT check that operand names + match any auth_types[].name. Use validate_auth_details() for + cross-referencing validation. + + Args: + expr: The config expression string. + + Returns: + List of error strings (empty = valid). + + Examples: + >>> validate_config("REQUIRED(api_key)") + [] + >>> validate_config("REQUIRED()") + ["clause 'REQUIRED(...)' has no operands"] + >>> validate_config("FOO(bar)") + ["malformed clause 'FOO(bar)' ..."] + """ +``` + +```python +def validate_auth_details(data: str | dict) -> list[str]: + """Validate Auth Details JSON shape. Returns list of errors. + + Performs ALL validation currently done by workflow_state.py's + validate_auth_detail(), including: + + - JSON parsing + - Required keys: auth_types, config, other_connection + - auth_types[] entry shape (type enum, name uniqueness, + xsoar_params non-empty list of non-empty strings, interpolated + bool) + - auth_types[] sort order by (type, name) + - config expression syntax (via validate_config) + - config operand names cross-referenced against auth_types[].name + - NoneRequired ↔ empty auth_types coherence + - other_connection: list of non-empty unique sorted strings + + Args: + data: JSON string or pre-parsed dict. + + Returns: + List of error strings (empty = valid). + + Examples: + >>> validate_auth_details('{"auth_types":[],' + ... '"config":"NoneRequired","other_connection":[]}') + [] + """ +``` + +#### Relationship to current code + +| Current function | New location | Change | +|-----------------|--------------|--------| +| [`validate_auth_detail()`](../workflow_state.py:522) | `validator.validate_auth_details()` | Name pluralized for consistency. Accepts `str | dict`. Same error messages for backward compat. | + +--- + +### 3.4 `utils.py` — Utility Functions + +Pure functions that extract derived information from parsed `AuthDetails` +objects. No CSV/filesystem dependencies. + +#### Public API + +```python +def project_xsoar_param_to_yml_id(xsoar_param: str) -> str: + """Collapse a dotted XSOAR param path to its base YML param id. + + Bare ids pass through unchanged. Dotted forms like + 'credentials.identifier' collapse to the segment before the + first '.' ('credentials'). + + Args: + xsoar_param: An XSOAR field path string. + + Returns: + The base YML param id. + + Examples: + >>> project_xsoar_param_to_yml_id("api_key") + "api_key" + >>> project_xsoar_param_to_yml_id("credentials.identifier") + "credentials" + >>> project_xsoar_param_to_yml_id("credentials.password") + "credentials" + """ +``` + +```python +def auth_param_ids(details: AuthDetails) -> set[str]: + """Extract the set of YML param ids from an AuthDetails object. + + Returns the deduplicated set of bare YML configuration[].name + values composed from: + + - Every auth_types[].xsoar_params entry, projected via + project_xsoar_param_to_yml_id(). + - Every entry in other_connection (already bare YML ids). + + Args: + details: A parsed AuthDetails object. + + Returns: + Set of YML param id strings. + + Examples: + >>> details = parse_auth_details({ + ... "auth_types": [{"type": "Plain", "name": "creds", + ... "xsoar_params": ["credentials.identifier", + ... "credentials.password"]}], + ... "config": "REQUIRED(creds)", + ... "other_connection": ["url", "proxy"] + ... }) + >>> auth_param_ids(details) + {"credentials", "url", "proxy"} + """ +``` + +```python +def auth_param_ids_with_sources( + details: AuthDetails, +) -> dict[str, list[str]]: + """Extract YML param ids with source attribution. + + Returns a dict mapping each YML param id to a list of + human-readable source descriptions indicating where the param + was declared. + + Args: + details: A parsed AuthDetails object. + + Returns: + Dict of {yml_param_id: [source_description, ...]}. + + Examples: + >>> sources = auth_param_ids_with_sources(details) + >>> sources["credentials"] + ["auth_types[].name='creds' (xsoar_params=['credentials.identifier', 'credentials.password'])"] + >>> sources["url"] + ["other_connection"] + """ +``` + +#### Relationship to current code + +| Current function | New location | Change | +|-----------------|--------------|--------| +| [`_project_xsoar_param_to_yml_id()`](../workflow_state.py:871) | `utils.project_xsoar_param_to_yml_id()` | Made public. Same logic. | +| [`_auth_param_sources()`](../workflow_state.py:885) | `utils.auth_param_ids_with_sources()` | Accepts `AuthDetails` instead of raw dict. Same source-descriptor format. | +| [`auth_param_ids()`](../workflow_state.py:940) (CSV-coupled) | `utils.auth_param_ids()` | Decoupled from CSV. Accepts `AuthDetails` object. Returns `set[str]` instead of `list[str]` (callers that need sorted output call `sorted()`). | + +**Key design change:** The current [`auth_param_ids()`](../workflow_state.py:940) +in `workflow_state.py` loads the CSV, finds the row, parses JSON, and +handles legacy rows. The new `utils.auth_param_ids()` is a pure function +that operates on an already-parsed `AuthDetails` object. The CSV-loading +wrapper remains in `workflow_state.py` and delegates to this package. + +--- + +### 3.5 `__init__.py` — Public API + +Re-exports all public symbols for convenient importing: + +```python +from auth_config_parser.types import ( + AuthConfigParseError, + AuthDetails, + AuthEntry, + AuthType, + ClauseOperator, + ConfigClause, + ConfigExpression, +) +from auth_config_parser.parser import ( + parse_auth_details, + parse_config, +) +from auth_config_parser.validator import ( + validate_auth_details, + validate_config, +) +from auth_config_parser.utils import ( + auth_param_ids, + auth_param_ids_with_sources, + project_xsoar_param_to_yml_id, +) + +__all__ = [ + # Types + "AuthConfigParseError", + "AuthDetails", + "AuthEntry", + "AuthType", + "ClauseOperator", + "ConfigClause", + "ConfigExpression", + # Parsing + "parse_auth_details", + "parse_config", + # Validation + "validate_auth_details", + "validate_config", + # Utilities + "auth_param_ids", + "auth_param_ids_with_sources", + "project_xsoar_param_to_yml_id", +] +``` + +--- + +## 4. Data Flow + +```mermaid +flowchart TD + A[JSON string or dict] --> B{parse_auth_details} + B -->|valid structure| C[AuthDetails object] + B -->|invalid| D[AuthConfigParseError] + + A --> E{validate_auth_details} + E --> F[list of error strings] + + C --> G{auth_param_ids} + G --> H[set of YML param ids] + + C --> I{auth_param_ids_with_sources} + I --> J[dict: param_id to source list] + + K[config expression string] --> L{parse_config} + L -->|valid| M[ConfigExpression] + L -->|invalid| D + + K --> N{validate_config} + N --> O[list of error strings] +``` + +--- + +## 5. Backward Compatibility Plan + +### 5.1 `workflow_state.py` migration + +After the package is implemented, `workflow_state.py` will be updated to +import from it. The migration is mechanical: + +```python +# Before (inline): +from workflow_state import VALID_AUTH_TYPES, validate_auth_detail + +# After (delegating): +from auth_config_parser import ( + AuthType, + validate_auth_details, + auth_param_ids as _auth_param_ids_pure, + auth_param_ids_with_sources as _auth_param_sources_pure, + parse_auth_details, + project_xsoar_param_to_yml_id, +) +``` + +The `VALID_AUTH_TYPES` set constant in `workflow_state.py` becomes: + +```python +VALID_AUTH_TYPES = {t.value for t in AuthType} +``` + +The CSV-coupled [`auth_param_ids(integration_id)`](../workflow_state.py:940) +wrapper stays in `workflow_state.py` but delegates to the pure function: + +```python +def auth_param_ids(integration_id: str) -> list[str]: + # ... CSV loading, row lookup, JSON parsing, legacy handling ... + details = parse_auth_details(parsed) + return sorted(_auth_param_ids_pure(details)) +``` + +Similarly, [`validate_auth_detail(value)`](../workflow_state.py:522) +becomes a thin wrapper: + +```python +def validate_auth_detail(value: str) -> list[str]: + return validate_auth_details(value) +``` + +### 5.2 Error message compatibility + +The validator MUST produce identical error message strings to the current +implementation. Tests in +[`workflow_state_test.py`](../workflow_state_test.py:1128) assert on +specific substrings like: + +- `"Invalid JSON"` +- `"Missing required keys"` +- `"invalid type 'INVALID'"` +- `"must be sorted by (type, name)"` +- `"must contain at least one entry"` +- `"unknown connection-type name"` +- `"no operands"` +- `"ends with '+'"` +- `"malformed clause"` +- `"'config' is 'NoneRequired' but 'auth_types' contains entries"` +- `"must be a list"` +- `"must be sorted ascending"` +- `"duplicate"` + +All of these must be preserved verbatim. + +### 5.3 `validate_auth_detail` vs `validate_auth_details` naming + +The current function is singular (`validate_auth_detail`). The new package +uses plural (`validate_auth_details`) for grammatical consistency. The +wrapper in `workflow_state.py` keeps the old name for backward compat. + +--- + +## 6. Test Plan + +### 6.1 `tests/test_parser.py` + +Tests for `parse_config()` and `parse_auth_details()`. + +| Test | Description | +|------|-------------| +| `test_parse_config_none_required` | `"NoneRequired"` → `ConfigExpression(none_required=True)` | +| `test_parse_config_single_required` | `"REQUIRED(api_key)"` → one clause | +| `test_parse_config_single_optional` | `"OPTIONAL(oauth)"` → one clause | +| `test_parse_config_single_choice` | `"CHOICE(a, b)"` → one clause with two names | +| `test_parse_config_multi_clause` | `"REQUIRED(a) + OPTIONAL(b)"` → two clauses | +| `test_parse_config_whitespace_tolerance` | Extra spaces around `+`, `,`, parens | +| `test_parse_config_empty_raises` | Empty string → `AuthConfigParseError` | +| `test_parse_config_leading_plus_raises` | `"+ REQUIRED(a)"` → error | +| `test_parse_config_trailing_plus_raises` | `"REQUIRED(a) +"` → error | +| `test_parse_config_empty_operands_raises` | `"REQUIRED()"` → error | +| `test_parse_config_bad_keyword_raises` | `"FOO(a)"` → error | +| `test_parse_config_bad_operand_name_raises` | `"REQUIRED(123bad)"` → error | +| `test_parse_config_stray_comma_raises` | `"REQUIRED(a,,b)"` → error | +| `test_parse_config_referenced_names` | Verify `ConfigExpression.referenced_names` property | +| `test_parse_auth_details_valid_simple` | Full valid JSON → `AuthDetails` with correct types | +| `test_parse_auth_details_valid_none_required` | NoneRequired variant | +| `test_parse_auth_details_from_dict` | Accepts pre-parsed dict | +| `test_parse_auth_details_from_string` | Accepts JSON string | +| `test_parse_auth_details_invalid_json_raises` | Bad JSON → `AuthConfigParseError` | +| `test_parse_auth_details_not_dict_raises` | JSON array → error | +| `test_parse_auth_details_missing_keys_raises` | Missing `config` → error | +| `test_parse_auth_details_invalid_auth_type_raises` | Unknown type enum → error | +| `test_parse_auth_details_interpolated_default_false` | Missing `interpolated` key defaults to `False` | +| `test_parse_auth_details_interpolated_true` | `"interpolated": true` → `AuthEntry.interpolated == True` | +| `test_parse_auth_details_legacy_no_other_connection` | Missing `other_connection` → `AuthDetails.other_connection is None` | + +### 6.2 `tests/test_validator.py` + +Tests for `validate_auth_details()` and `validate_config()`. Mirrors +existing [`TestValidateAuthDetail`](../workflow_state_test.py:1128) with +identical assertions. + +| Test | Description | +|------|-------------| +| `test_valid_simple` | Valid JSON → `[]` | +| `test_valid_none_required` | NoneRequired → `[]` | +| `test_invalid_json` | `"not json"` → `["Invalid JSON: ..."]` | +| `test_missing_keys` | Missing `config` + `other_connection` → error | +| `test_invalid_auth_type` | `"INVALID"` type → error | +| `test_all_valid_auth_types` | Each of the 7 types passes | +| `test_valid_two_clause_config` | `REQUIRED + OPTIONAL` → `[]` | +| `test_valid_choice` | `CHOICE(a, b)` → `[]` | +| `test_config_unknown_name` | Operand not in auth_types → error | +| `test_config_empty_required` | `REQUIRED()` → error | +| `test_config_trailing_plus` | Trailing `+` → error | +| `test_config_unknown_keyword` | `FOO(x)` → error | +| `test_config_missing_parens` | `REQUIRED api_key` → error | +| `test_none_required_with_entries` | NoneRequired + non-empty auth_types → error | +| `test_non_none_required_empty_types` | Config refs but empty auth_types → error | +| `test_sort_order_violation` | Out-of-order entries → error with pair names | +| `test_sort_order_same_type_by_name` | Same type, names out of order → error | +| `test_empty_xsoar_params` | `[]` xsoar_params → error | +| `test_duplicate_name` | Two entries with same name → error | +| `test_other_connection_valid` | Sorted list → `[]` | +| `test_other_connection_empty_list` | `[]` → `[]` | +| `test_other_connection_missing_key` | Missing key → error | +| `test_other_connection_not_list` | String instead of list → error | +| `test_other_connection_non_string` | `[42]` → error | +| `test_other_connection_empty_string` | `[""]` → error | +| `test_other_connection_duplicates` | `["url", "url"]` → error | +| `test_other_connection_unsorted` | `["url", "proxy"]` → error with suggestion | +| `test_validate_config_standalone` | `validate_config()` works independently | + +### 6.3 `tests/test_utils.py` + +Tests for utility functions. + +| Test | Description | +|------|-------------| +| `test_project_bare_id` | `"api_key"` → `"api_key"` | +| `test_project_dotted_identifier` | `"credentials.identifier"` → `"credentials"` | +| `test_project_dotted_password` | `"credentials.password"` → `"credentials"` | +| `test_project_empty_string` | `""` → `""` | +| `test_project_non_string` | Non-string input → `""` | +| `test_auth_param_ids_mixed` | APIKey + Plain + other_connection → correct union | +| `test_auth_param_ids_deduped` | Dotted forms collapsing to same id → single entry | +| `test_auth_param_ids_none_required` | NoneRequired + other_connection → only other_connection | +| `test_auth_param_ids_no_other_connection` | Legacy `None` → only auth_types ids | +| `test_auth_param_ids_empty` | NoneRequired + no other_connection → empty set | +| `test_auth_param_ids_with_sources_mixed` | Correct source descriptors for each param | +| `test_auth_param_ids_with_sources_dotted_dedup` | Two dotted forms → one descriptor per entry | +| `test_auth_param_ids_with_sources_other_connection` | other_connection items → `"other_connection"` source | + +--- + +## 7. Implementation Sequence + +The implementation should proceed in this order to maintain a green test +suite at each step: + +1. **Create `types.py`** — all dataclasses, enums, and the custom + exception. No dependencies on other modules. + +2. **Create `parser.py`** — implement `parse_config()` and + `parse_auth_details()`. Port the regex constants and + `_parse_auth_config()` logic. + +3. **Create `validator.py`** — implement `validate_auth_details()` and + `validate_config()`. Port the validation logic from + `validate_auth_detail()`, delegating config parsing to `parser.py`. + +4. **Create `utils.py`** — implement the three utility functions. Port + from `_project_xsoar_param_to_yml_id()` and `_auth_param_sources()`. + +5. **Create `__init__.py`** — wire up all re-exports. + +6. **Create `tests/`** — port and expand tests from + `workflow_state_test.py`. + +7. **Update `workflow_state.py`** — replace inline implementations with + imports from `auth_config_parser`. Keep the CSV-coupled wrappers. + +--- + +## 8. Design Constraints Checklist + +- [x] Pure Python, no external dependencies beyond stdlib +- [x] Python 3.9+ compatible (`from __future__ import annotations`) +- [x] All public types are frozen dataclasses with type hints +- [x] Parser raises `AuthConfigParseError` on invalid input +- [x] Validator returns error lists (never raises) +- [x] Backward-compatible error messages +- [x] `workflow_state.py` can import from this package +- [x] Docstrings with examples on all public functions +- [x] No CSV/filesystem dependencies in the package diff --git a/connectus/auth_config_parser/__init__.py b/connectus/auth_config_parser/__init__.py new file mode 100644 index 00000000000..4b60d5c0a64 --- /dev/null +++ b/connectus/auth_config_parser/__init__.py @@ -0,0 +1,60 @@ +"""auth_config_parser — Standalone Auth Details Config parser package. + +Extracts and formalizes the Auth Details Config parser previously +embedded in ``workflow_state.py``. Provides typed data models, +pure parsing functions, validation, and utility helpers. + +Usage:: + + from auth_config_parser import parse_config, parse_auth_details, AuthDetails + from auth_config_parser import validate_auth_details, auth_param_ids +""" +from __future__ import annotations + +from auth_config_parser.exceptions import ( + AuthConfigParseError, + AuthConfigValidationError, +) +from auth_config_parser.parser import ( + parse_auth_details, + parse_config, +) +from auth_config_parser.types import ( + AuthDetails, + AuthEntry, + AuthType, + ClauseOperator, + ConfigClause, + ConfigExpression, +) +from auth_config_parser.utils import ( + auth_param_ids, + auth_param_ids_with_sources, + project_xsoar_param_to_yml_id, +) +from auth_config_parser.validator import ( + validate_auth_details, + validate_config, +) + +__all__ = [ + # Types + "AuthConfigParseError", + "AuthConfigValidationError", + "AuthDetails", + "AuthEntry", + "AuthType", + "ClauseOperator", + "ConfigClause", + "ConfigExpression", + # Parsing + "parse_auth_details", + "parse_config", + # Validation + "validate_auth_details", + "validate_config", + # Utilities + "auth_param_ids", + "auth_param_ids_with_sources", + "project_xsoar_param_to_yml_id", +] diff --git a/connectus/auth_config_parser/exceptions.py b/connectus/auth_config_parser/exceptions.py new file mode 100644 index 00000000000..b0ed7072079 --- /dev/null +++ b/connectus/auth_config_parser/exceptions.py @@ -0,0 +1,49 @@ +"""Custom exceptions for the auth_config_parser package.""" +from __future__ import annotations + + +class AuthConfigParseError(Exception): + """Raised by parser functions when input is structurally invalid. + + Attributes: + message: Human-readable description of the parse failure. + errors: List of individual error strings (may contain >1 for + multi-error reporting). + + Examples: + >>> raise AuthConfigParseError("config expression is empty") + Traceback (most recent call last): + ... + auth_config_parser.exceptions.AuthConfigParseError: config expression is empty + + >>> raise AuthConfigParseError( + ... "multiple errors", + ... errors=["error 1", "error 2"], + ... ) + Traceback (most recent call last): + ... + auth_config_parser.exceptions.AuthConfigParseError: multiple errors + """ + + def __init__(self, message: str, errors: list[str] | None = None) -> None: + super().__init__(message) + self.message = message + self.errors = errors or [message] + + +class AuthConfigValidationError(Exception): + """Raised when validation of auth config data fails. + + Unlike :class:`AuthConfigParseError`, this is used for semantic + validation failures (e.g. cross-referencing errors) rather than + structural parse failures. + + Attributes: + message: Human-readable description of the validation failure. + errors: List of individual error strings. + """ + + def __init__(self, message: str, errors: list[str] | None = None) -> None: + super().__init__(message) + self.message = message + self.errors = errors or [message] diff --git a/connectus/auth_config_parser/parser.py b/connectus/auth_config_parser/parser.py new file mode 100644 index 00000000000..9ef69c986e4 --- /dev/null +++ b/connectus/auth_config_parser/parser.py @@ -0,0 +1,359 @@ +"""Pure parsing functions for the auth_config_parser package. + +Converts raw input (strings, dicts, JSON) into typed data model objects. +Raises :class:`~auth_config_parser.exceptions.AuthConfigParseError` on +invalid input. +""" +from __future__ import annotations + +import json +import re + +from auth_config_parser.exceptions import AuthConfigParseError +from auth_config_parser.types import ( + AuthDetails, + AuthEntry, + AuthType, + ClauseOperator, + ConfigClause, + ConfigExpression, +) + +# --------------------------------------------------------------------------- +# Regex constants (ported from workflow_state.py) +# --------------------------------------------------------------------------- + +_CLAUSE_RE = re.compile( + r"^\s*(REQUIRED|OPTIONAL|CHOICE)\s*\(\s*([^)]*?)\s*\)\s*$" +) +_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$") +_SPLIT_RE = re.compile(r"\s*\+\s*") + +# Valid auth type string values (for fast membership check during parsing). +_VALID_AUTH_TYPE_VALUES = {t.value for t in AuthType} + + +# --------------------------------------------------------------------------- +# Internal helpers +# --------------------------------------------------------------------------- + +def _parse_config_impl(config: str) -> tuple[ConfigExpression, list[str]]: + """Core config parsing implementation. + + Returns ``(ConfigExpression, parse_errors)`` where ``parse_errors`` + is a list of human-readable issues with the expression. + + This is the internal workhorse extracted from + ``workflow_state._parse_auth_config()``. The public + :func:`parse_config` raises on errors; the validator calls this + directly to collect errors without raising. + """ + parse_errors: list[str] = [] + clauses: list[ConfigClause] = [] + + stripped = config.strip() + if stripped == "": + parse_errors.append("config expression is empty") + return ConfigExpression(), parse_errors + if stripped == "NoneRequired": + return ConfigExpression(none_required=True), parse_errors + + # Detect leading/trailing `+` before splitting. + if stripped.startswith("+"): + parse_errors.append( + "config expression starts with '+' (no leading clause)" + ) + if stripped.endswith("+"): + parse_errors.append( + "config expression ends with '+' (no trailing clause)" + ) + + segments = _SPLIT_RE.split(stripped) + for seg_idx, segment in enumerate(segments): + if segment.strip() == "": + # Already covered by the leading/trailing checks above OR a + # genuine "+ +" in the middle. + if not (seg_idx == 0 and stripped.startswith("+")) and not ( + seg_idx == len(segments) - 1 and stripped.endswith("+") + ): + parse_errors.append("empty clause between '+' separators") + continue + m = _CLAUSE_RE.match(segment) + if not m: + parse_errors.append( + f"malformed clause '{segment}' (expected " + "REQUIRED(...), OPTIONAL(...), or CHOICE(...))" + ) + continue + keyword, inner = m.group(1), m.group(2) + if inner.strip() == "": + parse_errors.append(f"clause '{keyword}(...)' has no operands") + continue + operands = [op.strip() for op in inner.split(",")] + clause_names: list[str] = [] + for op in operands: + if op == "": + parse_errors.append( + f"clause '{keyword}(...)' has an empty operand " + "(stray comma?)" + ) + continue + if not _NAME_RE.fullmatch(op): + parse_errors.append( + f"clause '{keyword}(...)' operand '{op}' is not a " + "valid identifier (must match [A-Za-z_][A-Za-z0-9_]*)" + ) + continue + clause_names.append(op) + if clause_names: + clauses.append( + ConfigClause( + operator=ClauseOperator(keyword), + names=clause_names, + ) + ) + + return ConfigExpression(none_required=False, clauses=clauses), parse_errors + + +def _parse_auth_entry(index: int, raw_dict: dict) -> tuple[AuthEntry | None, list[str]]: + """Parse one ``auth_types[]`` entry dict into an :class:`AuthEntry`. + + Returns ``(entry_or_none, errors)``. If the entry is structurally + invalid, ``entry_or_none`` is ``None`` and ``errors`` describes the + problems. + """ + errors: list[str] = [] + + if not isinstance(raw_dict, dict): + errors.append( + f"auth_types[{index}]: expected object, got " + f"{type(raw_dict).__name__}" + ) + return None, errors + + # --- type --- + entry_type: AuthType | None = None + if "type" not in raw_dict: + errors.append(f"auth_types[{index}]: missing 'type'") + elif raw_dict["type"] not in _VALID_AUTH_TYPE_VALUES: + errors.append(f"auth_types[{index}]: invalid type '{raw_dict['type']}'") + else: + entry_type = AuthType(raw_dict["type"]) + + # --- name --- + entry_name: str | None = None + if "name" not in raw_dict: + errors.append(f"auth_types[{index}]: missing 'name'") + elif not isinstance(raw_dict["name"], str): + errors.append(f"auth_types[{index}]: 'name' must be a string") + elif not raw_dict["name"]: + errors.append(f"auth_types[{index}]: 'name' must be a non-empty string") + else: + entry_name = raw_dict["name"] + + # --- xsoar_params --- + xsoar_params: list[str] | None = None + if "xsoar_params" not in raw_dict: + errors.append(f"auth_types[{index}]: missing 'xsoar_params'") + elif not isinstance(raw_dict["xsoar_params"], list): + errors.append( + f"auth_types[{index}]: 'xsoar_params' must be a list, " + f"got {type(raw_dict['xsoar_params']).__name__}" + ) + elif len(raw_dict["xsoar_params"]) == 0: + errors.append( + f"auth_types[{index}]: 'xsoar_params' must contain at least " + "one entry" + ) + else: + xsoar_params = [] + for j, p in enumerate(raw_dict["xsoar_params"]): + if not isinstance(p, str) or not p: + errors.append( + f"auth_types[{index}]: xsoar_params[{j}] must be a " + "non-empty string" + ) + else: + xsoar_params.append(p) + + # --- interpolated (optional, defaults to False) --- + interpolated = False + if "interpolated" in raw_dict: + if not isinstance(raw_dict["interpolated"], bool): + errors.append( + f"auth_types[{index}]: 'interpolated' must be a bool, " + f"got {type(raw_dict['interpolated']).__name__}" + ) + else: + interpolated = raw_dict["interpolated"] + + if errors: + return None, errors + + # All fields validated — safe to construct. + assert entry_type is not None + assert entry_name is not None + assert xsoar_params is not None + return AuthEntry( + type=entry_type, + name=entry_name, + xsoar_params=xsoar_params, + interpolated=interpolated, + ), errors + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + +def parse_config(expr: str) -> ConfigExpression: + """Parse a config expression string into a ConfigExpression. + + Args: + expr: The config expression string, e.g. + ``'REQUIRED(api_key) + OPTIONAL(oauth_creds)'`` + or ``'NoneRequired'``. + + Returns: + A :class:`~auth_config_parser.types.ConfigExpression` with + parsed clauses. + + Raises: + AuthConfigParseError: If the expression is malformed. + + Examples: + >>> parse_config("NoneRequired") + ConfigExpression(none_required=True, clauses=[]) + + >>> parse_config("REQUIRED(api_key)") + ConfigExpression(none_required=False, clauses=[ConfigClause(operator=, names=['api_key'])]) + + >>> parse_config("REQUIRED(creds) + OPTIONAL(oauth)") + ConfigExpression(none_required=False, clauses=[ConfigClause(operator=, names=['creds']), ConfigClause(operator=, names=['oauth'])]) + """ + result, errors = _parse_config_impl(expr) + if errors: + raise AuthConfigParseError( + f"config parse errors: {'; '.join(errors)}", + errors=errors, + ) + return result + + +def parse_auth_details(data: str | dict) -> AuthDetails: + """Parse Auth Details from a JSON string or pre-parsed dict. + + Performs structural parsing only — converts raw JSON into typed + objects. Does NOT perform cross-referencing validation (e.g. + checking that config names match auth_types names). Use + :func:`~auth_config_parser.validator.validate_auth_details` for + full validation. + + Args: + data: Either a JSON string or an already-parsed dict. + + Returns: + An :class:`~auth_config_parser.types.AuthDetails` object. + + Raises: + AuthConfigParseError: If the input is not valid JSON, not a + dict, or is missing required keys / has wrong types. + + Examples: + >>> details = parse_auth_details({ + ... "auth_types": [{"type": "APIKey", "name": "api_key", + ... "xsoar_params": ["api_key"]}], + ... "config": "REQUIRED(api_key)", + ... "other_connection": ["url", "proxy"], + ... }) + >>> details.auth_types[0].type + + >>> details.config.clauses[0].operator + + """ + errors: list[str] = [] + + # --- Parse JSON if string --- + if isinstance(data, str): + try: + data = json.loads(data) + except json.JSONDecodeError as e: + raise AuthConfigParseError(f"Invalid JSON: {e}") from e + + if not isinstance(data, dict): + raise AuthConfigParseError( + f"Expected a JSON object, got {type(data).__name__}" + ) + + # --- Check required keys (auth_types and config are always required; + # other_connection is optional for legacy compat) --- + required_keys = {"auth_types", "config"} + missing = required_keys - set(data.keys()) + if missing: + raise AuthConfigParseError( + f"Missing required keys: {', '.join(sorted(missing))}" + ) + + # --- Parse auth_types --- + auth_entries: list[AuthEntry] = [] + if not isinstance(data["auth_types"], list): + raise AuthConfigParseError( + f"'auth_types' must be a list, got " + f"{type(data['auth_types']).__name__}" + ) + + for i, raw_entry in enumerate(data["auth_types"]): + entry, entry_errors = _parse_auth_entry(i, raw_entry) + if entry_errors: + errors.extend(entry_errors) + if entry is not None: + auth_entries.append(entry) + + # --- Parse config --- + if not isinstance(data["config"], str): + errors.append( + f"'config' must be a string, got " + f"{type(data['config']).__name__}" + ) + config_expr = ConfigExpression() + else: + config_expr, config_errors = _parse_config_impl(data["config"]) + for ce in config_errors: + errors.append(f"'config': {ce}") + + # --- Parse other_connection (optional) --- + other_connection: list[str] | None = None + if "other_connection" in data: + oc = data["other_connection"] + if not isinstance(oc, list): + errors.append( + f"'other_connection' must be a list, got " + f"{type(oc).__name__}" + ) + else: + other_connection = [] + for j, item in enumerate(oc): + if not isinstance(item, str): + errors.append( + f"'other_connection'[{j}]: must be a string, got " + f"{type(item).__name__}" + ) + elif not item: + errors.append( + f"'other_connection'[{j}]: must be a non-empty string" + ) + else: + other_connection.append(item) + + if errors: + raise AuthConfigParseError( + f"auth details parse errors: {'; '.join(errors)}", + errors=errors, + ) + + return AuthDetails( + auth_types=auth_entries, + config=config_expr, + other_connection=other_connection, + ) diff --git a/connectus/auth_config_parser/tests/__init__.py b/connectus/auth_config_parser/tests/__init__.py new file mode 100644 index 00000000000..0d001e5a212 --- /dev/null +++ b/connectus/auth_config_parser/tests/__init__.py @@ -0,0 +1 @@ +"""Tests for the auth_config_parser package.""" diff --git a/connectus/auth_config_parser/tests/test_parser.py b/connectus/auth_config_parser/tests/test_parser.py new file mode 100644 index 00000000000..efd59f9593f --- /dev/null +++ b/connectus/auth_config_parser/tests/test_parser.py @@ -0,0 +1,365 @@ +"""Tests for auth_config_parser.parser — parse_config() and parse_auth_details().""" +from __future__ import annotations + +import json + +import pytest + +from auth_config_parser import ( + AuthConfigParseError, + AuthDetails, + AuthEntry, + AuthType, + ClauseOperator, + ConfigClause, + ConfigExpression, + parse_auth_details, + parse_config, +) + + +# --------------------------------------------------------------------------- +# parse_config() tests +# --------------------------------------------------------------------------- + + +class TestParseConfig: + def test_parse_config_none_required(self) -> None: + result = parse_config("NoneRequired") + assert result == ConfigExpression(none_required=True, clauses=[]) + assert result.none_required is True + assert result.clauses == [] + assert result.referenced_names == [] + + def test_parse_config_single_required(self) -> None: + result = parse_config("REQUIRED(api_key)") + assert result.none_required is False + assert len(result.clauses) == 1 + assert result.clauses[0].operator == ClauseOperator.REQUIRED + assert result.clauses[0].names == ["api_key"] + + def test_parse_config_single_optional(self) -> None: + result = parse_config("OPTIONAL(oauth)") + assert result.none_required is False + assert len(result.clauses) == 1 + assert result.clauses[0].operator == ClauseOperator.OPTIONAL + assert result.clauses[0].names == ["oauth"] + + def test_parse_config_single_choice(self) -> None: + result = parse_config("CHOICE(a, b)") + assert result.none_required is False + assert len(result.clauses) == 1 + assert result.clauses[0].operator == ClauseOperator.CHOICE + assert result.clauses[0].names == ["a", "b"] + + def test_parse_config_multi_clause(self) -> None: + result = parse_config("REQUIRED(a) + OPTIONAL(b)") + assert result.none_required is False + assert len(result.clauses) == 2 + assert result.clauses[0] == ConfigClause( + operator=ClauseOperator.REQUIRED, names=["a"] + ) + assert result.clauses[1] == ConfigClause( + operator=ClauseOperator.OPTIONAL, names=["b"] + ) + + def test_parse_config_whitespace_tolerance(self) -> None: + # Extra spaces around +, commas, and parens. + result = parse_config(" REQUIRED( api_key ) + OPTIONAL( oauth ) ") + assert len(result.clauses) == 2 + assert result.clauses[0].names == ["api_key"] + assert result.clauses[1].names == ["oauth"] + + def test_parse_config_empty_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("") + assert "config expression is empty" in exc_info.value.errors[0] + + def test_parse_config_whitespace_only_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config(" ") + assert "config expression is empty" in exc_info.value.errors[0] + + def test_parse_config_leading_plus_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("+ REQUIRED(a)") + assert any( + "starts with '+'" in e for e in exc_info.value.errors + ) + + def test_parse_config_trailing_plus_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("REQUIRED(a) +") + assert any( + "ends with '+'" in e for e in exc_info.value.errors + ) + + def test_parse_config_empty_operands_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("REQUIRED()") + assert any( + "no operands" in e for e in exc_info.value.errors + ) + + def test_parse_config_bad_keyword_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("FOO(a)") + assert any( + "malformed clause" in e for e in exc_info.value.errors + ) + + def test_parse_config_bad_operand_name_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("REQUIRED(123bad)") + assert any( + "not a valid identifier" in e for e in exc_info.value.errors + ) + + def test_parse_config_stray_comma_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("REQUIRED(a,,b)") + assert any( + "empty operand" in e for e in exc_info.value.errors + ) + + def test_parse_config_referenced_names(self) -> None: + result = parse_config("REQUIRED(a) + CHOICE(b, c) + OPTIONAL(d)") + assert result.referenced_names == ["a", "b", "c", "d"] + + def test_parse_config_referenced_names_with_duplicates(self) -> None: + result = parse_config("REQUIRED(a) + OPTIONAL(a)") + assert result.referenced_names == ["a", "a"] + + def test_parse_config_missing_parens_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("REQUIRED api_key") + assert any( + "malformed clause" in e for e in exc_info.value.errors + ) + + def test_parse_config_case_sensitive_keywords(self) -> None: + # Lowercase keywords should fail. + with pytest.raises(AuthConfigParseError) as exc_info: + parse_config("required(a)") + assert any( + "malformed clause" in e for e in exc_info.value.errors + ) + + +# --------------------------------------------------------------------------- +# parse_auth_details() tests +# --------------------------------------------------------------------------- + + +class TestParseAuthDetails: + def test_parse_auth_details_valid_simple(self) -> None: + details = parse_auth_details({ + "auth_types": [ + { + "type": "APIKey", + "name": "api_key", + "xsoar_params": ["api_key"], + } + ], + "config": "REQUIRED(api_key)", + "other_connection": ["proxy", "url"], + }) + assert isinstance(details, AuthDetails) + assert len(details.auth_types) == 1 + assert details.auth_types[0].type == AuthType.APIKey + assert details.auth_types[0].name == "api_key" + assert details.auth_types[0].xsoar_params == ["api_key"] + assert details.auth_types[0].interpolated is False + assert details.config.none_required is False + assert len(details.config.clauses) == 1 + assert details.config.clauses[0].operator == ClauseOperator.REQUIRED + assert details.other_connection == ["proxy", "url"] + + def test_parse_auth_details_valid_none_required(self) -> None: + details = parse_auth_details({ + "auth_types": [], + "config": "NoneRequired", + "other_connection": [], + }) + assert details.auth_types == [] + assert details.config.none_required is True + assert details.other_connection == [] + + def test_parse_auth_details_from_dict(self) -> None: + data = { + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + } + details = parse_auth_details(data) + assert details.auth_types[0].name == "x" + + def test_parse_auth_details_from_string(self) -> None: + data = json.dumps({ + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + details = parse_auth_details(data) + assert details.auth_types[0].name == "x" + + def test_parse_auth_details_invalid_json_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details("not json") + assert "Invalid JSON" in exc_info.value.message + + def test_parse_auth_details_not_dict_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details("[]") + assert "Expected a JSON object" in exc_info.value.message + + def test_parse_auth_details_missing_keys_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({"auth_types": []}) + assert "Missing required keys" in exc_info.value.message + assert "config" in exc_info.value.message + + def test_parse_auth_details_invalid_auth_type_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [ + {"type": "INVALID", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert any( + "invalid type 'INVALID'" in e for e in exc_info.value.errors + ) + + def test_parse_auth_details_interpolated_default_false(self) -> None: + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert details.auth_types[0].interpolated is False + + def test_parse_auth_details_interpolated_true(self) -> None: + details = parse_auth_details({ + "auth_types": [ + { + "type": "APIKey", + "name": "x", + "xsoar_params": ["p"], + "interpolated": True, + } + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert details.auth_types[0].interpolated is True + + def test_parse_auth_details_legacy_no_other_connection(self) -> None: + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + }) + assert details.other_connection is None + + def test_parse_auth_details_auth_types_not_list_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": "not a list", + "config": "NoneRequired", + "other_connection": [], + }) + assert "'auth_types' must be a list" in exc_info.value.message + + def test_parse_auth_details_missing_name_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert any( + "missing 'name'" in e for e in exc_info.value.errors + ) + + def test_parse_auth_details_missing_xsoar_params_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "x"} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert any( + "missing 'xsoar_params'" in e for e in exc_info.value.errors + ) + + def test_parse_auth_details_empty_xsoar_params_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": []} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert any( + "must contain at least one entry" in e + for e in exc_info.value.errors + ) + + def test_parse_auth_details_all_auth_types_valid(self) -> None: + for at in AuthType: + if at == AuthType.NoneRequired: + # NoneRequired requires empty auth_types. + continue + details = parse_auth_details({ + "auth_types": [ + {"type": at.value, "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert details.auth_types[0].type == at + + def test_parse_auth_details_interpolated_non_bool_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [ + { + "type": "APIKey", + "name": "x", + "xsoar_params": ["p"], + "interpolated": "yes", + } + ], + "config": "REQUIRED(x)", + "other_connection": [], + }) + assert any( + "'interpolated' must be a bool" in e + for e in exc_info.value.errors + ) + + def test_parse_auth_details_other_connection_not_list_raises(self) -> None: + with pytest.raises(AuthConfigParseError) as exc_info: + parse_auth_details({ + "auth_types": [], + "config": "NoneRequired", + "other_connection": "url", + }) + assert any( + "'other_connection' must be a list" in e + for e in exc_info.value.errors + ) diff --git a/connectus/auth_config_parser/tests/test_utils.py b/connectus/auth_config_parser/tests/test_utils.py new file mode 100644 index 00000000000..94a6514e2e2 --- /dev/null +++ b/connectus/auth_config_parser/tests/test_utils.py @@ -0,0 +1,257 @@ +"""Tests for auth_config_parser.utils — utility functions.""" +from __future__ import annotations + +import pytest + +from auth_config_parser import ( + AuthDetails, + AuthEntry, + AuthType, + ClauseOperator, + ConfigClause, + ConfigExpression, + auth_param_ids, + auth_param_ids_with_sources, + parse_auth_details, + project_xsoar_param_to_yml_id, +) + + +# --------------------------------------------------------------------------- +# project_xsoar_param_to_yml_id() tests +# --------------------------------------------------------------------------- + + +class TestProjectXsoarParamToYmlId: + def test_project_bare_id(self) -> None: + assert project_xsoar_param_to_yml_id("api_key") == "api_key" + + def test_project_dotted_identifier(self) -> None: + assert project_xsoar_param_to_yml_id("credentials.identifier") == "credentials" + + def test_project_dotted_password(self) -> None: + assert project_xsoar_param_to_yml_id("credentials.password") == "credentials" + + def test_project_empty_string(self) -> None: + assert project_xsoar_param_to_yml_id("") == "" + + def test_project_non_string(self) -> None: + # Non-string input should return empty string. + assert project_xsoar_param_to_yml_id(42) == "" # type: ignore[arg-type] + assert project_xsoar_param_to_yml_id(None) == "" # type: ignore[arg-type] + + def test_project_multi_dot(self) -> None: + # Only the first dot matters. + assert project_xsoar_param_to_yml_id("a.b.c") == "a" + + +# --------------------------------------------------------------------------- +# auth_param_ids() tests +# --------------------------------------------------------------------------- + + +class TestAuthParamIds: + def test_auth_param_ids_mixed(self) -> None: + """APIKey + Plain + other_connection → correct union.""" + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "api_key", "xsoar_params": ["api_key"]}, + { + "type": "Plain", + "name": "credentials", + "xsoar_params": [ + "credentials.identifier", + "credentials.password", + ], + }, + ], + "config": "REQUIRED(api_key) + REQUIRED(credentials)", + "other_connection": ["insecure", "proxy", "url"], + }) + result = auth_param_ids(details) + assert result == {"api_key", "credentials", "insecure", "proxy", "url"} + + def test_auth_param_ids_deduped(self) -> None: + """Dotted forms collapsing to same id → single entry.""" + details = parse_auth_details({ + "auth_types": [ + { + "type": "Plain", + "name": "creds", + "xsoar_params": [ + "credentials.identifier", + "credentials.password", + ], + } + ], + "config": "REQUIRED(creds)", + "other_connection": [], + }) + result = auth_param_ids(details) + assert result == {"credentials"} + assert "credentials.identifier" not in result + assert "credentials.password" not in result + + def test_auth_param_ids_none_required(self) -> None: + """NoneRequired + other_connection → only other_connection.""" + details = parse_auth_details({ + "auth_types": [], + "config": "NoneRequired", + "other_connection": ["host", "port"], + }) + result = auth_param_ids(details) + assert result == {"host", "port"} + + def test_auth_param_ids_no_other_connection(self) -> None: + """Legacy None → only auth_types ids.""" + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "api_key", "xsoar_params": ["api_key"]} + ], + "config": "REQUIRED(api_key)", + }) + assert details.other_connection is None + result = auth_param_ids(details) + assert result == {"api_key"} + + def test_auth_param_ids_empty(self) -> None: + """NoneRequired + no other_connection → empty set.""" + details = parse_auth_details({ + "auth_types": [], + "config": "NoneRequired", + "other_connection": [], + }) + result = auth_param_ids(details) + assert result == set() + + def test_auth_param_ids_sorted_output(self) -> None: + """Callers that need sorted output can call sorted().""" + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "api_key", "xsoar_params": ["api_key"]}, + { + "type": "Plain", + "name": "credentials", + "xsoar_params": [ + "credentials.identifier", + "credentials.password", + ], + }, + ], + "config": "REQUIRED(api_key) + REQUIRED(credentials)", + "other_connection": ["insecure", "proxy", "url"], + }) + result = sorted(auth_param_ids(details)) + assert result == ["api_key", "credentials", "insecure", "proxy", "url"] + + +# --------------------------------------------------------------------------- +# auth_param_ids_with_sources() tests +# --------------------------------------------------------------------------- + + +class TestAuthParamIdsWithSources: + def test_auth_param_ids_with_sources_mixed(self) -> None: + """Correct source descriptors for each param.""" + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "api_key", "xsoar_params": ["api_key"]}, + { + "type": "Plain", + "name": "credentials", + "xsoar_params": [ + "credentials.identifier", + "credentials.password", + ], + }, + ], + "config": "REQUIRED(api_key) + REQUIRED(credentials)", + "other_connection": ["url"], + }) + sources = auth_param_ids_with_sources(details) + + # api_key comes from auth_types entry. + assert "api_key" in sources + assert len(sources["api_key"]) == 1 + assert "auth_types[].name='api_key'" in sources["api_key"][0] + assert "xsoar_params=['api_key']" in sources["api_key"][0] + + # credentials comes from dotted forms. + assert "credentials" in sources + assert len(sources["credentials"]) == 1 + assert "auth_types[].name='credentials'" in sources["credentials"][0] + assert "credentials.identifier" in sources["credentials"][0] + assert "credentials.password" in sources["credentials"][0] + + # url comes from other_connection. + assert "url" in sources + assert sources["url"] == ["other_connection"] + + def test_auth_param_ids_with_sources_dotted_dedup(self) -> None: + """Two dotted forms → one descriptor per entry.""" + details = parse_auth_details({ + "auth_types": [ + { + "type": "Plain", + "name": "creds", + "xsoar_params": [ + "credentials.identifier", + "credentials.password", + ], + } + ], + "config": "REQUIRED(creds)", + "other_connection": [], + }) + sources = auth_param_ids_with_sources(details) + # Both dotted forms collapse to "credentials" — only one descriptor. + assert "credentials" in sources + assert len(sources["credentials"]) == 1 + + def test_auth_param_ids_with_sources_other_connection(self) -> None: + """other_connection items → 'other_connection' source.""" + details = parse_auth_details({ + "auth_types": [], + "config": "NoneRequired", + "other_connection": ["proxy", "url"], + }) + sources = auth_param_ids_with_sources(details) + assert sources["proxy"] == ["other_connection"] + assert sources["url"] == ["other_connection"] + + def test_auth_param_ids_with_sources_no_other_connection(self) -> None: + """Legacy None other_connection → only auth_types sources.""" + details = parse_auth_details({ + "auth_types": [ + {"type": "APIKey", "name": "api_key", "xsoar_params": ["api_key"]} + ], + "config": "REQUIRED(api_key)", + }) + sources = auth_param_ids_with_sources(details) + assert "api_key" in sources + assert len(sources) == 1 + + def test_auth_param_ids_with_sources_multiple_entries_same_yml_id(self) -> None: + """Multiple auth_types entries projecting to the same YML id.""" + details = parse_auth_details({ + "auth_types": [ + { + "type": "Plain", + "name": "creds_a", + "xsoar_params": ["credentials.identifier"], + }, + { + "type": "Plain", + "name": "creds_b", + "xsoar_params": ["credentials.password"], + }, + ], + "config": "CHOICE(creds_a, creds_b)", + "other_connection": [], + }) + sources = auth_param_ids_with_sources(details) + # "credentials" should have two descriptors — one per entry. + assert "credentials" in sources + assert len(sources["credentials"]) == 2 + assert any("creds_a" in d for d in sources["credentials"]) + assert any("creds_b" in d for d in sources["credentials"]) diff --git a/connectus/auth_config_parser/tests/test_validator.py b/connectus/auth_config_parser/tests/test_validator.py new file mode 100644 index 00000000000..607a049a81b --- /dev/null +++ b/connectus/auth_config_parser/tests/test_validator.py @@ -0,0 +1,454 @@ +"""Tests for auth_config_parser.validator — validate_auth_details() and validate_config(). + +These tests mirror the existing ``TestValidateAuthDetail`` in +``workflow_state_test.py`` with identical assertions to ensure +backward-compatible error messages. +""" +from __future__ import annotations + +import json + +import pytest + +from auth_config_parser import ( + AuthType, + validate_auth_details, + validate_config, +) + +# --------------------------------------------------------------------------- +# Reusable test data +# --------------------------------------------------------------------------- + +VALID_AUTH_JSON = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)",' + '"other_connection":["insecure","proxy","url"]}' +) + +VALID_AUTH_JSON_NONE = ( + '{"auth_types":[],"config":"NoneRequired","other_connection":[]}' +) + +VALID_AUTH_TYPES = {t.value for t in AuthType} + + +# --------------------------------------------------------------------------- +# validate_config() standalone tests +# --------------------------------------------------------------------------- + + +class TestValidateConfig: + def test_valid_required(self) -> None: + assert validate_config("REQUIRED(api_key)") == [] + + def test_valid_none_required(self) -> None: + assert validate_config("NoneRequired") == [] + + def test_valid_multi_clause(self) -> None: + assert validate_config("REQUIRED(a) + OPTIONAL(b)") == [] + + def test_valid_choice(self) -> None: + assert validate_config("CHOICE(a, b)") == [] + + def test_empty_expression(self) -> None: + errors = validate_config("") + assert any("config expression is empty" in e for e in errors) + + def test_empty_operands(self) -> None: + errors = validate_config("REQUIRED()") + assert any("no operands" in e for e in errors) + + def test_bad_keyword(self) -> None: + errors = validate_config("FOO(bar)") + assert any("malformed clause" in e for e in errors) + + def test_trailing_plus(self) -> None: + errors = validate_config("REQUIRED(a) +") + assert any("ends with '+'" in e for e in errors) + + def test_leading_plus(self) -> None: + errors = validate_config("+ REQUIRED(a)") + assert any("starts with '+'" in e for e in errors) + + def test_bad_operand_name(self) -> None: + errors = validate_config("REQUIRED(123bad)") + assert any("not a valid identifier" in e for e in errors) + + def test_stray_comma(self) -> None: + errors = validate_config("REQUIRED(a,,b)") + assert any("empty operand" in e for e in errors) + + +# --------------------------------------------------------------------------- +# validate_auth_details() tests — mirrors TestValidateAuthDetail +# --------------------------------------------------------------------------- + + +class TestValidateAuthDetails: + def test_valid_simple(self) -> None: + assert validate_auth_details(VALID_AUTH_JSON) == [] + + def test_valid_none_required(self) -> None: + assert validate_auth_details(VALID_AUTH_JSON_NONE) == [] + + def test_invalid_json(self) -> None: + errors = validate_auth_details("not json") + assert "Invalid JSON" in errors[0] + + def test_missing_keys(self) -> None: + errors = validate_auth_details('{"auth_types":[]}') + assert "Missing required keys" in errors[0] + + def test_invalid_auth_type(self) -> None: + bad = ( + '{"auth_types":[{"type":"INVALID","name":"x",' + '"xsoar_params":["p"]}],"config":"REQUIRED(x)",' + '"other_connection":[]}' + ) + errors = validate_auth_details(bad) + assert any("invalid type 'INVALID'" in e for e in errors) + + def test_all_valid_auth_types(self) -> None: + for at in VALID_AUTH_TYPES: + detail = ( + f'{{"auth_types":[{{"type":"{at}","name":"x",' + '"xsoar_params":["p"]}],' + '"config":"REQUIRED(x)","other_connection":[]}' + ) + assert validate_auth_details(detail) == [], ( + f"Type '{at}' should be valid" + ) + + def test_valid_two_clause_config(self) -> None: + detail = ( + '{"auth_types":[' + '{"type":"OAuth2ClientCreds","name":"credentials_consumer",' + '"xsoar_params":["credentials_consumer.identifier",' + '"credentials_consumer.password"]},' + '{"type":"Plain","name":"credentials",' + '"xsoar_params":["credentials.identifier",' + '"credentials.password"]}' + '],' + '"config":"REQUIRED(credentials) + OPTIONAL(credentials_consumer)",' + '"other_connection":[]}' + ) + assert validate_auth_details(detail) == [] + + def test_valid_choice(self) -> None: + detail = ( + '{"auth_types":[' + '{"type":"Plain","name":"credentials",' + '"xsoar_params":["credentials.identifier",' + '"credentials.password"]},' + '{"type":"Plain","name":"hunting_credentials",' + '"xsoar_params":["hunting_credentials.identifier",' + '"hunting_credentials.password"]}' + '],' + '"config":"CHOICE(credentials, hunting_credentials)",' + '"other_connection":[]}' + ) + assert validate_auth_details(detail) == [] + + def test_config_unknown_name(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(missing_name)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "unknown connection-type name 'missing_name'" in e + for e in errors + ), errors + + def test_config_empty_required(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED()","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config'" in e and "no operands" in e for e in errors + ), errors + + def test_config_trailing_plus(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key) +","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config'" in e and "ends with '+'" in e for e in errors + ), errors + + def test_config_unknown_keyword(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"FOO(api_key)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config'" in e and "malformed clause" in e for e in errors + ), errors + + def test_config_missing_parens(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED api_key","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config'" in e and "malformed clause" in e for e in errors + ), errors + + def test_none_required_with_entries(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"NoneRequired","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config' is 'NoneRequired' but 'auth_types' contains entries" + in e + for e in errors + ), errors + + def test_non_none_required_empty_types(self) -> None: + detail = ( + '{"auth_types":[],"config":"REQUIRED(api_key)",' + '"other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config' is not 'NoneRequired' but 'auth_types' is empty" in e + for e in errors + ), errors + # And the unknown-name check should also fire. + assert any( + "unknown connection-type name 'api_key'" in e for e in errors + ), errors + + def test_sort_order_violation(self) -> None: + # APIKey < Plain by type; placing Plain first is out of order. + detail = ( + '{"auth_types":[' + '{"type":"Plain","name":"credentials",' + '"xsoar_params":["credentials.identifier",' + '"credentials.password"]},' + '{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}' + '],' + '"config":"REQUIRED(api_key) + REQUIRED(credentials)",' + '"other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "must be sorted by (type, name)" in e for e in errors + ), errors + # The error should name the offending pair. + assert any( + "'Plain'/'credentials'" in e and "'APIKey'/'api_key'" in e + for e in errors + ), errors + + def test_sort_order_same_type_by_name(self) -> None: + # Same type, names out of order: 'b' before 'a'. + detail = ( + '{"auth_types":[' + '{"type":"APIKey","name":"b","xsoar_params":["p"]},' + '{"type":"APIKey","name":"a","xsoar_params":["p"]}' + '],' + '"config":"REQUIRED(a) + REQUIRED(b)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "must be sorted by (type, name)" in e for e in errors + ), errors + + def test_empty_xsoar_params(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":[]}],' + '"config":"REQUIRED(api_key)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "auth_types[0]" in e and "must contain at least one entry" in e + for e in errors + ), errors + + def test_duplicate_name(self) -> None: + detail = ( + '{"auth_types":[' + '{"type":"APIKey","name":"x","xsoar_params":["p"]},' + '{"type":"APIKey","name":"x","xsoar_params":["q"]}' + '],' + '"config":"REQUIRED(x)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "duplicate 'name' 'x'" in e for e in errors + ), errors + + def test_other_connection_valid(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)",' + '"other_connection":["insecure","proxy","url"]}' + ) + assert validate_auth_details(detail) == [] + + def test_other_connection_empty_list(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)","other_connection":[]}' + ) + assert validate_auth_details(detail) == [] + + def test_other_connection_missing_key(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)"}' + ) + errors = validate_auth_details(detail) + assert any( + "Missing required keys" in e and "other_connection" in e + for e in errors + ), errors + + def test_other_connection_not_list(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)","other_connection":"url"}' + ) + errors = validate_auth_details(detail) + assert any( + "'other_connection' must be a list" in e for e in errors + ), errors + + def test_other_connection_non_string(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)","other_connection":["url",42]}' + ) + errors = validate_auth_details(detail) + assert any( + "'other_connection'[1]" in e and "must be a string" in e + for e in errors + ), errors + + def test_other_connection_empty_string(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)",' + '"other_connection":["url",""]}' + ) + errors = validate_auth_details(detail) + assert any( + "'other_connection'[1]" in e and "non-empty string" in e + for e in errors + ), errors + + def test_other_connection_duplicates(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)",' + '"other_connection":["proxy","url","url"]}' + ) + errors = validate_auth_details(detail) + assert any( + "duplicate" in e and "url" in e for e in errors + ), errors + + def test_other_connection_unsorted(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"api_key",' + '"xsoar_params":["api_key"]}],' + '"config":"REQUIRED(api_key)",' + '"other_connection":["url","proxy"]}' + ) + errors = validate_auth_details(detail) + assert any( + "must be sorted ascending" in e + and "['proxy', 'url']" in e + for e in errors + ), errors + + def test_validate_config_standalone(self) -> None: + """validate_config() works independently of validate_auth_details().""" + assert validate_config("REQUIRED(api_key)") == [] + assert validate_config("NoneRequired") == [] + errors = validate_config("FOO(bar)") + assert len(errors) > 0 + assert any("malformed clause" in e for e in errors) + + def test_accepts_dict_input(self) -> None: + """validate_auth_details() accepts pre-parsed dict.""" + data = { + "auth_types": [ + {"type": "APIKey", "name": "x", "xsoar_params": ["p"]} + ], + "config": "REQUIRED(x)", + "other_connection": [], + } + assert validate_auth_details(data) == [] + + def test_not_dict_input(self) -> None: + """validate_auth_details() rejects non-dict JSON.""" + errors = validate_auth_details("[]") + assert any("Expected a JSON object" in e for e in errors) + + def test_auth_types_not_list(self) -> None: + detail = ( + '{"auth_types":"not a list","config":"NoneRequired",' + '"other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'auth_types' must be a list" in e for e in errors + ), errors + + def test_auth_types_entry_not_dict(self) -> None: + detail = ( + '{"auth_types":["not a dict"],"config":"REQUIRED(x)",' + '"other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "auth_types[0]" in e and "expected object" in e + for e in errors + ), errors + + def test_config_not_string(self) -> None: + detail = ( + '{"auth_types":[],"config":42,"other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'config' must be a string" in e for e in errors + ), errors + + def test_interpolated_non_bool(self) -> None: + detail = ( + '{"auth_types":[{"type":"APIKey","name":"x",' + '"xsoar_params":["p"],"interpolated":"yes"}],' + '"config":"REQUIRED(x)","other_connection":[]}' + ) + errors = validate_auth_details(detail) + assert any( + "'interpolated' must be a bool" in e for e in errors + ), errors diff --git a/connectus/auth_config_parser/types.py b/connectus/auth_config_parser/types.py new file mode 100644 index 00000000000..6894758f846 --- /dev/null +++ b/connectus/auth_config_parser/types.py @@ -0,0 +1,172 @@ +"""Data model types for the auth_config_parser package. + +All public types live here. Pure Python, no external dependencies. +""" +from __future__ import annotations + +import enum +from dataclasses import dataclass, field + + +class AuthType(str, enum.Enum): + """The 7 valid auth-type enum values for Auth Details entries. + + Inherits from ``str`` so that ``AuthType("APIKey")`` construction + from JSON values works naturally and ``entry.type.value`` returns + the string for serialization. + + Examples: + >>> AuthType("APIKey") + + >>> AuthType.APIKey.value + 'APIKey' + >>> AuthType("APIKey") == "APIKey" + True + """ + + OAuth2AuthCode = "OAuth2AuthCode" + OAuth2ClientCreds = "OAuth2ClientCreds" + OAuth2JWT = "OAuth2JWT" + APIKey = "APIKey" + Plain = "Plain" + Other = "Other" + NoneRequired = "NoneRequired" + + +class ClauseOperator(str, enum.Enum): + """Operators in the config expression mini-grammar. + + Examples: + >>> ClauseOperator("REQUIRED") + + >>> ClauseOperator.CHOICE.value + 'CHOICE' + """ + + REQUIRED = "REQUIRED" + OPTIONAL = "OPTIONAL" + CHOICE = "CHOICE" + + +@dataclass(frozen=True) +class AuthEntry: + """One entry in auth_types[]: a single UCP connection type. + + Attributes: + type: The auth-type enum value. + name: Free-form logical id (unique within the row). + xsoar_params: XSOAR field paths supplying secrets for this + connection type. Bare ids or dotted forms. + interpolated: When True, the value is templated at runtime + rather than supplied by the user. Defaults to False. + + Examples: + >>> entry = AuthEntry( + ... type=AuthType.APIKey, + ... name="api_key", + ... xsoar_params=["api_key"], + ... ) + >>> entry.type + + >>> entry.interpolated + False + """ + + type: AuthType + name: str + xsoar_params: list[str] + interpolated: bool = False + + +@dataclass(frozen=True) +class ConfigClause: + """One clause in a config expression. + + Attributes: + operator: REQUIRED, OPTIONAL, or CHOICE. + names: The connection-type names referenced by this clause. + + Examples: + >>> clause = ConfigClause( + ... operator=ClauseOperator.REQUIRED, + ... names=["api_key"], + ... ) + >>> clause.operator + + """ + + operator: ClauseOperator + names: list[str] + + +@dataclass(frozen=True) +class ConfigExpression: + """Parsed config expression. + + Attributes: + none_required: True when the expression is the literal + 'NoneRequired'. When True, clauses is empty. + clauses: Ordered list of parsed clauses. Empty when + none_required is True. + + Examples: + >>> expr = ConfigExpression(none_required=True) + >>> expr.referenced_names + [] + + >>> expr = ConfigExpression(clauses=[ + ... ConfigClause(operator=ClauseOperator.REQUIRED, names=["a"]), + ... ConfigClause(operator=ClauseOperator.OPTIONAL, names=["b"]), + ... ]) + >>> expr.referenced_names + ['a', 'b'] + """ + + none_required: bool = False + clauses: list[ConfigClause] = field(default_factory=list) + + @property + def referenced_names(self) -> list[str]: + """All connection-type names referenced across all clauses, + in order, possibly with duplicates.""" + names: list[str] = [] + for clause in self.clauses: + names.extend(clause.names) + return names + + +@dataclass(frozen=True) +class AuthDetails: + """Fully parsed Auth Details JSON object. + + Attributes: + auth_types: List of auth entries, sorted by (type, name). + config: Parsed config expression. + other_connection: Sorted list of YML param ids for + connection-adjacent non-auth params. None when the key + is absent (legacy rows). + + Examples: + >>> details = AuthDetails( + ... auth_types=[ + ... AuthEntry(type=AuthType.APIKey, name="api_key", + ... xsoar_params=["api_key"]), + ... ], + ... config=ConfigExpression(clauses=[ + ... ConfigClause(operator=ClauseOperator.REQUIRED, + ... names=["api_key"]), + ... ]), + ... other_connection=["proxy", "url"], + ... ) + >>> details.auth_type_names + {'api_key'} + """ + + auth_types: list[AuthEntry] + config: ConfigExpression + other_connection: list[str] | None = None + + @property + def auth_type_names(self) -> set[str]: + """Set of all auth_types[].name values.""" + return {e.name for e in self.auth_types} diff --git a/connectus/auth_config_parser/utils.py b/connectus/auth_config_parser/utils.py new file mode 100644 index 00000000000..39a00ba4462 --- /dev/null +++ b/connectus/auth_config_parser/utils.py @@ -0,0 +1,148 @@ +"""Utility functions for the auth_config_parser package. + +Pure functions that extract derived information from parsed +:class:`~auth_config_parser.types.AuthDetails` objects. No +CSV/filesystem dependencies. +""" +from __future__ import annotations + +from auth_config_parser.types import AuthDetails + + +def project_xsoar_param_to_yml_id(xsoar_param: str) -> str: + """Collapse a dotted XSOAR param path to its base YML param id. + + Bare ids pass through unchanged. Dotted forms like + ``'credentials.identifier'`` collapse to the segment before the + first ``'.'`` (``'credentials'``). + + Args: + xsoar_param: An XSOAR field path string. + + Returns: + The base YML param id. + + Examples: + >>> project_xsoar_param_to_yml_id("api_key") + 'api_key' + >>> project_xsoar_param_to_yml_id("credentials.identifier") + 'credentials' + >>> project_xsoar_param_to_yml_id("credentials.password") + 'credentials' + >>> project_xsoar_param_to_yml_id("") + '' + """ + if not isinstance(xsoar_param, str): + return "" + return xsoar_param.split(".", 1)[0] + + +def auth_param_ids(details: AuthDetails) -> set[str]: + """Extract the set of YML param ids from an AuthDetails object. + + Returns the deduplicated set of bare YML ``configuration[].name`` + values composed from: + + - Every ``auth_types[].xsoar_params`` entry, projected via + :func:`project_xsoar_param_to_yml_id`. + - Every entry in ``other_connection`` (already bare YML ids). + + Args: + details: A parsed :class:`~auth_config_parser.types.AuthDetails` + object. + + Returns: + Set of YML param id strings. + + Examples: + >>> from auth_config_parser.parser import parse_auth_details + >>> details = parse_auth_details({ + ... "auth_types": [{"type": "Plain", "name": "creds", + ... "xsoar_params": ["credentials.identifier", + ... "credentials.password"]}], + ... "config": "REQUIRED(creds)", + ... "other_connection": ["url", "proxy"], + ... }) + >>> sorted(auth_param_ids(details)) + ['credentials', 'proxy', 'url'] + """ + result: set[str] = set() + + for entry in details.auth_types: + for xp in entry.xsoar_params: + yml_id = project_xsoar_param_to_yml_id(xp) + if yml_id: + result.add(yml_id) + + if details.other_connection is not None: + for item in details.other_connection: + if isinstance(item, str) and item: + result.add(item) + + return result + + +def auth_param_ids_with_sources( + details: AuthDetails, +) -> dict[str, list[str]]: + """Extract YML param ids with source attribution. + + Returns a dict mapping each YML param id to a list of + human-readable source descriptions indicating where the param + was declared. + + Args: + details: A parsed :class:`~auth_config_parser.types.AuthDetails` + object. + + Returns: + Dict of ``{yml_param_id: [source_description, ...]}``. + + Examples: + >>> from auth_config_parser.parser import parse_auth_details + >>> details = parse_auth_details({ + ... "auth_types": [{"type": "Plain", "name": "creds", + ... "xsoar_params": ["credentials.identifier", + ... "credentials.password"]}], + ... "config": "REQUIRED(creds)", + ... "other_connection": ["url"], + ... }) + >>> sources = auth_param_ids_with_sources(details) + >>> sources["credentials"] + ["auth_types[].name='creds' (xsoar_params=['credentials.identifier', 'credentials.password'])"] + >>> sources["url"] + ['other_connection'] + """ + sources: dict[str, list[str]] = {} + + for entry in details.auth_types: + # Collect projected ids for this entry. + projected_for_entry: list[str] = [] + for xp in entry.xsoar_params: + yml_id = project_xsoar_param_to_yml_id(xp) + if yml_id: + projected_for_entry.append(yml_id) + + # Group source description by entry — every projected id + # cites the same entry-level (name, xsoar_params) pair so + # the overlap message can quote the dotted forms verbatim. + # Dedupe per-yml_id so dotted forms collapsing to the same + # bare id (credentials.identifier + credentials.password → + # credentials) don't repeat the same descriptor twice. + descriptor = ( + f"auth_types[].name={entry.name!r} " + f"(xsoar_params={list(entry.xsoar_params)!r})" + ) + seen_for_entry: set[str] = set() + for yml_id in projected_for_entry: + if yml_id in seen_for_entry: + continue + seen_for_entry.add(yml_id) + sources.setdefault(yml_id, []).append(descriptor) + + if details.other_connection is not None: + for item in details.other_connection: + if isinstance(item, str) and item: + sources.setdefault(item, []).append("other_connection") + + return sources diff --git a/connectus/auth_config_parser/validator.py b/connectus/auth_config_parser/validator.py new file mode 100644 index 00000000000..14f70fe6b8f --- /dev/null +++ b/connectus/auth_config_parser/validator.py @@ -0,0 +1,258 @@ +"""Validation functions for the auth_config_parser package. + +Returns error lists (empty = valid). Never raises. Matches the current +``validate_auth_detail()`` contract in ``workflow_state.py``. +""" +from __future__ import annotations + +import json + +from auth_config_parser.parser import _VALID_AUTH_TYPE_VALUES, _parse_config_impl + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + +def validate_config(expr: str) -> list[str]: + """Validate a config expression string. + + Returns a list of human-readable error strings. Empty list means + the expression is syntactically valid. + + This validates syntax only — it does NOT check that operand names + match any ``auth_types[].name``. Use :func:`validate_auth_details` + for cross-referencing validation. + + Args: + expr: The config expression string. + + Returns: + List of error strings (empty = valid). + + Examples: + >>> validate_config("REQUIRED(api_key)") + [] + >>> validate_config("NoneRequired") + [] + >>> validate_config("REQUIRED()") + ["clause 'REQUIRED(...)' has no operands"] + >>> validate_config("FOO(bar)") # doctest: +ELLIPSIS + ["malformed clause 'FOO(bar)' ..."] + """ + _, errors = _parse_config_impl(expr) + return errors + + +def validate_auth_details(data: str | dict) -> list[str]: + """Validate Auth Details JSON shape. Returns list of errors ([] = valid). + + Performs ALL validation currently done by ``workflow_state.py``'s + ``validate_auth_detail()``, including: + + - JSON parsing + - Required keys: ``auth_types``, ``config``, ``other_connection`` + - ``auth_types[]`` entry shape (type enum, name uniqueness, + xsoar_params non-empty list of non-empty strings, interpolated + bool) + - ``auth_types[]`` sort order by ``(type, name)`` + - ``config`` expression syntax (via :func:`validate_config`) + - ``config`` operand names cross-referenced against + ``auth_types[].name`` + - ``NoneRequired`` ↔ empty ``auth_types`` coherence + - ``other_connection``: list of non-empty unique sorted strings + + Args: + data: JSON string or pre-parsed dict. + + Returns: + List of error strings (empty = valid). + + Examples: + >>> validate_auth_details('{"auth_types":[],' + ... '"config":"NoneRequired","other_connection":[]}') + [] + """ + errors: list[str] = [] + + # --- Parse JSON if string --- + if isinstance(data, str): + try: + detail = json.loads(data) + except json.JSONDecodeError as e: + return [f"Invalid JSON: {e}"] + else: + detail = data + + if not isinstance(detail, dict): + return [f"Expected a JSON object, got {type(detail).__name__}"] + + required_keys = {"auth_types", "config", "other_connection"} + missing = required_keys - set(detail.keys()) + if missing: + errors.append(f"Missing required keys: {', '.join(sorted(missing))}") + return errors + + seen_names: set[str] = set() + # Track per-entry validity for the sort check (only consider entries + # whose `type` and `name` are both well-formed). + sortable: list[tuple[int, str, str]] = [] + valid_auth_types_list = isinstance(detail["auth_types"], list) + if not valid_auth_types_list: + errors.append( + f"'auth_types' must be a list, got " + f"{type(detail['auth_types']).__name__}" + ) + else: + for i, entry in enumerate(detail["auth_types"]): + if not isinstance(entry, dict): + errors.append( + f"auth_types[{i}]: expected object, got " + f"{type(entry).__name__}" + ) + continue + entry_type_ok = False + entry_name_ok = False + if "type" not in entry: + errors.append(f"auth_types[{i}]: missing 'type'") + elif entry["type"] not in _VALID_AUTH_TYPE_VALUES: + errors.append( + f"auth_types[{i}]: invalid type '{entry['type']}'" + ) + else: + entry_type_ok = True + if "name" not in entry: + errors.append(f"auth_types[{i}]: missing 'name'") + elif not isinstance(entry["name"], str): + errors.append(f"auth_types[{i}]: 'name' must be a string") + elif not entry["name"]: + errors.append( + f"auth_types[{i}]: 'name' must be a non-empty string" + ) + elif entry["name"] in seen_names: + errors.append( + f"auth_types[{i}]: duplicate 'name' '{entry['name']}' " + "(each entry must have a unique logical name)" + ) + else: + seen_names.add(entry["name"]) + entry_name_ok = True + if "xsoar_params" not in entry: + errors.append(f"auth_types[{i}]: missing 'xsoar_params'") + elif not isinstance(entry["xsoar_params"], list): + errors.append( + f"auth_types[{i}]: 'xsoar_params' must be a list, " + f"got {type(entry['xsoar_params']).__name__}" + ) + elif len(entry["xsoar_params"]) == 0: + errors.append( + f"auth_types[{i}]: 'xsoar_params' must contain at " + "least one entry" + ) + else: + for j, p in enumerate(entry["xsoar_params"]): + if not isinstance(p, str) or not p: + errors.append( + f"auth_types[{i}]: xsoar_params[{j}] must be " + "a non-empty string" + ) + if "interpolated" in entry and not isinstance( + entry["interpolated"], bool + ): + errors.append( + f"auth_types[{i}]: 'interpolated' must be a bool, " + f"got {type(entry['interpolated']).__name__}" + ) + + if entry_type_ok and entry_name_ok: + sortable.append((i, entry["type"], entry["name"])) + + # Sort-order check: report the first out-of-order adjacent pair + # among the entries that have valid `type` and `name`. + for k in range(len(sortable) - 1): + i_a, type_a, name_a = sortable[k] + i_b, type_b, name_b = sortable[k + 1] + if (type_a, name_a) > (type_b, name_b): + errors.append( + f"auth_types must be sorted by (type, name); entry " + f"[{i_a}] '{type_a}'/'{name_a}' should come after " + f"entry [{i_b}] '{type_b}'/'{name_b}'" + ) + break + + if not isinstance(detail["config"], str): + errors.append( + f"'config' must be a string, got " + f"{type(detail['config']).__name__}" + ) + else: + config_str = detail["config"] + config_expr, parse_errors = _parse_config_impl(config_str) + for pe in parse_errors: + errors.append(f"'config': {pe}") + for n in config_expr.referenced_names: + if n not in seen_names: + errors.append( + f"'config' references unknown connection-type name " + f"'{n}' (must match an auth_types[].name)" + ) + # Coherence between `config` and `auth_types`. + if valid_auth_types_list: + auth_types_empty = len(detail["auth_types"]) == 0 + if config_str.strip() == "NoneRequired": + if not auth_types_empty: + errors.append( + "'config' is 'NoneRequired' but 'auth_types' " + "contains entries; remove the entries or change " + "'config'" + ) + else: + # Only flag the empty-auth_types mismatch if the config + # itself parsed cleanly (otherwise the parse error is + # the more informative signal). + if not parse_errors and auth_types_empty: + errors.append( + "'config' is not 'NoneRequired' but 'auth_types' " + "is empty" + ) + + other_connection = detail["other_connection"] + if not isinstance(other_connection, list): + errors.append( + f"'other_connection' must be a list, got " + f"{type(other_connection).__name__}" + ) + else: + all_strings = True + for j, item in enumerate(other_connection): + if not isinstance(item, str): + errors.append( + f"'other_connection'[{j}]: must be a string, got " + f"{type(item).__name__}" + ) + all_strings = False + elif not item: + errors.append( + f"'other_connection'[{j}]: must be a non-empty string" + ) + all_strings = False + if all_strings: + if len(set(other_connection)) != len(other_connection): + seen: set[str] = set() + dups: list[str] = [] + for item in other_connection: + if item in seen and item not in dups: + dups.append(item) + seen.add(item) + errors.append( + "'other_connection' contains duplicate entries: " + f"{dups}" + ) + sorted_oc = sorted(other_connection) + if other_connection != sorted_oc: + errors.append( + "'other_connection' must be sorted ascending; got " + f"{other_connection}, expected {sorted_oc}" + ) + + return errors diff --git a/connectus/auth_parity_test_design.md b/connectus/auth_parity_test_design.md new file mode 100644 index 00000000000..3122bd35fb9 --- /dev/null +++ b/connectus/auth_parity_test_design.md @@ -0,0 +1,1075 @@ +# Design: Auth Parity Test + +## Purpose + +Verify that for each non-interpolated connection declared in an +integration's `Auth Details`, the secret values end up in the **same +place** on every outgoing HTTP request regardless of whether they were +supplied the "old way" (via [`demisto.params()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9736) → +integration code → [`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703)) or the +"new way" (params **omitted** from `demisto.params()`, secrets +**injected by BaseClient** via the UCP credential-injection +infrastructure — [`_inject_ucp_credentials()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9919), +[`_apply_ucp_credentials()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9799), +[`get_ucp_credentials()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13849)). + +"Same place" means: the same header name, query-param name, body +field, basic-auth slot, bearer-token slot, cookie, or URL-userinfo +position — byte-for-byte on the sentinel value, modulo the +canonicalization rules in [§4](#4-the-parity-comparison). + +### Non-goals + +| # | Non-goal | Why | +|---|----------|-----| +| 1 | Checking parameter correctness / coverage | That is [`check_command_params.py`](connectus/check_command_params.py:1)'s job. | +| 2 | Checking that the API actually accepts the request | The test only inspects the **request** side; responses are canned. | +| 3 | Validating `interpolated: true` connections | Interpolated connections have their values templated at runtime by the manifest generator — there is no user-supplied secret to compare. The parity test emits `ERROR_ALL_INTERPOLATED` when every connection is interpolated, or `"skipped_interpolated"` per-connection when only some are (see [§5.5](#55-error-codes--hard-errors)). | +| 4 | Validating `other_connection` values (URL, proxy, insecure, …) | Only auth secrets (values declared in `auth_types[].xsoar_params`) are in scope. Connection metadata is orthogonal. | +| 5 | Non-Python integrations or integrations without `BaseClient` | **Hard error, not a skip.** The tool emits `ERROR_NON_PYTHON` or `ERROR_NO_BASECLIENT` and exits immediately. The migration skill must mark the affected connections as `"interpolated": true` and re-run `set-auth`. See [§5.5](#55-error-codes--hard-errors). | + +--- + +## 1. Inputs + +| Input | Source | Purpose | +|-------|--------|---------| +| Integration directory | CLI arg (same as [`check_command_params.py`](connectus/check_command_params.py:1)) | Locate YML + Python source. | +| `Auth Details` cell | Read via [`workflow_state.py show-step`](connectus/workflow_state.py:2240) or [`parse_auth_details()`](connectus/auth_config_parser/parser.py:1) / [`auth_param_ids()`](connectus/auth_config_parser/utils.py:1) programmatically | Provides `auth_types[]` with `xsoar_params`, `interpolated`, `name`, `type`, and the `config` expression — parsed into typed [`AuthDetails`](connectus/auth_config_parser/types.py:102) / [`AuthEntry`](connectus/auth_config_parser/types.py:52) dataclasses. | +| `Params for test with default in code` cell | Read via [`workflow_state.py show-step`](connectus/workflow_state.py:1) | Supplies throwaway defaults for non-auth required params so the integration can start. | +| `Params to Commands` cell | Read via [`workflow_state.py show-step`](connectus/workflow_state.py:1) | Provides the per-command param lists; used to pick a minimal covering command set. | +| Integration ID | CLI arg `--integration-id` | Key into the pipeline CSV for all of the above. | + +### Command selection strategy + +The test must exercise at least: + +1. **`test-module`** — always present, always exercises the primary + auth path. +2. **One representative command per distinct auth-bearing code path.** + In practice, most integrations use a single `Client` constructed + once in `main()`, so `test-module` alone covers the auth surface. + However, integrations with multiple `Client` instances or + per-command auth overrides (e.g. a `fetch-events` command that uses + a different token than `test-module`) need additional commands. + +**Heuristic:** start with `test-module`. If the `Params to Commands` +cell shows commands whose param lists include auth-adjacent params not +present in `test-module`'s list, add those commands. If no such +commands exist, `test-module` alone is sufficient. The analyzer can +also accept `--commands cmd1 cmd2 ...` for manual override. + +--- + +## 2. The two runs — old vs new + +For each non-interpolated `auth_types[]` entry (call it **connection +C**): + +### 2.1 Old run (legacy path) + +Build a `demisto.params()` dict that includes C's `xsoar_params` +populated with **distinguishable sentinel values**. Non-auth required +params are filled from `Params for test with default in code` plus a +generic placeholder pass. Run the selected command(s) under +[`capture_proxy.py`](connectus/capture_proxy.py:1). Record every +captured outgoing request. + +### 2.2 New run (UCP injection path) + +Build a `demisto.params()` dict that **omits** C's `xsoar_params` +entirely. Instead, patch the UCP injection seam so that +[`get_ucp_credentials()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13849) +returns a credential dict containing the **same sentinel values**, +routed through the appropriate type envelope. Run the same command(s) +under [`capture_proxy.py`](connectus/capture_proxy.py:1). Record +every captured outgoing request. + +### 2.3 Sentinel value generation + +Each leaf field in `xsoar_params` gets a unique sentinel: + +``` +__AUTHPARITY______ +``` + +Example for a `Plain` connection named `credentials`: + +``` +__AUTHPARITY__credentials__credentials.identifier__a1b2c3d4 +__AUTHPARITY__credentials__credentials.password__e5f6g7h8 +``` + +Properties: +- **Unique per leaf field** — so we can disambiguate which secret + landed where even when multiple secrets share a header. +- **Long enough** (≥40 chars) to be unambiguous in grep. +- **ASCII-safe** — no characters that would be mangled by URL-encoding + or base64 in ways that hide the sentinel. +- The `uuid8` suffix is 8 hex chars from `uuid.uuid4().hex[:8]`, + regenerated per test run. + +### 2.4 Non-auth param filling + +1. Read `Params for test with default in code` — use those values + verbatim. +2. For any remaining required YML param not in the ignore set and not + an auth param: seed with a type-aware placeholder (reuse the + coercion logic from + [`check_command_params.py`](connectus/check_command_params.py:1) — + booleans → `True`, ints → `1`, strings → `"PLACEHOLDER_"`, + credentials → `{"identifier": "placeholder", "password": + "placeholder"}`, etc.). +3. Param correctness is out of scope — if the integration crashes + because a placeholder is wrong, the run is `inconclusive`, not a + parity failure. + +### 2.5 UCP injection wiring + +The test harness must intercept the UCP credential-resolution chain. +The seam is [`get_ucp_credentials(method_unique_id)`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13849), +which normally calls `demisto.getUCPCredentials(...)`. The harness +patches this function (or the underlying `demisto.getUCPCredentials` +mock) to return a synthetic credential dict. + +**Contract the parity test requires from the injection hook:** + +```python +def mock_get_ucp_credentials(method_unique_id: str) -> dict: + """Return a credential dict whose secret fields contain the + same sentinel values that the old run seeded into demisto.params(). + + The dict shape depends on the auth type: + + APIKey: + {"type": "api_key", "api_key": {"key": ""}} + + Plain: + {"type": "plain", "plain": {"username": "", "password": ""}} + + OAuth2 (any sub-type): + {"type": "oauth2", "oauth2": {"access_token": "", "token_type": "Bearer"}} + """ +``` + +The mapping from `auth_types[].type` to credential-dict shape is: + +| `auth_types[].type` | UCP `type` field | Sentinel placement | +|---------------------|------------------|--------------------| +| `APIKey` | `"api_key"` | `api_key.key` ← sentinel for the single `xsoar_params` entry | +| `Plain` | `"plain"` | `plain.username` ← sentinel for `.identifier`; `plain.password` ← sentinel for `.password` | +| `OAuth2ClientCreds` | `"oauth2"` | `oauth2.access_token` ← sentinel for the password/secret param | +| `OAuth2AuthCode` | `"oauth2"` | same as above | +| `OAuth2JWT` | `"oauth2"` | same as above | +| `Other` | varies | **skip** — see [§6 edge cases](#6-edge-cases--open-questions) | +| `NoneRequired` | n/a | no run needed | + +If the exact injection API changes before this test ships, the +**contract** above is what the test needs: a function that accepts a +connection-name → sentinel-values map and returns the correctly-shaped +credential dict. The test does not depend on the internal UCP cache, +TTL, or capability-resolution logic — it short-circuits all of that. + +Additionally, the harness must ensure [`is_ucp_enabled()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13671) +returns `False` for the old run and `True` for the new run, and that +[`should_use_ucp_auth()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13671) +follows suit. This controls whether integrations like +[`Salesforce_IAM`](Packs/Salesforce/Integrations/Salesforce_IAM/Salesforce_IAM.py:42) +take the legacy `get_access_token_()` path or the UCP path. + +### 2.6 Network mocking + +Requests must be allowed to leave the integration code but MUST be +intercepted before hitting the real API. +[`capture_proxy.py`](connectus/capture_proxy.py:1) already does this: +it accepts any HTTP method on any path, returns `200 {}`, and records +the full request (method, path, query, headers, body, timestamp). + +The integration's `url` / `base_url` param is pointed at +`http://localhost:` so all traffic routes through the +proxy. Responses are canned/empty; the test only inspects the +**request** side. + +### 2.7 Execution model + +This section describes how the harness loads integration code, wires +the proxy, injects sentinels, and handles crashes. The pattern mirrors +[`check_command_params.py`](connectus/check_command_params.py:1)'s +dynamic phase. + +#### 2.7.1 How the integration code is loaded + +The harness uses the same content-preparation pipeline as +[`check_command_params.py`](connectus/check_command_params.py:1): + +1. **Prepend [`demistomock.py`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:1)** + — provides the `demisto` object with `.params()`, `.command()`, + `.args()`, etc. The harness patches these to return controlled + values (see [§2.7.2](#272-how-the-proxy-is-wired-in)). +2. **Prepend [`CommonServerPython.py`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:1)** + — provides [`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703), + UCP injection functions, and the rest of the runtime. +3. **Run `demisto-sdk prepare-content -i `** — inlines API + modules (e.g. `MicrosoftApiModule`, `AWSApiModule`). +4. **Result:** a single unified `.py` file that can be imported and + executed standalone. + +The unified file is loaded via +[`importlib.util.spec_from_file_location()`](connectus/check_command_params.py:2650) +as module `"integration_under_test"`, then executed with +[`spec.loader.exec_module(module)`](connectus/check_command_params.py:2656). +After import, `return_error` is patched to exit with a distinct code +(`RC_RETURN_ERROR_PATCHED = 7`) so errors are observable. Finally, +[`module.main()`](connectus/check_command_params.py:2677) is called. + +Params are seeded **before import** via env vars +(`CHECK_PARAMS_JSON`, `CHECK_COMMAND`) read by the on-disk +`demistomock.py` mock — this is critical for integrations whose +`Client(...)` is constructed at import time and reads params during +construction (the pre-import param seeding pattern from +[`check_command_params.py`](connectus/check_command_params.py:2640)). + +#### 2.7.2 How the proxy is wired in + +Since the auth parity test **requires** [`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703) +usage, proxy wiring is simpler than in +[`check_command_params.py`](connectus/check_command_params.py:2889): + +1. **URL rewriting:** The `url` param in `demisto.params()` is set to + `http://127.0.0.1:`. Since we require `BaseClient`, + this covers all HTTP traffic — `BaseClient.__init__` stores + [`self._base_url = base_url`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9746) + from the `url` param, and all subsequent + [`_http_request()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:10186) + calls use `urljoin(self._base_url, url_suffix)`. + +2. **Insecure flag:** Set `demisto.params()["insecure"] = True` so + `BaseClient.__init__` calls + [`skip_cert_verification()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9764) + and does not reject the plain HTTP connection. + +3. **No `HTTP_PROXY` env var needed.** Unlike + [`check_command_params.py`](connectus/check_command_params.py:2889) + which sets `HTTP_PROXY` / `HTTPS_PROXY` env vars to catch traffic + from non-BaseClient code paths, the auth parity test does NOT need + this — we require `BaseClient`, so URL rewriting is sufficient. + This avoids the `boto3` proxy-bypass problem entirely. + +#### 2.7.3 Sentinel injection — old vs new run + +For each `(connection, command)` pair, the harness executes two runs: + +- **Old run:** Sentinels are placed directly into `demisto.params()` + at the `xsoar_params` paths (see [§2.1](#21-old-run-legacy-path)). + UCP is disabled: `is_ucp_enabled() → False`. +- **New run:** `xsoar_params` are **omitted** from `demisto.params()`. + Sentinels are injected via the UCP mock (see [§2.5](#25-ucp-injection-wiring)). + UCP is enabled: `is_ucp_enabled() → True`, + `should_use_ucp_auth() → True`. + +Both runs use the same sentinel values so the location comparison +is meaningful. + +#### 2.7.4 Sequence diagram — one connection, one command + +``` +Harness Proxy Integration + | | | + |-- start proxy ---------->| | + |<-- port=P ---------------| | + | | | + |== OLD RUN =======================================| + | | | + |-- new_session() -------->| | + |<-- sid_old --------------| | + | | | + |-- seed params: | + | url=http://127.0.0.1:P | + | insecure=True | + | xsoar_params=sentinels | + | is_ucp_enabled=False | + | | | + |-- load unified .py ----->| | + |-- call main() ---------> ---->| + | |<-- HTTP req 1 --------| + | |--- 200 {} ----------->| + | |<-- HTTP req 2 --------| + | |--- 200 {} ----------->| + |<-- main() returns / crashes | + | | | + |-- get_requests(sid_old)->| | + |<-- old_requests ---------| | + | | | + |== NEW RUN =======================================| + | | | + |-- new_session() -------->| | + |<-- sid_new --------------| | + | | | + |-- seed params: | + | url=http://127.0.0.1:P | + | insecure=True | + | xsoar_params=OMITTED | + | is_ucp_enabled=True | + | get_ucp_credentials=mock with sentinels | + | | | + |-- load unified .py ----->| | + |-- call main() ---------> ---->| + | |<-- HTTP req 1 --------| + | |--- 200 {} ----------->| + | |<-- HTTP req 2 --------| + | |--- 200 {} ----------->| + |<-- main() returns / crashes | + | | | + |-- get_requests(sid_new)->| | + |<-- new_requests ---------| | + | | | + |== COMPARE =======================================| + | | | + |-- extract_locations(old_requests, sentinels) | + |-- extract_locations(new_requests, sentinels) | + |-- locations_old == locations_new? | + | YES -> PASS | + | NO -> FAIL + classify diffs | +``` + +#### 2.7.5 Crash handling + +When the integration crashes during a run (old or new): + +1. **Capture the exception** — record the traceback in + `diagnostics....stderr_excerpt`. +2. **Emit `"inconclusive"`** for that command — do NOT treat it as a + parity failure. +3. **Do NOT abort the entire run.** Other commands for the same + connection, and other connections, continue independently. This + mirrors [`check_command_params.py`](connectus/check_command_params.py:1)'s + per-command exception isolation (Fix #1 in the implementation + status). + +--- + +## 3. Invariants + +For each non-interpolated connection C and each exercised command: + +> **Parity invariant:** For every sentinel value S generated for C's +> `xsoar_params`, the set of locations where S appears in the old +> run's captured requests MUST equal the set of locations where S +> appears in the new run's captured requests. + +A "location" is a structured path — see [§4](#4-the-parity-comparison). + +--- + +## 4. The parity comparison + +### 4.1 Location taxonomy + +For each captured request, extract the **locations** where each +sentinel value appears. A location is one of: + +| Location type | Format | Example | +|---------------|--------|---------| +| HTTP header (raw) | `header:` | `header:X-Api-Key` | +| HTTP header (Bearer) | `header:Authorization:Bearer` | Bearer token body | +| HTTP header (Basic — user slot) | `header:Authorization:Basic:user` | Decoded user from `Basic ` | +| HTTP header (Basic — pass slot) | `header:Authorization:Basic:pass` | Decoded pass from `Basic ` | +| HTTP header (Token/custom prefix) | `header:Authorization:` | e.g. `Token`, `SSWS` | +| Query parameter | `query:` | `query:api_key` | +| JSON body field | `body.json:` | `body.json:auth.client_secret` | +| Form body field | `body.form:` | `body.form:client_id` | +| URL userinfo | `url.userinfo:user` or `url.userinfo:pass` | `https://user:pass@host/` | +| Cookie | `cookie:` | `cookie:session_token` | + +### 4.2 Canonicalization rules + +To avoid false failures from cosmetic differences: + +1. **Header name case:** case-insensitive comparison (normalize to + lowercase). +2. **Basic auth:** decode the `Basic` header's base64 payload and + compare the user and password slots independently — never compare + the raw base64 blob. +3. **Bearer / Token / custom prefixes:** strip the scheme prefix + (e.g. `Bearer `, `Token `, `SSWS `), compare the token body only. + The prefix itself is recorded in the location type for diagnostic + purposes but is not part of the parity check on the sentinel value. +4. **Multiple requests in a run:** compare as **multisets** keyed by + `(method, url_path_template, location)`. The same auth header + appearing on every request in both runs is fine; appearing in old + but not new (or vice versa) is a fail. +5. **Order of headers / query params:** irrelevant. Location sets are + unordered. +6. **URL-encoding:** sentinels are ASCII-safe by design, but if a + sentinel appears URL-encoded in a query string, decode before + comparison. + +### 4.3 Building location sets + +For each run (old, new) and each sentinel S: + +``` +locations(S) = { + (method, url_path, location_type) + for request in captured_requests + for location_type in extract_locations(request, S) +} +``` + +Where `url_path` is the request path with the proxy host stripped +(e.g. `/api/v1/health`). Query-string parameters are NOT part of +`url_path` — they are captured as `query:` locations. + +### 4.4 Parity verdict + +The integration **passes parity** for connection C iff for every +sentinel S generated for C: + +``` +locations_old(S) == locations_new(S) +``` + +### 4.5 Failure taxonomy + +| Code | Meaning | Severity | +|------|---------|----------| +| `MISSING_IN_NEW` | Sentinel appeared in old request at location L, not in new at L. | **Fail** — the new path lost a secret placement. | +| `EXTRA_IN_NEW` | Sentinel appeared in new at location L, not in old. | **Fail** — the new path added an unexpected secret placement. | +| `WRONG_LOCATION` | Sentinel present in both runs but at different locations. Special case combining the above two; surfaced explicitly because it is the most diagnostic. | **Fail** | +| `MISSING_IN_BOTH` | Sentinel never appeared in any captured request for this command. | **Diagnostic only** — the command may not exercise that connection (fine), or the integration may be dead (note it). Not a parity failure. | +| `RUN_FAILED_OLD` | The integration crashed before issuing any request in the old run. | **Inconclusive** | +| `RUN_FAILED_NEW` | The integration crashed before issuing any request in the new run. | **Inconclusive** | +| `NO_REQUESTS_CAPTURED` | Ran cleanly but issued zero HTTP calls. | **Inconclusive** | + +--- + +## 5. CLI / output shape + +### 5.1 Invocation + +```bash +python3 connectus/check_auth_parity.py \ + --integration-id "" \ + [--commands cmd1 cmd2 ...] \ + [--connection ] \ + [--timeout SECONDS] \ + [--docker {auto,always,never}] \ + [--docker-image ] \ + [--use-integration-docker] +``` + +Mirrors [`check_command_params.py`](connectus/check_command_params.py:4206)'s +CLI surface. The `--integration-id` flag is **required** (not optional +as in the sibling tool) because the test needs `Auth Details` to know +what to test. The optional `--connection` flag restricts the test to a +single named connection (useful when re-running after removing an +interpolated connection from the invocation). + +### 5.2 Stdout JSON shape + +```json +{ + "integration": "", + "auth_parity": { + "": { + "status": "pass | fail | skipped_interpolated | skipped_other_type | skipped_signed | skipped_mtls | inconclusive", + "commands": { + "": "pass | fail | inconclusive" + } + } + }, + "diagnostics": { + "": { + "sentinels": { + "": "" + }, + "commands": { + "": { + "old_run": { + "status": "ok | crashed | no_requests", + "captured_request_count": 3, + "locations": { + "": ["header:authorization:bearer", "..."] + }, + "stderr_excerpt": "..." + }, + "new_run": { + "status": "ok | crashed | no_requests", + "captured_request_count": 3, + "locations": { + "": ["header:authorization:bearer", "..."] + }, + "stderr_excerpt": "..." + }, + "diffs": [ + { + "sentinel": "", + "failure_code": "MISSING_IN_NEW | EXTRA_IN_NEW | WRONG_LOCATION | MISSING_IN_BOTH", + "old_locations": ["header:authorization:bearer"], + "new_locations": [] + } + ], + "request_set_diff": { + "only_in_old": [{"method": "POST", "path": "/oauth/token"}], + "only_in_new": [] + } + } + } + } + } +} +``` + +### 5.3 Diagnostics stripping rule + +Same convention as [`check_command_params.py`](connectus/check_command_params.py:1): + +> ⚠️ **The `diagnostics` field MUST be stripped before persisting.** +> It is internal signal for the migration skill's decision-making. +> The persisted artifact contains only `integration` and +> `auth_parity`. + +### 5.4 Persisted result — proposed column and setter + +**Recommendation:** use the existing workflow step **#13 `auth parity +test passes`** (a checkpoint, not a data column). The parity test's +`auth_parity` JSON is consumed by the AI to decide whether to +`markpass` or `fail` step #13. There is no need for a new CSV column — +the JSON output is ephemeral (like `check_command_params.py`'s output +is ephemeral before being distilled into `Params to Commands`). + +The workflow interaction is: + +1. Run `check_auth_parity.py` → JSON to stdout. +2. AI reads `auth_parity` → all connections `pass` or + `skipped_interpolated`? + - **Yes** → + `python3 connectus/workflow_state.py markpass "" "auth parity test passes"` + - **No** → investigate failures, fix code, re-run. + +Step #12 (`requires auth parity test`) must already be `YES` for step +#13 to be meaningful. If it is `NO` or `N/A`, step #13 is auto-`N/A` +and the parity test is not run. + +### 5.5 Error codes — hard errors + +The tool's scope is **strictly** Python integrations that use +[`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703). +Everything outside this scope produces a **hard error** (not a skip, +not inconclusive) with a specific error code. The error is reported +as a top-level `"error"` key in the JSON output (consistent with +[`check_command_params.py`](connectus/check_command_params.py:1)'s +pattern of emitting structured JSON on stdout even for failures) and +a non-zero process exit code. + +| Error code | Exit code | Detection | Error message | +|------------|-----------|-----------|---------------| +| `ERROR_NON_PYTHON` | `10` | YML `script.type` is not `python` / `python3`, or the integration directory contains no `.py` file (only `.js` or `.ps1`). | `"Auth parity test only supports Python integrations. This integration is . Mark its auth as interpolated if it cannot use BaseClient injection."` | +| `ERROR_NO_BASECLIENT` | `11` | The integration is Python but its `.py` file does not import or instantiate `BaseClient` (or a subclass). Detection: statically check whether the source contains `class (BaseClient)`, `BaseClient(`, or `from CommonServerPython import ... BaseClient`. | `"Auth parity test requires BaseClient usage. This integration does not use BaseClient. Mark its auth as interpolated if it cannot use BaseClient injection."` | +| `ERROR_ALL_INTERPOLATED` | `12` | Every `auth_types[]` entry in `Auth Details` has `"interpolated": true`. | `"All auth types are interpolated. Auth parity test is not applicable — interpolated connections are handled by infrastructure, not integration code."` | +| `ERROR_CONNECTION_INTERPOLATED` | `13` | A specific connection name was requested via `--connection ` but that connection has `"interpolated": true`. | `"Connection '' is interpolated. Auth parity test only applies to non-interpolated connections. Remove the interpolated flag or skip this connection."` | +| `ERROR_INTEGRATION_REJECTS_HTTP` | `14` | The integration code checks that the URL starts with `https://` and rejects `http://`, causing the proxy-redirected URL to fail. Detected when the old run crashes with an error message containing `http://` or `https` and `scheme`/`protocol`. | `"Integration rejects HTTP URLs. Auth parity test requires BaseClient URL rewriting to http://. Mark its auth as interpolated if it cannot use BaseClient injection."` | + +**JSON shape on hard error:** + +```json +{ + "integration": "", + "error": { + "code": "ERROR_NON_PYTHON", + "message": "Auth parity test only supports Python integrations. This integration is javascript. Mark its auth as interpolated if it cannot use BaseClient injection.", + "exit_code": 10 + } +} +``` + +**Partial interpolation** (some `auth_types[]` entries are +interpolated, some are not) is **NOT** an error. The tool runs parity +checks only on the non-interpolated entries and reports +`"skipped_interpolated"` for the interpolated ones in the +`auth_parity` output. This is the one case where a skip status is +acceptable — because the tool IS running, just not for those specific +connections. + +### 5.6 Skill error handling + +When the migration skill (in +[`connectus/connectus-migration-SKILL.md`](connectus/connectus-migration-SKILL.md)) +encounters these error codes, it must react as follows: + +#### `ERROR_NON_PYTHON` or `ERROR_NO_BASECLIENT` or `ERROR_INTEGRATION_REJECTS_HTTP` + +1. **Reset the workflow** back to the `Auth Details` step. +2. **Re-run `set-auth`** with all `auth_types[]` entries for the + affected connections changed to `"interpolated": true`. +3. **Re-run manifest generation** (since auth details changed). + +The error messages are designed to be parseable by the skill — they +contain the literal string `"Mark its auth as interpolated"` as a +signal. The skill should pattern-match on this substring to trigger +the interpolation-and-retry flow. + +#### `ERROR_ALL_INTERPOLATED` + +The skill should recognize this integration doesn't need auth parity +testing and mark the checkpoint as passed (or N/A): + +```bash +python3 connectus/workflow_state.py markpass "" "auth parity test passes" +``` + +This is not a failure — it means the integration's auth is fully +handled by infrastructure and there is no user-supplied secret to +compare. + +#### `ERROR_CONNECTION_INTERPOLATED` + +The skill should remove that connection from the test invocation and +retry with only non-interpolated connections. If no non-interpolated +connections remain after removal, treat as `ERROR_ALL_INTERPOLATED`. + +--- + +## 6. Edge cases & open questions + +### 6.1 `CHOICE(a, b)` configs + +Run parity for **each branch independently**. A `CHOICE` means the +user picks one of several connection types at configuration time. The +parity test must verify that whichever branch the user picks, the +secrets land in the same place under old vs new. Each branch gets its +own `auth_parity[""]` entry. + +### 6.2 `REQUIRED(a) + OPTIONAL(b)` + +Test the optional connection **when it is non-interpolated**. Run it +as a separate parity check with its own sentinel set. The optional +auth is activated by seeding its `xsoar_params` (old run) or +injecting its UCP credentials (new run) — independently of the +required connection's run. This avoids conflating the two connections' +sentinel locations. + +If the optional connection's `xsoar_params` overlap with the required +connection's (same XSOAR field backing both), the sentinels will +differ between the two runs (each run generates fresh UUIDs), so +there is no cross-contamination. + +### 6.3 Signed requests (HMAC, AWS SigV4) + +The sentinel will **not** appear verbatim in the `Authorization` +header — it is consumed as an input to a signing function whose output +is a derived signature. + +**Recommendation: skip with `status: "skipped_signed"`.** + +Rationale: the parity test's core mechanism is sentinel-grep. For +signed auth, the sentinel is an input to a one-way function; the +output is not greppable. Comparing derived signatures would require +the test to replicate the signing algorithm, which defeats the purpose +of a black-box parity check. These integrations are better verified +by a targeted unit test that asserts the signing inputs are identical. + +Detection: integrations classified as `APIKey` whose code imports +`hmac`, `hashlib.sha256` for signing, `botocore`, `AWSApiModule`, or +Akamai EdgeGrid patterns. The analyzer can flag these statically. + +### 6.4 mTLS / cert-key auth (YML type 14) + +The "secret" is a PEM certificate/key. It typically lands in the TLS +handshake, not in the HTTP request body or headers. +[`capture_proxy.py`](connectus/capture_proxy.py:1) operates at the +HTTP layer and does **not** surface TLS client-certificate +negotiation. + +**Recommendation: skip with `status: "skipped_mtls"`.** + +The limitation is structural — the capture proxy would need to be +replaced with a TLS-terminating MITM proxy to observe client certs, +which is a significant complexity increase for a rare auth type. +Document the skip and recommend a manual TLS-layer probe for these +integrations. + +### 6.4.1 Multiple base URLs + +Some integrations construct a second +[`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703) +with a different URL (e.g. an auth endpoint vs an API endpoint, or a +graph endpoint vs a management endpoint). Since the harness sets the +`url` param to `http://127.0.0.1:`, the proxy captures **all** +traffic to that address regardless of path. + +The `Host` header in captured requests distinguishes the original +target — the integration typically sets `Host: api.example.com` via +[`BaseClient._headers`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9749) +or the request itself. The parity comparison uses `(method, +url_path)` tuples (see [§4.3](#43-building-location-sets)), so +requests to different paths are naturally separated even though they +all hit the same proxy port. + +If the integration constructs a second `BaseClient` with a +**hardcoded** URL (not from `demisto.params()`), that traffic will +NOT reach the proxy. This is acceptable — the parity test only +covers auth paths that flow through `demisto.params()["url"]`. + +### 6.4.2 HTTPS enforcement in code + +If the integration code checks that the URL starts with `https://` +and rejects `http://`, the test will fail during the old run before +any HTTP request is made. This is caught as +`ERROR_INTEGRATION_REJECTS_HTTP` (see [§5.5](#55-error-codes--hard-errors)) +with a specific diagnostic: `"integration_rejects_http"`. + +The skill should treat this the same as `ERROR_NO_BASECLIENT` — mark +the affected connections as `"interpolated": true` and re-run +`set-auth`. + +### 6.4.3 OAuth token exchange + +[`BaseClient`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9703)'s +built-in OAuth methods may make a token request to an auth server +before the API call. Both old and new runs will make this request +through the proxy. The proxy returns `200 {}` which will likely cause +the OAuth flow to fail — the response lacks the expected +`access_token` field. + +**Mitigation:** For OAuth2 auth types (`OAuth2ClientCreds`, +`OAuth2AuthCode`, `OAuth2JWT`), the harness should detect token- +exchange requests and return a canned OAuth response instead of the +default `200 {}`: + +```json +{ + "access_token": "__AUTHPARITY__oauth_token____", + "token_type": "bearer", + "expires_in": 3600 +} +``` + +The canned `access_token` is itself a sentinel — it will appear in +subsequent API requests as a `Bearer` token, and the parity +comparison will verify it lands in the same `header:authorization:bearer` +location in both runs. + +Detection: the proxy identifies a token-exchange request by matching +common OAuth token endpoint patterns (`POST` to a path containing +`/token`, `/oauth`, or `/oauth2`, with a `Content-Type: +application/x-www-form-urlencoded` body containing `grant_type`). +When matched, the proxy returns the canned response instead of +`200 {}`. + +### 6.5 Cookie-based session auth (login round-trip) + +The sentinel password is sent to a login endpoint, which returns a +session cookie. Subsequent requests carry the cookie, not the +password. + +**Parity judgment:** compare the **login request** (where the sentinel +password appears), not the downstream requests. The downstream +requests carry a session cookie whose value is server-generated (in +our case, the proxy returns `200 {}` so there is no real cookie — but +the login request itself is where parity matters). + +If the old run sends `POST /login` with `password=` in the +body, and the new run sends the same `POST /login` with the same +sentinel in the same body field, parity holds — even though +subsequent requests differ (no real session cookie from the proxy). + +### 6.6 Integration mutates the secret before sending + +Example: base64-encodes the API key, wraps it in a JWT, or hashes it +with a nonce. + +**Parity here is byte-for-byte at the wire.** If the old run's +integration code does `base64(sentinel)` and the new run's +`_apply_ucp_api_key` does the same `base64(sentinel)`, the wire +values match and parity holds. If the mutation moves from integration +code to BaseClient injection, parity still holds as long as the wire +output is identical. + +The sentinel will appear in the captured request in its **mutated** +form. The grep must therefore search for both the raw sentinel AND +common transformations (base64, URL-encoding). The location extractor +should: + +1. Search for the raw sentinel. +2. Search for `base64(sentinel)` (both standard and URL-safe). +3. If neither is found, record `MISSING_IN_BOTH` — the mutation is + opaque and the test cannot verify parity for this sentinel. + +### 6.7 Different number of requests between old and new + +Example: the new injection path skips a discovery call that the old +path made, or the new path adds a token-refresh call. + +**Proposal:** align on the **union** of `(method, url_path)` tuples +from both runs. Parity-check the **intersection** only — requests +present in both runs. Report the symmetric difference as +`diagnostics...request_set_diff`, not as a parity +failure. + +Rationale: the parity test answers "do secrets land in the same +place?" — not "do both paths make the same number of calls?" A +discovery call that only the old path makes is not an auth-parity +issue; it is a behavioral difference that may be intentional. + +### 6.8 `Other` auth type + +Connections classified as `Other` (DeviceCode, ROPC, +ManagedIdentity, custom signing) have no standardized UCP credential +shape. The test cannot construct a meaningful +`mock_get_ucp_credentials` return value without per-integration +knowledge. + +**Recommendation: skip with `status: "skipped_other_type"`.** + +These connections require manual parity verification or a +per-integration test override. + +### 6.9 Open questions for reviewer + +1. **UCP injection seam stability.** The design assumes + [`get_ucp_credentials()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:13849) + is the correct seam to patch for the new run. If the injection + architecture changes (e.g. credentials are injected at a lower + level, or `_http_request` itself is modified to call a different + function), the harness must be updated. **Is this seam considered + stable for testing purposes?** + +2. **OAuth2 token exchange.** In the old run, OAuth2 integrations + typically exchange `client_id` + `client_secret` for an + `access_token` via a token endpoint. The sentinel appears in the + token-exchange request, not in subsequent API requests (which carry + the exchanged token). In the new run, UCP provides the + `access_token` directly — there is no token-exchange request. + **Should parity be judged on the token-exchange request (old-only) + or on the API requests (where the old run carries a real token and + the new run carries the sentinel)?** The current design proposes + comparing the intersection of requests (§6.7), which would skip + the token-exchange request. Is this acceptable? + +3. **Multi-connection integrations with shared `xsoar_params`.** When + the same XSOAR field (e.g. `credentials.password`) appears in + multiple `auth_types[]` entries, the parity test generates + different sentinels for each entry. But the old run can only seed + one value into `demisto.params()["credentials"]["password"]`. **How + should the test handle this?** Proposed: run each connection's + parity check in isolation (separate old+new run pairs), each with + its own sentinel. The old run seeds only the params for the + connection under test. + +4. **`_apply_ucp_*` overrides.** Integrations that override + [`_apply_ucp_api_key()`](Packs/Base/Scripts/CommonServerPython/CommonServerPython.py:9855) + or similar methods may place the credential in a non-default + location (e.g. `X-API-Key` header instead of `Authorization: + Bearer`). The parity test will catch this correctly (the new run + will show the sentinel in the overridden location), but **should + the test also verify that the override exists and is correct?** The + current design says no — that is a code-review concern, not a + parity concern. + +5. **Batch execution.** Should the parity test support a batch mode + (run across all integrations with `requires auth parity test = + YES`)? If so, should it produce a summary report similar to + [`bulk_static_results.json`](connectus/bulk_static_results.json:1)? + The current design covers single-integration invocation only. + +--- + +## 7. Suggested file layout + +### New files + +| File | Purpose | +|------|---------| +| `connectus/check_auth_parity.py` | Main analyzer script | +| `connectus/check_auth_parity_test.py` | Unit + integration tests | + +### Reused files + +| File | What is reused | +|------|----------------| +| [`connectus/capture_proxy.py`](connectus/capture_proxy.py:1) | HTTP capture server — session-based request recording, identical usage pattern. | +| [`connectus/check_command_params.py`](connectus/check_command_params.py:1) | Content preparation pipeline (unified `.py` assembly), Docker child execution, YML parsing, command discovery, sentinel coercion logic. | +| [`connectus/workflow_state.py`](connectus/workflow_state.py:1) | `show-step` to read `Params for test with default in code` and `Params to Commands`. | +| [`connectus/auth_config_parser/`](connectus/auth_config_parser/__init__.py:1) | [`parse_auth_details()`](connectus/auth_config_parser/parser.py:1) to parse `Auth Details` JSON into typed [`AuthDetails`](connectus/auth_config_parser/types.py:102) / [`AuthEntry`](connectus/auth_config_parser/types.py:52) dataclasses; [`auth_param_ids()`](connectus/auth_config_parser/utils.py:1) to extract XSOAR param IDs; [`validate_auth_details()`](connectus/auth_config_parser/validator.py:1) to validate the structure before use. | + +### Dependencies + +| Library | Purpose | +|---------|---------| +| [`auth_config_parser`](connectus/auth_config_parser/__init__.py:1) | **Canonical shared library** for parsing and validating Auth Details JSON and config expressions. Used by both [`workflow_state.py`](connectus/workflow_state.py:1) and the auth parity test tooling. Provides typed dataclasses ([`AuthDetails`](connectus/auth_config_parser/types.py:102), [`AuthEntry`](connectus/auth_config_parser/types.py:52), [`ConfigExpression`](connectus/auth_config_parser/types.py:100), [`AuthType`](connectus/auth_config_parser/types.py:11)) and pure functions ([`parse_auth_details()`](connectus/auth_config_parser/parser.py:1), [`validate_auth_details()`](connectus/auth_config_parser/validator.py:1), [`auth_param_ids()`](connectus/auth_config_parser/utils.py:1), [`auth_param_ids_with_sources()`](connectus/auth_config_parser/utils.py:1)). | + +### Module structure sketch + +``` +connectus/check_auth_parity.py +├── SentinelMap # dataclass: connection_name → {xsoar_param_path → sentinel_value} +├── generate_sentinels(details: AuthDetails) # Build SentinelMap from parsed AuthDetails +│ └── skip entries where entry.interpolated is True +├── build_old_params(sentinel_map, ...) # Build demisto.params() dict for old run +├── build_ucp_mock(sentinel_map, ...) # Build mock_get_ucp_credentials for new run +├── map_auth_type_to_ucp_shape(entry: AuthEntry) # entry.type (AuthType enum) → UCP credential dict template +├── run_old(integration_path, command, params, proxy) → list[CapturedRequest] +├── run_new(integration_path, command, params, ucp_mock, proxy) → list[CapturedRequest] +│ ├── patch is_ucp_enabled → True +│ ├── patch should_use_ucp_auth → True +│ └── patch get_ucp_credentials → ucp_mock +├── extract_sentinel_locations(requests, sentinel) → set[Location] +│ ├── scan headers (with Basic/Bearer decomposition) +│ ├── scan query params +│ ├── scan body (JSON + form) +│ ├── scan URL userinfo +│ ├── scan cookies +│ └── try base64 variants of sentinel +├── compare_locations(old_locs, new_locs) → list[Diff] +│ └── classify: MISSING_IN_NEW, EXTRA_IN_NEW, WRONG_LOCATION, MISSING_IN_BOTH +├── compare_request_sets(old_reqs, new_reqs) → RequestSetDiff +├── check_connection_parity(connection, commands, ...) → ConnectionResult +├── check_auth_parity(integration_path, integration_id, ...) → FullResult +│ ├── parse Auth Details via auth_config_parser.parse_auth_details() +│ ├── read Params for test, Params to Commands via workflow_state show-step +│ ├── for each non-interpolated connection: check_connection_parity +│ └── assemble auth_parity + diagnostics +├── _parse_args(argv) → argparse.Namespace +└── main(argv) → int +``` + +### Test structure sketch + +``` +connectus/check_auth_parity_test.py +├── Unit tests +│ ├── test_generate_sentinels — correct sentinel shape, interpolated skipped +│ ├── test_map_auth_type_to_ucp_shape — each auth type maps correctly +│ ├── test_extract_sentinel_locations_header_bearer +│ ├── test_extract_sentinel_locations_header_basic +│ ├── test_extract_sentinel_locations_query_param +│ ├── test_extract_sentinel_locations_json_body +│ ├── test_extract_sentinel_locations_form_body +│ ├── test_extract_sentinel_locations_base64_variant +│ ├── test_compare_locations_pass — identical sets +│ ├── test_compare_locations_missing_in_new +│ ├── test_compare_locations_extra_in_new +│ ├── test_compare_locations_wrong_location +│ ├── test_compare_locations_missing_in_both +│ ├── test_compare_request_sets — symmetric difference +│ └── test_canonicalization — header case, basic decode, bearer strip +├── Integration tests (curated pack examples) +│ ├── test_apikey_integration — e.g. AbnormalSecurity (header Bearer) +│ ├── test_plain_integration — e.g. Salesforce IAM (basic auth / ROPC) +│ └── test_oauth2_integration — e.g. CrowdStrike Falcon (client creds) +``` + +--- + +## 8. Auth Details parsing + +The auth parity test uses the [`auth_config_parser`](connectus/auth_config_parser/__init__.py:1) +package as the single source of truth for parsing and validating Auth +Details JSON. This replaces the earlier pattern of calling internal +helpers from [`workflow_state.py`](connectus/workflow_state.py:1) +directly. + +### 8.1 Parsing Auth Details + +```python +from auth_config_parser import parse_auth_details, AuthDetails, AuthEntry, AuthType + +# raw_json comes from workflow_state.py show-step "Auth Details" +details: AuthDetails = parse_auth_details(raw_json) + +for entry in details.auth_types: # entry: AuthEntry (frozen dataclass) + if entry.interpolated: + continue # skip interpolated connections + print(entry.name, entry.type, entry.xsoar_params) + # entry.type is an AuthType enum: AuthType.APIKey, AuthType.Plain, … +``` + +### 8.2 Validating before use + +```python +from auth_config_parser import validate_auth_details + +errors: list[str] = validate_auth_details(raw_json) +if errors: + raise ValueError(f"Invalid Auth Details: {errors}") +``` + +### 8.3 Extracting param IDs + +```python +from auth_config_parser import auth_param_ids, auth_param_ids_with_sources, AuthDetails + +ids: set[str] = auth_param_ids(details) # {"api_key", "credentials"} +ids_sourced: dict[str, str] = auth_param_ids_with_sources(details) +# {"api_key": "api_key_conn", "credentials": "creds_conn"} +``` + +### 8.4 Parsing config expressions + +```python +from auth_config_parser import parse_config, ConfigExpression, ConfigClause + +expr: ConfigExpression = parse_config("REQUIRED(api_key) + OPTIONAL(oauth)") +for clause in expr.clauses: # clause: ConfigClause + print(clause.operator, clause.names) # ClauseOperator.REQUIRED, ["api_key"] +``` + +### 8.5 Mapping `AuthEntry.type` to UCP shape + +The [`map_auth_type_to_ucp_shape()`](connectus/check_auth_parity.py:1) +function in the parity test uses [`AuthEntry.type`](connectus/auth_config_parser/types.py:75) +(an [`AuthType`](connectus/auth_config_parser/types.py:11) enum) to +select the correct UCP credential dict template. This replaces raw +string comparisons against `auth_types[].type` dict values: + +```python +from auth_config_parser import AuthType + +match entry.type: + case AuthType.APIKey: + return {"type": "api_key", "api_key": {"key": sentinel}} + case AuthType.Plain: + return {"type": "plain", "plain": {"username": sentinel_id, "password": sentinel_pw}} + case AuthType.OAuth2ClientCreds | AuthType.OAuth2AuthCode | AuthType.OAuth2JWT: + return {"type": "oauth2", "oauth2": {"access_token": sentinel, "token_type": "Bearer"}} + case AuthType.Other: + return None # skip — see §6.8 + case AuthType.NoneRequired: + return None # no auth +``` + +--- + +## Appendix: Execution flow diagram + +```mermaid +flowchart TD + Z[Start] --> Z1{Is integration Python?} + Z1 -->|No| Z2[ERROR_NON_PYTHON - exit 10] + Z1 -->|Yes| Z3{Uses BaseClient?} + Z3 -->|No| Z4[ERROR_NO_BASECLIENT - exit 11] + Z3 -->|Yes| A[Parse Auth Details via auth_config_parser] + A --> A1{All auth_types interpolated?} + A1 -->|Yes| A2[ERROR_ALL_INTERPOLATED - exit 12] + A1 -->|No| B{For each auth_types entry} + B -->|interpolated: true| C[Skip - status: skipped_interpolated] + B -->|type: Other| D[Skip - status: skipped_other_type] + B -->|type: NoneRequired| E[Skip - no auth] + B -->|non-interpolated, standard type| F[Generate sentinels for xsoar_params] + F --> G[Build old-run params with sentinels in demisto.params] + F --> H[Build new-run UCP mock with same sentinels] + G --> I[Old run: execute command under capture_proxy] + H --> J[New run: execute command under capture_proxy with UCP patched] + I --> K[Extract sentinel locations from old captured requests] + J --> L[Extract sentinel locations from new captured requests] + K --> M{locations_old == locations_new?} + L --> M + M -->|Yes| N[Connection PASS] + M -->|No| O[Connection FAIL - classify diffs] + I -->|Crashed| P[Inconclusive - RUN_FAILED_OLD] + J -->|Crashed| Q[Inconclusive - RUN_FAILED_NEW] + I -->|Zero requests| R[Inconclusive - NO_REQUESTS_CAPTURED] + J -->|Zero requests| R + I -->|Rejects HTTP| S[ERROR_INTEGRATION_REJECTS_HTTP - exit 14] +``` diff --git a/connectus/column-schemas.md b/connectus/column-schemas.md index 2fb581b986f..58f0acd9bc2 100644 --- a/connectus/column-schemas.md +++ b/connectus/column-schemas.md @@ -122,13 +122,21 @@ Worked example with `other_connection`: ``` Schema validation is enforced by -[`workflow_state.py validate_auth_detail()`](workflow_state.py:520) and runs -automatically on every `set-auth` invocation. +[`auth_config_parser.validate_auth_details()`](auth_config_parser/validator.py:47) +(the workflow CLI calls a one-line wrapper at +[`workflow_state.validators.validate_auth_detail()`](workflow_state/validators.py:25)) +and runs automatically on every `set-auth` invocation. Setter: -[`workflow_state.py set-auth "" ''`](workflow_state.py:833). +[`workflow_state.py set-auth "" ''`](workflow_state/cli.py:225) +([`cmd_set_auth`](workflow_state/cli.py:225)). Setting this value resets the workflow back to the first checkpoint -(`generated manifest`). +(`generated manifest`). The reset wipes every later workflow column +including the three Params\* data columns — `set-auth` deliberately +ignores the `preserve_on_reset` carve-out that `reset-to`/`fail` +honour, because auth-classification changes invalidate every +downstream artifact (in particular, the per-command param contract +validated by `params_to_commands_no_auth_overlap`). --- @@ -165,24 +173,46 @@ Example (post-ignore-list — only behavioral params remain): Notes: - `commands` is a flat object: command name → array of parameter IDs. -- Per-command lists are **sorted alphabetically (case-sensitive)** when - produced by the analyzer; downstream consumers should treat them as - sorted sets. +- Per-command lists are produced sorted (case-sensitive ascending) **by + convention** — the analyzer emits them sorted, but + [`validate_params_to_commands`](workflow_state/validators.py:49) does + not enforce sort order on per-command lists. Downstream consumers + should treat them as sets and re-sort if they care. - An empty list (`[]`) is the valid value for a command with no behavioral params. - Parameter IDs match those in the integration's YML `configuration` section. - Free-form: no enforced ordering or required keys beyond `integration` and `commands`. +- Per-command lists: the analyzer produces sorted lists by convention + (case-sensitive ascending); the validator at + [`validate_params_to_commands`](workflow_state/validators.py:49) does + not enforce sort order on per-command lists, so downstream consumers + should treat them as sets and re-sort if they care. +- **Extra top-level keys are HARD-REJECTED.** The validator at + [`validate_params_to_commands`](workflow_state/validators.py:49) + rejects any payload containing top-level keys other than `integration` + and `commands`. The error for `diagnostics` specifically includes a + strip-it one-liner + (`import sys, json; o = json.load(sys.stdin); o.pop('diagnostics', None); print(json.dumps(o))`) + because that is the most common offender — the analyzer emits + `diagnostics` as internal AI metadata that must NEVER be persisted. - **Disjointness with `Auth Details`:** `set-params-to-commands` HARD REJECTS any payload whose per-command lists include a YML param id that is already declared in the integration's `Auth Details` cell — either as a projected `auth_types[].xsoar_params` entry (dotted forms collapse to the segment before the first `.`) or as an `other_connection` entry. Inspect the live exclusion set with - [`workflow_state.py auth-params `](workflow_state.py:1) + [`workflow_state.py auth-params `](workflow_state/cli.py:1) and see [`connectus/Readme.md`](Readme.md:1) for the full CLI reference. The analyzer can also pull this set automatically when invoked with `--integration-id ` (see below). +- **Reset semantics.** `Params to Commands` is **preserved** on `fail` + and `reset-to` because the column carries `preserve_on_reset: true` + in [`workflow_state_config.yml`](workflow_state_config.yml:74). It is + **wiped** by `set-auth` and by plain `reset` — those two operations + deliberately ignore the carve-out because auth changes invalidate + every downstream artifact and `reset` is the "wipe the row" verb with + no carve-outs. ### Production source @@ -200,7 +230,7 @@ python3 connectus/check_command_params.py \ Pass `--integration-id ` to make the analyzer additionally pull the integration's auth-derived ignore set from -[`workflow_state.py auth-params `](workflow_state.py:1) and union it +[`workflow_state.py auth-params `](workflow_state/cli.py:1) and union it into its own ignore set. This guarantees the per-command output is disjoint from the integration's `Auth Details` cell from the start. @@ -230,7 +260,8 @@ The analyzer's stdout is: > §5 for the full rule. Setter: -[`workflow_state.py set-params-to-commands "" ''`](workflow_state.py:682). +[`workflow_state.py set-params-to-commands "" ''`](workflow_state/cli.py:229) +([`cmd_set_params_to_commands`](workflow_state/cli.py:229)). Must be valid JSON. Required before `generated manifest` can be marked passed. --- @@ -262,9 +293,19 @@ Notes: - Must be valid JSON. Setter: -[`workflow_state.py set-params-for-test "" ''`](workflow_state.py:687). +[`workflow_state.py set-params-for-test "" ''`](workflow_state/cli.py:254) +([`cmd_set_params_for_test`](workflow_state/cli.py:254)). Required before `generated manifest` can be marked passed. +> **Reset semantics.** `Params for test with default in code` is +> **preserved** on `fail` and `reset-to` because the column carries +> `preserve_on_reset: true` in +> [`workflow_state_config.yml`](workflow_state_config.yml:82). It is +> **wiped** by `set-auth` and by plain `reset` — those two operations +> deliberately ignore the carve-out because auth changes invalidate +> every downstream artifact and `reset` is the "wipe the row" verb with +> no carve-outs. + --- ## `Params same in other handlers` (optional) @@ -275,4 +316,13 @@ Required before `generated manifest` can be marked passed. } Must be valid JSON when set. Not a prerequisite for any checkpoint. -``` \ No newline at end of file +``` + +> **Reset semantics.** `Params same in other handlers` is **preserved** +> on `fail` and `reset-to` because the column carries +> `preserve_on_reset: true` in +> [`workflow_state_config.yml`](workflow_state_config.yml:90). It is +> **wiped** by `set-auth` and by plain `reset` — those two operations +> deliberately ignore the carve-out because auth changes invalidate +> every downstream artifact and `reset` is the "wipe the row" verb with +> no carve-outs. \ No newline at end of file diff --git a/connectus/connectus-migration-SKILL.md b/connectus/connectus-migration-SKILL.md index b39895aae87..bddd0cb42da 100644 --- a/connectus/connectus-migration-SKILL.md +++ b/connectus/connectus-migration-SKILL.md @@ -42,7 +42,7 @@ Use when the user says something like "migrate everything assigned to me" / "con python3 connectus/workflow_state.py next --mine ``` - Or from Python: `from connectus.workflow_state import integrations_for_assignee` and call `integrations_for_assignee("")`. Each result dict carries `integration_id`, `connector_id`, `assignee`, `current_step`, `current_step_index`, `completed_steps`, `all_complete`, `has_progress`. + Or from Python: `from workflow_state import integrations_for_assignee` and call `integrations_for_assignee("")`. Each result dict carries `integration_id`, `connector_id`, `assignee`, `current_step`, `current_step_index`, `completed_steps`, `all_complete`, `has_progress`. 3. **Empty result?** Tell the user there is nothing assigned + in-progress for them, and offer two follow-ups: - bulk-assign a connector via `set-assignee-by-connector ""` (suggest running `list-connectors` first to pick one), or - browse via `python3 connectus/workflow_state.py dashboard`. @@ -68,7 +68,7 @@ Use when the user says something like "migrate connector ``" / "do python3 connectus/workflow_state.py list-by-connector "" ``` - Or programmatically: `from connectus.workflow_state import list_integrations_by_connector` → `list_integrations_by_connector("")`. If the result is empty, suggest `python3 connectus/workflow_state.py list-connectors` to discover valid ids and stop. + Or programmatically: `from workflow_state import list_integrations_by_connector` → `list_integrations_by_connector("")`. If the result is empty, suggest `python3 connectus/workflow_state.py list-connectors` to discover valid ids and stop. 2. **Inspect ownership** on the matched rows (look at the `assignee` field on each dict). One of three cases applies: - **All rows assigned to the current git user** → proceed straight to step 4. - **All rows unassigned** → offer to bulk-assign to the current user. Confirm before running: @@ -110,12 +110,16 @@ When in doubt, surface the candidates and the rule that's pulling each direction ## Critical Rules +> **Architecture.** The source of truth for the workflow's shape (steps, columns, markers, interactions) is [`connectus/workflow_state_config.yml`](workflow_state_config.yml). The CLI dispatch, validators, state machine, CSV I/O, and display helpers live in the [`connectus/workflow_state/`](workflow_state/__init__.py) package. The file [`connectus/workflow_state.py`](workflow_state.py) is now a backward-compatibility shim — `python3 connectus/workflow_state.py …` still works because the script delegates to [`workflow_state.cli.main()`](workflow_state/cli.py:1). Canonical Python import is `from workflow_state import …`. +> +> **Q2 2026-05 BREAKING CHANGE — strict checkpoint values.** [`is_checked()`](workflow_state/state_machine.py:24) now accepts ONLY `"✅"` and `"N/A"` as "done". Historical aliases (`"YES"`, `"true"`, `"True"`, `"done"`, `"Done"`, `"DONE"`) are no longer recognized. The canonical list lives in `markers.checkpoint_done_values` in [`workflow_state_config.yml:22-24`](workflow_state_config.yml:22). + 1. **NEVER edit [`connectus/connectus-migration-pipeline.csv`](connectus-migration-pipeline.csv) directly.** All CSV modifications MUST go through [`connectus/workflow_state.py`](workflow_state.py) CLI commands. 2. **Follow the workflow checkpoints sequentially.** You cannot skip ahead — the state machine enforces ordering. 3. **Always check status first** before doing any work on an integration. 4. **Use `execute_command`** to run all `workflow_state.py` commands from the workspace root. 5. **Use `set-auth` to update Auth Details.** When correcting auth classifications, use `python3 connectus/workflow_state.py set-auth "" ''`. This validates the JSON schema and automatically resets the workflow back to the first checkpoint (`generated manifest`). -6. If a checkpoint does not pass, it might be because a previous step was not done well — go back to it via `fail` or `reset-to`. +6. If a checkpoint does not pass, it might be because a previous step was not done well — go back to it via `fail` or `reset-to`. Both verbs **preserve** the three Params\* data columns (`Params to Commands`, `Params for test with default in code`, `Params same in other handlers`) — they are tagged `preserve_on_reset: true` in [`connectus/workflow_state_config.yml`](workflow_state_config.yml) so per-command param research survives a failed checkpoint. The CLI prints `Preserved (preserve_on_reset=true): [...]` listing what was kept; the api response includes the same names in `result["preserved"]`. **`set-auth` is NOT covered by this carve-out** — auth changes invalidate downstream artifacts, so `set-auth` continues to wipe Params\* by design (see Step 1 below). Plain `reset` (the "wipe the whole row" verb) also wipes Params\*; preservation is for `reset-to`/`fail` only. 7. Try to be efficient in what needs input from the user. If you have an option to read files instead of grep, or batch commands to the cli, it is better. ## Linked Files @@ -221,7 +225,7 @@ Three output formats are available — pick the one that matches how you'll cons For in-process Python use, import the helper directly: ```python -from connectus.workflow_state import get_integration_files +from workflow_state import get_integration_files files = get_integration_files("") # files["yml"], files["code"], files["description"], files["readme"], files["test"], plus any extras @@ -237,7 +241,7 @@ For background only: integration files conventionally live at `Packs// #### 1.2 Researching `Auth Details` — the four sources of truth -Before you can write the JSON for `set-auth`, you must derive it from the integration pack itself — never guess from the param list alone. The shape you are building is documented in [`connectus/column-schemas.md`](column-schemas.md:16) and is enforced by [`validate_auth_detail()`](workflow_state.py:521); the validator now checks the `config` expression grammar AND that every name referenced in `config` exists as some `auth_types[].name`. Wrong input is rejected at the CLI — better to catch it at research time. +Before you can write the JSON for `set-auth`, you must derive it from the integration pack itself — never guess from the param list alone. The shape you are building is documented in [`connectus/column-schemas.md`](column-schemas.md:16) and is enforced by [`validate_auth_details()`](auth_config_parser/validator.py:47) (called via the [`workflow_state.validators.validate_auth_detail()`](workflow_state/validators.py:25) wrapper); the validator now checks the `config` expression grammar AND that every name referenced in `config` exists as some `auth_types[].name`. Wrong input is rejected at the CLI — better to catch it at research time. Read these four files **in this order**, treating each one as a cross-check on the previous: @@ -273,11 +277,12 @@ Read these four files **in this order**, treating each one as a cross-check on t If steps 1 and 2 disagree (e.g. the YML defines a `credentials` param but the code only ever reads `params.get('api_key')`), step 2 wins. Steps 3 and 4 are tiebreakers when the code is ambiguous. +Before you actually use the `set_auth` command, present the evidence to the user for why you decided on the auth types and config structure in a concise and clear way. --- #### 1.2.1 Classification decision table -Map "what you saw in the source" → "auth-type enum value" (the values are listed in [`VALID_AUTH_TYPES`](workflow_state.py:118)): +Map "what you saw in the source" → "auth-type enum value" (the values are derived from the [`AuthType`](auth_config_parser/types.py:11) enum and re-exported from `workflow_state` as `VALID_AUTH_TYPES`): | You see... | Use type | |---|---| @@ -356,7 +361,7 @@ connection-adjacent YML param that is **not** an auth secret and **not** a per-command behavioral param — i.e. everything you reasonably need to define the integration's connection besides the secrets themselves. -The validator (see [`validate_auth_detail()`](workflow_state.py:521)) +The validator (see [`validate_auth_details()`](auth_config_parser/validator.py:47)) requires the key on every `set-auth` write; the field is required even when empty (use `[]`). @@ -652,10 +657,10 @@ Microsoft/Azure integrations are the most complex (23 corrections in the manual #### 1.8 Auth Details JSON Validation -After determining the correct auth types, validate the Auth Details JSON against the rules in [`connectus/column-schemas.md`](column-schemas.md:16). The same rules are enforced at runtime by [`validate_auth_detail()`](workflow_state.py:521): +After determining the correct auth types, validate the Auth Details JSON against the rules in [`connectus/column-schemas.md`](column-schemas.md:16). The same rules are enforced at runtime by [`validate_auth_details()`](auth_config_parser/validator.py:47): 1. Must be valid JSON with top-level keys `auth_types` (array), `config` (string), AND `other_connection` (array of strings). All three are REQUIRED on every `set-auth` write — the validator rejects payloads missing any of them. -2. Each `auth_types[]` entry has a `type` (one of [`VALID_AUTH_TYPES`](workflow_state.py:118)), a unique `name`, and a non-empty `xsoar_params` array (unless the entry is `NoneRequired`-shaped). +2. Each `auth_types[]` entry has a `type` (one of the [`AuthType`](auth_config_parser/types.py:11) enum values, also re-exported as `VALID_AUTH_TYPES`), a unique `name`, and a non-empty `xsoar_params` array (unless the entry is `NoneRequired`-shaped). 3. `auth_types[]` entries are sorted by `(type, name)` ascending. 4. `config` is either the literal `NoneRequired`, or one or more clauses joined with ` + `, each clause being `REQUIRED(...)`, `OPTIONAL(...)`, or `CHOICE(...)`. 5. Every operand name appearing inside `config`'s parens MUST exist as some `auth_types[].name` (the most common cause of `set-auth` rejection). @@ -698,9 +703,9 @@ python3 connectus/workflow_state.py set-auth "" ' \ Where `` is the directory containing the integration's `.yml` and `.py` files (e.g., `Packs/QRadar/Integrations/QRadar_v3`). -The `--integration-id ""` flag is **strongly recommended inside the migration workflow.** When supplied, the analyzer additionally calls [`workflow_state.py auth-params `](workflow_state.py:1) and unions every YML param id declared in the integration's `Auth Details` cell (auth-secret params projected from `auth_types[].xsoar_params` plus every `other_connection` entry) into its own ignore set. This removes the entire burden of "remembering which params already live in `Auth Details`" from the AI — those params will simply not appear in the analyzer's per-command output. The flag is OPTIONAL; standalone runs (outside the migration workflow, or on integrations that haven't been classified yet) can omit it and the analyzer falls back to the file-based ignore set with a single-line stderr WARNING. +The `--integration-id ""` flag is **strongly recommended inside the migration workflow.** When supplied, the analyzer additionally calls [`workflow_state.py auth-params `](workflow_state/cli.py:1) and unions every YML param id declared in the integration's `Auth Details` cell (auth-secret params projected from `auth_types[].xsoar_params` plus every `other_connection` entry) into its own ignore set. This removes the entire burden of "remembering which params already live in `Auth Details`" from the AI — those params will simply not appear in the analyzer's per-command output. The flag is OPTIONAL; standalone runs (outside the migration workflow, or on integrations that haven't been classified yet) can omit it and the analyzer falls back to the file-based ignore set with a single-line stderr WARNING. Optional flags the skill should know about: @@ -783,7 +788,7 @@ Optional flags the skill should know about: - `--timeout SECONDS` — per-command wall-clock timeout (default 30s; the batch runner uses 300s for the whole integration). - `--docker {auto,always,never}` — `auto` (default) uses Docker when available; `never` runs in host Python (will fail on integrations needing third-party deps); `always` requires Docker. - `--use-integration-docker` — opt-in: instead of the pinned `demisto/py3-native` image, use the integration's own `script.dockerimage` from its YML. Use this for a targeted re-run when an integration reports `module_not_found` (see Step 1 of the decision tree in section 6 below). Falls back to `--docker-image` if the YML doesn't declare one. -- `--integration-id ` — OPTIONAL. When supplied, the analyzer pulls the auth-derived ignore set from [`workflow_state.py auth-params `](workflow_state.py:1) and unions it with the file-based ignore set, guaranteeing that any param already declared in the integration's `Auth Details` cell cannot leak into the per-command output. The analyzer logs a single-line stderr INFO with the pulled list. Inside the migration workflow, ALWAYS pass this flag — `set-params-to-commands` will reject overlap regardless, so pulling the exclusion list up front saves a round-trip. If the integration is not in the workflow CSV, or its `Auth Details` is unset, the analyzer logs a single-line stderr WARNING and proceeds with just the file-based ignore set (it is intentionally not a fatal error). +- `--integration-id ` — OPTIONAL. When supplied, the analyzer pulls the auth-derived ignore set from [`workflow_state.py auth-params `](workflow_state/cli.py:1) and unions it with the file-based ignore set, guaranteeing that any param already declared in the integration's `Auth Details` cell cannot leak into the per-command output. The analyzer logs a single-line stderr INFO with the pulled list. Inside the migration workflow, ALWAYS pass this flag — `set-params-to-commands` will reject overlap regardless, so pulling the exclusion list up front saves a round-trip. If the integration is not in the workflow CSV, or its `Auth Details` is unset, the analyzer logs a single-line stderr WARNING and proceeds with just the file-based ignore set (it is intentionally not a fatal error). - `--no-sentinel-coercion` — disable automatic sentinel-value coercion. By default the analyzer coerces sentinels for params whose **NAME** (case-insensitive substring match) contains `thumbprint`, `certificate`, or `private_key`, replacing the generic `SENTINEL_PARAM_` string with a syntactically-valid stub (40-char hex thumbprint, stub PEM cert, stub PEM private key). This prevents the cert-thumbprint-hex-validator pattern (see §1.6 row #9) from killing the entire dynamic phase. Pass `--no-sentinel-coercion` for strict-sentinel debug mode. - `--seed-param NAME=VALUE` — repeatable. Operator/AI escape hatch: provide an explicit value to seed for a specific YML param, overriding all other sources (YML default, cert coercion, generic sentinel). Use this when an integration has a param the auto-coercion didn't anticipate (e.g., a different format-validating credential, an enum-value selector that needs a specific value to traverse a code path). Values >= 4 chars long act as ad-hoc sentinels — they're grep-able in captured HTTP and the post-hoc attribution code looks for them too. - `--no-auto-retry-integration-docker` — disable the automatic retry. By default, when the FIRST command's diagnostic comes back as `module_not_found` AND the analyzer is using the default `demisto/py3-native` image, it will automatically restart the dynamic phase with `--use-integration-docker` (which uses the integration's own production image, usually with the missing package preinstalled). Pass `--no-auto-retry-integration-docker` to disable, in which case the analyzer fast-fails the remaining commands as `module_not_found` (~30s × N saved) and returns immediately. @@ -1252,19 +1257,25 @@ python3 connectus/workflow_state.py markpass "" "code merged" ## Error Recovery Commands -### Fail a checkpoint (resets it and all subsequent checkpoints) +`fail` and `reset-to` share semantics. Both clear the named step and every later step that is **not** tagged `preserve_on_reset: true` in [`connectus/workflow_state_config.yml`](workflow_state_config.yml). Today only the three Params\* data columns carry that tag — they survive a failed checkpoint so per-command param research is not lost. The CLI prints `Preserved (preserve_on_reset=true): [...]` listing what was kept. + +**Explicit-target carve-out:** if the user names a preserved step EXPLICITLY as the target of `fail`/`reset-to`, that one step IS cleared (the user's intent wins). Later preserved steps in the same blast radius are still preserved. Example: `fail "Auth Details"` keeps Params\*; `fail "Params to Commands"` clears `Params to Commands` but keeps `Params for test with default in code` and `Params same in other handlers`. + +`set-auth` and plain `reset` IGNORE `preserve_on_reset` — see the description of each. + +### Fail a checkpoint (clears it and all subsequent non-preserved steps) ```bash python3 connectus/workflow_state.py fail "" "" ``` -### Reset to a specific checkpoint +### Reset to a specific checkpoint (alias of fail) ```bash python3 connectus/workflow_state.py reset-to "" "" ``` -### Reset all workflow columns +### Reset all workflow columns (no preserve carve-out) ```bash python3 connectus/workflow_state.py reset "" diff --git a/connectus/connectus-migration-pipeline.csv b/connectus/connectus-migration-pipeline.csv index 02d03d984ef..a70efa4dcfb 100644 --- a/connectus/connectus-migration-pipeline.csv +++ b/connectus/connectus-migration-pipeline.csv @@ -1,7 +1,7 @@ Integration ID,Integration File Path,Connector ID,assignee,Auth Details,Params to Commands,Params for test with default in code,Params same in other handlers,generated manifest,run manifest make validate,wrote/checked code,shadowed command test passes,write tests,precommit/validate/unit tests passed,requires auth parity test,auth parity test passes,param parity test passes,code reviewed,code merged AMP,Packs/AMP/Integrations/AMP/AMP.yml,Cisco Security,,,,,,,,,,,,,,,, AMPv2,Packs/AMP/Integrations/AMPv2/AMPv2.yml,Cisco Security,,,,,,,,,,,,,,,, -APIVoid,Packs/APIVoid/Integrations/APIVoid/APIVoid.yml,APIVoid,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""api_key"",""xsoar_params"":[""credentials.password""]}],""config"":""REQUIRED(api_key)"",""other_connection"":[""insecure"",""proxy"",""url""]}",,,,,,,,,,,,,, +APIVoid,Packs/APIVoid/Integrations/APIVoid/APIVoid.yml,APIVoid,,,,,,,,,,,,,,,, AWS - ACM,Packs/AWS-ACM/Integrations/AWS-ACM/AWS-ACM.yml,AWS,,,,,,,,,,,,,,,, AWS - AccessAnalyzer,Packs/AWS-AccessAnalyzer/Integrations/AWS-AccessAnalyzer/AWS-AccessAnalyzer.yml,AWS,,,,,,,,,,,,,,,, AWS - Athena - Beta,Packs/AWS-Athena/Integrations/AWS-Athena/AWS-Athena.yml,AWS,,,,,,,,,,,,,,,, @@ -29,14 +29,14 @@ AWS-EKS,Packs/AWS-EKS/Integrations/AWSEKS/AWSEKS.yml,AWS,,,,,,,,,,,,,,,, AWS-ILM,Packs/AWS-ILM/Integrations/AWSILM/AWSILM.yml,AWS,,,,,,,,,,,,,,,, AWS-SNS-Listener,Packs/AWS-SNS-Listener/Integrations/AWSSNSListener/AWSSNSListener.yml,AWS,,,,,,,,,,,,,,,, AWS-WAF,Packs/AWS_WAF/Integrations/AWSWAF/AWSWAF.yml,AWS,,,,,,,,,,,,,,,, -Abnormal Security Event Collector,Packs/AbnormalSecurity/Integrations/AbnormalSecurityEventCollector/AbnormalSecurityEventCollector.yml,Abnormal Security,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""token"",""xsoar_params"":[""token.password""]}],""config"":""REQUIRED(token)"",""other_connection"":[""proxy"",""verify""]}","{""integration"":""Abnormal Security Event Collector"",""commands"":{""test-module"":[],""fetch-events"":[],""abnormal-security-event-collector-get-events"":[]}}",[],,,,,,,,,,,, +Abnormal Security Event Collector,Packs/AbnormalSecurity/Integrations/AbnormalSecurityEventCollector/AbnormalSecurityEventCollector.yml,Abnormal Security,,,,,,,,,,,,,,,, Absolute,Packs/Absolute/Integrations/Absolute/Absolute.yml,Absolute Software Absolute,,,,,,,,,,,,,,,, AbuseIPDB,Packs/AbuseDB/Integrations/AbuseDB/AbuseDB.yml,AbuseIPDB,,,,,,,,,,,,,,,, Active Directory Query v2,Packs/Active_Directory_Query/Integrations/Active_Directory_Query/Active_Directory_Query.yml,Microsoft Active Directory,,,,,,,,,,,,,,,, ActiveMQ,Packs/ActiveMQ/Integrations/ActiveMQ/ActiveMQ.yml,Apache ActiveMQ,,,,,,,,,,,,,,,, AdminByRequest,Packs/AdminByRequest/Integrations/AdminByRequestEventCollector/AdminByRequestEventCollector.yml,AdminByRequest,,,,,,,,,,,,,,,, Aha,Packs/AHA/Integrations/AHA/AHA.yml,Aha,,,,,,,,,,,,,,,, -Akamai WAF,Packs/Akamai_WAF/Integrations/Akamai_WAF/Akamai_WAF.yml,Akamai,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""edgegrid"",""xsoar_params"":[""accessToken"",""clientSecret"",""clientToken"",""credentials_access_token.password"",""credentials_client_secret.password"",""credentials_client_token.password""]}],""config"":""REQUIRED(edgegrid)"",""other_connection"":[""host"",""insecure"",""proxy""]}","{""integration"":""Akamai WAF"",""commands"":{}}",[],,,,,,,,,,,, +Akamai WAF,Packs/Akamai_WAF/Integrations/Akamai_WAF/Akamai_WAF.yml,Akamai,,,,,,,,,,,,,,,, Akamai WAF SIEM,Packs/Akamai_SIEM/Integrations/Akamai_SIEM/Akamai_SIEM.yml,Akamai,,,,,,,,,,,,,,,, AlgoSec,Packs/Algosec/Integrations/AlgoSec/AlgoSec.yml,AlgoSec,,,,,,,,,,,,,,,, Alibaba Action Trail Event Collector,Packs/AlibabaActionTrail/Integrations/AlibabaActionTrailEventCollector/AlibabaActionTrailEventCollector.yml,Alibaba Alibaba Cloud,,,,,,,,,,,,,,,, @@ -83,7 +83,7 @@ Azure Network Security Groups,Packs/AzureNetworkSecurityGroups/Integrations/Azur Azure Resource Graph,Packs/AzureResourceGraph/Integrations/AzureResourceGraph/AzureResourceGraph.yml,Microsoft Azure,,,,,,,,,,,,,,,, Azure SQL Management,Packs/AzureSQLManagement/Integrations/AzureSQLManagement/AzureSQLManagement.yml,Microsoft Azure,,,,,,,,,,,,,,,, Azure Security Center v2,Packs/AzureSecurityCenter/Integrations/AzureSecurityCenter_v2/AzureSecurityCenter_v2.yml,Microsoft Security,,,,,,,,,,,,,,,, -Azure Sentinel,Packs/AzureSentinel/Integrations/AzureSentinel/AzureSentinel.yml,Microsoft Security,juschwartz,"{""auth_types"":[{""type"":""OAuth2ClientCreds"",""name"":""creds_certificate"",""xsoar_params"":[""creds_certificate.identifier"",""creds_certificate.password"",""creds_client_id.password"",""creds_tenant_id.password""]},{""type"":""OAuth2ClientCreds"",""name"":""creds_client_secret"",""xsoar_params"":[""creds_client_id.password"",""creds_tenant_id.password"",""credentials.password""]},{""type"":""Other"",""name"":""managed_identities"",""xsoar_params"":[""managed_identities_client_id.password"",""use_managed_identities""]}],""config"":""CHOICE(creds_certificate, creds_client_secret, managed_identities)"",""other_connection"":[""azure_cloud"",""insecure"",""proxy"",""resourceGroupName"",""server_url"",""subscriptionID"",""workspaceName""]}","{""integration"":""Microsoft Sentinel"",""commands"":{""azure-sentinel-auth-reset"":[""mirror_direction""],""azure-sentinel-create-alert-rule"":[""mirror_direction""],""azure-sentinel-create-incident"":[""mirror_direction""],""azure-sentinel-create-update-watchlist-item"":[""mirror_direction""],""azure-sentinel-delete-alert-rule"":[""mirror_direction""],""azure-sentinel-delete-incident"":[""mirror_direction""],""azure-sentinel-delete-watchlist"":[""mirror_direction""],""azure-sentinel-delete-watchlist-item"":[""mirror_direction""],""azure-sentinel-get-incident-by-id"":[""mirror_direction""],""azure-sentinel-incident-add-comment"":[""mirror_direction""],""azure-sentinel-incident-delete-comment"":[""mirror_direction""],""azure-sentinel-list-alert-rule"":[""mirror_direction""],""azure-sentinel-list-alert-rule-template"":[""mirror_direction""],""azure-sentinel-list-incident-alerts"":[""mirror_direction""],""azure-sentinel-list-incident-comments"":[""mirror_direction""],""azure-sentinel-list-incident-entities"":[""mirror_direction""],""azure-sentinel-list-incident-relations"":[""mirror_direction""],""azure-sentinel-list-incidents"":[""mirror_direction""],""azure-sentinel-list-watchlist-items"":[""mirror_direction""],""azure-sentinel-list-watchlists"":[""mirror_direction""],""azure-sentinel-resource-group-list"":[""mirror_direction""],""azure-sentinel-subscriptions-list"":[""mirror_direction""],""azure-sentinel-threat-indicator-create"":[""mirror_direction""],""azure-sentinel-threat-indicator-delete"":[""mirror_direction""],""azure-sentinel-threat-indicator-list"":[""mirror_direction""],""azure-sentinel-threat-indicator-query"":[""mirror_direction""],""azure-sentinel-threat-indicator-tags-append"":[""mirror_direction""],""azure-sentinel-threat-indicator-tags-replace"":[""mirror_direction""],""azure-sentinel-threat-indicator-update"":[""mirror_direction""],""azure-sentinel-update-alert-rule"":[""mirror_direction""],""azure-sentinel-update-incident"":[""mirror_direction""],""azure-sentinel-watchlist-create-update"":[""mirror_direction""],""fetch-incidents"":[""fetch_additional_info"",""fetch_time"",""limit"",""min_severity"",""mirror_direction"",""statuses_to_fetch""],""get-mapping-fields"":[""mirror_direction""],""get-modified-remote-data"":[""mirror_direction""],""get-remote-data"":[""close_incident"",""mirror_direction""],""test-module"":[""mirror_direction""],""update-remote-system"":[""close_ticket"",""mirror_direction""]}}","[""fetch_time"",""min_severity""]",,,,,,,,,,,, +Azure Sentinel,Packs/AzureSentinel/Integrations/AzureSentinel/AzureSentinel.yml,Microsoft Security,,,,,,,,,,,,,,,, Azure Storage,Packs/AzureStorage/Integrations/AzureStorage/AzureStorage.yml,Microsoft Azure,,,,,,,,,,,,,,,, Azure Storage Container,Packs/AzureStorageContainer/Integrations/AzureStorageContainer/AzureStorageContainer.yml,Microsoft Azure,,,,,,,,,,,,,,,, Azure Storage FileShare,Packs/AzureStorageFileShare/Integrations/AzureStorageFileShare/AzureStorageFileShare.yml,Microsoft Azure,,,,,,,,,,,,,,,, @@ -114,7 +114,7 @@ CIRCL,Packs/CIRCL/Integrations/CIRCL/CIRCL.yml,CIRCL,,,,,,,,,,,,,,,, CIRCL CVE Search,Packs/CIRCL/Integrations/CirclCVESearch/CirclCVESearch.yml,CIRCL CVE Search,,,,,,,,,,,,,,,, CSVFeed,Packs/FeedCSV/Integrations/FeedCSV/FeedCSV.yml,CSVFeed,,,,,,,,,,,,,,,, CapeSandbox,Packs/CapeSandbox/Integrations/CapeSandbox/CapeSandbox.yml,CapeSandbox,,,,,,,,,,,,,,,, -Carbon Black Endpoint Standard,Packs/CarbonBlackDefense/Integrations/CarbonBlackEndpointStandard/CarbonBlackEndpointStandard.yml,Carbon Black,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""custom_credentials"",""xsoar_params"":[""custom_credentials.identifier"",""custom_credentials.password""]},{""type"":""APIKey"",""name"":""live_response_credentials"",""xsoar_params"":[""live_response_credentials.identifier"",""live_response_credentials.password""]}],""config"":""REQUIRED(custom_credentials, live_response_credentials)"",""other_connection"":[""insecure"",""organization_key"",""proxy"",""url""]}","{""integration"":""Carbon Black Endpoint Standard v2"",""commands"":{""cbd-add-rule-to-policy"":[],""cbd-alerts-search"":[],""cbd-create-policy"":[],""cbd-delete-policy"":[],""cbd-delete-rule-from-policy"":[],""cbd-device-background-scan"":[],""cbd-device-background-scan-stop"":[],""cbd-device-bypass"":[],""cbd-device-policy-update"":[],""cbd-device-quarantine"":[],""cbd-device-search"":[],""cbd-device-unbypass"":[],""cbd-device-unquarantine"":[],""cbd-device-update-sensor-version"":[],""cbd-find-events"":[],""cbd-find-events-details"":[],""cbd-find-events-details-results"":[],""cbd-find-events-results"":[],""cbd-find-processes"":[],""cbd-find-processes-results"":[],""cbd-get-alert-details"":[],""cbd-get-policies"":[],""cbd-get-policy"":[],""cbd-set-policy"":[],""cbd-update-policy"":[],""cbd-update-rule-in-policy"":[],""fetch-incidents"":[""category"",""device_id"",""device_username"",""first_fetch"",""max_fetch"",""min_severity"",""policy_id"",""query"",""suffix_url_path""],""test-module"":[""category"",""device_id"",""device_username"",""isFetch"",""min_severity"",""policy_id"",""query"",""suffix_url_path""]}}","[""first_fetch"",""max_fetch"",""suffix_url_path""]",,,,,,,,,,,, +Carbon Black Endpoint Standard,Packs/CarbonBlackDefense/Integrations/CarbonBlackEndpointStandard/CarbonBlackEndpointStandard.yml,Carbon Black,,,,,,,,,,,,,,,, Carbon Black Endpoint Standard v3,Packs/CarbonBlackDefense/Integrations/CarbonBlackEndpointStandardV3/CarbonBlackEndpointStandardV3.yml,Carbon Black,,,,,,,,,,,,,,,, Carbon Black Enterprise EDR,Packs/CarbonBlackEnterpriseEDR/Integrations/CarbonBlackEnterpriseEDR/CarbonBlackEnterpriseEDR.yml,Carbon Black,,,,,,,,,,,,,,,, CarbonBlackEndpointStandardEventCollector,Packs/CarbonBlackDefense/Integrations/CarbonBlackEndpointStandardEventCollector/CarbonBlackEndpointStandardEventCollector.yml,Carbon Black,,,,,,,,,,,,,,,, @@ -136,7 +136,7 @@ Cisco Firepower,Packs/CiscoFirepower/Integrations/CiscoFirepower/CiscoFirepower. Cisco ISE,Packs/cisco-ise/Integrations/cisco-ise/cisco-ise.yml,Cisco Security,,,,,,,,,,,,,,,, Cisco Meraki v2,Packs/cisco-meraki/Integrations/CiscoMerakiv2/CiscoMerakiv2.yml,Cisco Meraki,,,,,,,,,,,,,,,, Cisco Secure Malware Analytics,Packs/ThreatGrid/Integrations/FeedCiscoSecureMalwareAnalytics/FeedCiscoSecureMalwareAnalytics.yml,Cisco Security,,,,,,,,,,,,,,,, -Cisco Spark,Packs/CiscoSpark/Integrations/CiscoSpark/CiscoSpark.yml,Cisco Security,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""apikey_creds"",""xsoar_params"":[""apiKey"",""apikey_creds.identifier"",""apikey_creds.password""]}],""config"":""REQUIRED(apikey_creds)"",""other_connection"":[""insecure"",""proxy"",""server""]}","{""integration"":""Cisco Spark"",""commands"":{""cisco-spark-list-people"":[],""cisco-spark-create-person"":[],""cisco-spark-get-person-details"":[],""cisco-spark-update-person"":[],""cisco-spark-delete-person"":[],""cisco-spark-get-own-details"":[],""cisco-spark-list-rooms"":[],""cisco-spark-create-room"":[],""cisco-spark-get-room-details"":[],""cisco-spark-update-room"":[],""cisco-spark-delete-room"":[],""cisco-spark-list-memberships"":[],""cisco-spark-create-membership"":[],""cisco-spark-get-membership-details"":[],""cisco-spark-update-membership"":[],""cisco-spark-delete-membership"":[],""cisco-spark-list-messages"":[],""cisco-spark-create-message"":[],""cisco-spark-get-message-details"":[],""cisco-spark-delete-message"":[],""cisco-spark-list-teams"":[],""cisco-spark-create-team"":[],""cisco-spark-get-team-details"":[],""cisco-spark-update-team"":[],""cisco-spark-delete-team"":[],""cisco-spark-list-team-memberships"":[],""cisco-spark-create-team-membership"":[],""cisco-spark-get-team-membership-details"":[],""cisco-spark-update-team-membership"":[],""cisco-spark-delete-team-membership"":[],""cisco-spark-list-webhooks"":[],""cisco-spark-create-webhook"":[],""cisco-spark-get-webhook-details"":[],""cisco-spark-update-webhook"":[],""cisco-spark-delete-webhook"":[],""cisco-spark-list-organizations"":[],""cisco-spark-get-organization-details"":[],""cisco-spark-list-licenses"":[],""cisco-spark-get-license-details"":[],""cisco-spark-list-roles"":[],""cisco-spark-get-role-details"":[],""cisco-spark-send-message-to-person"":[],""cisco-spark-send-message-to-room"":[],""test-module"":[]}}",[],,,,,,,,,,,, +Cisco Spark,Packs/CiscoSpark/Integrations/CiscoSpark/CiscoSpark.yml,Cisco Security,,,,,,,,,,,,,,,, Cisco Stealthwatch,Packs/CiscoStealthwatch/Integrations/CiscoStealthwatch/CiscoStealthwatch.yml,Cisco Security,,,,,,,,,,,,,,,, Cisco Umbrella Cloud Security v2,Packs/Cisco-umbrella-cloud-security/Integrations/CiscoUmbrellaCloudSecurityv2/CiscoUmbrellaCloudSecurityv2.yml,Cisco Cisco Umbrella,,,,,,,,,,,,,,,, Cisco Umbrella Enforcement,Packs/Cisco-umbrella-enforcement/Integrations/CiscoUmbrellaEnforcement/CiscoUmbrellaEnforcement.yml,Cisco Cisco Umbrella,,,,,,,,,,,,,,,, @@ -166,8 +166,8 @@ Cortex XDR - IOC,Packs/CortexXDR/Integrations/XDR_iocs/XDR_iocs.yml,Palo Alto Ne Cortex XDR - IR,Packs/CortexXDR/Integrations/CortexXDRIR/CortexXDRIR.yml,Palo Alto Networks Cortex,,,,,,,,,,,,,,,, Cortex XDR - XQL Query Engine,Packs/CortexXDR/Integrations/XQLQueryingEngine/XQLQueryingEngine.yml,Palo Alto Networks Cortex,,,,,,,,,,,,,,,, CounterTack,Packs/CounterTack/Integrations/CounterTack/CounterTack.yml,CounterTack,,,,,,,,,,,,,,,, -CrowdStrike Falcon Intel v2,Packs/CrowdStrikeIntel/Integrations/CrowdStrikeFalconIntel_v2/CrowdStrikeFalconIntel_v2.yml,CrowdStrike Falcon,juschwartz,,,,,,,,,,,,,,, -CrowdstrikeFalcon,Packs/CrowdStrikeFalcon/Integrations/CrowdStrikeFalcon/CrowdStrikeFalcon.yml,CrowdStrike Falcon,juschwartz,"{""auth_types"":[{""type"":""OAuth2ClientCreds"",""name"":""credentials"",""xsoar_params"":[""credentials.identifier"",""credentials.password""]}],""config"":""REQUIRED(credentials)"",""other_connection"":[""insecure"",""proxy"",""url""]}","{""integration"": ""CrowdStrike Falcon"", ""commands"": {""cs-device-ran-on"": [], ""cs-falcon-add-case-tag"": [], ""cs-falcon-add-host-group-members"": [], ""cs-falcon-apply-quarantine-file-action"": [], ""cs-falcon-batch-upload-custom-ioc"": [], ""cs-falcon-contain-host"": [], ""cs-falcon-create-host-group"": [], ""cs-falcon-create-ioa-exclusion"": [], ""cs-falcon-create-ml-exclusion"": [], ""cs-falcon-cspm-list-policy-details"": [], ""cs-falcon-cspm-list-service-policy-settings"": [], ""cs-falcon-cspm-update-policy_settings"": [], ""cs-falcon-delete-case-tag"": [], ""cs-falcon-delete-custom-ioc"": [], ""cs-falcon-delete-file"": [], ""cs-falcon-delete-host-groups"": [], ""cs-falcon-delete-ioa-exclusion"": [], ""cs-falcon-delete-ioc"": [], ""cs-falcon-delete-ml-exclusion"": [], ""cs-falcon-delete-script"": [], ""cs-falcon-device-count-ioc"": [], ""cs-falcon-device-ran-on"": [], ""cs-falcon-get-behavior"": [], ""cs-falcon-get-custom-ioc"": [], ""cs-falcon-get-evidence-for-case"": [], ""cs-falcon-get-extracted-file"": [], ""cs-falcon-get-file"": [], ""cs-falcon-get-ioarules"": [], ""cs-falcon-get-ioc"": [], ""cs-falcon-get-script"": [], ""cs-falcon-lift-host-containment"": [], ""cs-falcon-list-case-summaries"": [], ""cs-falcon-list-cnapp-alerts"": [], ""cs-falcon-list-detection-summaries"": [], ""cs-falcon-list-files"": [], ""cs-falcon-list-host-files"": [], ""cs-falcon-list-host-group-members"": [], ""cs-falcon-list-host-groups"": [], ""cs-falcon-list-identity-entities"": [], ""cs-falcon-list-quarantined-file"": [], ""cs-falcon-list-scripts"": [], ""cs-falcon-list-users"": [], ""cs-falcon-ods-create-scan"": [], ""cs-falcon-ods-create-scheduled-scan"": [], ""cs-falcon-ods-delete-scheduled-scan"": [], ""cs-falcon-ods-query-malicious-files"": [], ""cs-falcon-ods-query-scan"": [], ""cs-falcon-ods-query-scan-host"": [], ""cs-falcon-ods-query-scheduled-scan"": [], ""cs-falcon-process-details"": [], ""cs-falcon-processes-ran-on"": [], ""cs-falcon-refresh-session"": [], ""cs-falcon-remove-host-group-members"": [], ""cs-falcon-resolve-case"": [], ""cs-falcon-resolve-detection"": [], ""cs-falcon-resolve-identity-detection"": [], ""cs-falcon-resolve-mobile-detection"": [], ""cs-falcon-rtr-kill-process"": [], ""cs-falcon-rtr-list-network-stats"": [], ""cs-falcon-rtr-list-processes"": [], ""cs-falcon-rtr-list-scheduled-tasks"": [], ""cs-falcon-rtr-read-registry"": [], ""cs-falcon-rtr-remove-file"": [], ""cs-falcon-rtr-retrieve-file"": [], ""cs-falcon-run-command"": [], ""cs-falcon-run-get-command"": [], ""cs-falcon-run-script"": [], ""cs-falcon-search-custom-iocs"": [], ""cs-falcon-search-detection"": [], ""cs-falcon-search-device"": [], ""cs-falcon-search-ioa-exclusion"": [], ""cs-falcon-search-iocs"": [], ""cs-falcon-search-ml-exclusion"": [], ""cs-falcon-search-ngsiem-events"": [], ""cs-falcon-spotlight-list-host-by-vulnerability"": [], ""cs-falcon-spotlight-search-vulnerability"": [], ""cs-falcon-status-command"": [], ""cs-falcon-status-get-command"": [], ""cs-falcon-update-custom-ioc"": [], ""cs-falcon-update-host-group"": [], ""cs-falcon-update-ioa-exclusion"": [], ""cs-falcon-update-ioc"": [], ""cs-falcon-update-ml-exclusion"": [], ""cs-falcon-upload-custom-ioc"": [], ""cs-falcon-upload-file"": [], ""cs-falcon-upload-ioc"": [], ""cs-falcon-upload-script"": [], ""cve"": [], ""endpoint"": [], ""fetch-events"": [""automated_leads_fetch_query"", ""fetch_events_or_detections"", ""fetch_incidents_or_detections"", ""fetch_query"", ""fetch_time"", ""idp_detections_fetch_query"", ""incidents_per_fetch"", ""ioa_fetch_query"", ""iom_fetch_query"", ""look_back"", ""look_back_xsiam"", ""mirror_direction"", ""mobile_detections_fetch_query"", ""ngsiem_cases_fetch_query"", ""ngsiem_detection_fetch_query"", ""ngsiem_incidents_fetch_query"", ""ofp_detection_fetch_query"", ""on_demand_fetch_query"", ""secret"", ""third_party_detection_fetch_query""], ""fetch-incidents"": [""automated_leads_fetch_query"", ""fetch_events_or_detections"", ""fetch_incidents_or_detections"", ""fetch_query"", ""fetch_time"", ""idp_detections_fetch_query"", ""incidents_per_fetch"", ""ioa_fetch_query"", ""iom_fetch_query"", ""look_back"", ""look_back_xsiam"", ""mirror_direction"", ""mobile_detections_fetch_query"", ""ngsiem_cases_fetch_query"", ""ngsiem_detection_fetch_query"", ""ngsiem_incidents_fetch_query"", ""ofp_detection_fetch_query"", ""on_demand_fetch_query"", ""secret"", ""third_party_detection_fetch_query""], ""get-mapping-fields"": [], ""get-modified-remote-data"": [""fetch_incidents_or_detections"", ""fetch_time"", ""incidents_per_fetch"", ""mirror_direction"", ""secret""], ""get-remote-data"": [""close_incident"", ""fetch_time"", ""incidents_per_fetch"", ""mirror_direction"", ""reopen_statuses"", ""secret""], ""test-module"": [""automated_leads_fetch_query"", ""fetch_events_or_detections"", ""fetch_incidents_or_detections"", ""fetch_query"", ""fetch_time"", ""idp_detections_fetch_query"", ""incidents_per_fetch"", ""ioa_fetch_query"", ""iom_fetch_query"", ""isFetch"", ""look_back"", ""look_back_xsiam"", ""mirror_direction"", ""mobile_detections_fetch_query"", ""ngsiem_cases_fetch_query"", ""ngsiem_detection_fetch_query"", ""ngsiem_incidents_fetch_query"", ""ofp_detection_fetch_query"", ""on_demand_fetch_query"", ""secret"", ""third_party_detection_fetch_query""], ""update-remote-system"": []}}","[""fetch_time"",""incidents_per_fetch""]",,,,,,,,,,,, +CrowdStrike Falcon Intel v2,Packs/CrowdStrikeIntel/Integrations/CrowdStrikeFalconIntel_v2/CrowdStrikeFalconIntel_v2.yml,CrowdStrike Falcon,,,,,,,,,,,,,,,, +CrowdstrikeFalcon,Packs/CrowdStrikeFalcon/Integrations/CrowdStrikeFalcon/CrowdStrikeFalcon.yml,CrowdStrike Falcon,juschwartz,"{""auth_types"":[{""type"":""OAuth2ClientCreds"",""name"":""credentials"",""xsoar_params"":[""credentials.identifier"",""credentials.password""]}],""config"":""REQUIRED(credentials)"",""other_connection"":[""insecure"",""proxy"",""url""]}","{""integration"":""CrowdStrike Falcon"",""commands"":{""cs-device-ran-on"":[],""cs-falcon-add-case-tag"":[],""cs-falcon-add-host-group-members"":[],""cs-falcon-apply-quarantine-file-action"":[],""cs-falcon-batch-upload-custom-ioc"":[],""cs-falcon-contain-host"":[],""cs-falcon-create-host-group"":[],""cs-falcon-create-ioa-exclusion"":[],""cs-falcon-create-ml-exclusion"":[],""cs-falcon-cspm-list-policy-details"":[],""cs-falcon-cspm-list-service-policy-settings"":[],""cs-falcon-cspm-update-policy_settings"":[],""cs-falcon-delete-case-tag"":[],""cs-falcon-delete-custom-ioc"":[],""cs-falcon-delete-file"":[],""cs-falcon-delete-host-groups"":[],""cs-falcon-delete-ioa-exclusion"":[],""cs-falcon-delete-ioc"":[],""cs-falcon-delete-ml-exclusion"":[],""cs-falcon-delete-script"":[],""cs-falcon-device-count-ioc"":[],""cs-falcon-device-ran-on"":[],""cs-falcon-get-behavior"":[],""cs-falcon-get-custom-ioc"":[],""cs-falcon-get-evidence-for-case"":[],""cs-falcon-get-extracted-file"":[],""cs-falcon-get-file"":[],""cs-falcon-get-ioarules"":[],""cs-falcon-get-ioc"":[],""cs-falcon-get-script"":[],""cs-falcon-lift-host-containment"":[],""cs-falcon-list-case-summaries"":[],""cs-falcon-list-cnapp-alerts"":[],""cs-falcon-list-detection-summaries"":[],""cs-falcon-list-files"":[],""cs-falcon-list-host-files"":[],""cs-falcon-list-host-group-members"":[],""cs-falcon-list-host-groups"":[],""cs-falcon-list-identity-entities"":[],""cs-falcon-list-quarantined-file"":[],""cs-falcon-list-scripts"":[],""cs-falcon-list-users"":[],""cs-falcon-ods-create-scan"":[],""cs-falcon-ods-create-scheduled-scan"":[],""cs-falcon-ods-delete-scheduled-scan"":[],""cs-falcon-ods-query-malicious-files"":[],""cs-falcon-ods-query-scan"":[],""cs-falcon-ods-query-scan-host"":[],""cs-falcon-ods-query-scheduled-scan"":[],""cs-falcon-process-details"":[],""cs-falcon-processes-ran-on"":[],""cs-falcon-refresh-session"":[],""cs-falcon-remove-host-group-members"":[],""cs-falcon-resolve-case"":[],""cs-falcon-resolve-detection"":[],""cs-falcon-resolve-identity-detection"":[],""cs-falcon-resolve-mobile-detection"":[],""cs-falcon-rtr-kill-process"":[],""cs-falcon-rtr-list-network-stats"":[],""cs-falcon-rtr-list-processes"":[],""cs-falcon-rtr-list-scheduled-tasks"":[],""cs-falcon-rtr-read-registry"":[],""cs-falcon-rtr-remove-file"":[],""cs-falcon-rtr-retrieve-file"":[],""cs-falcon-run-command"":[],""cs-falcon-run-get-command"":[],""cs-falcon-run-script"":[],""cs-falcon-search-custom-iocs"":[],""cs-falcon-search-detection"":[],""cs-falcon-search-device"":[],""cs-falcon-search-ioa-exclusion"":[],""cs-falcon-search-iocs"":[],""cs-falcon-search-ml-exclusion"":[],""cs-falcon-search-ngsiem-events"":[],""cs-falcon-spotlight-list-host-by-vulnerability"":[],""cs-falcon-spotlight-search-vulnerability"":[],""cs-falcon-status-command"":[],""cs-falcon-status-get-command"":[],""cs-falcon-update-custom-ioc"":[],""cs-falcon-update-host-group"":[],""cs-falcon-update-ioa-exclusion"":[],""cs-falcon-update-ioc"":[],""cs-falcon-update-ml-exclusion"":[],""cs-falcon-upload-custom-ioc"":[],""cs-falcon-upload-file"":[],""cs-falcon-upload-ioc"":[],""cs-falcon-upload-script"":[],""cve"":[],""endpoint"":[],""fetch-events"":[""automated_leads_fetch_query"",""fetch_query"",""idp_detections_fetch_query"",""mobile_detections_fetch_query"",""ngsiem_cases_fetch_query"",""ngsiem_detection_fetch_query"",""ngsiem_incidents_fetch_query"",""ofp_detection_fetch_query"",""on_demand_fetch_query"",""third_party_detection_fetch_query""],""fetch-incidents"":[""automated_leads_fetch_query"",""fetch_query"",""idp_detections_fetch_query"",""mobile_detections_fetch_query"",""ngsiem_cases_fetch_query"",""ngsiem_detection_fetch_query"",""ngsiem_incidents_fetch_query"",""ofp_detection_fetch_query"",""on_demand_fetch_query"",""third_party_detection_fetch_query""],""get-mapping-fields"":[],""get-modified-remote-data"":[],""get-remote-data"":[],""test-module"":[""automated_leads_fetch_query"",""fetch_query"",""idp_detections_fetch_query"",""mobile_detections_fetch_query"",""ngsiem_cases_fetch_query"",""ngsiem_detection_fetch_query"",""ngsiem_incidents_fetch_query"",""ofp_detection_fetch_query"",""on_demand_fetch_query"",""third_party_detection_fetch_query""],""update-remote-system"":[]}}",[],,,,,,,,,,,, Cryptocurrency,Packs/Cryptocurrency/Integrations/Cryptocurrency/Cryptocurrency.yml,Cryptocurrency,,,,,,,,,,,,,,,, Cuckoo Sandbox,Packs/CuckooSandbox/Integrations/CuckooSandbox/CuckooSandbox.yml,Cuckoo Cuckoo Sandbox,,,,,,,,,,,,,,,, CybelAngel Event Collector,Packs/CybelAngel/Integrations/CybelAngelEventCollector/CybelAngelEventCollector.yml,CybelAngel,,,,,,,,,,,,,,,, @@ -399,7 +399,7 @@ OktaAuth0EventCollector,Packs/OktaAuth0/Integrations/OktaAuth0EventCollector/Okt OnboardingIntegration,Packs/OnboardingIntegration/Integrations/OnboardingIntegration/OnboardingIntegration.yml,Cortex Automation Developer Tools,,,,,,,,,,,,,,,, OneLogin Event Collector,Packs/OneLogin/Integrations/OneLoginEventCollector/OneLoginEventCollector.yml,OneLogin,,,,,,,,,,,,,,,, OnePassword,Packs/OnePassword/Integrations/OnePassword/OnePassword.yml,1Password,,,,,,,,,,,,,,,, -OpenAi ChatGPT v3,Packs/OpenAI/Integrations/OpenAiChatGPTV3/OpenAiChatGPTV3.yml,OpenAI,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""apikey"",""xsoar_params"":[""apikey.password""]}],""config"":""REQUIRED(apikey)"",""other_connection"":[""insecure"",""proxy"",""url""]}","{""integration"":""OpenAI GPT"",""commands"":{""test-module"":[""max_tokens"",""model-freetext"",""model-select"",""temperature"",""top_p""],""gpt-send-message"":[""max_tokens"",""model-freetext"",""model-select"",""temperature"",""top_p""],""gpt-check-email-header"":[""max_tokens"",""model-freetext"",""model-select"",""temperature"",""top_p""],""gpt-check-email-body"":[""max_tokens"",""model-freetext"",""model-select"",""temperature"",""top_p""],""gpt-create-soc-email-template"":[""max_tokens"",""model-freetext"",""model-select"",""temperature"",""top_p""]}}",[],,,,,,,,,,,, +OpenAi ChatGPT v3,Packs/OpenAI/Integrations/OpenAiChatGPTV3/OpenAiChatGPTV3.yml,OpenAI,,,,,,,,,,,,,,,, OpenCTI,Packs/OpenCTI/Integrations/OpenCTI/OpenCTI.yml,OpenCTI,,,,,,,,,,,,,,,, OpenCTI Feed 4.X,Packs/FeedOpenCTI/Integrations/FeedOpenCTI_v4/FeedOpenCTI_v4.yml,OpenCTI,,,,,,,,,,,,,,,, OpenCVE,Packs/OpenCVE/Integrations/OpenCVE/OpenCVE.yml,OpenCVE,,,,,,,,,,,,,,,, @@ -441,7 +441,7 @@ ProofpointIsolationEventCollector,Packs/ProofpointIsolation/Integrations/Proofpo ProofpointThreatResponseEventCollector,Packs/ProofpointThreatResponse/Integrations/ProofpointThreatResponseEventCollector/ProofpointThreatResponseEventCollector.yml,Proofpoint,,,,,,,,,,,,,,,, ProtectWise,Packs/ProtectWise/Integrations/ProtectWise/ProtectWise.yml,ProtectWise,,,,,,,,,,,,,,,, Public DNS Feed,Packs/FeedPublicDNS/Integrations/FeedPublicDNS/FeedPublicDNS.yml,Public DNS Feed,,,,,,,,,,,,,,,, -QRadar v3,Packs/QRadar/Integrations/QRadar_v3/QRadar_v3.yml,IBM QRadar,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""api_key"",""xsoar_params"":[""credentials.identifier"",""credentials.password""]},{""type"":""Plain"",""name"":""credentials"",""xsoar_params"":[""credentials.identifier"",""credentials.password""]}],""config"":""CHOICE(api_key, credentials)"",""other_connection"":[""api_version"",""insecure"",""proxy"",""server""]}","{""integration"": ""IBM QRadar v3"", ""commands"": {""get-mapping-fields"": [""adv_params"", ""fetch_interval"", ""timeout""], ""get-modified-remote-data"": [""adv_params"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""mirror_limit"", ""mirror_options"", ""timeout""], ""get-remote-data"": [""adv_params"", ""close_incident"", ""enrichment"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""mirror_options"", ""timeout""], ""long-running-execution"": [""adv_params"", ""enrichment"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""first_fetch"", ""incident_type"", ""limit_assets"", ""mirror_options"", ""offenses_per_fetch"", ""query"", ""retry_events_fetch"", ""timeout""], ""qradar-assets-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-closing-reasons"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-create-note"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-create-reference-set"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-create-reference-set-value"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-delete-reference-set"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-delete-reference-set-value"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-disconnected-log-collectors-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-domains-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-event-collectors-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-geolocations-for-ip"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-asset-by-id"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-assets"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-closing-reasons"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-custom-properties"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-domain-by-id"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-domains"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-note"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-reference-by-name"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-search"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-get-search-results"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-indicators-upload"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-ips-local-destination-get"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-ips-source-get"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-create"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-delete"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-extensions-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-groups-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-languages-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-protocol-types-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-types-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-source-update"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-log-sources-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offense-by-id"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offense-note-create"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offense-notes-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offense-update"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offenses"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-offenses-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-print-context"": [""adv_params"", ""fetch_interval""], ""qradar-reference-set-create"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-reference-set-delete"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-reference-set-value-delete"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-reference-set-value-upsert"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-reference-sets-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-remote-network-cidr-create"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-remote-network-cidr-delete"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-remote-network-cidr-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-remote-network-cidr-update"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-remote-network-deploy-execution"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-reset-last-run"": [""adv_params"", ""fetch_interval""], ""qradar-rule-groups-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-rules-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-saved-searches-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-search-cancel"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-search-create"": [""adv_params"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""timeout""], ""qradar-search-delete"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-search-results-get"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-search-retrieve-events"": [""adv_params"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""timeout""], ""qradar-search-status-get"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-searches"": [""adv_params"", ""events_columns"", ""events_limit"", ""fetch_interval"", ""fetch_mode"", ""timeout""], ""qradar-searches-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-update-offense"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-update-reference-set-value"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-upload-indicators"": [""adv_params"", ""fetch_interval"", ""timeout""], ""qradar-wincollect-destinations-list"": [""adv_params"", ""fetch_interval"", ""timeout""], ""test-module"": [""adv_params"", ""fetch_interval"", ""timeout""]}}","[""close_incident"",""enrichment"",""events_limit"",""first_fetch"",""limit_assets"",""mirror_limit"",""mirror_options"",""retry_events_fetch""]",,,,,,,,,,,, +QRadar v3,Packs/QRadar/Integrations/QRadar_v3/QRadar_v3.yml,IBM QRadar,,,,,,,,,,,,,,,, QualysV2,Packs/qualys/Integrations/Qualysv2/Qualysv2.yml,Qualys,,,,,,,,,,,,,,,, RSA Archer v2,Packs/ArcherRSA/Integrations/ArcherV2/ArcherV2.yml,RSA,,,,,,,,,,,,,,,, RSA NetWitness Endpoint,Packs/RSANetWitnessEndpoint/Integrations/RSANetWitnessEndpoint/RSANetWitnessEndpoint.yml,RSA,,,,,,,,,,,,,,,, @@ -493,7 +493,7 @@ Snowflake,Packs/Snowflake/Integrations/Snowflake/Snowflake.yml,Snowflake,,,,,,,, SolarWinds,Packs/SolarWinds/Integrations/SolarWinds/SolarWinds.yml,SolarWinds,,,,,,,,,,,,,,,, Sophos Central,Packs/SophosCentral/Integrations/SophosCentral/SophosCentral.yml,Sophos,,,,,,,,,,,,,,,, SpamhausFeed,Packs/FeedSpamhaus/Integrations/FeedSpamhaus/FeedSpamhaus.yml,Spamhaus,,,,,,,,,,,,,,,, -SplunkPy,Packs/SplunkPy/Integrations/SplunkPy/SplunkPy.yml,Splunk,juschwartz,"{""auth_types"":[{""type"":""APIKey"",""name"":""hec_token"",""xsoar_params"":[""cred_hec_token.password"",""hec_token""]},{""type"":""Plain"",""name"":""authentication"",""xsoar_params"":[""authentication.identifier"",""authentication.password""]}],""config"":""REQUIRED(authentication) + OPTIONAL(hec_token)"",""other_connection"":[""app"",""hec_url"",""host"",""port"",""proxy"",""unsecure""]}","{""integration"": ""SplunkPy"", ""commands"": {""fetch-incidents"": [""comment_tag_from_splunk"", ""comment_tag_to_splunk"", ""enabled_enrichments"", ""extensive_logs"", ""extractFields"", ""fetchQuery"", ""fetch_limit"", ""fetch_time"", ""mirror_direction"", ""notable_time_source"", ""occurrence_look_behind"", ""parseNotableEventsRaw"", ""replaceKeys"", ""splunk_user_field"", ""timezone"", ""unique_id_fields"", ""useSplunkTime"", ""userMapping"", ""user_map_lookup_name"", ""xsoar_user_field""], ""get-mapping-fields"": [""asset_enrich_lookup_tables"", ""enabled_enrichments"", ""extractFields"", ""fetchQuery"", ""fetch_limit"", ""fetch_time"", ""identity_enrich_lookup_tables"", ""replaceKeys"", ""timezone"", ""type_field"", ""useSplunkTime"", ""use_cim""], ""get-modified-remote-data"": [""close_end_status_statuses"", ""close_extra_labels"", ""close_incident"", ""comment_tag_from_splunk"", ""comment_tag_to_splunk"", ""enabled_enrichments"", ""extensive_logs"", ""fetch_limit"", ""fetch_time"", ""mirror_direction"", ""parseNotableEventsRaw"", ""replaceKeys"", ""splunk_user_field"", ""timezone"", ""userMapping"", ""user_map_lookup_name"", ""xsoar_user_field""], ""get-remote-data"": [""close_end_status_statuses"", ""close_extra_labels"", ""close_incident"", ""comment_tag_from_splunk"", ""comment_tag_to_splunk"", ""enabled_enrichments"", ""extensive_logs"", ""fetch_limit"", ""fetch_time"", ""mirror_direction"", ""parseNotableEventsRaw"", ""replaceKeys"", ""splunk_user_field"", ""userMapping"", ""user_map_lookup_name"", ""xsoar_user_field""], ""splunk-get-indexes"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-get-username-by-xsoar-user"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-job-create"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-job-share"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-job-status"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-add-entries"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-config"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-create"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-create-transform"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-data-delete"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-data-list"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-delete"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-delete-entry"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collection-search-entry"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-kv-store-collections-list"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-notable-event-edit"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-parse-raw"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-reset-enriching-fetch-mechanism"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-results"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-search"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-submit-event"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""splunk-submit-event-hec"": [""enabled_enrichments"", ""fetch_limit"", ""fetch_time"", ""replaceKeys""], ""test-module"": [""enabled_enrichments"", ""fetchQuery"", ""fetch_limit"", ""fetch_time"", ""isFetch"", ""mirror_direction"", ""replaceKeys"", ""timezone""]}}","[""comment_tag_from_splunk"",""comment_tag_to_splunk"",""fetch_limit"",""notable_time_source"",""type_field""]",,,,,,,,,,,, +SplunkPy,Packs/SplunkPy/Integrations/SplunkPy/SplunkPy.yml,Splunk,,,,,,,,,,,,,,,, SplunkPy v2,Packs/SplunkPy/Integrations/SplunkPyV2/SplunkPyV2.yml,Splunk,,,,,,,,,,,,,,,, Stealthwatch Cloud,Packs/Stealthwatch_Cloud/Integrations/Stealthwatch_Cloud/Stealthwatch_Cloud.yml,Cisco Security,,,,,,,,,,,,,,,, SumoLogic,Packs/SumoLogic/Integrations/SumoLogic/SumoLogic.yml,Sumo Logic,,,,,,,,,,,,,,,, diff --git a/connectus/default_ignore_params.txt b/connectus/default_ignore_params.txt new file mode 100644 index 00000000000..fbf667853fb --- /dev/null +++ b/connectus/default_ignore_params.txt @@ -0,0 +1,37 @@ +# XSOAR framework params (not part of any integration's auth or per-command logic). +# Sourced separately from integration-specific Auth Details (--integration-id flag). +# Do NOT add auth-related params here — those come from Auth Details automatically +# (the analyzer pulls them via `workflow_state.py auth-params ` and unions +# them into its effective ignore set). +# +# Also do NOT add connection-adjacent params here (url, server_url, host, port, +# region, proxy, use_system_proxy, insecure, unsecure, trust_any_certificate, +# credentials, apikey, api_key, client_id, client_secret, username, password, +# token, etc.). Those belong in `Auth Details.other_connection` per integration. + +# Long-running / fetch-loop framework +longRunning +longRunningPort + +# Feed framework (TIM / indicator feeds) +feedReputation +feedExpirationInterval +feedExpirationPolicy +feedReliability +feedTags +feedFetchInterval +feedBypassExclusionList +feedIncremental +tlp_color + +# Incident-mirroring / classifier framework +mirror_options +mirror_direction +incidentType + +# Generic XSOAR fetch-loop toggles (flip switches XSOAR injects independently +# of the integration's own command surface) +isFetch +isFetchEvents +isFetchSamples +isFetchAssets diff --git a/connectus/workflow_state.py b/connectus/workflow_state.py index 489983181c9..07603d5084e 100644 --- a/connectus/workflow_state.py +++ b/connectus/workflow_state.py @@ -1,2914 +1,43 @@ #!/usr/bin/env python3 +"""Backward-compatible shim for the legacy ``workflow_state`` module. + +The real implementation now lives in the ``workflow_state/`` package +sitting next to this file. When Python resolves ``import workflow_state`` +it picks the package (directory) over this file because packages take +precedence — so existing callers keep working unchanged. + +This file remains so that the long-standing CLI invocation +``python3 connectus/workflow_state.py …`` continues to work: the script +runs as ``__main__`` from this file, which then delegates to +:func:`workflow_state.cli.main`. Every public name the package exports +is also re-exported here via the wildcard import for any caller that +loads this file as a path (instead of as the package). """ -Workflow State Machine for connectus-migration-pipeline.csv (UNIFIED 16-STEP MODEL) - -This script manages the workflow tracking columns in the CSV. It models the -workflow as a single linear 16-step sequence, strictly gated. Setting any -step at-or-behind the current step resets every step that follows it -("cascade reset"). The ONLY exception is `set-assignee`, which is treated as -an administrative update and never resets later steps. - -State is purely derived from row contents — there is no explicit -"current step" pointer column. The current step is defined as -``current_step(row) = first STEPS[i] where not is_done(row, STEPS[i])``. - -CSV column groups: - -Identity / metadata columns (3) — NOT managed by this script: - 1. Integration ID - 2. Integration File Path - 3. Connector ID - -Workflow columns (16) — the unified ordered sequence (see ``STEPS``): - 1. assignee (data, admin) - 2. Auth Details (data, JSON) - 3. Params to Commands (data, JSON) - 4. Params for test with default in code (data, JSON) - 5. Params same in other handlers (data, JSON, OPTIONAL — `skip`) - 6. generated manifest (checkpoint) - 7. run manifest make validate (checkpoint) - 8. wrote/checked code (checkpoint) - 9. shadowed command test passes (checkpoint) - 10. write tests (checkpoint) - 11. precommit/validate/unit tests passed (checkpoint) - 12. requires auth parity test (flag: YES/NO/N/A) - 13. auth parity test passes (checkpoint, auto-N/A from #12) - 14. param parity test passes (checkpoint) - 15. code reviewed (checkpoint) - 16. code merged (checkpoint) - -Rules: - - Strict ordering: any set/markpass/skip targeting a step AHEAD of the - current step is rejected. - - Cascade reset: any set/markpass/skip targeting a step AT-OR-BEHIND the - current step writes the new value AND clears every step after it. - (set-assignee is the ONLY exception — see ``cmd_set_assignee``.) - - Optional step #5 may be `skip`-ped (writes the sentinel "N/A"). - - Flag step #12: setting it to NO/N/A auto-writes "N/A" into step #13. - - Normalization on read AND write: any later-step value past the first - incomplete step is auto-cleared. A one-line stderr warning is printed - per row that gets normalized. - -Usage examples: - python3 connectus/workflow_state.py status "Cisco Spark" - python3 connectus/workflow_state.py status-all - python3 connectus/workflow_state.py dashboard - python3 connectus/workflow_state.py next - python3 connectus/workflow_state.py next "Cisco Spark" - python3 connectus/workflow_state.py next --all - python3 connectus/workflow_state.py set-assignee "Cisco Spark" "John Doe" - python3 connectus/workflow_state.py set-auth "Cisco Spark" '' - python3 connectus/workflow_state.py set-params-to-commands "Cisco Spark" '' - python3 connectus/workflow_state.py set-params-for-test "Cisco Spark" '' - python3 connectus/workflow_state.py set-shared-params "Cisco Spark" '' - python3 connectus/workflow_state.py skip "Cisco Spark" "Params same in other handlers" - python3 connectus/workflow_state.py markpass "Cisco Spark" "wrote/checked code" - python3 connectus/workflow_state.py set-auth-flag "Cisco Spark" YES - python3 connectus/workflow_state.py fail "Cisco Spark" "write tests" - python3 connectus/workflow_state.py reset-to "Cisco Spark" "wrote/checked code" - python3 connectus/workflow_state.py reset "Cisco Spark" - python3 connectus/workflow_state.py at-step "wrote/checked code" - python3 connectus/workflow_state.py list - python3 connectus/workflow_state.py list-by-assignee "John Doe" - python3 connectus/workflow_state.py list-connectors - python3 connectus/workflow_state.py list-by-connector "abcd1234" - python3 connectus/workflow_state.py set-assignee-by-connector "abcd1234" "John Doe" - python3 connectus/workflow_state.py next --mine - python3 connectus/workflow_state.py next --connector "abcd1234" - python3 connectus/workflow_state.py next --connector "abcd1234" --mine - python3 connectus/workflow_state.py show-step "Cisco Spark" "Params to Commands" - python3 connectus/workflow_state.py files "Cisco Spark" - python3 connectus/workflow_state.py auth-params "Cisco Spark" - python3 connectus/workflow_state.py auth-params "Cisco Spark" --format=json -""" - from __future__ import annotations -import csv -import io -import json -import os -import re -import subprocess -import sys -import tempfile -from dataclasses import dataclass -from typing import Callable, Optional - -# --------------------------------------------------------------------------- -# Constants -# --------------------------------------------------------------------------- - -BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) -CSV_PATH = os.path.join(BASE_DIR, "connectus", "connectus-migration-pipeline.csv") - -CHECK = "✅" -FAIL_MARK = "❌" -NA_MARK = "N/A" - -# Identity / metadata columns (NOT part of the workflow, never cleared). -DATA_COLUMNS = [ - "Integration ID", - "Integration File Path", - "Connector ID", -] - -VALID_FLAG_VALUES = {"YES", "NO", "N/A"} - -# Valid auth type enum values for Auth Details schema validation -VALID_AUTH_TYPES = { - "OAuth2AuthCode", - "OAuth2ClientCreds", - "OAuth2JWT", - "APIKey", - "Plain", - "Other", - "NoneRequired", -} - - -# --------------------------------------------------------------------------- -# The unified 16-step sequence (single source of truth) -# --------------------------------------------------------------------------- - -@dataclass(frozen=True) -class Step: - """A single step in the unified workflow sequence.""" - - index: int # 1..16 - name: str # CSV column name AND user-facing identifier - kind: str # "data" | "checkpoint" | "flag" - optional: bool # True only for #5 - setter: Optional[str] # CLI subcommand for setting; None for pure markpass - description: str # short human-readable summary used by `next` - - -# Forward declare validators set below. -_NoOp = lambda _v: [] # noqa: E731 — placeholder set after STEPS is defined - - -def _validate_assignee(value: str) -> list[str]: - if not value.strip(): - return ["Assignee cannot be empty."] - return [] - - -def _validate_json(value: str, what: str) -> list[str]: - try: - json.loads(value) - except json.JSONDecodeError as e: - return [f"'{what}' must be valid JSON. Parse error: {e}"] - return [] - - -def _validate_flag(value: str) -> list[str]: - v = value.strip().upper() - if v not in VALID_FLAG_VALUES: - return [f"Flag must be one of YES, NO, N/A. Got: '{value}'"] - return [] - - -STEPS: list[Step] = [ - Step(1, "assignee", "data", False, "set-assignee", - "Assign an owner to drive this integration's migration."), - Step(2, "Auth Details", "data", False, "set-auth", - "Record the auth classification JSON (validated against the Auth Details schema)."), - Step(3, "Params to Commands", "data", False, "set-params-to-commands", - "Map each integration command to the parameter IDs it consumes (JSON)."), - Step(4, "Params for test with default in code", "data", False, "set-params-for-test", - "List the param IDs whose defaults live in the integration source (JSON)."), - Step(5, "Params same in other handlers", "data", True, "set-shared-params", - "Optional: list params shared verbatim with sibling handlers (or `skip`)."), - Step(6, "generated manifest", "checkpoint", False, None, - "Generate the ConnectUs manifest YAML for the integration."), - Step(7, "run manifest make validate", "checkpoint", False, None, - "Run `make validate` on the generated manifest."), - Step(8, "wrote/checked code", "checkpoint", False, None, - "Write or review the integration source code."), - Step(9, "shadowed command test passes", "checkpoint", False, None, - "Verify there are no shadowed/conflicting commands in the same connector."), - Step(10, "write tests", "checkpoint", False, None, - "Author unit tests for the integration."), - Step(11, "precommit/validate/unit tests passed", "checkpoint", False, None, - "Run pre-commit, validate, and unit tests via demisto-sdk pre-commit."), - Step(12, "requires auth parity test", "flag", False, "set-auth-flag", - "Decide whether the integration needs an auth-parity test (YES/NO/N/A)."), - Step(13, "auth parity test passes", "checkpoint", False, None, - "Run the auth-parity test (auto-N/A when step 12 is NO/N/A)."), - Step(14, "param parity test passes", "checkpoint", False, None, - "Run the parameter-parity test."), - Step(15, "code reviewed", "checkpoint", False, None, - "Complete code review."), - Step(16, "code merged", "checkpoint", False, None, - "Merge the integration to the branch."), -] - -assert len(STEPS) == 16, "STEPS must have exactly 16 entries." - -STEP_BY_NAME: dict[str, Step] = {s.name: s for s in STEPS} -STEP_BY_INDEX: dict[int, Step] = {s.index: s for s in STEPS} - -# Derived constants — NEVER hand-maintain these; they reflect STEPS only. -WORKFLOW_COLUMNS: list[str] = [s.name for s in STEPS] -WORKFLOW_DATA_COLUMNS: list[str] = [s.name for s in STEPS if s.kind == "data"] -CHECKPOINT_COLUMNS: list[str] = [s.name for s in STEPS if s.kind == "checkpoint"] -JSON_VALUED_COLUMNS: set[str] = { - s.name for s in STEPS - if s.kind == "data" and s.name != "assignee" -} -AUTH_PARITY_FLAG_COLUMN: str = "requires auth parity test" -ALL_COLUMNS: list[str] = DATA_COLUMNS + WORKFLOW_COLUMNS -EXPECTED_COLUMN_COUNT: int = len(ALL_COLUMNS) - -# Steps that look like they could be markpass'd but actually need a setter. -NON_CHECKPOINT_STEPS: dict[str, str] = { - s.name: s.setter - for s in STEPS - if s.setter is not None -} - - -# --------------------------------------------------------------------------- -# Errors -# --------------------------------------------------------------------------- - -class WorkflowError(Exception): - """User-facing workflow violation. Caller prints `.message` and exits 1.""" - - def __init__(self, message: str): - super().__init__(message) - self.message = message - - -# --------------------------------------------------------------------------- -# State predicates -# --------------------------------------------------------------------------- - -def is_checked(value: str) -> bool: - """Whether a checkpoint cell value represents 'done'.""" - v = value.strip() - return v in (CHECK, "✅", "YES", NA_MARK, "N/A", "true", "True", "done", "Done", "DONE") - - -def is_done(row: dict[str, str], step: Step) -> bool: - """The unified completion predicate for any step kind.""" - val = row.get(step.name, "").strip() - if step.kind == "data": - return val != "" - if step.kind == "flag": - return val.upper() in VALID_FLAG_VALUES - if step.kind == "checkpoint": - return is_checked(val) - raise AssertionError(f"Unknown step kind: {step.kind!r}") - - -def current_step(row: dict[str, str]) -> Optional[Step]: - """First step that is not yet done; ``None`` if every step is done.""" - for step in STEPS: - if not is_done(row, step): - return step - return None - - -# Backward-compatible alias for the legacy public name. Returns the step name -# (str) rather than the Step object. -def get_current_step(row: dict[str, str]) -> Optional[str]: - """Legacy wrapper: returns the current step's name (or None).""" - s = current_step(row) - return s.name if s is not None else None - - -def get_step(name: str) -> Step: - """Look up a Step by name; raise ``WorkflowError`` if unknown.""" - step = STEP_BY_NAME.get(name) - if step is None: - raise WorkflowError( - f"Unknown step: '{name}'.\n" - f" Valid steps:\n" + "\n".join(f" {s.index:2d}. {s.name}" for s in STEPS) - ) - return step - - -def get_step_index(step_name: str) -> int: - """Legacy: return the 0-based index of a checkpoint step within - ``CHECKPOINT_COLUMNS`` (preserves old API for any external callers).""" - try: - return CHECKPOINT_COLUMNS.index(step_name) - except ValueError: - raise ValueError( - f"Unknown checkpoint step: '{step_name}'. " - f"Valid steps: {', '.join(CHECKPOINT_COLUMNS)}" - ) - - -# --------------------------------------------------------------------------- -# Cascade reset and normalization -# --------------------------------------------------------------------------- - -def reset_after(row: dict[str, str], step: Step) -> list[str]: - """Clear every step strictly after ``step``. Returns the cleared columns.""" - cleared: list[str] = [] - for s in STEPS: - if s.index > step.index: - if row.get(s.name, "") != "": - cleared.append(s.name) - row[s.name] = "" - return cleared - - -def normalize_row(row: dict[str, str]) -> list[str]: - """Auto-clear any value past the first incomplete step. - - Returns the list of column names that were cleared. The caller is - responsible for printing a stderr warning if the list is non-empty. - """ - # Walk steps in order; once we find the first incomplete one, every - # subsequent step's column must be empty. - cleared: list[str] = [] - found_incomplete = False - for step in STEPS: - if not found_incomplete: - if not is_done(row, step): - found_incomplete = True - continue - # We're past the first incomplete step; any value here is contradictory. - if row.get(step.name, "").strip() != "": - cleared.append(step.name) - row[step.name] = "" - return cleared - - -def _normalize_rows_with_warning(rows: list[dict[str, str]], context: str) -> None: - """Normalize each row in place. Print one stderr warning per modified row.""" - for row in rows: - cleared = normalize_row(row) - if cleared: - integration_id = row.get("Integration ID", "") - print( - f"WARNING: normalized {context} row '{integration_id}': " - f"cleared columns {cleared} (values were past the first incomplete step).", - file=sys.stderr, - ) - - -# --------------------------------------------------------------------------- -# CSV I/O (with normalization on read AND write) -# --------------------------------------------------------------------------- - -def load_csv() -> list[dict[str, str]]: - """Load the CSV and return list of row dicts. Normalizes on read.""" - with open(CSV_PATH, "r", encoding="utf-8") as f: - reader = csv.DictReader(f) - fieldnames = reader.fieldnames or [] - if fieldnames != ALL_COLUMNS: - missing = [c for c in ALL_COLUMNS if c not in fieldnames] - extra = [c for c in fieldnames if c not in ALL_COLUMNS] - print( - "WARNING: CSV header does not match expected schema.\n" - f" Expected {len(ALL_COLUMNS)} columns, got {len(fieldnames)}.\n" - f" Missing: {missing}\n" - f" Extra: {extra}", - file=sys.stderr, - ) - rows = list(reader) - - _normalize_rows_with_warning(rows, context="loaded") - return rows - - -def save_csv(rows: list[dict[str, str]]) -> None: - """Write rows back to CSV atomically. Normalizes on write.""" - if not rows: - return - - _normalize_rows_with_warning(rows, context="saved") - - fieldnames = list(rows[0].keys()) - - output = io.StringIO() - writer = csv.DictWriter( - output, - fieldnames=fieldnames, - quoting=csv.QUOTE_MINIMAL, - lineterminator="\n", - ) - writer.writeheader() - writer.writerows(rows) - - target_dir = os.path.dirname(CSV_PATH) or "." - tmp_path: Optional[str] = None - try: - with tempfile.NamedTemporaryFile( - mode="w", - encoding="utf-8", - dir=target_dir, - prefix=".connectus-migration-pipeline.", - suffix=".tmp", - delete=False, - ) as tmp: - tmp_path = tmp.name - tmp.write(output.getvalue()) - os.replace(tmp_path, CSV_PATH) - tmp_path = None - finally: - if tmp_path is not None and os.path.exists(tmp_path): - try: - os.remove(tmp_path) - except OSError: - pass - - -def find_row(rows: list[dict[str, str]], integration_id: str) -> Optional[int]: - """Find a row by Integration ID (case-insensitive). Returns index or None.""" - name_lower = integration_id.lower().strip() - for i, row in enumerate(rows): - if row.get("Integration ID", "").strip().lower() == name_lower: - return i - return None - - -# --------------------------------------------------------------------------- -# Auth Details schema validation -# --------------------------------------------------------------------------- - -# Regexes for the Auth Details `config` mini-grammar. -_AUTH_CONFIG_CLAUSE_RE = re.compile( - r"^\s*(REQUIRED|OPTIONAL|CHOICE)\s*\(\s*([^)]*?)\s*\)\s*$" +# Re-export everything from the package so legacy `from workflow_state +# import …` works whether the loader picked the file or the package. +from workflow_state import * # noqa: F401,F403 +# Underscore-prefixed names are NOT re-exported by `*`; pull them in +# explicitly for the few tests that import them by name. +from workflow_state import ( # noqa: F401 + _auth_other_connection_summary, + _auth_param_sources, + _can_advance_to, + _check_params_to_commands_overlap, + _example_value_for, + _format_step_for_listing, + _git_user_name, + _normalize_rows_with_warning, + _parse_next_flags, + _project_xsoar_param_to_yml_id, + _reset_config_for_testing, + _resolve_row_or_exit, + _set_json_data_step, + _set_step_via_dispatch, + _summary_value, ) -_AUTH_CONFIG_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$") -# Split clauses on `+` surrounded by optional whitespace. We use a manual -# split rather than re.split so we can detect leading/trailing `+` (which -# would produce empty segments). -_AUTH_CONFIG_SPLIT_RE = re.compile(r"\s*\+\s*") - - -def _parse_auth_config(config: str) -> tuple[list[str], list[str]]: - """Parse the Auth Details ``config`` expression mini-grammar. - - Returns ``(referenced_names, parse_errors)`` where ``referenced_names`` - is the (order-preserving, possibly duplicate) list of operand names - appearing inside any ``REQUIRED(...)``, ``OPTIONAL(...)`` or - ``CHOICE(...)`` clause, and ``parse_errors`` is a list of human-readable - issues with the expression itself (malformed clauses, bad operand - names, stray ``+``, etc.). - - Grammar (case-sensitive on keywords): - - config := "NoneRequired" | clause ( " + " clause )* - clause := ("REQUIRED" | "OPTIONAL" | "CHOICE") "(" name_list ")" - name_list := name ("," name)* - name := /[A-Za-z_][A-Za-z0-9_]*/ - - Surrounding whitespace inside clauses and around ``+`` / ``,`` is - tolerated. Empty clauses (``REQUIRED()``) are rejected. - """ - referenced_names: list[str] = [] - parse_errors: list[str] = [] - - stripped = config.strip() - if stripped == "": - parse_errors.append("config expression is empty") - return referenced_names, parse_errors - if stripped == "NoneRequired": - return referenced_names, parse_errors - - # Detect leading/trailing `+` before splitting, so the resulting empty - # segments give a clear error message. - if stripped.startswith("+"): - parse_errors.append("config expression starts with '+' (no leading clause)") - if stripped.endswith("+"): - parse_errors.append("config expression ends with '+' (no trailing clause)") - - segments = _AUTH_CONFIG_SPLIT_RE.split(stripped) - for seg_idx, segment in enumerate(segments): - if segment.strip() == "": - # Already covered by the leading/trailing checks above OR a - # genuine "+ +" in the middle. - if not (seg_idx == 0 and stripped.startswith("+")) and not ( - seg_idx == len(segments) - 1 and stripped.endswith("+") - ): - parse_errors.append("empty clause between '+' separators") - continue - m = _AUTH_CONFIG_CLAUSE_RE.match(segment) - if not m: - parse_errors.append( - f"malformed clause '{segment}' (expected " - "REQUIRED(...), OPTIONAL(...), or CHOICE(...))" - ) - continue - keyword, inner = m.group(1), m.group(2) - if inner.strip() == "": - parse_errors.append(f"clause '{keyword}(...)' has no operands") - continue - operands = [op.strip() for op in inner.split(",")] - for op in operands: - if op == "": - parse_errors.append( - f"clause '{keyword}(...)' has an empty operand " - "(stray comma?)" - ) - continue - if not _AUTH_CONFIG_NAME_RE.fullmatch(op): - parse_errors.append( - f"clause '{keyword}(...)' operand '{op}' is not a " - "valid identifier (must match [A-Za-z_][A-Za-z0-9_]*)" - ) - continue - referenced_names.append(op) - - return referenced_names, parse_errors - - -def validate_auth_detail(value: str) -> list[str]: - """Validate Auth Details JSON shape. Returns list of errors ([] = valid). - - Shape: ``{"auth_types": [{"type": , "name": , - "xsoar_params": [, ...], "interpolated"?: }, - ...], "config": , - "other_connection": [, ...]}``. - - Each ``auth_types[]`` entry describes one prospective ConnectUs - connection type that the migrated integration should expose. ``name`` - is a free-form logical id chosen for that connection type and must be - unique across entries within this row. ``xsoar_params`` is the list - of XSOAR parameter ids whose values feed the secrets for that - connection type (the same XSOAR param may appear in multiple entries - if it supplies several connection types). ``config`` references the - entry ``name``s (not the XSOAR param ids). - - ``other_connection`` is a flat sorted list of YML param ids that are - connection-adjacent but not auth secrets — e.g. ``url``, ``proxy``, - ``insecure``, ``port``, ``host``, ``region``. The list captures the - ids exactly as they appear in the integration YML's - ``configuration[].name``. An empty list ``[]`` is valid (= the - integration has no connection-adjacent params besides its auth - secrets). The validator does NOT check overlap with - ``auth_types[].xsoar_params`` — keeping the two lists disjoint is the - classifier's responsibility. - - Validation performed (in addition to the per-entry shape checks): - - - ``xsoar_params`` must be a non-empty list of non-empty strings. - - ``auth_types`` entries must be sorted by ``(type, name)`` - ascending. The first out-of-order pair is reported. - - ``config`` must conform to the mini-grammar parsed by - :func:`_parse_auth_config` (``NoneRequired`` or one or more - ``REQUIRED/OPTIONAL/CHOICE`` clauses joined by ``+``). - - Every operand name referenced by ``config`` must appear as some - ``auth_types[].name`` in the same row. - - If ``config == "NoneRequired"`` then ``auth_types`` must be - empty; otherwise ``auth_types`` must be non-empty. - - ``other_connection`` is REQUIRED on write. It must be a list of - non-empty unique strings, sorted ascending. ``[]`` is allowed. - - NOTE on backward compatibility: legacy CSV rows written before this - field existed lack ``other_connection`` entirely. The read/display - path tolerates that (see ``format_status`` / ``format_step_value``) - and renders a ``(not set — re-run set-auth)`` hint, but ``set-auth`` - writes go through this validator and MUST include the key. - """ - errors: list[str] = [] - - try: - detail = json.loads(value) - except json.JSONDecodeError as e: - return [f"Invalid JSON: {e}"] - - if not isinstance(detail, dict): - return [f"Expected a JSON object, got {type(detail).__name__}"] - - required_keys = {"auth_types", "config", "other_connection"} - missing = required_keys - set(detail.keys()) - if missing: - errors.append(f"Missing required keys: {', '.join(sorted(missing))}") - return errors - - seen_names: set[str] = set() - # Track per-entry validity for the sort check (only consider entries - # whose `type` and `name` are both well-formed). - sortable: list[tuple[int, str, str]] = [] - valid_auth_types_list = isinstance(detail["auth_types"], list) - if not valid_auth_types_list: - errors.append(f"'auth_types' must be a list, got {type(detail['auth_types']).__name__}") - else: - for i, entry in enumerate(detail["auth_types"]): - if not isinstance(entry, dict): - errors.append(f"auth_types[{i}]: expected object, got {type(entry).__name__}") - continue - entry_type_ok = False - entry_name_ok = False - if "type" not in entry: - errors.append(f"auth_types[{i}]: missing 'type'") - elif entry["type"] not in VALID_AUTH_TYPES: - errors.append(f"auth_types[{i}]: invalid type '{entry['type']}'") - else: - entry_type_ok = True - if "name" not in entry: - errors.append(f"auth_types[{i}]: missing 'name'") - elif not isinstance(entry["name"], str): - errors.append(f"auth_types[{i}]: 'name' must be a string") - elif not entry["name"]: - errors.append(f"auth_types[{i}]: 'name' must be a non-empty string") - elif entry["name"] in seen_names: - errors.append( - f"auth_types[{i}]: duplicate 'name' '{entry['name']}' " - "(each entry must have a unique logical name)" - ) - else: - seen_names.add(entry["name"]) - entry_name_ok = True - if "xsoar_params" not in entry: - errors.append(f"auth_types[{i}]: missing 'xsoar_params'") - elif not isinstance(entry["xsoar_params"], list): - errors.append( - f"auth_types[{i}]: 'xsoar_params' must be a list, " - f"got {type(entry['xsoar_params']).__name__}" - ) - elif len(entry["xsoar_params"]) == 0: - errors.append( - f"auth_types[{i}]: 'xsoar_params' must contain at least one entry" - ) - else: - for j, p in enumerate(entry["xsoar_params"]): - if not isinstance(p, str) or not p: - errors.append( - f"auth_types[{i}]: xsoar_params[{j}] must be a non-empty string" - ) - if "interpolated" in entry and not isinstance(entry["interpolated"], bool): - errors.append( - f"auth_types[{i}]: 'interpolated' must be a bool, " - f"got {type(entry['interpolated']).__name__}" - ) - - if entry_type_ok and entry_name_ok: - sortable.append((i, entry["type"], entry["name"])) - - # Sort-order check: report the first out-of-order adjacent pair - # among the entries that have valid `type` and `name`. - for k in range(len(sortable) - 1): - i_a, type_a, name_a = sortable[k] - i_b, type_b, name_b = sortable[k + 1] - if (type_a, name_a) > (type_b, name_b): - errors.append( - f"auth_types must be sorted by (type, name); entry " - f"[{i_a}] '{type_a}'/'{name_a}' should come after " - f"entry [{i_b}] '{type_b}'/'{name_b}'" - ) - break - - if not isinstance(detail["config"], str): - errors.append(f"'config' must be a string, got {type(detail['config']).__name__}") - else: - config_str = detail["config"] - referenced_names, parse_errors = _parse_auth_config(config_str) - for pe in parse_errors: - errors.append(f"'config': {pe}") - for n in referenced_names: - if n not in seen_names: - errors.append( - f"'config' references unknown connection-type name " - f"'{n}' (must match an auth_types[].name)" - ) - # Coherence between `config` and `auth_types`. - if valid_auth_types_list: - auth_types_empty = len(detail["auth_types"]) == 0 - if config_str.strip() == "NoneRequired": - if not auth_types_empty: - errors.append( - "'config' is 'NoneRequired' but 'auth_types' " - "contains entries; remove the entries or change " - "'config'" - ) - else: - # Only flag the empty-auth_types mismatch if the config - # itself parsed cleanly (otherwise the parse error is - # the more informative signal). - if not parse_errors and auth_types_empty: - errors.append( - "'config' is not 'NoneRequired' but 'auth_types' is empty" - ) - - other_connection = detail["other_connection"] - if not isinstance(other_connection, list): - errors.append( - f"'other_connection' must be a list, got " - f"{type(other_connection).__name__}" - ) - else: - all_strings = True - for j, item in enumerate(other_connection): - if not isinstance(item, str): - errors.append( - f"'other_connection'[{j}]: must be a string, got " - f"{type(item).__name__}" - ) - all_strings = False - elif not item: - errors.append( - f"'other_connection'[{j}]: must be a non-empty string" - ) - all_strings = False - if all_strings: - if len(set(other_connection)) != len(other_connection): - seen: set[str] = set() - dups: list[str] = [] - for item in other_connection: - if item in seen and item not in dups: - dups.append(item) - seen.add(item) - errors.append( - "'other_connection' contains duplicate entries: " - f"{dups}" - ) - sorted_oc = sorted(other_connection) - if other_connection != sorted_oc: - errors.append( - "'other_connection' must be sorted ascending; got " - f"{other_connection}, expected {sorted_oc}" - ) - - return errors - - -# Hint embedded in every "extra top-level key" error reported by -# :func:`validate_params_to_commands`. The one-liner is documented as -# the canonical strip recipe so the calling agent can recover from a -# polluted analyzer payload (e.g. ``check_command_params.py`` invoked -# without ``--with-diagnostics`` was the historical leak source) by -# re-piping the JSON through ``json.load`` / ``pop`` / ``json.dumps`` -# without re-running the analyzer. Kept in sync with -# ``connectus/column-schemas.md`` §Params to Commands. -_PARAMS_TO_COMMANDS_STRIP_HINT = ( - "strip it before persisting (see column-schemas.md " - "§Params to Commands). One-liner: python3 -c " - "\"import sys, json; o = json.load(sys.stdin); " - "o.pop('diagnostics', None); print(json.dumps(o))\"" -) - - -def validate_params_to_commands(value: str) -> list[str]: - """Validate Params to Commands JSON shape. Returns errors ([] = valid). - - Strict shape (per ``connectus/column-schemas.md`` §Params to Commands):: - - { - "integration": "", - "commands": { - "": ["", ...], - ... - } - } - - Validation rules: - - - The top level must be a JSON object. - - The set of top-level keys MUST equal exactly - ``{"integration", "commands"}``. Missing keys are reported. - Extra top-level keys (the historical leak: ``diagnostics``, - ``status``, ``failure_excerpt``, ``captured_requests``, - ``error``, ``stderr``, etc.) are ALL named in a single error - and the error embeds the canonical strip recipe (see - :data:`_PARAMS_TO_COMMANDS_STRIP_HINT`). - - ``integration`` must be a non-empty string. - - ``commands`` must be a dict. Each value must be a list, and - every element of every list must be a non-empty string. - - Mirrors :func:`validate_auth_detail`'s contract: returns a list of - human-readable error strings. An empty list means the payload is - valid. Multiple errors are accumulated rather than bailing on the - first — callers print all of them so the operator can fix the - payload in a single pass. - """ - errors: list[str] = [] - - try: - payload = json.loads(value) - except json.JSONDecodeError as e: - return [f"Invalid JSON: {e}"] - - if not isinstance(payload, dict): - return [f"Expected a JSON object, got {type(payload).__name__}"] - - expected_keys = {"integration", "commands"} - actual_keys = set(payload.keys()) - missing = expected_keys - actual_keys - extras = actual_keys - expected_keys - - if missing: - errors.append( - f"Missing required top-level key(s): {sorted(missing)}; " - f"payload must contain exactly {sorted(expected_keys)}." - ) - - if extras: - sorted_extras = sorted(extras) - # Call out diagnostics by name when present — it is the known - # common offender (analyzer leaked it under the old default). - if "diagnostics" in extras: - errors.append( - f"Extra top-level key 'diagnostics' is forbidden in " - f"'Params to Commands' (it is internal analyzer " - f"metadata, not pipeline data); " - f"{_PARAMS_TO_COMMANDS_STRIP_HINT}" - ) - other_extras = [k for k in sorted_extras if k != "diagnostics"] - if other_extras: - errors.append( - f"Extra top-level key(s) {other_extras} are " - f"forbidden; {_PARAMS_TO_COMMANDS_STRIP_HINT}" - ) - else: - errors.append( - f"Extra top-level key(s) {sorted_extras} are forbidden; " - f"{_PARAMS_TO_COMMANDS_STRIP_HINT}" - ) - - if "integration" in payload: - integration = payload["integration"] - if not isinstance(integration, str): - errors.append( - f"'integration' must be a string, got " - f"{type(integration).__name__}" - ) - elif integration == "": - errors.append("'integration' must be a non-empty string") - - if "commands" in payload: - commands = payload["commands"] - if not isinstance(commands, dict): - errors.append( - f"'commands' must be a JSON object, got " - f"{type(commands).__name__}" - ) - else: - for cmd, param_list in commands.items(): - if not isinstance(param_list, list): - errors.append( - f"commands[{cmd!r}]: expected a list of param " - f"ids, got {type(param_list).__name__}" - ) - continue - for i, p in enumerate(param_list): - if not isinstance(p, str): - errors.append( - f"commands[{cmd!r}][{i}]: param id must be " - f"a string, got {type(p).__name__}" - ) - continue - if p == "": - errors.append( - f"commands[{cmd!r}][{i}]: param id must be " - f"a non-empty string" - ) - - return errors - - -# --------------------------------------------------------------------------- -# Auth-derived ignore set (cross-step exclusion plumbing) -# --------------------------------------------------------------------------- - -def _project_xsoar_param_to_yml_id(xsoar_param: str) -> str: - """Project a single ``auth_types[].xsoar_params`` entry to its YML param id. - - Bare ids (``api_key``) pass through unchanged. Dotted forms like - ``credentials.identifier`` / ``credentials.password`` collapse to the - segment before the first ``.`` (``credentials``) — that's the actual - YML ``configuration[].name`` so it can be cross-checked against the - ``Params to Commands`` payload (whose values are bare YML ids). - """ - if not isinstance(xsoar_param, str): - return "" - return xsoar_param.split(".", 1)[0] - - -def _auth_param_sources(auth_detail: dict) -> dict[str, list[str]]: - """Return ``{yml_param_id: [, ...]}`` for an Auth - Details object. - - Used by :func:`auth_param_ids` and the ``set-params-to-commands`` - overlap-rejection error message — the latter needs to name *where* - each offending param was declared (``auth_types[].name='credentials' - (xsoar_params=[...])`` vs ``other_connection``). - - Tolerates legacy-shape Auth Details that lack ``other_connection`` - by simply omitting that source — see :func:`auth_param_ids` for - the user-visible error/warning behaviour. - """ - sources: dict[str, list[str]] = {} - - auth_types = auth_detail.get("auth_types") - if isinstance(auth_types, list): - for entry in auth_types: - if not isinstance(entry, dict): - continue - entry_name = entry.get("name", "") - xsoar_params = entry.get("xsoar_params") - if not isinstance(xsoar_params, list): - continue - projected_for_entry: list[str] = [] - for xp in xsoar_params: - yml_id = _project_xsoar_param_to_yml_id(xp) - if yml_id: - projected_for_entry.append(yml_id) - # Group source description by entry — every projected id - # cites the same entry-level (name, xsoar_params) pair so - # the overlap message can quote the dotted forms verbatim. - # Dedupe per-yml_id so dotted forms collapsing to the same - # bare id (credentials.identifier + credentials.password → - # credentials) don't repeat the same descriptor twice. - descriptor = ( - f"auth_types[].name={entry_name!r} " - f"(xsoar_params={list(xsoar_params)!r})" - ) - seen_for_entry: set[str] = set() - for yml_id in projected_for_entry: - if yml_id in seen_for_entry: - continue - seen_for_entry.add(yml_id) - sources.setdefault(yml_id, []).append(descriptor) - - other_connection = auth_detail.get("other_connection") - if isinstance(other_connection, list): - for item in other_connection: - if isinstance(item, str) and item: - sources.setdefault(item, []).append("other_connection") - - return sources - - -def auth_param_ids(integration_id: str) -> list[str]: - """Return the union of YML param ids declared in an integration's - ``Auth Details``. - - Returns the deduplicated, ascending-sorted list of bare YML - ``configuration[].name`` values composed from: - - * Every ``auth_types[].xsoar_params`` entry, projected via - :func:`_project_xsoar_param_to_yml_id` (bare ids pass through; - dotted forms like ``credentials.identifier`` collapse to - ``credentials``). - * Every entry in ``other_connection`` (already bare YML ids — no - projection needed). - - Behaviour for edge cases: - - * Integration not in the CSV → :class:`WorkflowError`. - * ``Auth Details`` cell empty (the workflow prerequisite for - populating ``Params to Commands``) → :class:`WorkflowError` with - a clear "set auth first" message. - * ``Auth Details`` JSON unparseable → :class:`WorkflowError`. - * Legacy ``Auth Details`` row that lacks the ``other_connection`` - key entirely → degrade gracefully: log a one-line stderr hint - and return only the auth_types-derived ids. The downstream - analyzer / set-params-to-commands callers must keep working on - these legacy rows; surfacing this as a hard error would block - the existing CSV from loading. - """ - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - raise WorkflowError( - f"Integration '{integration_id}' not found in the CSV." - ) - - raw = rows[idx].get("Auth Details", "").strip() - if not raw: - raise WorkflowError( - f"'Auth Details' is not set for integration " - f"'{rows[idx].get('Integration ID', integration_id)}'. " - f"Run 'set-auth' first — populating 'Params to Commands' " - f"requires the auth classification to be in place so the " - f"two columns stay disjoint." - ) - - try: - parsed = json.loads(raw) - except json.JSONDecodeError as e: - raise WorkflowError( - f"'Auth Details' for integration '{integration_id}' is not " - f"valid JSON: {e}. Re-run 'set-auth' with a corrected payload." - ) - if not isinstance(parsed, dict): - raise WorkflowError( - f"'Auth Details' for integration '{integration_id}' is not a " - f"JSON object (got {type(parsed).__name__}). Re-run 'set-auth'." - ) - - if "other_connection" not in parsed: - # Legacy-shape row from before the field existed. Don't crash - # — the helper is consumed by tools that must keep working on - # historical rows. Surface a stderr hint so the next set-auth - # run gets it right. - print( - f"WARNING: Auth Details for '{integration_id}' is missing " - f"'other_connection' (legacy shape). Re-run 'set-auth' to " - f"populate it; auth_param_ids() returning only the " - f"auth_types-derived ids in the meantime.", - file=sys.stderr, - ) - - sources = _auth_param_sources(parsed) - return sorted(sources.keys()) - - -# --------------------------------------------------------------------------- -# Unified dispatch — the heart of the cascade-reset rule -# --------------------------------------------------------------------------- - -def _can_advance_to(row: dict[str, str], target: Step) -> tuple[bool, str]: - """True iff every step strictly before ``target`` is done.""" - for s in STEPS: - if s.index >= target.index: - break - if not is_done(row, s): - verb = s.setter if s.setter else "markpass" - return False, ( - f"Cannot advance to '{target.name}' (step {target.index}/16) yet — " - f"prior step #{s.index} '{s.name}' is not done.\n" - f" Run: workflow_state.py {verb} " - + ("" if s.setter else f'"{s.name}"') - ) - return True, "" - - -def apply_step_action( - row: dict[str, str], - target: Step, - new_value: str, - *, - verb: str, -) -> tuple[list[str], bool]: - """Apply a step action with cascade-reset semantics. - - Returns ``(cleared_columns, was_no_op)``. - - Behavior: - - If ``target`` is AHEAD of the current step: raise ``WorkflowError``. - - If ``target`` is AT current step: write the value (no clearing — there - was nothing past current that wasn't already empty). - - If ``target`` is BEHIND current (or is the same as current and was - already done): write the new value AND ``reset_after(target)``. - - Special case for the flag step #12: setting the same value is a no-op - (no reset). - - NOTE: This function does NOT enforce the special ``set-assignee`` carve-out. - The caller (``cmd_set_assignee``) bypasses this dispatch and writes - the assignee directly, on purpose. See override #5 in the design overrides. - """ - cur = current_step(row) - cur_idx = cur.index if cur is not None else len(STEPS) + 1 - - # AHEAD of current — reject. - if cur is not None and target.index > cur_idx: - raise WorkflowError( - f"Cannot {verb} '{target.name}' (step {target.index}/16) yet — " - f"current step is #{cur.index} '{cur.name}'.\n" - f" Complete it first via " - f"'{cur.setter or 'markpass'}'." - ) - - # Flag-step idempotency: same value, no reset. - if target.kind == "flag": - existing = row.get(target.name, "").strip().upper() - if existing == new_value.strip().upper() and existing in VALID_FLAG_VALUES: - return [], True - - # AT or BEHIND current. Write then cascade-reset. - row[target.name] = new_value - cleared = reset_after(row, target) - return cleared, False - - -# --------------------------------------------------------------------------- -# Display helpers -# --------------------------------------------------------------------------- - -def _summary_value(step: Step, raw: str) -> str: - """Short inline display for status output.""" - val = raw.strip() - if not val: - if step.kind == "checkpoint": - return "⬜" - return "(not set)" - if step.kind == "data" and step.name in JSON_VALUED_COLUMNS: - # Long JSON values get summarized. - if len(val) > 60: - return f"{val[:57]}… (set; show-step for full)" - return val - return val - - -def _auth_other_connection_summary(raw: str) -> str: - """Return a one-line ``other_connection`` summary for an Auth Details - JSON blob. Tolerates legacy rows that predate the field by returning - a clear ``(not set — re-run set-auth)`` hint instead of crashing.""" - val = raw.strip() - if not val: - return "(not set)" - try: - parsed = json.loads(val) - except json.JSONDecodeError: - return "(invalid JSON — cannot extract other_connection)" - if not isinstance(parsed, dict): - return "(invalid Auth Details object)" - if "other_connection" not in parsed: - return "(not set — re-run set-auth)" - oc = parsed["other_connection"] - if not isinstance(oc, list): - return f"(malformed: expected list, got {type(oc).__name__})" - if not oc: - return "[] (none)" - return json.dumps(oc) - - -def format_status(row: dict[str, str]) -> str: - """Format the workflow status of a single integration.""" - integration_id = row.get("Integration ID", "") - cur = current_step(row) - done_count = sum(1 for s in STEPS if is_done(row, s)) - - lines = [ - f"\n{'=' * 60}", - f" {integration_id}", - f"{'=' * 60}", - ] - - file_path = row.get("Integration File Path", "").strip() - connector_id = row.get("Connector ID", "").strip() - assignee = row.get("assignee", "").strip() - - lines.append(f" Assignee: {assignee if assignee else '(unassigned)'}") - lines.append(f" File Path: {file_path if file_path else '(not set)'}") - if file_path: - lines.append(f" (run 'workflow_state.py files {integration_id}' to list all source files)") - lines.append(f" Connector ID: {connector_id if connector_id else '(not set)'}") - lines.append("") - - lines.append(f" Workflow ([{done_count}/16]):") - lines.append(" " + "-" * 40) - for step in STEPS: - marker = " " - if cur is not None and step.index == cur.index: - marker = "▶" - raw = row.get(step.name, "") - display = _summary_value(step, raw) - lines.append(f" {marker}{step.index:2d}. {step.name:38s} : {display}") - # Surface other_connection inline for Auth Details (legacy-tolerant). - if step.name == "Auth Details" and raw.strip(): - oc_summary = _auth_other_connection_summary(raw) - lines.append(f" {'other_connection':38s} : {oc_summary}") - - lines.append("") - if cur is None: - if has_workflow_progress(row): - lines.append(" 🎉 All 16 steps complete!") - else: - lines.append(" ⏳ Not started") - else: - verb = cur.setter or "markpass" - lines.append(f" ➡️ Current step: #{cur.index} {cur.name} (run: {verb})") - - return "\n".join(lines) - - -def format_dashboard_row(row: dict[str, str]) -> Optional[str]: - """Compact dashboard line. Returns None for not-started rows.""" - if not has_workflow_progress(row): - return None - - integration_id = row.get("Integration ID", "") - cur = current_step(row) - done_count = sum(1 for s in STEPS if is_done(row, s)) - total = len(STEPS) - - bar = "".join("█" if is_done(row, s) else "░" for s in STEPS) - status = cur.name if cur is not None else "✅ DONE" - return f" {integration_id:45s} [{bar}] {done_count}/{total} → {status}" - - -def format_step_value(row: dict[str, str], step_name: str) -> str: - """Pretty-print the value at ``step_name`` for ``row``.""" - name = row.get("Integration ID", "") - raw = row.get(step_name, "") - value = raw.strip() - - header = ( - f"\n{'=' * 60}\n" - f" {name} — {step_name}\n" - f"{'=' * 60}" - ) - - if not value: - return f"{header}\n (not set)" - - if step_name in JSON_VALUED_COLUMNS: - try: - parsed = json.loads(value) - pretty = json.dumps(parsed, indent=2, sort_keys=False) - # Legacy-row tolerance: pre-other_connection Auth Details rows - # don't include the new key. Don't crash; surface the gap so - # the user knows to re-run set-auth. - if ( - step_name == "Auth Details" - and isinstance(parsed, dict) - and "other_connection" not in parsed - ): - pretty += ( - "\n\n other_connection: (not set — re-run set-auth)" - ) - return f"{header}\n{pretty}" - except json.JSONDecodeError: - return f"{header}\n {value}" - - return f"{header}\n {value}" - - -# --------------------------------------------------------------------------- -# Helpers used by multiple commands -# --------------------------------------------------------------------------- - -def has_workflow_progress(row: dict[str, str]) -> bool: - """Return True if the row has any non-trivial workflow progress. - - Being merely assigned does NOT count as progress. - """ - return any( - row.get(s.name, "").strip() - for s in STEPS - if s.name != "assignee" - ) - - -def list_by_assignee(rows: list[dict[str, str]], assignee_name: str) -> list[dict[str, str]]: - """Filter rows to those whose assignee matches (case-insensitive).""" - target = assignee_name.strip().lower() - return [row for row in rows if row.get("assignee", "").strip().lower() == target] - - -def list_by_connector(rows: list[dict[str, str]], connector_id: str) -> list[dict[str, str]]: - """Filter rows to those whose Connector ID matches (case-insensitive, trimmed).""" - target = connector_id.strip().lower() - return [ - row for row in rows - if row.get("Connector ID", "").strip().lower() == target - ] - - -def format_by_assignee(rows: list[dict[str, str]], assignee_name: str) -> str: - """Format a list of integrations belonging to an assignee.""" - if not rows: - return f"No integrations found for assignee '{assignee_name}'." - - lines = [f"\nIntegrations assigned to '{assignee_name}' ({len(rows)}):"] - for row in rows: - name = row.get("Integration ID", "") - if not has_workflow_progress(row): - step_display = "not started" - else: - cur = current_step(row) - step_display = cur.name if cur is not None else "✅ DONE" - lines.append(f" - {name:45s} → {step_display}") - return "\n".join(lines) - - -def _git_user_name() -> Optional[str]: - """Return ``git config user.name`` or None if unavailable.""" - try: - out = subprocess.run( - ["git", "config", "user.name"], - capture_output=True, text=True, check=False, timeout=5, - ) - name = out.stdout.strip() - return name or None - except (FileNotFoundError, subprocess.SubprocessError): - return None - - -# --------------------------------------------------------------------------- -# Backward-compat shims (for old call sites and tests) -# --------------------------------------------------------------------------- - -def reset_from_step(row: dict[str, str], step_name: str) -> None: - """Legacy API: clear ``step_name`` and every later step. - - Equivalent to ``fail/reset-to step_name`` in the unified model. - """ - step = STEP_BY_NAME.get(step_name) - if step is None: - raise ValueError( - f"Unknown step: '{step_name}'. " - f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}" - ) - # Clear the named step and everything after. - prev_index = step.index - 1 - if prev_index < 1: - # Clear all workflow columns - for s in STEPS: - row[s.name] = "" - return - prev = STEP_BY_INDEX[prev_index] - row[step.name] = "" - reset_after(row, prev) - # Note: reset_after(prev) clears everything strictly after prev, which is - # step.index and onward. We've also explicitly cleared step.name above - # in case it was already cleared but we want to be belt-and-suspenders. - - -def markpass_step(row: dict[str, str], step_name: str) -> str: - """Legacy API: mark a checkpoint step as passed. Returns a status message.""" - integration_id = row.get("Integration ID", "") - - if step_name in NON_CHECKPOINT_STEPS: - correct_cmd = NON_CHECKPOINT_STEPS[step_name] - return ( - f"ERROR: '{step_name}' is not a pass/fail checkpoint.\n" - f" Use '{correct_cmd}' instead.\n" - f" Example: workflow_state.py {correct_cmd} " - f"\"{integration_id}\" " - ) - - step = STEP_BY_NAME.get(step_name) - if step is None: - raise ValueError( - f"Unknown checkpoint step: '{step_name}'. " - f"Valid steps: {', '.join(CHECKPOINT_COLUMNS)}" - ) - - # Already done? - if is_done(row, step): - return f"'{step_name}' is already marked as passed for '{integration_id}'." - - # Special handling for the flag-gated #13 auth parity test. - if step_name == "auth parity test passes": - flag = row.get(AUTH_PARITY_FLAG_COLUMN, "").strip().upper() - if flag in ("NO", "N/A"): - row[step_name] = NA_MARK - return f"'{step_name}' set to N/A (auth parity test not required)." - if flag == "": - return ( - f"ERROR: Cannot mark '{step_name}' as passed — " - f"'requires auth parity test' flag is not set.\n" - f" Use 'set-auth-flag' first.\n" - f" Example: workflow_state.py set-auth-flag " - f"\"{integration_id}\" YES" - ) - - # Verify all prior steps are done (including data steps now). - ok, reason = _can_advance_to(row, step) - if not ok: - cur = current_step(row) - cur_name = cur.name if cur else "(none)" - return ( - f"ERROR: Cannot mark '{step_name}' as passed — " - f"you are not up to that step yet.\n" - f" Current step: '{cur_name}'\n" - f" {reason}" - ) - - row[step.name] = CHECK - return f"✅ '{step_name}' marked as passed for '{integration_id}'." - - -# --------------------------------------------------------------------------- -# CLI commands -# --------------------------------------------------------------------------- - -def cmd_status(args: list[str]) -> None: - if not args: - print("Usage: workflow_state.py status [id2 ...]") - sys.exit(1) - - rows = load_csv() - for name in args: - idx = find_row(rows, name) - if idx is None: - print(f"ERROR: Integration '{name}' not found.") - continue - print(format_status(rows[idx])) - - -def cmd_status_all(_args: list[str]) -> None: - rows = load_csv() - found = False - for row in rows: - if has_workflow_progress(row): - print(format_status(row)) - found = True - if not found: - print("No integrations have workflow progress yet.") - - -def cmd_dashboard(_args: list[str]) -> None: - rows = load_csv() - print(f"\n{'=' * 80}") - print(" WORKFLOW DASHBOARD") - print(f"{'=' * 80}") - print(f" {'Integration ID':45s} {'Progress':18s} → Current Step") - print(f" {'-' * 75}") - - in_progress = 0 - completed = 0 - not_started = 0 - - for row in rows: - line = format_dashboard_row(row) - if line: - print(line) - if current_step(row) is not None: - in_progress += 1 - else: - completed += 1 - else: - not_started += 1 - - print(f"\n Summary: {completed} complete, {in_progress} in progress, " - f"{not_started} not started") - - -def _set_step_via_dispatch( - row: dict[str, str], - target: Step, - new_value: str, - verb: str, -) -> str: - """Apply step action and return a user-facing message.""" - integration_id = row.get("Integration ID", "") - cleared, no_op = apply_step_action(row, target, new_value, verb=verb) - if no_op: - return f"'{target.name}' already set to '{new_value}' for '{integration_id}'. No change." - msg = f"Set '{target.name}' (step {target.index}/16) for '{integration_id}'." - if cleared: - msg += f"\n Cleared {len(cleared)} subsequent step(s): {cleared}" - return msg - - -def _resolve_row_or_exit(rows: list[dict[str, str]], name: str) -> int: - idx = find_row(rows, name) - if idx is None: - print(f"ERROR: Integration '{name}' not found.") - sys.exit(1) - return idx - - -def _set_json_data_step(args: list[str], step_name: str, setter_cmd: str) -> None: - """Shared CLI handler for set-auth / set-params-* / set-shared-params.""" - if len(args) < 2: - print(f"Usage: workflow_state.py {setter_cmd} ''") - print(f" The value must be valid JSON (see connectus/column-schemas.md).") - sys.exit(1) - - name = args[0] - raw = " ".join(args[1:]) - - # JSON validation - try: - json.loads(raw) - except json.JSONDecodeError as e: - print(f"ERROR: '{step_name}' must be valid JSON.") - print(f" Got: {raw}") - print(f" Parse error: {e}") - print(f" Example: workflow_state.py {setter_cmd} \"{name}\" '{{}}'") - sys.exit(1) - - # set-auth has a richer schema check on top. - if step_name == "Auth Details": - schema_errors = validate_auth_detail(raw) - if schema_errors: - print("ERROR: Auth Details does not match the required schema.") - for err in schema_errors: - print(f" - {err}") - sys.exit(1) - # Defense-in-depth: catch a polluted "Params to Commands" payload - # even if the caller bypassed cmd_set_params_to_commands and - # invoked _set_json_data_step directly. - elif step_name == "Params to Commands": - schema_errors = validate_params_to_commands(raw) - if schema_errors: - print("ERROR: Params to Commands does not match the required schema.") - for err in schema_errors: - print(f" - {err}") - sys.exit(1) - - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - target = STEP_BY_NAME[step_name] - - try: - msg = _set_step_via_dispatch(rows[idx], target, raw, verb=setter_cmd) - except WorkflowError as e: - print(f"ERROR: {e.message}") - sys.exit(1) - - save_csv(rows) - print(msg) - cur = current_step(rows[idx]) - if cur is not None: - print(f" Current step: #{cur.index} {cur.name}") - elif has_workflow_progress(rows[idx]): - print(" 🎉 All 16 steps complete!") - - -def cmd_set_auth(args: list[str]) -> None: - _set_json_data_step(args, "Auth Details", "set-auth") - - -def _check_params_to_commands_overlap( - integration_id: str, payload: dict -) -> None: - """Reject ``set-params-to-commands`` payloads that overlap with auth. - - The workflow tool is the single source of truth for the per-integration - "auth ignore set" — :func:`auth_param_ids` is consulted (which reads - the same ``Auth Details`` cell that ``set-auth`` populated). If ANY - ``(command, param_id)`` in the payload references a param that is - already declared in ``Auth Details`` (either as a projected - ``auth_types[].xsoar_params`` entry or in ``other_connection``), - raise :class:`WorkflowError` with: - - * every offending pair, AND - * for each offending param, the precise auth-detail source it came - from (so the agent can decide whether to strip the param from the - per-command payload OR revert to ``set-auth`` and remove it from - ``Auth Details``). - - The caller is :func:`cmd_set_params_to_commands`; ``Auth Details`` - being unset is an upstream prerequisite enforced by - :func:`auth_param_ids` (raises a clearer "set auth first" error). - """ - # Re-load the Auth Details JSON once so we can attribute the source - # of each offending param (auth_types vs other_connection). - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - # Defensive — caller already resolved the row, but the helper - # can be invoked outside that context too. - raise WorkflowError( - f"Integration '{integration_id}' not found in the CSV." - ) - raw_auth = rows[idx].get("Auth Details", "").strip() - auth_detail: dict = {} - if raw_auth: - try: - parsed = json.loads(raw_auth) - if isinstance(parsed, dict): - auth_detail = parsed - except json.JSONDecodeError: - pass - sources = _auth_param_sources(auth_detail) if auth_detail else {} - - # The helper raises when Auth Details is unset; let that propagate. - auth_ids = set(auth_param_ids(integration_id)) - - commands_block = payload.get("commands") if isinstance(payload, dict) else None - if not isinstance(commands_block, dict): - # Shape mismatch is not THIS check's concern; let the - # downstream consumer (or future schema validator) surface it. - return - - offenders: list[tuple[str, str]] = [] - for cmd, param_list in commands_block.items(): - if not isinstance(param_list, list): - continue - for p in param_list: - if isinstance(p, str) and p in auth_ids: - offenders.append((str(cmd), p)) - - if not offenders: - return - - # Build a deterministic, human-readable error. - lines = [ - f"'Params to Commands' for '{integration_id}' contains " - f"{len(offenders)} param(s) that are already declared in " - f"'Auth Details'. The two columns MUST be disjoint.", - "", - "Offending (command, param) pairs:", - ] - for cmd, p in sorted(offenders): - lines.append(f" - ({cmd!r}, {p!r})") - - # One source line per distinct offending param. - lines.append("") - lines.append("Source of each offending param in 'Auth Details':") - seen_params: set[str] = set() - for _cmd, p in sorted(offenders): - if p in seen_params: - continue - seen_params.add(p) - srcs = sources.get(p) - if srcs: - for src in srcs: - lines.append(f" - param {p!r} overlaps with {src}") - else: - # Defensive — overlap was reported but source attribution - # missed it (e.g. legacy row without other_connection). - lines.append( - f" - param {p!r} overlaps with Auth Details " - f"(source not attributable; legacy row?)" - ) - - lines.extend([ - "", - "Fix:", - f" Re-derive the per-command lists with the auth-aware ignore " - f"set — run:", - f" python3 connectus/workflow_state.py auth-params " - f"\"{integration_id}\"", - f" to see exactly what to exclude. The analyzer can pull this " - f"list automatically: pass --integration-id " - f"\"{integration_id}\" to " - f"connectus/check_command_params.py.", - "", - f" If a listed param is *truly* used per-command and was " - f"misclassified into 'Auth Details', revert to Step 1 with " - f"'set-auth' and remove it from 'auth_types[].xsoar_params' " - f"or 'other_connection' first. Do NOT bypass this rejection " - f"by hand-stripping just to make the call go through.", - ]) - - raise WorkflowError("\n".join(lines)) - - -def cmd_set_params_to_commands(args: list[str]) -> None: - # Two pre-flight checks ahead of the cascade-write so a bad payload - # can never partially mutate the row: - # - # (1) STRICT SCHEMA: top-level keys MUST equal exactly - # {"integration", "commands"}; the historical leak was the - # analyzer emitting a top-level "diagnostics" key that the - # agent piped verbatim. Reported FIRST because shape errors - # are the more common mistake and the overlap check is a - # deeper semantic check that only makes sense once the - # payload shape is valid. - # - # (2) OVERLAP: reject payloads whose per-command param lists - # overlap with the integration's auth-derived ignore set. - # Auth Details being unset is already an upstream - # prerequisite (apply_step_action would reject the call - # ahead-of-current); auth_param_ids() re-asserts it with a - # more specific error if we reach the overlap check first. - if len(args) >= 2: - name = args[0] - raw = " ".join(args[1:]) - # (1) Strict schema check — done up-front and on the raw text - # so we report extra/missing top-level keys (esp. the leaked - # "diagnostics" key) before any other check looks at the body. - schema_errors = validate_params_to_commands(raw) - if schema_errors: - print("ERROR: Params to Commands does not match the required schema.") - for err in schema_errors: - print(f" - {err}") - sys.exit(1) - # (2) Overlap check — only meaningful once the payload shape is - # valid (validator above guaranteed parseability + dict shape). - payload = json.loads(raw) - if isinstance(payload, dict): - try: - _check_params_to_commands_overlap(name, payload) - except WorkflowError as e: - print(f"ERROR: {e.message}") - sys.exit(1) - _set_json_data_step(args, "Params to Commands", "set-params-to-commands") - - -def cmd_set_params_for_test(args: list[str]) -> None: - _set_json_data_step(args, "Params for test with default in code", "set-params-for-test") - - -def cmd_set_shared_params(args: list[str]) -> None: - _set_json_data_step(args, "Params same in other handlers", "set-shared-params") - - -def cmd_set_assignee(args: list[str]) -> None: - """Set the assignee for an integration. - - SPECIAL CARVE-OUT (override #5 of the design): set-assignee is the ONLY - setter that does NOT trigger ``reset_after``. Re-assigning an integration - is administrative — it must not wipe migration progress. - """ - if len(args) < 2: - print("Usage: workflow_state.py set-assignee ") - sys.exit(1) - - name = args[0] - assignee = " ".join(args[1:]) - - if not assignee.strip(): - print("ERROR: Assignee cannot be empty.") - sys.exit(1) - - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - - # Direct write — no apply_step_action, no cascade reset. - rows[idx]["assignee"] = assignee - save_csv(rows) - print(f"Set assignee for '{rows[idx]['Integration ID']}' to: {assignee}") - cur = current_step(rows[idx]) - if cur is not None: - print(f" Current step: #{cur.index} {cur.name}") - - -def cmd_set_auth_flag(args: list[str]) -> None: - """Set the 'requires auth parity test' flag (step #12). - - When the new value is NO/N/A, also write 'N/A' into step #13 so the - user is auto-advanced past it (per design §1.4). - """ - if len(args) < 2: - print("Usage: workflow_state.py set-auth-flag ") - sys.exit(1) - - name = args[0] - flag = args[1].upper().strip() - - if flag not in VALID_FLAG_VALUES: - print(f"ERROR: Flag must be YES, NO, or N/A. Got: '{args[1]}'") - sys.exit(1) - - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - target = STEP_BY_NAME[AUTH_PARITY_FLAG_COLUMN] - - try: - cleared, no_op = apply_step_action(rows[idx], target, flag, verb="set-auth-flag") - except WorkflowError as e: - print(f"ERROR: {e.message}") - sys.exit(1) - - # After the cascade reset, write step #13 if NO/N/A. - auth_parity_step = STEP_BY_NAME["auth parity test passes"] - if flag in ("NO", "N/A"): - rows[idx][auth_parity_step.name] = NA_MARK - - save_csv(rows) - - if no_op: - print(f"'{AUTH_PARITY_FLAG_COLUMN}' already set to '{flag}' " - f"for '{rows[idx]['Integration ID']}'. No change.") - else: - print(f"Set '{AUTH_PARITY_FLAG_COLUMN}' = {flag} " - f"for '{rows[idx]['Integration ID']}'.") - if cleared: - print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") - if flag in ("NO", "N/A"): - print(f" Auto-set 'auth parity test passes' = N/A.") - - cur = current_step(rows[idx]) - if cur is not None: - print(f" Current step: #{cur.index} {cur.name}") - elif has_workflow_progress(rows[idx]): - print(" 🎉 All 16 steps complete!") - - -def cmd_markpass(args: list[str]) -> None: - if len(args) < 2: - print("Usage: workflow_state.py markpass ") - print("\nCheckpoint steps (in order):") - for s in STEPS: - if s.kind == "checkpoint": - print(f" {s.index:2d}. {s.name}") - print("\nNon-checkpoint columns (use a different command):") - for step_name, cmd in NON_CHECKPOINT_STEPS.items(): - print(f" - '{step_name}' → use '{cmd}'") - sys.exit(1) - - name = args[0] - step_name = " ".join(args[1:]) - - # Reject non-checkpoint steps with corrective guidance. - if step_name in NON_CHECKPOINT_STEPS: - correct = NON_CHECKPOINT_STEPS[step_name] - print( - f"ERROR: '{step_name}' is not a pass/fail checkpoint.\n" - f" Use '{correct}' instead.\n" - f" Example: workflow_state.py {correct} \"{name}\" " - ) - sys.exit(1) - - target = STEP_BY_NAME.get(step_name) - if target is None: - print(f"ERROR: Unknown step '{step_name}'.") - print(f"Valid checkpoint steps: {', '.join(CHECKPOINT_COLUMNS)}") - sys.exit(1) - - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - row = rows[idx] - - # Special prerequisites for #13. - if step_name == "auth parity test passes": - flag = row.get(AUTH_PARITY_FLAG_COLUMN, "").strip().upper() - if flag == "": - print( - f"ERROR: Cannot mark '{step_name}' as passed — " - f"'requires auth parity test' flag is not set.\n" - f" Use 'set-auth-flag' first.\n" - f" Example: workflow_state.py set-auth-flag " - f"\"{row['Integration ID']}\" YES" - ) - sys.exit(1) - if flag in ("NO", "N/A"): - # Already auto-N/A'd; treat as already done. - row[step_name] = NA_MARK - save_csv(rows) - print(f"'{step_name}' set to N/A (auth parity test not required).") - return - - # Already done — re-pass means cascade-reset behind current. - try: - cleared, no_op = apply_step_action(row, target, CHECK, verb="markpass") - except WorkflowError as e: - print(f"ERROR: {e.message}") - sys.exit(1) - - save_csv(rows) - if no_op: - print(f"'{step_name}' already passed. No change.") - else: - print(f"✅ '{step_name}' (step {target.index}/16) marked as passed " - f"for '{row['Integration ID']}'.") - if cleared: - print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") - - cur = current_step(row) - if cur is not None: - print(f" Next step: #{cur.index} {cur.name}") - elif has_workflow_progress(row): - print(" 🎉 All 16 steps complete!") - - -def cmd_skip(args: list[str]) -> None: - """Mark an OPTIONAL step as skipped (writes 'N/A' into the column).""" - if len(args) < 2: - print("Usage: workflow_state.py skip ") - print("Skippable (optional) steps:") - for s in STEPS: - if s.optional: - print(f" {s.index:2d}. {s.name}") - sys.exit(1) - - name = args[0] - step_name = " ".join(args[1:]) - - target = STEP_BY_NAME.get(step_name) - if target is None: - print(f"ERROR: Unknown step '{step_name}'.") - sys.exit(1) - - if not target.optional: - print(f"ERROR: step '{step_name}' is not optional and cannot be skipped.") - sys.exit(1) - - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - row = rows[idx] - - try: - cleared, _no_op = apply_step_action(row, target, NA_MARK, verb="skip") - except WorkflowError as e: - print(f"ERROR: {e.message}") - sys.exit(1) - - save_csv(rows) - print(f"✓ Skipped step {target.index} ('{target.name}') for '{row['Integration ID']}'.") - if cleared: - print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") - cur = current_step(row) - if cur is not None: - print(f" Next step: #{cur.index} {cur.name}") - - -def _do_reset_to(rows: list[dict[str, str]], idx: int, step_name: str, verb: str) -> None: - """Shared implementation for ``fail`` and ``reset-to``: clear step + after.""" - target = STEP_BY_NAME.get(step_name) - if target is None: - print(f"ERROR: Unknown step '{step_name}'.") - print(f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}") - sys.exit(1) - - row = rows[idx] - integration_id = row.get("Integration ID", "") - - # Clear named step plus everything after (i.e. reset_after(prev)). - if target.index == 1: - for s in STEPS: - row[s.name] = "" - else: - prev = STEP_BY_INDEX[target.index - 1] - row[target.name] = "" - reset_after(row, prev) - - save_csv(rows) - print(f"{verb}: cleared step {target.index} ('{target.name}') and all " - f"subsequent steps for '{integration_id}'.") - cur = current_step(row) - if cur is not None: - print(f" Current step is now: #{cur.index} {cur.name}") - - -def cmd_fail(args: list[str]) -> None: - if len(args) < 2: - print("Usage: workflow_state.py fail ") - print(f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}") - sys.exit(1) - name = args[0] - step_name = " ".join(args[1:]) - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - _do_reset_to(rows, idx, step_name, verb="Reset (fail)") - - -def cmd_reset_to(args: list[str]) -> None: - if len(args) < 2: - print("Usage: workflow_state.py reset-to ") - print(f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}") - sys.exit(1) - name = args[0] - step_name = " ".join(args[1:]) - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - _do_reset_to(rows, idx, step_name, verb="Reset-to") - - -def cmd_reset(args: list[str]) -> None: - """Clear all 16 workflow columns. Identity columns and assignee preserved. - - Per override #10: ``reset`` clears ALL workflow columns (assignee - included), per the existing behavior. - """ - if not args: - print("Usage: workflow_state.py reset ") - sys.exit(1) - - name = args[0] - rows = load_csv() - idx = _resolve_row_or_exit(rows, name) - - for col in WORKFLOW_COLUMNS: - rows[idx][col] = "" - - save_csv(rows) - print(f"Reset all workflow columns for '{rows[idx]['Integration ID']}'.") - - -def cmd_at_step(args: list[str]) -> None: - """List all integrations currently at a specific step.""" - if not args: - print("Usage: workflow_state.py at-step ") - print(f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}") - sys.exit(1) - - step_name = " ".join(args) - if step_name not in STEP_BY_NAME: - print(f"ERROR: Unknown step '{step_name}'.") - print(f"Valid steps: {', '.join(WORKFLOW_COLUMNS)}") - sys.exit(1) - - rows = load_csv() - matches = [ - row["Integration ID"] - for row in rows - if (cur := current_step(row)) is not None and cur.name == step_name - ] - - if matches: - print(f"\nIntegrations currently at step '{step_name}' ({len(matches)}):") - for name in matches: - print(f" - {name}") - else: - print(f"No integrations are currently at step '{step_name}'.") - - -def cmd_list(_args: list[str]) -> None: - rows = load_csv() - for row in rows: - print(row.get("Integration ID", "")) - - -def cmd_list_by_assignee(args: list[str]) -> None: - if not args: - print("Usage: workflow_state.py list-by-assignee ") - sys.exit(1) - assignee_name = " ".join(args) - rows = load_csv() - matches = list_by_assignee(rows, assignee_name) - print(format_by_assignee(matches, assignee_name)) - - -def _format_step_for_listing(row: dict[str, str]) -> str: - """Return the user-facing step display: 'not started' / step name / '✅ DONE'.""" - if not has_workflow_progress(row): - return "not started" - cur = current_step(row) - return cur.name if cur is not None else "✅ DONE" - - -def cmd_list_by_connector(args: list[str]) -> None: - """List every integration whose Connector ID matches (case-insensitive).""" - if not args: - print("Usage: workflow_state.py list-by-connector ") - sys.exit(1) - - connector_id = " ".join(args) - rows = load_csv() - matches = list_by_connector(rows, connector_id) - - if not matches: - print(f"No integrations found for connector '{connector_id}'.") - print(" Tip: run 'workflow_state.py list-connectors' to see all known Connector IDs.") - return - - print(f"\nIntegrations in connector '{connector_id}' ({len(matches)}):") - for row in matches: - integration_id = row.get("Integration ID", "") - assignee = row.get("assignee", "").strip() or "unassigned" - step_display = _format_step_for_listing(row) - print(f" - {integration_id} [assignee: {assignee}] → {step_display}") - - -def cmd_list_connectors(_args: list[str]) -> None: - """Print every distinct non-empty Connector ID with counts.""" - rows = load_csv() - - # Group by connector id (preserving the first-seen original casing for display). - buckets: dict[str, dict] = {} - for row in rows: - cid_raw = row.get("Connector ID", "").strip() - if not cid_raw: - continue - key = cid_raw.lower() - bucket = buckets.setdefault( - key, - {"display": cid_raw, "rows": []}, - ) - bucket["rows"].append(row) - - if not buckets: - print("No connectors found in the CSV.") - return - - # Sort by display name (case-insensitive). - sorted_keys = sorted(buckets.keys(), key=lambda k: buckets[k]["display"].lower()) - - # Compute column width for the connector id. - max_id_len = max(len(buckets[k]["display"]) for k in sorted_keys) - id_col_width = max(max_id_len, len("Connector ID")) - - header = ( - f"{'Connector ID':<{id_col_width}} {'Integrations':>12} " - f"{'In Progress':>11} {'Complete':>8}" - ) - rule = ( - f"{'-' * id_col_width} {'-' * 12} {'-' * 11} {'-' * 8}" - ) - print(header) - print(rule) - for key in sorted_keys: - bucket = buckets[key] - bucket_rows: list[dict[str, str]] = bucket["rows"] - total = len(bucket_rows) - in_progress = 0 - complete = 0 - for r in bucket_rows: - if not has_workflow_progress(r): - continue - if current_step(r) is None: - complete += 1 - else: - in_progress += 1 - print( - f"{bucket['display']:<{id_col_width}} {total:>12} " - f"{in_progress:>11} {complete:>8}" - ) - - -def cmd_set_assignee_by_connector(args: list[str]) -> None: - """Assign an owner to every integration in a given connector. - - SPECIAL CARVE-OUT (override #5): like ``cmd_set_assignee``, this writes - the assignee column directly with NO cascade reset. Re-assigning is an - administrative action; existing migration progress is preserved. - """ - if len(args) < 2: - print( - "Usage: workflow_state.py set-assignee-by-connector " - " " - ) - sys.exit(1) - - connector_id = args[0] - assignee = " ".join(args[1:]) - - if not assignee.strip(): - print("ERROR: Assignee cannot be empty.") - sys.exit(1) - - rows = load_csv() - matches = list_by_connector(rows, connector_id) - - if not matches: - print(f"ERROR: No integrations found for connector '{connector_id}'.") - print( - " Tip: run 'workflow_state.py list-connectors' to see all known " - "Connector IDs." - ) - sys.exit(1) - - # Direct write per row — no apply_step_action, no cascade reset. - for row in matches: - row["assignee"] = assignee - - save_csv(rows) - print( - f"Assigned {len(matches)} integration(s) in connector " - f"'{connector_id}' to '{assignee}':" - ) - for row in matches: - print(f" - {row.get('Integration ID', '')}") - - -def cmd_show_step(args: list[str]) -> None: - """Show the value of a specific column for an integration.""" - if len(args) < 2: - print("Usage: workflow_state.py show-step ") - print("\nValid columns:") - for col in DATA_COLUMNS: - print(f" - {col} (data)") - for col in WORKFLOW_COLUMNS: - print(f" - {col}") - sys.exit(1) - - name = args[0] - step = " ".join(args[1:]) - - rows = load_csv() - idx = find_row(rows, name) - if idx is None: - print(f"ERROR: Integration '{name}' not found.") - sys.exit(1) - - valid_steps = set(WORKFLOW_COLUMNS) | set(DATA_COLUMNS) - if step not in valid_steps: - print(f"ERROR: Unknown column '{step}' for integration '{rows[idx]['Integration ID']}'.") - print(f"Valid columns: {', '.join(sorted(valid_steps))}") - sys.exit(1) - - print(format_step_value(rows[idx], step)) - - -# --------------------------------------------------------------------------- -# `files` command — resolve all source files for an integration -# --------------------------------------------------------------------------- - -# Filename extensions that should NOT be included in the `extras` map -# (binary blobs, images, archives — not useful as text source files). -_EXTRAS_BINARY_EXTENSIONS = { - ".png", ".jpg", ".jpeg", ".gif", ".svg", ".ico", ".zip", -} - - -def cmd_files(args: list[str]) -> None: - """Print all known source-file paths for an integration. - - Usage: workflow_state.py files [--format=text|json|paths] - """ - fmt = "text" - positional: list[str] = [] - for a in args: - if a.startswith("--format="): - fmt = a[len("--format="):] - else: - positional.append(a) - - if not positional: - print("Usage: workflow_state.py files [--format=text|json|paths]") - sys.exit(1) - - if fmt not in {"text", "json", "paths"}: - print(f"ERROR: Unknown --format value '{fmt}'. Valid: text, json, paths.", file=sys.stderr) - sys.exit(1) - - integration_id = " ".join(positional) - info = get_integration_files(integration_id) - - if "error" in info: - print(f"ERROR: {info['error']}", file=sys.stderr) - sys.exit(1) - - if fmt == "json": - print(json.dumps(info, indent=2)) - return - - if fmt == "paths": - for key in ("yml", "code", "description", "readme", "test"): - val = info.get(key) - if val: - print(val) - return - - # Default: text - name = info["integration_id"] - lines = [ - f"\n{'=' * 60}", - f" {name} — source files", - f"{'=' * 60}", - f" Directory: {info['directory']}", - f" Base: {info['base']}", - f" Language: {info['code_language'] if info['code_language'] else '(unknown)'}", - "", - f" YML: {info['yml'] if info['yml'] else '(missing)'}", - f" Code: {info['code'] if info['code'] else '(missing)'}", - f" Description: {info['description'] if info['description'] else '(missing)'}", - f" README: {info['readme'] if info['readme'] else '(missing)'}", - f" Test: {info['test'] if info['test'] else '(missing)'}", - ] - extras = info.get("extras") or {} - if extras: - lines.append("") - lines.append(" Other files in directory:") - for fname in sorted(extras.keys()): - lines.append(f" - {fname}") - print("\n".join(lines)) - - -# --------------------------------------------------------------------------- -# `auth-params` command — print the auth-derived YML param ignore set -# --------------------------------------------------------------------------- - -def cmd_auth_params(args: list[str]) -> None: - """Print the union of YML param ids declared in the integration's - ``Auth Details``. - - Usage: workflow_state.py auth-params [--format=text|json] - - Default format is ``text`` (one param id per line — easy to pipe - into ``grep -vFf`` / ``xargs``). ``--format=json`` prints - ``{"integration_id": "...", "params": [...]}`` for programmatic - consumption (mirrors the ``files`` subcommand's format flag). - """ - fmt = "text" - positional: list[str] = [] - for a in args: - if a.startswith("--format="): - fmt = a[len("--format="):] - else: - positional.append(a) - - if not positional: - print( - "Usage: workflow_state.py auth-params " - "[--format=text|json]" - ) - sys.exit(1) - - if fmt not in {"text", "json"}: - print( - f"ERROR: Unknown --format value '{fmt}'. Valid: text, json.", - file=sys.stderr, - ) - sys.exit(1) - - integration_id = " ".join(positional) - try: - params = auth_param_ids(integration_id) - except WorkflowError as e: - print(f"ERROR: {e.message}", file=sys.stderr) - sys.exit(1) - - if fmt == "json": - print(json.dumps( - {"integration_id": integration_id, "params": params}, - indent=2, - )) - return - - # Default: one param id per line. Empty list → nothing printed - # (consistent with `grep -vFf`-friendly output). - for p in params: - print(p) - - -# --------------------------------------------------------------------------- -# `next` command -# --------------------------------------------------------------------------- - -def _example_value_for(step: Step) -> str: - """Return a canonical example value for the example CLI line.""" - if step.kind == "data" and step.name in JSON_VALUED_COLUMNS: - if step.name == "Auth Details": - return ("'{\"auth_types\":[],\"config\":\"NoneRequired\"," - "\"other_connection\":[]}'") - if step.name == "Params for test with default in code": - return "'[]'" - if step.name == "Params same in other handlers": - return "'[]'" - return "'{}'" - if step.name == "assignee": - return '""' - if step.kind == "flag": - return "YES" - return "" - - -def format_next_line(row: dict[str, str]) -> str: - """Format the literal next action for a row.""" - integration_id = row.get("Integration ID", "") - cur = current_step(row) - if cur is None: - return f"{integration_id} — all 16 steps complete. 🎉" - - lines = [f"{integration_id} — step {cur.index} of 16: {cur.name}"] - if cur.setter: - example = _example_value_for(cur) - cmd = (f"python3 connectus/workflow_state.py {cur.setter} " - f"\"{integration_id}\" {example}".rstrip()) - lines.append(f" Run: {cmd}") - if cur.optional: - lines.append( - f" Or: python3 connectus/workflow_state.py skip " - f"\"{integration_id}\" \"{cur.name}\"" - ) - else: - lines.append( - f" Run: python3 connectus/workflow_state.py markpass " - f"\"{integration_id}\" \"{cur.name}\"" - ) - lines.append(f" About: {cur.description}") - return "\n".join(lines) - - -def _parse_next_flags(args: list[str]) -> tuple[Optional[str], bool, list[str]]: - """Parse `--connector ` and `--mine` out of args (order-independent). - - Returns ``(connector_id, mine_flag, leftover_args)``. Leftover args are - the positional arguments not consumed by recognized flags; the caller - decides what to do with them (e.g. treat as an integration ID, or as - ``--all``). - """ - connector_id: Optional[str] = None - mine = False - leftover: list[str] = [] - i = 0 - while i < len(args): - a = args[i] - if a == "--mine": - mine = True - i += 1 - continue - if a == "--connector": - if i + 1 >= len(args): - print("ERROR: --connector requires a connector id argument.") - sys.exit(1) - connector_id = args[i + 1] - i += 2 - continue - # Allow `--connector=` form too, just in case. - if a.startswith("--connector="): - connector_id = a[len("--connector="):] - i += 1 - continue - leftover.append(a) - i += 1 - return connector_id, mine, leftover - - -def cmd_next(args: list[str]) -> None: - """Print the literal next action. - - Forms: - next → that integration only - next → in-progress integrations assigned to current git user - next --mine → explicit alias for the no-arg form - next --all → in-progress integrations for everyone - next --connector → in-progress integrations in that connector - next --connector --mine → intersection of the above - """ - rows = load_csv() - - if not rows: - print("(no rows in CSV — nothing to do)") - return - - connector_id, mine, leftover = _parse_next_flags(args) - - # Form 1: explicit integration ID — only when no flags consumed. - if leftover and leftover[0] != "--all" and connector_id is None and not mine: - name = " ".join(leftover) - idx = find_row(rows, name) - if idx is None: - print(f"ERROR: Integration '{name}' not found.") - sys.exit(1) - print(format_next_line(rows[idx])) - return - - show_all = bool(leftover and leftover[0] == "--all") - if show_all and (mine or connector_id is not None): - # --all combined with selectors makes no semantic sense; let --mine / - # --connector win and ignore --all. - show_all = False - - # Determine the assignee filter. - target_assignee: Optional[str] = None - use_assignee_filter = (not show_all) and (mine or connector_id is None) - if use_assignee_filter: - target_assignee = _git_user_name() - if not target_assignee: - # If the user explicitly asked for --connector without --mine, we - # don't need a git user. But the no-arg form does. - if connector_id is None: - print( - "ERROR: cannot determine current user via 'git config user.name'.\n" - " Pass an integration ID, or use 'next --all' to list everyone's work." - ) - sys.exit(1) - # User passed --connector without --mine but we also have no git - # user; that's fine because we're not filtering by assignee in - # that branch (use_assignee_filter would be False). Defensive. - target_assignee = None - use_assignee_filter = False - - # If --connector was given, narrow the candidate rows first. - candidate_rows = rows - if connector_id is not None: - candidate_rows = list_by_connector(rows, connector_id) - if not candidate_rows: - print(f"No integrations found for connector '{connector_id}'.") - print( - " Tip: run 'workflow_state.py list-connectors' to see all known " - "Connector IDs." - ) - return - - matched_any = False - any_in_progress_in_connector = False - for row in candidate_rows: - if not has_workflow_progress(row): - continue - if current_step(row) is None: - continue - any_in_progress_in_connector = True - if use_assignee_filter: - if row.get("assignee", "").strip().lower() != (target_assignee or "").lower(): - continue - print(format_next_line(row)) - print() - matched_any = True - - if matched_any: - return - - # No matches — produce a targeted message. - if connector_id is not None and not any_in_progress_in_connector: - print( - f"No in-progress integrations in connector '{connector_id}' " - f"(all are either unstarted or done)." - ) - return - if connector_id is not None and use_assignee_filter: - print( - f"No in-progress integrations in connector '{connector_id}' " - f"assigned to '{target_assignee}'." - ) - return - if connector_id is not None: - print(f"No in-progress integrations in connector '{connector_id}'.") - return - if show_all: - print("No in-progress integrations.") - return - print(f"No in-progress integrations assigned to '{target_assignee}'.") - - -# --------------------------------------------------------------------------- -# Help -# --------------------------------------------------------------------------- - -def cmd_help(_args: list[str]) -> None: - print(__doc__) - - -# --------------------------------------------------------------------------- -# Programmatic API (for use by AI agents / other scripts) -# --------------------------------------------------------------------------- - -def get_integration_status(integration_id: str) -> dict: - """Return a dict summary of an integration's workflow state.""" - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - - row = rows[idx] - cur = current_step(row) - completed = sum(1 for s in STEPS if is_done(row, s)) - return { - "name": row.get("Integration ID", ""), - "current_step": cur.name if cur else None, - "current_step_index": cur.index if cur else None, - "workflow": {col: row.get(col, "") for col in WORKFLOW_COLUMNS}, - "completed_steps": completed, - "total_steps": len(STEPS), - "progress_pct": round(completed / len(STEPS) * 100, 1), - "all_complete": cur is None and completed > 0, - } - - -def get_integration_files(integration_id: str) -> dict: - """Return all known source-file paths for an integration. - - The integration's ``Integration File Path`` column holds the YML file - path (relative to the repo root). All other integration source files - live in the same directory and follow demisto-sdk conventions:: - - /.yml ← YML (manifest) - /.py ← Python source (or .js / .ps1) - /_description.md ← short UI blurb - /README.md ← long-form docs (filename is fixed) - /_test.py ← unit tests (Python only) - - Returns a dict with the following keys (str values are repo-relative - paths; ``None`` means "not present on disk"):: - - { - "integration_id": "", - "directory": "", - "base": "", - "yml": "" | None, - "code": "" | None, - "code_language": "python" | "javascript" | "powershell" | None, - "description": "" | None, - "readme": "" | None, - "test": "" | None, - "extras": {"": "", ...}, - } - - Errors return ``{"error": "..."}`` (matching the convention used by - ``get_integration_status`` and ``next_step_for``). - """ - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - - row = rows[idx] - yml_rel = row.get("Integration File Path", "").strip() - if not yml_rel: - return { - "error": ( - f"Integration '{integration_id}' has no Integration File Path " - f"set in the CSV." - ) - } - - # Normalize separators for the relative path components, but resolve - # against BASE_DIR for existence checks. - directory_rel = os.path.dirname(yml_rel) - basename = os.path.basename(yml_rel) - # Strip a `.yml` extension specifically — leave anything else intact. - if basename.lower().endswith(".yml"): - base = basename[:-4] - else: - base = os.path.splitext(basename)[0] - - abs_dir = os.path.join(BASE_DIR, directory_rel) - if not os.path.isdir(abs_dir): - return { - "error": ( - f"Integration directory '{directory_rel}' (from CSV) does " - f"not exist on disk." - ) - } - - def _rel_if_exists(filename: str) -> Optional[str]: - abs_path = os.path.join(abs_dir, filename) - if os.path.isfile(abs_path): - return os.path.join(directory_rel, filename) if directory_rel else filename - return None - - yml_path = _rel_if_exists(basename) - - code_path: Optional[str] = None - code_language: Optional[str] = None - for ext, lang in (("py", "python"), ("js", "javascript"), ("ps1", "powershell")): - candidate = _rel_if_exists(f"{base}.{ext}") - if candidate is not None: - code_path = candidate - code_language = lang - break - - description_path = _rel_if_exists(f"{base}_description.md") - readme_path = _rel_if_exists("README.md") - - test_path: Optional[str] = None - if code_language == "python": - test_path = _rel_if_exists(f"{base}_test.py") - - canonical_filenames = { - basename, - f"{base}.py", - f"{base}.js", - f"{base}.ps1", - f"{base}_description.md", - "README.md", - f"{base}_test.py", - } - - extras: dict[str, str] = {} - try: - entries = os.listdir(abs_dir) - except OSError: - entries = [] - for fname in entries: - if fname in canonical_filenames: - continue - abs_entry = os.path.join(abs_dir, fname) - # Only list regular files (skip subdirectories like test_data/). - if not os.path.isfile(abs_entry): - continue - ext = os.path.splitext(fname)[1].lower() - if ext in _EXTRAS_BINARY_EXTENSIONS: - continue - extras[fname] = ( - os.path.join(directory_rel, fname) if directory_rel else fname - ) - - return { - "integration_id": row.get("Integration ID", ""), - "directory": directory_rel, - "base": base, - "yml": yml_path, - "code": code_path, - "code_language": code_language, - "description": description_path, - "readme": readme_path, - "test": test_path, - "extras": extras, - } - - -def next_step_for(integration_id: str) -> dict: - """Return the next-action info for an integration.""" - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - row = rows[idx] - cur = current_step(row) - if cur is None: - return {"complete": True, "message": format_next_line(row)} - return { - "complete": False, - "step_index": cur.index, - "step_name": cur.name, - "setter": cur.setter, - "description": cur.description, - "message": format_next_line(row), - } - - -def _row_summary_dict(row: dict[str, str]) -> dict: - """JSON-serializable snapshot of an integration row's workflow state.""" - cur = current_step(row) - completed = sum(1 for s in STEPS if is_done(row, s)) - return { - "integration_id": row.get("Integration ID", ""), - "connector_id": row.get("Connector ID", "").strip(), - "assignee": row.get("assignee", "").strip(), - "current_step": cur.name if cur else None, - "current_step_index": cur.index if cur else None, - "completed_steps": completed, - "all_complete": cur is None and has_workflow_progress(row), - "has_progress": has_workflow_progress(row), - } - - -def list_integrations_by_connector(connector_id: str) -> list[dict]: - """Return one summary dict per integration matching ``connector_id``. - - Match is case-insensitive on the trimmed Connector ID. - """ - rows = load_csv() - matches = list_by_connector(rows, connector_id) - return [_row_summary_dict(row) for row in matches] - - -def integrations_for_assignee(assignee_name: str) -> list[dict]: - """Return one summary dict per integration assigned to ``assignee_name``. - - Match is case-insensitive on the trimmed assignee column. - """ - rows = load_csv() - matches = list_by_assignee(rows, assignee_name) - return [_row_summary_dict(row) for row in matches] - - -def assign_connector(connector_id: str, assignee_name: str) -> dict: - """Assign every integration in ``connector_id`` to ``assignee_name``. - - Mirrors ``cmd_set_assignee_by_connector``: NO cascade reset. Returns - ``{"connector_id", "assignee", "assigned": [], "count": N}`` on - success, or ``{"error": "..."}`` if no rows match or the assignee is - empty. - """ - if not assignee_name or not assignee_name.strip(): - return {"error": "Assignee cannot be empty."} - - rows = load_csv() - matches = list_by_connector(rows, connector_id) - if not matches: - return { - "error": ( - f"No integrations found for connector '{connector_id}'. " - "Use list-connectors to see all known Connector IDs." - ) - } - - assigned_ids: list[str] = [] - for row in matches: - row["assignee"] = assignee_name - assigned_ids.append(row.get("Integration ID", "")) - - save_csv(rows) - return { - "connector_id": connector_id, - "assignee": assignee_name, - "assigned": assigned_ids, - "count": len(assigned_ids), - } - - -def markpass_integration_step(integration_id: str, step_name: str) -> dict: - """Mark a checkpoint as passed via the unified dispatch.""" - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - - row = rows[idx] - target = STEP_BY_NAME.get(step_name) - if target is None: - return {"error": f"Unknown step '{step_name}'."} - if step_name in NON_CHECKPOINT_STEPS: - return {"error": f"'{step_name}' is not a checkpoint; use {NON_CHECKPOINT_STEPS[step_name]}."} - - if step_name == "auth parity test passes": - flag = row.get(AUTH_PARITY_FLAG_COLUMN, "").strip().upper() - if flag in ("NO", "N/A"): - row[step_name] = NA_MARK - save_csv(rows) - cur = current_step(row) - return { - "message": f"'{step_name}' set to N/A.", - "completed_step": step_name, - "current_step": cur.name if cur else None, - } - if flag == "": - return {"error": f"'{step_name}' requires the flag to be set first."} - - try: - cleared, no_op = apply_step_action(row, target, CHECK, verb="markpass") - except WorkflowError as e: - return {"error": e.message} - - save_csv(rows) - cur = current_step(row) - return { - "message": (f"'{step_name}' marked passed." - + (f" Cleared: {cleared}" if cleared else "") - + (" (no-op)" if no_op else "")), - "completed_step": step_name, - "current_step": cur.name if cur else None, - } - - -def fail_integration_step(integration_id: str, step_name: str) -> dict: - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - target = STEP_BY_NAME.get(step_name) - if target is None: - return {"error": f"Unknown step '{step_name}'."} - row = rows[idx] - if target.index == 1: - for s in STEPS: - row[s.name] = "" - else: - prev = STEP_BY_INDEX[target.index - 1] - row[target.name] = "" - reset_after(row, prev) - save_csv(rows) - cur = current_step(row) - return { - "message": f"Reset '{step_name}' and subsequent steps.", - "current_step": cur.name if cur else None, - } - - -def reset_integration_to_step(integration_id: str, step_name: str) -> dict: - return fail_integration_step(integration_id, step_name) - - -def skip_integration_step(integration_id: str, step_name: str) -> dict: - """Skip an optional step (writes 'N/A').""" - target = STEP_BY_NAME.get(step_name) - if target is None: - return {"error": f"Unknown step '{step_name}'."} - if not target.optional: - return {"error": f"Step '{step_name}' is not optional and cannot be skipped."} - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - row = rows[idx] - try: - cleared, _ = apply_step_action(row, target, NA_MARK, verb="skip") - except WorkflowError as e: - return {"error": e.message} - save_csv(rows) - cur = current_step(row) - return { - "message": f"Skipped '{step_name}'." + (f" Cleared: {cleared}" if cleared else ""), - "current_step": cur.name if cur else None, - } - - -def set_integration_auth(integration_id: str, auth_detail_json: str) -> dict: - """Set Auth Details and cascade-reset every later step.""" - schema_errors = validate_auth_detail(auth_detail_json) - if schema_errors: - return {"error": "Auth Details schema validation failed:\n" - + "\n".join(f" - {e}" for e in schema_errors)} - - rows = load_csv() - idx = find_row(rows, integration_id) - if idx is None: - return {"error": f"Integration '{integration_id}' not found."} - - row = rows[idx] - target = STEP_BY_NAME["Auth Details"] - try: - cleared, _ = apply_step_action(row, target, auth_detail_json, verb="set-auth") - except WorkflowError as e: - return {"error": e.message} - save_csv(rows) - cur = current_step(row) - return { - "message": f"Set 'Auth Details' for '{row.get('Integration ID', '')}'." - + (f" Cleared: {cleared}" if cleared else ""), - "current_step": cur.name if cur else None, - } - - -# --------------------------------------------------------------------------- -# Main dispatch -# --------------------------------------------------------------------------- - -COMMANDS: dict[str, Callable[[list[str]], None]] = { - "status": cmd_status, - "status-all": cmd_status_all, - "dashboard": cmd_dashboard, - "next": cmd_next, - "set-assignee": cmd_set_assignee, - "set-auth": cmd_set_auth, - "set-params-to-commands": cmd_set_params_to_commands, - "set-params-for-test": cmd_set_params_for_test, - "set-shared-params": cmd_set_shared_params, - "set-auth-flag": cmd_set_auth_flag, - "markpass": cmd_markpass, - "skip": cmd_skip, - "fail": cmd_fail, - "reset-to": cmd_reset_to, - "reset": cmd_reset, - "at-step": cmd_at_step, - "list": cmd_list, - "list-by-assignee": cmd_list_by_assignee, - "list-by-connector": cmd_list_by_connector, - "list-connectors": cmd_list_connectors, - "set-assignee-by-connector": cmd_set_assignee_by_connector, - "show-step": cmd_show_step, - "files": cmd_files, - "auth-params": cmd_auth_params, - "help": cmd_help, -} - - -def main() -> None: - if len(sys.argv) < 2: - cmd_help([]) - sys.exit(1) - - command = sys.argv[1] - args = sys.argv[2:] - - if command not in COMMANDS: - print(f"ERROR: Unknown command '{command}'.") - print(f"Available commands: {', '.join(COMMANDS.keys())}") - sys.exit(1) - - COMMANDS[command](args) +from workflow_state.cli import main if __name__ == "__main__": diff --git a/connectus/workflow_state/__init__.py b/connectus/workflow_state/__init__.py new file mode 100644 index 00000000000..318f41bba96 --- /dev/null +++ b/connectus/workflow_state/__init__.py @@ -0,0 +1,309 @@ +"""workflow_state — Config-driven workflow state machine for the +connectus migration pipeline. + +The shape of the workflow (steps, columns, markers, cross-step +interactions) is declared in +``connectus/workflow_state_config.yml`` and loaded at import time by +:func:`workflow_state.config_loader.get_config`. The runtime engine +(cascade reset, normalization, CSV I/O, CLI dispatch) lives in this +package. + +External callers should keep using ``from workflow_state import …`` +(via the thin shim at ``connectus/workflow_state.py``); every public +name is re-exported here for back-compat. +""" +from __future__ import annotations + +# ---- Exceptions ---------------------------------------------------------- + +from workflow_state.exceptions import ( + ConfigLoadError, + WorkflowError, +) + +# ---- Types / dataclasses ------------------------------------------------ + +from workflow_state.types import ( + IdentityColumn, + MarkerSet, + Step, + StepInteraction, + WorkflowConfig, +) + +# ---- Config loader ------------------------------------------------------- + +from workflow_state.config_loader import ( + _reset_config_for_testing, + default_config_path, + get_config, + load_config, +) + +# ---- State engine -------------------------------------------------------- + +from workflow_state.state_machine import ( + _can_advance_to, + _normalize_rows_with_warning, + apply_step_action, + current_step, + get_current_step, + get_step, + get_step_index, + has_workflow_progress, + is_checked, + is_done, + markpass_step, + normalize_row, + reset_after, + reset_from_step, +) + +# ---- CSV I/O ------------------------------------------------------------- + +from workflow_state.csv_io import ( + BASE_DIR, + CSV_PATH, + find_row, + load_csv, + os, # re-exported for tests that monkey-patch ``workflow_state.os.replace`` + save_csv, + wipe_workflow_data, +) + +# ---- Validators ---------------------------------------------------------- + +from workflow_state.validators import ( + auth_param_sources as _auth_param_sources, + get_named_validator, + is_known_cross_check, + known_cross_check_names, + known_validator_names, + validate_any_json, + validate_auth_detail, + validate_params_to_commands, +) + +# ---- Display ------------------------------------------------------------- + +from workflow_state.display import ( + _auth_other_connection_summary, + _example_value_for, + _format_step_for_listing, + _summary_value, + format_by_assignee, + format_dashboard_row, + format_next_line, + format_status, + format_step_for_listing, + format_step_value, +) + +# ---- Programmatic API ---------------------------------------------------- + +from workflow_state.api import ( + _check_params_to_commands_overlap, + _project_xsoar_param_to_yml_id, + assign_connector, + auth_param_ids, + fail_integration_step, + get_integration_files, + get_integration_status, + integrations_for_assignee, + list_by_assignee, + list_by_connector, + list_integrations_by_connector, + markpass_integration_step, + next_step_for, + reset_integration_to_step, + set_integration_auth, + skip_integration_step, +) + +# ---- CLI ----------------------------------------------------------------- + +from workflow_state.cli import ( + COMMANDS, + _git_user_name, + _parse_next_flags, + _resolve_row_or_exit, + _set_json_data_step, + _set_step_via_dispatch, + cmd_at_step, + cmd_auth_params, + cmd_dashboard, + cmd_fail, + cmd_files, + cmd_help, + cmd_list, + cmd_list_by_assignee, + cmd_list_by_connector, + cmd_list_connectors, + cmd_markpass, + cmd_next, + cmd_reset, + cmd_reset_to, + cmd_set_assignee, + cmd_set_assignee_by_connector, + cmd_set_auth, + cmd_set_auth_flag, + cmd_set_params_for_test, + cmd_set_params_to_commands, + cmd_set_shared_params, + cmd_show_step, + cmd_skip, + cmd_status, + cmd_status_all, + cmd_wipe_workflow_data, + main, +) + + +# ---- Derived legacy module-level constants ------------------------------ +# These are computed once at import time from the loaded config so that +# `from workflow_state import STEPS` and friends keep working unchanged. +# (Tests at workflow_state_test.py:22 import all of these by name.) + +from auth_config_parser import AuthType # re-exported for back-compat + + +def _compute_legacy_constants() -> None: + """Populate module-level legacy constants from the loaded config. + + Triggers the YAML load. If the YAML is malformed, ``ConfigLoadError`` + is raised here, fast — no `cmd_*` will run. + """ + cfg = get_config() + g = globals() + g["CHECK"] = cfg.markers.check + g["FAIL_MARK"] = cfg.markers.fail + g["NA_MARK"] = cfg.markers.na + g["VALID_FLAG_VALUES"] = set(cfg.markers.flag_values) + g["VALID_AUTH_TYPES"] = {t.value for t in AuthType} + g["DATA_COLUMNS"] = list(cfg.identity_column_names) + g["STEPS"] = list(cfg.steps) + g["STEP_BY_NAME"] = dict(cfg.step_by_name) + g["STEP_BY_INDEX"] = dict(cfg.step_by_index) + g["WORKFLOW_COLUMNS"] = list(cfg.workflow_columns) + g["WORKFLOW_DATA_COLUMNS"] = list(cfg.workflow_data_columns) + g["CHECKPOINT_COLUMNS"] = list(cfg.checkpoint_columns) + g["JSON_VALUED_COLUMNS"] = set(cfg.json_valued_columns) + g["AUTH_PARITY_FLAG_COLUMN"] = cfg.auth_parity_flag_column or "" + g["ALL_COLUMNS"] = list(cfg.all_columns) + g["EXPECTED_COLUMN_COUNT"] = cfg.expected_column_count + g["NON_CHECKPOINT_STEPS"] = dict(cfg.non_checkpoint_steps) + + +_compute_legacy_constants() + + +__all__ = [ + # Exceptions + "ConfigLoadError", + "WorkflowError", + # Types + "IdentityColumn", + "MarkerSet", + "Step", + "StepInteraction", + "WorkflowConfig", + "AuthType", + # Config loader + "default_config_path", + "get_config", + "load_config", + # State engine + "apply_step_action", + "current_step", + "get_current_step", + "get_step", + "get_step_index", + "has_workflow_progress", + "is_checked", + "is_done", + "markpass_step", + "normalize_row", + "reset_after", + "reset_from_step", + # CSV I/O + "BASE_DIR", + "CSV_PATH", + "find_row", + "load_csv", + "save_csv", + "wipe_workflow_data", + # Validators + "get_named_validator", + "validate_any_json", + "validate_auth_detail", + "validate_params_to_commands", + # Display + "format_by_assignee", + "format_dashboard_row", + "format_next_line", + "format_status", + "format_step_value", + "format_step_for_listing", + # API + "assign_connector", + "auth_param_ids", + "fail_integration_step", + "get_integration_files", + "get_integration_status", + "integrations_for_assignee", + "list_by_assignee", + "list_by_connector", + "list_integrations_by_connector", + "markpass_integration_step", + "next_step_for", + "reset_integration_to_step", + "set_integration_auth", + "skip_integration_step", + # CLI + "COMMANDS", + "main", + "cmd_at_step", + "cmd_auth_params", + "cmd_dashboard", + "cmd_fail", + "cmd_files", + "cmd_help", + "cmd_list", + "cmd_list_by_assignee", + "cmd_list_by_connector", + "cmd_list_connectors", + "cmd_markpass", + "cmd_next", + "cmd_reset", + "cmd_reset_to", + "cmd_set_assignee", + "cmd_set_assignee_by_connector", + "cmd_set_auth", + "cmd_set_auth_flag", + "cmd_set_params_for_test", + "cmd_set_params_to_commands", + "cmd_set_shared_params", + "cmd_show_step", + "cmd_skip", + "cmd_status", + "cmd_status_all", + "cmd_wipe_workflow_data", + # Derived legacy constants + "CHECK", + "FAIL_MARK", + "NA_MARK", + "VALID_FLAG_VALUES", + "VALID_AUTH_TYPES", + "DATA_COLUMNS", + "STEPS", + "STEP_BY_NAME", + "STEP_BY_INDEX", + "WORKFLOW_COLUMNS", + "WORKFLOW_DATA_COLUMNS", + "CHECKPOINT_COLUMNS", + "JSON_VALUED_COLUMNS", + "AUTH_PARITY_FLAG_COLUMN", + "ALL_COLUMNS", + "EXPECTED_COLUMN_COUNT", + "NON_CHECKPOINT_STEPS", +] diff --git a/connectus/workflow_state/api.py b/connectus/workflow_state/api.py new file mode 100644 index 00000000000..fd343895e7d --- /dev/null +++ b/connectus/workflow_state/api.py @@ -0,0 +1,566 @@ +"""Programmatic API and auth-derived helpers. + +Returns plain dicts. Consumed by the SKILL via subprocess and (when +imported in-process) by other Python callers. +""" +from __future__ import annotations + +import json +import os +import sys +from typing import Optional + +from auth_config_parser import ( + project_xsoar_param_to_yml_id as _pkg_project_xsoar_param_to_yml_id, +) +from workflow_state.config_loader import get_config +from workflow_state.csv_io import ( + BASE_DIR, + find_row, +) + + +def load_csv(): # type: ignore[no-redef] + """Indirect to ``workflow_state.load_csv`` so tests can monkey-patch.""" + import workflow_state as _ws + return _ws.load_csv() + + +def save_csv(rows): # type: ignore[no-redef] + """Indirect to ``workflow_state.save_csv`` so tests can monkey-patch.""" + import workflow_state as _ws + return _ws.save_csv(rows) +from workflow_state.exceptions import WorkflowError +from workflow_state.state_machine import ( + apply_step_action, + current_step, + has_workflow_progress, + is_done, + reset_after, +) +from workflow_state.validators import ( + auth_param_sources as _auth_param_sources, + validate_auth_detail, +) + + +# --------------------------------------------------------------------------- +# Auth-derived ignore set (cross-step exclusion plumbing) +# --------------------------------------------------------------------------- + +def _project_xsoar_param_to_yml_id(xsoar_param: str) -> str: + """Backward-compatible wrapper — delegates to the package.""" + return _pkg_project_xsoar_param_to_yml_id(xsoar_param) + + +def auth_param_ids(integration_id: str) -> list[str]: + """Return the union of YML param ids declared in an integration's + ``Auth Details``. + """ + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + raise WorkflowError( + f"Integration '{integration_id}' not found in the CSV." + ) + + raw = rows[idx].get("Auth Details", "").strip() + if not raw: + raise WorkflowError( + f"'Auth Details' is not set for integration " + f"'{rows[idx].get('Integration ID', integration_id)}'. " + f"Run 'set-auth' first — populating 'Params to Commands' " + f"requires the auth classification to be in place so the " + f"two columns stay disjoint." + ) + + try: + parsed = json.loads(raw) + except json.JSONDecodeError as e: + raise WorkflowError( + f"'Auth Details' for integration '{integration_id}' is not " + f"valid JSON: {e}. Re-run 'set-auth' with a corrected payload." + ) + if not isinstance(parsed, dict): + raise WorkflowError( + f"'Auth Details' for integration '{integration_id}' is not a " + f"JSON object (got {type(parsed).__name__}). Re-run 'set-auth'." + ) + + if "other_connection" not in parsed: + print( + f"WARNING: Auth Details for '{integration_id}' is missing " + f"'other_connection' (legacy shape). Re-run 'set-auth' to " + f"populate it; auth_param_ids() returning only the " + f"auth_types-derived ids in the meantime.", + file=sys.stderr, + ) + + sources = _auth_param_sources(parsed) + return sorted(sources.keys()) + + +# --------------------------------------------------------------------------- +# Programmatic dict-returning API +# --------------------------------------------------------------------------- + +def get_integration_status(integration_id: str) -> dict: + """Return a dict summary of an integration's workflow state.""" + cfg = get_config() + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + + row = rows[idx] + cur = current_step(row) + completed = sum(1 for s in cfg.steps if is_done(row, s)) + return { + "name": row.get("Integration ID", ""), + "current_step": cur.name if cur else None, + "current_step_index": cur.index if cur else None, + "workflow": {col: row.get(col, "") for col in cfg.workflow_columns}, + "completed_steps": completed, + "total_steps": len(cfg.steps), + "progress_pct": round(completed / len(cfg.steps) * 100, 1), + "all_complete": cur is None and completed > 0, + } + + +# Filename extensions that should NOT be included in the `extras` map. +_EXTRAS_BINARY_EXTENSIONS = { + ".png", ".jpg", ".jpeg", ".gif", ".svg", ".ico", ".zip", +} + + +def get_integration_files(integration_id: str) -> dict: + """Return all known source-file paths for an integration.""" + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + + row = rows[idx] + yml_rel = row.get("Integration File Path", "").strip() + if not yml_rel: + return { + "error": ( + f"Integration '{integration_id}' has no Integration File Path " + f"set in the CSV." + ) + } + + directory_rel = os.path.dirname(yml_rel) + basename = os.path.basename(yml_rel) + if basename.lower().endswith(".yml"): + base = basename[:-4] + else: + base = os.path.splitext(basename)[0] + + abs_dir = os.path.join(BASE_DIR, directory_rel) + if not os.path.isdir(abs_dir): + return { + "error": ( + f"Integration directory '{directory_rel}' (from CSV) does " + f"not exist on disk." + ) + } + + def _rel_if_exists(filename: str) -> Optional[str]: + abs_path = os.path.join(abs_dir, filename) + if os.path.isfile(abs_path): + return os.path.join(directory_rel, filename) if directory_rel else filename + return None + + yml_path = _rel_if_exists(basename) + + code_path: Optional[str] = None + code_language: Optional[str] = None + for ext, lang in (("py", "python"), ("js", "javascript"), ("ps1", "powershell")): + candidate = _rel_if_exists(f"{base}.{ext}") + if candidate is not None: + code_path = candidate + code_language = lang + break + + description_path = _rel_if_exists(f"{base}_description.md") + readme_path = _rel_if_exists("README.md") + + test_path: Optional[str] = None + if code_language == "python": + test_path = _rel_if_exists(f"{base}_test.py") + + canonical_filenames = { + basename, + f"{base}.py", + f"{base}.js", + f"{base}.ps1", + f"{base}_description.md", + "README.md", + f"{base}_test.py", + } + + extras: dict[str, str] = {} + try: + entries = os.listdir(abs_dir) + except OSError: + entries = [] + for fname in entries: + if fname in canonical_filenames: + continue + abs_entry = os.path.join(abs_dir, fname) + if not os.path.isfile(abs_entry): + continue + ext = os.path.splitext(fname)[1].lower() + if ext in _EXTRAS_BINARY_EXTENSIONS: + continue + extras[fname] = ( + os.path.join(directory_rel, fname) if directory_rel else fname + ) + + return { + "integration_id": row.get("Integration ID", ""), + "directory": directory_rel, + "base": base, + "yml": yml_path, + "code": code_path, + "code_language": code_language, + "description": description_path, + "readme": readme_path, + "test": test_path, + "extras": extras, + } + + +# Forward declaration needed for next_step_for; defined in display.py. +def _format_next_line(row: dict[str, str]) -> str: + from workflow_state.display import format_next_line + return format_next_line(row) + + +def next_step_for(integration_id: str) -> dict: + """Return the next-action info for an integration.""" + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + row = rows[idx] + cur = current_step(row) + if cur is None: + return {"complete": True, "message": _format_next_line(row)} + return { + "complete": False, + "step_index": cur.index, + "step_name": cur.name, + "setter": cur.setter, + "description": cur.description, + "message": _format_next_line(row), + } + + +def _row_summary_dict(row: dict[str, str]) -> dict: + cfg = get_config() + cur = current_step(row) + completed = sum(1 for s in cfg.steps if is_done(row, s)) + return { + "integration_id": row.get("Integration ID", ""), + "connector_id": row.get("Connector ID", "").strip(), + "assignee": row.get("assignee", "").strip(), + "current_step": cur.name if cur else None, + "current_step_index": cur.index if cur else None, + "completed_steps": completed, + "all_complete": cur is None and has_workflow_progress(row), + "has_progress": has_workflow_progress(row), + } + + +def list_by_assignee(rows: list[dict[str, str]], assignee_name: str) -> list[dict[str, str]]: + """Filter rows to those whose assignee matches (case-insensitive).""" + target = assignee_name.strip().lower() + return [row for row in rows if row.get("assignee", "").strip().lower() == target] + + +def list_by_connector(rows: list[dict[str, str]], connector_id: str) -> list[dict[str, str]]: + """Filter rows to those whose Connector ID matches (case-insensitive).""" + target = connector_id.strip().lower() + return [ + row for row in rows + if row.get("Connector ID", "").strip().lower() == target + ] + + +def list_integrations_by_connector(connector_id: str) -> list[dict]: + rows = load_csv() + matches = list_by_connector(rows, connector_id) + return [_row_summary_dict(row) for row in matches] + + +def integrations_for_assignee(assignee_name: str) -> list[dict]: + rows = load_csv() + matches = list_by_assignee(rows, assignee_name) + return [_row_summary_dict(row) for row in matches] + + +def assign_connector(connector_id: str, assignee_name: str) -> dict: + """Assign every integration in ``connector_id`` to ``assignee_name``. + + Mirrors the ``set-assignee-by-connector`` carve-out: NO cascade reset. + """ + if not assignee_name or not assignee_name.strip(): + return {"error": "Assignee cannot be empty."} + + rows = load_csv() + matches = list_by_connector(rows, connector_id) + if not matches: + return { + "error": ( + f"No integrations found for connector '{connector_id}'. " + "Use list-connectors to see all known Connector IDs." + ) + } + + assigned_ids: list[str] = [] + for row in matches: + row["assignee"] = assignee_name + assigned_ids.append(row.get("Integration ID", "")) + + save_csv(rows) + return { + "connector_id": connector_id, + "assignee": assignee_name, + "assigned": assigned_ids, + "count": len(assigned_ids), + } + + +def markpass_integration_step(integration_id: str, step_name: str) -> dict: + """Mark a checkpoint as passed via the unified dispatch.""" + cfg = get_config() + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + + row = rows[idx] + target = cfg.step_by_name.get(step_name) + if target is None: + return {"error": f"Unknown step '{step_name}'."} + non_checkpoint = cfg.non_checkpoint_steps + if step_name in non_checkpoint: + return {"error": f"'{step_name}' is not a checkpoint; use {non_checkpoint[step_name]}."} + + # Honour any flag_auto_na_target interaction whose target_step matches. + for inter in cfg.step_interactions: + if inter.kind == "flag_auto_na_target" and inter.target_step == step_name: + flag = row.get(inter.when_step, "").strip().upper() + if flag in {v.upper() for v in inter.when_value_in}: + row[step_name] = inter.write_value + save_csv(rows) + cur = current_step(row) + return { + "message": f"'{step_name}' set to {inter.write_value}.", + "completed_step": step_name, + "current_step": cur.name if cur else None, + } + if flag == "": + return {"error": f"'{step_name}' requires the flag to be set first."} + + try: + cleared, no_op = apply_step_action(row, target, cfg.markers.check, verb="markpass") + except WorkflowError as e: + return {"error": e.message} + + save_csv(rows) + cur = current_step(row) + return { + "message": ( + f"'{step_name}' marked passed." + + (f" Cleared: {cleared}" if cleared else "") + + (" (no-op)" if no_op else "") + ), + "completed_step": step_name, + "current_step": cur.name if cur else None, + } + + +def fail_integration_step(integration_id: str, step_name: str) -> dict: + """Programmatic ``fail`` / ``reset-to`` (they share semantics). + + Honours ``preserve_on_reset`` on later steps. The named ``step_name`` + is always cleared even if it is itself preserved (explicit-target + carve-out). + """ + cfg = get_config() + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + target = cfg.step_by_name.get(step_name) + if target is None: + return {"error": f"Unknown step '{step_name}'."} + row = rows[idx] + # Explicit-target carve-out: clear the named target even if it's + # tagged preserve_on_reset. + row[target.name] = "" + cleared, preserved = reset_after(row, target, respect_preserve=True) + save_csv(rows) + cur = current_step(row) + msg = f"Reset '{step_name}' and subsequent non-preserved steps." + if preserved: + msg += f" Preserved (preserve_on_reset=true): {preserved}." + return { + "message": msg, + "current_step": cur.name if cur else None, + "preserved": preserved, + } + + +def reset_integration_to_step(integration_id: str, step_name: str) -> dict: + return fail_integration_step(integration_id, step_name) + + +def skip_integration_step(integration_id: str, step_name: str) -> dict: + cfg = get_config() + target = cfg.step_by_name.get(step_name) + if target is None: + return {"error": f"Unknown step '{step_name}'."} + if not target.optional: + return {"error": f"Step '{step_name}' is not optional and cannot be skipped."} + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + row = rows[idx] + try: + cleared, _ = apply_step_action(row, target, cfg.markers.na, verb="skip") + except WorkflowError as e: + return {"error": e.message} + save_csv(rows) + cur = current_step(row) + return { + "message": f"Skipped '{step_name}'." + (f" Cleared: {cleared}" if cleared else ""), + "current_step": cur.name if cur else None, + } + + +def set_integration_auth(integration_id: str, auth_detail_json: str) -> dict: + cfg = get_config() + schema_errors = validate_auth_detail(auth_detail_json) + if schema_errors: + return { + "error": "Auth Details schema validation failed:\n" + + "\n".join(f" - {e}" for e in schema_errors) + } + + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + return {"error": f"Integration '{integration_id}' not found."} + + row = rows[idx] + target = cfg.step_by_name["Auth Details"] + try: + cleared, _ = apply_step_action(row, target, auth_detail_json, verb="set-auth") + except WorkflowError as e: + return {"error": e.message} + save_csv(rows) + cur = current_step(row) + return { + "message": ( + f"Set 'Auth Details' for '{row.get('Integration ID', '')}'." + + (f" Cleared: {cleared}" if cleared else "") + ), + "current_step": cur.name if cur else None, + } + + +# --------------------------------------------------------------------------- +# Cross-check: implementation referenced by the validators registry name +# `params_to_commands_no_auth_overlap`. Lives here because it consults the +# CSV (load_csv) and auth_param_ids; importing those into validators.py +# would create a cycle. +# --------------------------------------------------------------------------- + +def _check_params_to_commands_overlap(integration_id: str, payload: dict) -> None: + """Reject ``set-params-to-commands`` payloads that overlap with auth.""" + rows = load_csv() + idx = find_row(rows, integration_id) + if idx is None: + raise WorkflowError( + f"Integration '{integration_id}' not found in the CSV." + ) + raw_auth = rows[idx].get("Auth Details", "").strip() + auth_detail: dict = {} + if raw_auth: + try: + parsed = json.loads(raw_auth) + if isinstance(parsed, dict): + auth_detail = parsed + except json.JSONDecodeError: + pass + sources = _auth_param_sources(auth_detail) if auth_detail else {} + + auth_ids = set(auth_param_ids(integration_id)) + + commands_block = payload.get("commands") if isinstance(payload, dict) else None + if not isinstance(commands_block, dict): + return + + offenders: list[tuple[str, str]] = [] + for cmd, param_list in commands_block.items(): + if not isinstance(param_list, list): + continue + for p in param_list: + if isinstance(p, str) and p in auth_ids: + offenders.append((str(cmd), p)) + + if not offenders: + return + + lines = [ + f"'Params to Commands' for '{integration_id}' contains " + f"{len(offenders)} param(s) that are already declared in " + f"'Auth Details'. The two columns MUST be disjoint.", + "", + "Offending (command, param) pairs:", + ] + for cmd, p in sorted(offenders): + lines.append(f" - ({cmd!r}, {p!r})") + + lines.append("") + lines.append("Source of each offending param in 'Auth Details':") + seen_params: set[str] = set() + for _cmd, p in sorted(offenders): + if p in seen_params: + continue + seen_params.add(p) + srcs = sources.get(p) + if srcs: + for src in srcs: + lines.append(f" - param {p!r} overlaps with {src}") + else: + lines.append( + f" - param {p!r} overlaps with Auth Details " + f"(source not attributable; legacy row?)" + ) + + lines.extend([ + "", + "Fix:", + f" Re-derive the per-command lists with the auth-aware ignore " + f"set — run:", + f" python3 connectus/workflow_state.py auth-params " + f"\"{integration_id}\"", + f" to see exactly what to exclude. The analyzer can pull this " + f"list automatically: pass --integration-id " + f"\"{integration_id}\" to " + f"connectus/check_command_params.py.", + "", + f" If a listed param is *truly* used per-command and was " + f"misclassified into 'Auth Details', revert to Step 1 with " + f"'set-auth' and remove it from 'auth_types[].xsoar_params' " + f"or 'other_connection' first. Do NOT bypass this rejection " + f"by hand-stripping just to make the call go through.", + ]) + + raise WorkflowError("\n".join(lines)) diff --git a/connectus/workflow_state/cli.py b/connectus/workflow_state/cli.py new file mode 100644 index 00000000000..ccc96bbede0 --- /dev/null +++ b/connectus/workflow_state/cli.py @@ -0,0 +1,1130 @@ +"""CLI commands and main dispatch. + +Each ``cmd_*`` function is the implementation of one CLI verb. They look +up validators / cross-checks by their ``Step.json_schema`` / +``Step.cross_check`` field rather than hardcoding step names. +""" +from __future__ import annotations + +import json +import subprocess +import sys +from typing import Callable, Optional + +from workflow_state.api import ( + _check_params_to_commands_overlap, + auth_param_ids, + get_integration_files, +) +from workflow_state.config_loader import get_config +from workflow_state.csv_io import ( + find_row, + wipe_workflow_data, +) + + +def load_csv(): # type: ignore[no-redef] + """Indirect to ``workflow_state.load_csv`` so tests can monkey-patch + the package-level binding without having to know which submodule + actually owns the function. + """ + import workflow_state as _ws + return _ws.load_csv() + + +def save_csv(rows): # type: ignore[no-redef] + """Indirect to ``workflow_state.save_csv`` so tests can monkey-patch.""" + import workflow_state as _ws + return _ws.save_csv(rows) +from workflow_state.display import ( + format_by_assignee, + format_dashboard_row, + format_next_line, + format_status, + format_step_for_listing, + format_step_value, +) +from workflow_state.exceptions import WorkflowError +from workflow_state.state_machine import ( + apply_step_action, + current_step, + has_workflow_progress, + reset_after, +) +from workflow_state.types import Step +from workflow_state.validators import ( + get_named_validator, + validate_auth_detail, + validate_params_to_commands, +) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _git_user_name() -> Optional[str]: + """Return ``git config user.name`` or None if unavailable.""" + try: + out = subprocess.run( + ["git", "config", "user.name"], + capture_output=True, text=True, check=False, timeout=5, + ) + name = out.stdout.strip() + return name or None + except (FileNotFoundError, subprocess.SubprocessError): + return None + + +def _resolve_git_user_name() -> Optional[str]: + """Indirect to ``workflow_state._git_user_name`` so tests can monkey-patch.""" + import workflow_state as _ws + return _ws._git_user_name() + + +def _resolve_row_or_exit(rows: list[dict[str, str]], name: str) -> int: + idx = find_row(rows, name) + if idx is None: + print(f"ERROR: Integration '{name}' not found.") + sys.exit(1) + return idx + + +def _set_step_via_dispatch( + row: dict[str, str], + target: Step, + new_value: str, + verb: str, +) -> str: + """Apply step action and return a user-facing message.""" + cfg = get_config() + integration_id = row.get("Integration ID", "") + cleared, no_op = apply_step_action(row, target, new_value, verb=verb) + if no_op: + return f"'{target.name}' already set to '{new_value}' for '{integration_id}'. No change." + msg = f"Set '{target.name}' (step {target.index}/{len(cfg.steps)}) for '{integration_id}'." + if cleared: + msg += f"\n Cleared {len(cleared)} subsequent step(s): {cleared}" + return msg + + +# --------------------------------------------------------------------------- +# Status / dashboard / next +# --------------------------------------------------------------------------- + +def cmd_status(args: list[str]) -> None: + if not args: + print("Usage: workflow_state.py status [id2 ...]") + sys.exit(1) + + rows = load_csv() + for name in args: + idx = find_row(rows, name) + if idx is None: + print(f"ERROR: Integration '{name}' not found.") + continue + print(format_status(rows[idx])) + + +def cmd_status_all(_args: list[str]) -> None: + rows = load_csv() + found = False + for row in rows: + if has_workflow_progress(row): + print(format_status(row)) + found = True + if not found: + print("No integrations have workflow progress yet.") + + +def cmd_dashboard(_args: list[str]) -> None: + rows = load_csv() + print(f"\n{'=' * 80}") + print(" WORKFLOW DASHBOARD") + print(f"{'=' * 80}") + print(f" {'Integration ID':45s} {'Progress':18s} → Current Step") + print(f" {'-' * 75}") + + in_progress = 0 + completed = 0 + not_started = 0 + + for row in rows: + line = format_dashboard_row(row) + if line: + print(line) + if current_step(row) is not None: + in_progress += 1 + else: + completed += 1 + else: + not_started += 1 + + print(f"\n Summary: {completed} complete, {in_progress} in progress, " + f"{not_started} not started") + + +# --------------------------------------------------------------------------- +# Setters for JSON-shaped data steps +# --------------------------------------------------------------------------- + +def _set_json_data_step(args: list[str], step_name: str, setter_cmd: str) -> None: + """Shared CLI handler for set-auth / set-params-* / set-shared-params.""" + cfg = get_config() + if len(args) < 2: + print(f"Usage: workflow_state.py {setter_cmd} ''") + print(" The value must be valid JSON (see connectus/column-schemas.md).") + sys.exit(1) + + name = args[0] + raw = " ".join(args[1:]) + + # JSON validation (always required for any json_schema-bound step) + try: + json.loads(raw) + except json.JSONDecodeError as e: + print(f"ERROR: '{step_name}' must be valid JSON.") + print(f" Got: {raw}") + print(f" Parse error: {e}") + print(f" Example: workflow_state.py {setter_cmd} \"{name}\" '{{}}'") + sys.exit(1) + + target = cfg.step_by_name.get(step_name) + if target is None: + print(f"ERROR: Unknown step '{step_name}'.") + sys.exit(1) + + # Look up the validator from the YAML config (Q3: bound by name). + validator = get_named_validator(target.json_schema) if target.json_schema else None + if validator is not None and target.json_schema not in (None, "any_json"): + schema_errors = validator(raw) + if schema_errors: + label = step_name + print(f"ERROR: {label} does not match the required schema.") + for err in schema_errors: + print(f" - {err}") + sys.exit(1) + + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + + try: + msg = _set_step_via_dispatch(rows[idx], target, raw, verb=setter_cmd) + except WorkflowError as e: + print(f"ERROR: {e.message}") + sys.exit(1) + + save_csv(rows) + print(msg) + cur = current_step(rows[idx]) + if cur is not None: + print(f" Current step: #{cur.index} {cur.name}") + elif has_workflow_progress(rows[idx]): + print(f" 🎉 All {len(cfg.steps)} steps complete!") + + +def cmd_set_auth(args: list[str]) -> None: + _set_json_data_step(args, "Auth Details", "set-auth") + + +def cmd_set_params_to_commands(args: list[str]) -> None: + cfg = get_config() + if len(args) >= 2: + name = args[0] + raw = " ".join(args[1:]) + # Strict schema check + schema_errors = validate_params_to_commands(raw) + if schema_errors: + print("ERROR: Params to Commands does not match the required schema.") + for err in schema_errors: + print(f" - {err}") + sys.exit(1) + # Cross-check (overlap with Auth Details) + target = cfg.step_by_name.get("Params to Commands") + if target is not None and target.cross_check == "params_to_commands_no_auth_overlap": + payload = json.loads(raw) + if isinstance(payload, dict): + try: + _check_params_to_commands_overlap(name, payload) + except WorkflowError as e: + print(f"ERROR: {e.message}") + sys.exit(1) + _set_json_data_step(args, "Params to Commands", "set-params-to-commands") + + +def cmd_set_params_for_test(args: list[str]) -> None: + _set_json_data_step(args, "Params for test with default in code", "set-params-for-test") + + +def cmd_set_shared_params(args: list[str]) -> None: + _set_json_data_step(args, "Params same in other handlers", "set-shared-params") + + +# --------------------------------------------------------------------------- +# Assignee (with carve-out: cascade_on_set=False on the YAML step) +# --------------------------------------------------------------------------- + +def cmd_set_assignee(args: list[str]) -> None: + """Set the assignee for an integration. + + The carve-out (no cascade reset) is now driven by the YAML + ``cascade_on_set: false`` field on the ``assignee`` step, which the + state machine honours in :func:`apply_step_action`. We still write + the cell directly to keep behaviour identical (no normalization + surprises) and to bypass `apply_step_action`'s kind-specific paths. + """ + if len(args) < 2: + print("Usage: workflow_state.py set-assignee ") + sys.exit(1) + + name = args[0] + assignee = " ".join(args[1:]) + + if not assignee.strip(): + print("ERROR: Assignee cannot be empty.") + sys.exit(1) + + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + + rows[idx]["assignee"] = assignee + save_csv(rows) + print(f"Set assignee for '{rows[idx]['Integration ID']}' to: {assignee}") + cur = current_step(rows[idx]) + if cur is not None: + print(f" Current step: #{cur.index} {cur.name}") + + +def cmd_set_assignee_by_connector(args: list[str]) -> None: + """Bulk-assign every integration in a connector. NO cascade reset.""" + if len(args) < 2: + print( + "Usage: workflow_state.py set-assignee-by-connector " + " " + ) + sys.exit(1) + + connector_id = args[0] + assignee = " ".join(args[1:]) + + if not assignee.strip(): + print("ERROR: Assignee cannot be empty.") + sys.exit(1) + + from workflow_state.api import list_by_connector + rows = load_csv() + matches = list_by_connector(rows, connector_id) + + if not matches: + print(f"ERROR: No integrations found for connector '{connector_id}'.") + print( + " Tip: run 'workflow_state.py list-connectors' to see all known " + "Connector IDs." + ) + sys.exit(1) + + for row in matches: + row["assignee"] = assignee + + save_csv(rows) + print( + f"Assigned {len(matches)} integration(s) in connector " + f"'{connector_id}' to '{assignee}':" + ) + for row in matches: + print(f" - {row.get('Integration ID', '')}") + + +# --------------------------------------------------------------------------- +# Flag setters / markpass / skip / fail / reset +# --------------------------------------------------------------------------- + +def cmd_set_auth_flag(args: list[str]) -> None: + """Set the 'requires auth parity test' flag (or whatever YAML configures). + + When the new value triggers a configured ``flag_auto_na_target`` + interaction, also write that interaction's ``write_value`` into the + target step so the user is auto-advanced past it. + """ + cfg = get_config() + if len(args) < 2: + print("Usage: workflow_state.py set-auth-flag ") + sys.exit(1) + + name = args[0] + flag = args[1].upper().strip() + + if flag not in set(cfg.markers.flag_values): + print(f"ERROR: Flag must be one of {list(cfg.markers.flag_values)}. Got: '{args[1]}'") + sys.exit(1) + + flag_col = cfg.auth_parity_flag_column + if flag_col is None: + print("ERROR: no flag_auto_na_target interaction configured.") + sys.exit(1) + + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + target = cfg.step_by_name[flag_col] + + try: + cleared, no_op = apply_step_action(rows[idx], target, flag, verb="set-auth-flag") + except WorkflowError as e: + print(f"ERROR: {e.message}") + sys.exit(1) + + interaction = cfg.find_flag_auto_na_target(flag_col) + if interaction is not None and flag in {v.upper() for v in interaction.when_value_in}: + rows[idx][interaction.target_step] = interaction.write_value + + save_csv(rows) + + if no_op: + print(f"'{flag_col}' already set to '{flag}' " + f"for '{rows[idx]['Integration ID']}'. No change.") + else: + print(f"Set '{flag_col}' = {flag} " + f"for '{rows[idx]['Integration ID']}'.") + if cleared: + print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") + if interaction is not None and flag in {v.upper() for v in interaction.when_value_in}: + print(f" Auto-set '{interaction.target_step}' = {interaction.write_value}.") + + cur = current_step(rows[idx]) + if cur is not None: + print(f" Current step: #{cur.index} {cur.name}") + elif has_workflow_progress(rows[idx]): + print(f" 🎉 All {len(cfg.steps)} steps complete!") + + +def cmd_markpass(args: list[str]) -> None: + cfg = get_config() + non_checkpoint = cfg.non_checkpoint_steps + + if len(args) < 2: + print("Usage: workflow_state.py markpass ") + print("\nCheckpoint steps (in order):") + for s in cfg.steps: + if s.kind == "checkpoint": + print(f" {s.index:2d}. {s.name}") + print("\nNon-checkpoint columns (use a different command):") + for step_name, cmd in non_checkpoint.items(): + print(f" - '{step_name}' → use '{cmd}'") + sys.exit(1) + + name = args[0] + step_name = " ".join(args[1:]) + + if step_name in non_checkpoint: + correct = non_checkpoint[step_name] + print( + f"ERROR: '{step_name}' is not a pass/fail checkpoint.\n" + f" Use '{correct}' instead.\n" + f" Example: workflow_state.py {correct} \"{name}\" " + ) + sys.exit(1) + + target = cfg.step_by_name.get(step_name) + if target is None: + print(f"ERROR: Unknown step '{step_name}'.") + print(f"Valid checkpoint steps: {', '.join(cfg.checkpoint_columns)}") + sys.exit(1) + + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + row = rows[idx] + + # Honour any configured flag_auto_na_target interaction whose + # target_step matches. + for inter in cfg.step_interactions: + if inter.kind == "flag_auto_na_target" and inter.target_step == step_name: + flag = row.get(inter.when_step, "").strip().upper() + if flag == "": + print( + f"ERROR: Cannot mark '{step_name}' as passed — " + f"'{inter.when_step}' flag is not set.\n" + f" Use 'set-auth-flag' first.\n" + f" Example: workflow_state.py set-auth-flag " + f"\"{row['Integration ID']}\" YES" + ) + sys.exit(1) + if flag in {v.upper() for v in inter.when_value_in}: + row[step_name] = inter.write_value + save_csv(rows) + print( + f"'{step_name}' set to {inter.write_value} (auth parity test not required)." + ) + return + + try: + cleared, no_op = apply_step_action(row, target, cfg.markers.check, verb="markpass") + except WorkflowError as e: + print(f"ERROR: {e.message}") + sys.exit(1) + + save_csv(rows) + if no_op: + print(f"'{step_name}' already passed. No change.") + else: + print(f"✅ '{step_name}' (step {target.index}/{len(cfg.steps)}) marked as passed " + f"for '{row['Integration ID']}'.") + if cleared: + print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") + + cur = current_step(row) + if cur is not None: + print(f" Next step: #{cur.index} {cur.name}") + elif has_workflow_progress(row): + print(f" 🎉 All {len(cfg.steps)} steps complete!") + + +def cmd_skip(args: list[str]) -> None: + cfg = get_config() + if len(args) < 2: + print("Usage: workflow_state.py skip ") + print("Skippable (optional) steps:") + for s in cfg.steps: + if s.optional: + print(f" {s.index:2d}. {s.name}") + sys.exit(1) + + name = args[0] + step_name = " ".join(args[1:]) + + target = cfg.step_by_name.get(step_name) + if target is None: + print(f"ERROR: Unknown step '{step_name}'.") + sys.exit(1) + + if not target.optional: + print(f"ERROR: step '{step_name}' is not optional and cannot be skipped.") + sys.exit(1) + + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + row = rows[idx] + + try: + cleared, _no_op = apply_step_action(row, target, cfg.markers.na, verb="skip") + except WorkflowError as e: + print(f"ERROR: {e.message}") + sys.exit(1) + + save_csv(rows) + print(f"✓ Skipped step {target.index} ('{target.name}') for '{row['Integration ID']}'.") + if cleared: + print(f" Cleared {len(cleared)} subsequent step(s): {cleared}") + cur = current_step(row) + if cur is not None: + print(f" Next step: #{cur.index} {cur.name}") + + +def _do_reset_to(rows: list[dict[str, str]], idx: int, step_name: str, verb: str) -> None: + """Shared implementation for ``fail`` and ``reset-to``. + + Honours ``preserve_on_reset``: steps tagged as preserved retain + their value across this operation, EXCEPT when the user names the + preserved step explicitly as ``target`` — in that case the user's + intent wins for that one step (it is cleared) but later preserved + steps in the same blast radius are still preserved. + """ + cfg = get_config() + target = cfg.step_by_name.get(step_name) + if target is None: + print(f"ERROR: Unknown step '{step_name}'.") + print(f"Valid steps: {', '.join(cfg.workflow_columns)}") + sys.exit(1) + + row = rows[idx] + integration_id = row.get("Integration ID", "") + + # Explicit-target carve-out: clear the named target even if it's + # tagged preserve_on_reset (the user named it on purpose). + row[target.name] = "" + + # Then clear everything strictly after, honouring preserve_on_reset + # for those later steps. When target.index == 1 there is nothing + # before it; otherwise we still pivot on `target` itself (which has + # already been cleared above) and rely on reset_after's strict + # "index > step.index" filter. + cleared, preserved = reset_after(row, target, respect_preserve=True) + + save_csv(rows) + print(f"{verb}: cleared step {target.index} ('{target.name}') and all " + f"subsequent non-preserved steps for '{integration_id}'.") + if preserved: + print( + f" Preserved (preserve_on_reset=true): {preserved}" + ) + cur = current_step(row) + if cur is not None: + print(f" Current step is now: #{cur.index} {cur.name}") + + +def cmd_fail(args: list[str]) -> None: + cfg = get_config() + if len(args) < 2: + print("Usage: workflow_state.py fail ") + print(f"Valid steps: {', '.join(cfg.workflow_columns)}") + sys.exit(1) + name = args[0] + step_name = " ".join(args[1:]) + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + _do_reset_to(rows, idx, step_name, verb="Reset (fail)") + + +def cmd_reset_to(args: list[str]) -> None: + cfg = get_config() + if len(args) < 2: + print("Usage: workflow_state.py reset-to ") + print(f"Valid steps: {', '.join(cfg.workflow_columns)}") + sys.exit(1) + name = args[0] + step_name = " ".join(args[1:]) + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + _do_reset_to(rows, idx, step_name, verb="Reset-to") + + +def cmd_wipe_workflow_data(args: list[str]) -> None: + """⚠️ Bulk-wipe every workflow column in the pipeline CSV. + + Identity columns (Integration ID, Integration File Path, Connector ID, + plus any future identity columns from the YAML) are preserved + verbatim. The header is regenerated from the YAML config so the + columns are re-aligned to the current workflow plan. + + Use this only when the workflow plan changes shape and you want to + keep the integration roster but drop every per-row state cell. To + reset a single integration, use 'reset' instead. + + Requires --yes-i-am-sure to proceed. Writes a sibling backup at + ``.bak.`` unless --no-backup is given. + """ + sure = "--yes-i-am-sure" in args + no_backup = "--no-backup" in args + + banner = ( + "\n" + "╔══════════════════════════════════════════════════════════════════════════╗\n" + "║ ⚠️ DESTRUCTIVE OPERATION: wipe-workflow-data ║\n" + "║ ║\n" + "║ This will CLEAR every workflow column for EVERY row in the ║\n" + "║ connectus pipeline CSV. Identity columns are preserved. ║\n" + "║ The header is rewritten from connectus/workflow_state_config.yml ║\n" + "║ so the file aligns with the current workflow plan. ║\n" + "║ ║\n" + "║ Use 'reset ' instead if you only want to reset ║\n" + "║ one row. There is no undo for this operation other than the ║\n" + "║ timestamped backup file written next to the CSV. ║\n" + "╚══════════════════════════════════════════════════════════════════════════╝\n" + ) + print(banner) + + if not sure: + print( + "Refusing to run without --yes-i-am-sure.\n" + " Re-run as: workflow_state.py wipe-workflow-data --yes-i-am-sure\n" + " Add --no-backup to skip the timestamped backup file." + ) + sys.exit(1) + + try: + result = wipe_workflow_data(confirm=True, backup=not no_backup) + except FileNotFoundError as e: + print(f"ERROR: pipeline CSV not found: {e}") + sys.exit(1) + + header_cols = result["header"] + n_header = len(header_cols) if isinstance(header_cols, list) else 0 + print(f"✅ Wiped {result['cells_cleared']} workflow cell(s) " + f"across {result['rows_touched']} row(s).") + print(f" Rows preserved: {result['rows']}") + print(f" Header columns: {n_header}") + if result["backup_path"]: + print(f" Backup written: {result['backup_path']}") + else: + print(" Backup written: (skipped via --no-backup)") + print(f" CSV path: {result['csv_path']}") + + +def cmd_reset(args: list[str]) -> None: + cfg = get_config() + if not args: + print("Usage: workflow_state.py reset ") + sys.exit(1) + + name = args[0] + rows = load_csv() + idx = _resolve_row_or_exit(rows, name) + + for col in cfg.workflow_columns: + rows[idx][col] = "" + + save_csv(rows) + print(f"Reset all workflow columns for '{rows[idx]['Integration ID']}'.") + + +# --------------------------------------------------------------------------- +# Listing commands +# --------------------------------------------------------------------------- + +def cmd_at_step(args: list[str]) -> None: + cfg = get_config() + if not args: + print("Usage: workflow_state.py at-step ") + print(f"Valid steps: {', '.join(cfg.workflow_columns)}") + sys.exit(1) + + step_name = " ".join(args) + if step_name not in cfg.step_by_name: + print(f"ERROR: Unknown step '{step_name}'.") + print(f"Valid steps: {', '.join(cfg.workflow_columns)}") + sys.exit(1) + + rows = load_csv() + matches = [ + row["Integration ID"] + for row in rows + if (cur := current_step(row)) is not None and cur.name == step_name + ] + + if matches: + print(f"\nIntegrations currently at step '{step_name}' ({len(matches)}):") + for name in matches: + print(f" - {name}") + else: + print(f"No integrations are currently at step '{step_name}'.") + + +def cmd_list(_args: list[str]) -> None: + rows = load_csv() + for row in rows: + print(row.get("Integration ID", "")) + + +def cmd_list_by_assignee(args: list[str]) -> None: + if not args: + print("Usage: workflow_state.py list-by-assignee ") + sys.exit(1) + assignee_name = " ".join(args) + rows = load_csv() + from workflow_state.api import list_by_assignee + matches = list_by_assignee(rows, assignee_name) + print(format_by_assignee(matches, assignee_name)) + + +def cmd_list_by_connector(args: list[str]) -> None: + if not args: + print("Usage: workflow_state.py list-by-connector ") + sys.exit(1) + + connector_id = " ".join(args) + rows = load_csv() + from workflow_state.api import list_by_connector + matches = list_by_connector(rows, connector_id) + + if not matches: + print(f"No integrations found for connector '{connector_id}'.") + print(" Tip: run 'workflow_state.py list-connectors' to see all known Connector IDs.") + return + + print(f"\nIntegrations in connector '{connector_id}' ({len(matches)}):") + for row in matches: + integration_id = row.get("Integration ID", "") + assignee = row.get("assignee", "").strip() or "unassigned" + step_display = format_step_for_listing(row) + print(f" - {integration_id} [assignee: {assignee}] → {step_display}") + + +def cmd_list_connectors(_args: list[str]) -> None: + rows = load_csv() + + buckets: dict[str, dict] = {} + for row in rows: + cid_raw = row.get("Connector ID", "").strip() + if not cid_raw: + continue + key = cid_raw.lower() + bucket = buckets.setdefault( + key, + {"display": cid_raw, "rows": []}, + ) + bucket["rows"].append(row) + + if not buckets: + print("No connectors found in the CSV.") + return + + sorted_keys = sorted(buckets.keys(), key=lambda k: buckets[k]["display"].lower()) + + max_id_len = max(len(buckets[k]["display"]) for k in sorted_keys) + id_col_width = max(max_id_len, len("Connector ID")) + + header = ( + f"{'Connector ID':<{id_col_width}} {'Integrations':>12} " + f"{'In Progress':>11} {'Complete':>8}" + ) + rule = ( + f"{'-' * id_col_width} {'-' * 12} {'-' * 11} {'-' * 8}" + ) + print(header) + print(rule) + for key in sorted_keys: + bucket = buckets[key] + bucket_rows: list[dict[str, str]] = bucket["rows"] + total = len(bucket_rows) + in_progress = 0 + complete = 0 + for r in bucket_rows: + if not has_workflow_progress(r): + continue + if current_step(r) is None: + complete += 1 + else: + in_progress += 1 + print( + f"{bucket['display']:<{id_col_width}} {total:>12} " + f"{in_progress:>11} {complete:>8}" + ) + + +def cmd_show_step(args: list[str]) -> None: + cfg = get_config() + if len(args) < 2: + print("Usage: workflow_state.py show-step ") + print("\nValid columns:") + for col in cfg.identity_column_names: + print(f" - {col} (data)") + for col in cfg.workflow_columns: + print(f" - {col}") + sys.exit(1) + + name = args[0] + step = " ".join(args[1:]) + + rows = load_csv() + idx = find_row(rows, name) + if idx is None: + print(f"ERROR: Integration '{name}' not found.") + sys.exit(1) + + valid_steps = set(cfg.workflow_columns) | set(cfg.identity_column_names) + if step not in valid_steps: + print(f"ERROR: Unknown column '{step}' for integration '{rows[idx]['Integration ID']}'.") + print(f"Valid columns: {', '.join(sorted(valid_steps))}") + sys.exit(1) + + print(format_step_value(rows[idx], step)) + + +# --------------------------------------------------------------------------- +# files / auth-params +# --------------------------------------------------------------------------- + +def cmd_files(args: list[str]) -> None: + fmt = "text" + positional: list[str] = [] + for a in args: + if a.startswith("--format="): + fmt = a[len("--format="):] + else: + positional.append(a) + + if not positional: + print("Usage: workflow_state.py files [--format=text|json|paths]") + sys.exit(1) + + if fmt not in {"text", "json", "paths"}: + print(f"ERROR: Unknown --format value '{fmt}'. Valid: text, json, paths.", file=sys.stderr) + sys.exit(1) + + integration_id = " ".join(positional) + info = get_integration_files(integration_id) + + if "error" in info: + print(f"ERROR: {info['error']}", file=sys.stderr) + sys.exit(1) + + if fmt == "json": + print(json.dumps(info, indent=2)) + return + + if fmt == "paths": + for key in ("yml", "code", "description", "readme", "test"): + val = info.get(key) + if val: + print(val) + return + + name = info["integration_id"] + lines = [ + f"\n{'=' * 60}", + f" {name} — source files", + f"{'=' * 60}", + f" Directory: {info['directory']}", + f" Base: {info['base']}", + f" Language: {info['code_language'] if info['code_language'] else '(unknown)'}", + "", + f" YML: {info['yml'] if info['yml'] else '(missing)'}", + f" Code: {info['code'] if info['code'] else '(missing)'}", + f" Description: {info['description'] if info['description'] else '(missing)'}", + f" README: {info['readme'] if info['readme'] else '(missing)'}", + f" Test: {info['test'] if info['test'] else '(missing)'}", + ] + extras = info.get("extras") or {} + if extras: + lines.append("") + lines.append(" Other files in directory:") + for fname in sorted(extras.keys()): + lines.append(f" - {fname}") + print("\n".join(lines)) + + +def cmd_auth_params(args: list[str]) -> None: + fmt = "text" + positional: list[str] = [] + for a in args: + if a.startswith("--format="): + fmt = a[len("--format="):] + else: + positional.append(a) + + if not positional: + print( + "Usage: workflow_state.py auth-params " + "[--format=text|json]" + ) + sys.exit(1) + + if fmt not in {"text", "json"}: + print( + f"ERROR: Unknown --format value '{fmt}'. Valid: text, json.", + file=sys.stderr, + ) + sys.exit(1) + + integration_id = " ".join(positional) + try: + params = auth_param_ids(integration_id) + except WorkflowError as e: + print(f"ERROR: {e.message}", file=sys.stderr) + sys.exit(1) + + if fmt == "json": + print(json.dumps( + {"integration_id": integration_id, "params": params}, + indent=2, + )) + return + + for p in params: + print(p) + + +# --------------------------------------------------------------------------- +# next +# --------------------------------------------------------------------------- + +def _parse_next_flags(args: list[str]) -> tuple[Optional[str], bool, list[str]]: + """Parse `--connector ` and `--mine` out of args (order-independent).""" + connector_id: Optional[str] = None + mine = False + leftover: list[str] = [] + i = 0 + while i < len(args): + a = args[i] + if a == "--mine": + mine = True + i += 1 + continue + if a == "--connector": + if i + 1 >= len(args): + print("ERROR: --connector requires a connector id argument.") + sys.exit(1) + connector_id = args[i + 1] + i += 2 + continue + if a.startswith("--connector="): + connector_id = a[len("--connector="):] + i += 1 + continue + leftover.append(a) + i += 1 + return connector_id, mine, leftover + + +def cmd_next(args: list[str]) -> None: + rows = load_csv() + + if not rows: + print("(no rows in CSV — nothing to do)") + return + + connector_id, mine, leftover = _parse_next_flags(args) + + if leftover and leftover[0] != "--all" and connector_id is None and not mine: + name = " ".join(leftover) + idx = find_row(rows, name) + if idx is None: + print(f"ERROR: Integration '{name}' not found.") + sys.exit(1) + print(format_next_line(rows[idx])) + return + + show_all = bool(leftover and leftover[0] == "--all") + if show_all and (mine or connector_id is not None): + show_all = False + + target_assignee: Optional[str] = None + use_assignee_filter = (not show_all) and (mine or connector_id is None) + if use_assignee_filter: + target_assignee = _resolve_git_user_name() + if not target_assignee: + if connector_id is None: + print( + "ERROR: cannot determine current user via 'git config user.name'.\n" + " Pass an integration ID, or use 'next --all' to list everyone's work." + ) + sys.exit(1) + target_assignee = None + use_assignee_filter = False + + candidate_rows = rows + if connector_id is not None: + from workflow_state.api import list_by_connector + candidate_rows = list_by_connector(rows, connector_id) + if not candidate_rows: + print(f"No integrations found for connector '{connector_id}'.") + print( + " Tip: run 'workflow_state.py list-connectors' to see all known " + "Connector IDs." + ) + return + + matched_any = False + any_in_progress_in_connector = False + for row in candidate_rows: + if not has_workflow_progress(row): + continue + if current_step(row) is None: + continue + any_in_progress_in_connector = True + if use_assignee_filter: + if row.get("assignee", "").strip().lower() != (target_assignee or "").lower(): + continue + print(format_next_line(row)) + print() + matched_any = True + + if matched_any: + return + + if connector_id is not None and not any_in_progress_in_connector: + print( + f"No in-progress integrations in connector '{connector_id}' " + f"(all are either unstarted or done)." + ) + return + if connector_id is not None and use_assignee_filter: + print( + f"No in-progress integrations in connector '{connector_id}' " + f"assigned to '{target_assignee}'." + ) + return + if connector_id is not None: + print(f"No in-progress integrations in connector '{connector_id}'.") + return + if show_all: + print("No in-progress integrations.") + return + print(f"No in-progress integrations assigned to '{target_assignee}'.") + + +# --------------------------------------------------------------------------- +# Help & main dispatch +# --------------------------------------------------------------------------- + +_DOC = """Workflow State Machine for connectus-migration-pipeline.csv (UNIFIED 16-STEP MODEL) + +This script manages the workflow tracking columns in the CSV. The shape +of the workflow (steps, columns, markers) is declared in +connectus/workflow_state_config.yml. The runtime engine lives in the +connectus/workflow_state/ Python package. + +Usage examples: + python3 connectus/workflow_state.py status "Cisco Spark" + python3 connectus/workflow_state.py dashboard + python3 connectus/workflow_state.py next + python3 connectus/workflow_state.py set-assignee "Cisco Spark" "John Doe" + python3 connectus/workflow_state.py set-auth "Cisco Spark" '' + python3 connectus/workflow_state.py markpass "Cisco Spark" "wrote/checked code" +""" + + +def cmd_help(_args: list[str]) -> None: + print(_DOC) + + +COMMANDS: dict[str, Callable[[list[str]], None]] = { + "status": cmd_status, + "status-all": cmd_status_all, + "dashboard": cmd_dashboard, + "next": cmd_next, + "set-assignee": cmd_set_assignee, + "set-auth": cmd_set_auth, + "set-params-to-commands": cmd_set_params_to_commands, + "set-params-for-test": cmd_set_params_for_test, + "set-shared-params": cmd_set_shared_params, + "set-auth-flag": cmd_set_auth_flag, + "markpass": cmd_markpass, + "skip": cmd_skip, + "fail": cmd_fail, + "reset-to": cmd_reset_to, + "reset": cmd_reset, + "wipe-workflow-data": cmd_wipe_workflow_data, + "at-step": cmd_at_step, + "list": cmd_list, + "list-by-assignee": cmd_list_by_assignee, + "list-by-connector": cmd_list_by_connector, + "list-connectors": cmd_list_connectors, + "set-assignee-by-connector": cmd_set_assignee_by_connector, + "show-step": cmd_show_step, + "files": cmd_files, + "auth-params": cmd_auth_params, + "help": cmd_help, +} + + +def main() -> None: + if len(sys.argv) < 2: + cmd_help([]) + sys.exit(1) + + command = sys.argv[1] + args = sys.argv[2:] + + if command not in COMMANDS: + print(f"ERROR: Unknown command '{command}'.") + print(f"Available commands: {', '.join(COMMANDS.keys())}") + sys.exit(1) + + COMMANDS[command](args) + + +# Re-exports for back-compat: tests and external callers can import +# `validate_auth_detail` / `validate_params_to_commands` from the module +# (both were CLI-side names in the legacy file). +__all__ = sorted({ + *COMMANDS.keys(), + "main", + "_set_json_data_step", + "_check_params_to_commands_overlap", + "_resolve_row_or_exit", + "_set_step_via_dispatch", + "_parse_next_flags", + "_git_user_name", + "validate_auth_detail", + "validate_params_to_commands", +}) diff --git a/connectus/workflow_state/config_loader.py b/connectus/workflow_state/config_loader.py new file mode 100644 index 00000000000..5309b1eab4b --- /dev/null +++ b/connectus/workflow_state/config_loader.py @@ -0,0 +1,596 @@ +"""YAML config loader for the workflow_state package. + +Reads ``connectus/workflow_state_config.yml``, validates the schema +described in ``workflow_state_DESIGN.md`` §5.2, and returns a fully +typed :class:`~workflow_state.types.WorkflowConfig` value. + +All errors are collected before the loader raises; callers see a single +:class:`~workflow_state.exceptions.ConfigLoadError` whose ``.errors`` +attribute lists every individual problem. +""" +from __future__ import annotations + +import os +from typing import Any, Optional + +import yaml + +from workflow_state.exceptions import ConfigLoadError +from workflow_state.types import ( + IdentityColumn, + MarkerSet, + Step, + StepInteraction, + WorkflowConfig, +) +from workflow_state.validators import ( + is_known_cross_check, + known_cross_check_names, + known_validator_names, +) + + +SUPPORTED_SCHEMA_VERSIONS = (1,) +_VALID_STEP_KINDS = {"data", "checkpoint", "flag"} +_VALID_INTERACTION_KINDS = {"flag_auto_na_target"} + +_TOP_LEVEL_REQUIRED = {"schema_version", "identity_columns", "markers", "steps"} +_TOP_LEVEL_OPTIONAL = {"step_interactions"} +_TOP_LEVEL_ALLOWED = _TOP_LEVEL_REQUIRED | _TOP_LEVEL_OPTIONAL + +# Default path: connectus/workflow_state_config.yml, sitting next to the +# `connectus/` package directory. +_DEFAULT_CONFIG_PATH = os.path.join( + os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + "workflow_state_config.yml", +) + + +# Module-level singleton cache. +_CACHED_CONFIG: Optional[WorkflowConfig] = None +_CACHED_PATH: Optional[str] = None + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + +def default_config_path() -> str: + """Return the on-disk path to the bundled workflow config YAML.""" + return _DEFAULT_CONFIG_PATH + + +def load_config(path: Optional[str] = None) -> WorkflowConfig: + """Load and validate the YAML config; return a :class:`WorkflowConfig`. + + Args: + path: Optional explicit path. Defaults to the file shipped next + to the package (``connectus/workflow_state_config.yml``). + + Raises: + ConfigLoadError: When the file is missing, unparseable, or the + schema validation fails. ``ConfigLoadError.errors`` lists + every individual problem. + """ + global _CACHED_CONFIG, _CACHED_PATH + + resolved = os.path.abspath(path) if path else _DEFAULT_CONFIG_PATH + + if _CACHED_CONFIG is not None and _CACHED_PATH == resolved: + return _CACHED_CONFIG + + raw = _read_yaml(resolved) + config = _build_and_validate(raw, resolved) + + _CACHED_CONFIG = config + _CACHED_PATH = resolved + return config + + +def get_config() -> WorkflowConfig: + """Return the cached :class:`WorkflowConfig`. First call triggers load.""" + if _CACHED_CONFIG is None: + return load_config() + return _CACHED_CONFIG + + +def _reset_config_for_testing() -> None: + """Clear the singleton cache. Tests use this between fixture YAMLs.""" + global _CACHED_CONFIG, _CACHED_PATH + _CACHED_CONFIG = None + _CACHED_PATH = None + + +# --------------------------------------------------------------------------- +# Internals +# --------------------------------------------------------------------------- + +def _read_yaml(path: str) -> dict: + """Open the YAML file and return the parsed top-level mapping. + + Wraps both file-not-found and YAML parse errors in + :class:`ConfigLoadError` with the full path embedded. + """ + if not os.path.exists(path): + raise ConfigLoadError( + f"workflow config file not found: {path}", + errors=[f"workflow config file not found: {path}"], + ) + try: + with open(path, "r", encoding="utf-8") as f: + data = yaml.safe_load(f) + except yaml.YAMLError as e: + msg = f"YAML parse error in {path}: {e}" + raise ConfigLoadError(msg, errors=[msg]) + + if data is None: + msg = f"workflow config file is empty: {path}" + raise ConfigLoadError(msg, errors=[msg]) + + if not isinstance(data, dict): + msg = f"workflow config root must be a mapping; got {type(data).__name__} in {path}" + raise ConfigLoadError(msg, errors=[msg]) + + return data + + +def _raise_if_errors(errors: list[str], path: str) -> None: + if not errors: + return + summary = ( + f"{path} has {len(errors)} problem(s):\n" + + "\n".join(f" - {e}" for e in errors) + ) + raise ConfigLoadError(summary, errors=list(errors)) + + +def _build_and_validate(raw: dict, path: str) -> WorkflowConfig: + """Validate the raw mapping and assemble a :class:`WorkflowConfig`.""" + errors: list[str] = [] + + # ---- Top-level shape ------------------------------------------------ + raw_keys = set(raw.keys()) + missing_top = _TOP_LEVEL_REQUIRED - raw_keys + extra_top = raw_keys - _TOP_LEVEL_ALLOWED + for k in sorted(missing_top): + errors.append(f"missing required top-level key: {k!r}") + for k in sorted(extra_top): + errors.append(f"unknown top-level key: {k!r}") + + # If the top-level shape is broken, bail early so we don't crash + # below trying to access missing keys. + if missing_top: + _raise_if_errors(errors, path) + + # ---- schema_version ------------------------------------------------- + schema_version = raw.get("schema_version") + if not isinstance(schema_version, int): + errors.append( + f"schema_version must be an int; got {type(schema_version).__name__}" + ) + elif schema_version not in SUPPORTED_SCHEMA_VERSIONS: + errors.append( + f"unsupported schema_version: {schema_version} " + f"(supported: {list(SUPPORTED_SCHEMA_VERSIONS)})" + ) + + # ---- identity_columns ---------------------------------------------- + identity_columns, ic_errors = _build_identity_columns(raw.get("identity_columns")) + errors.extend(ic_errors) + + # ---- markers -------------------------------------------------------- + markers, marker_errors = _build_markers(raw.get("markers")) + errors.extend(marker_errors) + + # ---- steps ---------------------------------------------------------- + steps, step_errors = _build_steps(raw.get("steps")) + errors.extend(step_errors) + + # Cross-validation: identity column names must not collide with step names. + if identity_columns is not None and steps is not None: + ic_names = {c.name for c in identity_columns} + for s in steps: + if s.name in ic_names: + errors.append( + f"steps[{s.index}] ({s.name!r}): step name collides " + f"with an identity_columns entry" + ) + + # ---- step_interactions --------------------------------------------- + interactions, inter_errors = _build_interactions( + raw.get("step_interactions"), + steps, + markers, + ) + errors.extend(inter_errors) + + _raise_if_errors(errors, path) + + return WorkflowConfig( + schema_version=schema_version, + identity_columns=tuple(identity_columns), + markers=markers, + steps=tuple(steps), + step_interactions=tuple(interactions), + ) + + +def _build_identity_columns( + raw: Any, +) -> tuple[Optional[list[IdentityColumn]], list[str]]: + errors: list[str] = [] + if not isinstance(raw, list) or not raw: + errors.append("identity_columns must be a non-empty list") + return None, errors + + out: list[IdentityColumn] = [] + seen: set[str] = set() + for i, item in enumerate(raw): + if not isinstance(item, dict): + errors.append( + f"identity_columns[{i}] must be a mapping; got " + f"{type(item).__name__}" + ) + continue + name = item.get("name") + description = item.get("description", "") + if not isinstance(name, str) or not name.strip(): + errors.append( + f"identity_columns[{i}].name must be a non-empty string" + ) + continue + if name in seen: + errors.append( + f"identity_columns[{i}].name {name!r} is duplicated" + ) + seen.add(name) + if description is None: + description = "" + if not isinstance(description, str): + errors.append( + f"identity_columns[{i}].description must be a string" + ) + description = "" + out.append(IdentityColumn(name=name, description=description)) + + if not out and not errors: + errors.append("identity_columns must contain at least one valid entry") + return (out if out else None), errors + + +def _build_markers(raw: Any) -> tuple[Optional[MarkerSet], list[str]]: + errors: list[str] = [] + if not isinstance(raw, dict): + errors.append("markers must be a mapping") + return None, errors + + required_keys = {"check", "fail", "na", "checkpoint_done_values", "flag_values"} + missing = required_keys - set(raw.keys()) + for k in sorted(missing): + errors.append(f"markers.{k}: missing required key") + if missing: + return None, errors + + check = raw["check"] + fail = raw["fail"] + na = raw["na"] + done_values = raw["checkpoint_done_values"] + flag_values = raw["flag_values"] + + for field_name, value in (("check", check), ("fail", fail), ("na", na)): + if not isinstance(value, str) or not value: + errors.append(f"markers.{field_name} must be a non-empty string") + + if not isinstance(done_values, list) or not done_values: + errors.append("markers.checkpoint_done_values must be a non-empty list of strings") + return None, errors + for i, v in enumerate(done_values): + if not isinstance(v, str) or not v: + errors.append( + f"markers.checkpoint_done_values[{i}] must be a non-empty string" + ) + + if not isinstance(flag_values, list) or not flag_values: + errors.append("markers.flag_values must be a non-empty list of strings") + return None, errors + if len(set(flag_values)) != len(flag_values): + errors.append("markers.flag_values must contain unique values") + for i, v in enumerate(flag_values): + if not isinstance(v, str) or not v: + errors.append(f"markers.flag_values[{i}] must be a non-empty string") + + if isinstance(check, str) and check not in done_values: + errors.append( + f"markers.checkpoint_done_values: missing required value " + f"{check!r} (markers.check)" + ) + if isinstance(na, str) and na not in done_values: + errors.append( + f"markers.checkpoint_done_values: missing required value " + f"{na!r} (markers.na)" + ) + + if errors: + return None, errors + + return ( + MarkerSet( + check=check, + fail=fail, + na=na, + checkpoint_done_values=tuple(done_values), + flag_values=tuple(flag_values), + ), + errors, + ) + + +def _build_steps(raw: Any) -> tuple[Optional[list[Step]], list[str]]: + errors: list[str] = [] + if not isinstance(raw, list) or not raw: + errors.append("steps must be a non-empty list") + return None, errors + + out: list[Step] = [] + seen_names: set[str] = set() + seen_setters: set[str] = set() + for idx, item in enumerate(raw, start=1): + if not isinstance(item, dict): + errors.append(f"steps[{idx}] must be a mapping; got {type(item).__name__}") + continue + step, step_errors = _build_one_step(idx, item) + errors.extend(step_errors) + if step is None: + continue + if step.name in seen_names: + errors.append(f"steps[{idx}] ({step.name!r}): duplicate step name") + seen_names.add(step.name) + if step.setter is not None: + if step.setter in seen_setters: + errors.append( + f"steps[{idx}] ({step.name!r}): duplicate setter " + f"{step.setter!r} (already used by an earlier step)" + ) + seen_setters.add(step.setter) + out.append(step) + + if not out and not errors: + errors.append("steps must contain at least one valid entry") + return (out if out else None), errors + + +def _coerce_validator_name(item: Any, field_label: str, errors: list[str]) -> Optional[str]: + """Accept either ``{"validator": "name"}`` (preferred per Q3) or a bare string.""" + if item is None: + return None + if isinstance(item, str): + return item + if isinstance(item, dict): + if "validator" in item and isinstance(item["validator"], str): + return item["validator"] + errors.append( + f"{field_label}: dict form must contain a 'validator' key with " + f"a string value (e.g. {{'validator': 'auth_details'}})" + ) + return None + errors.append( + f"{field_label}: must be a string or {{'validator': ''}} mapping" + ) + return None + + +def _build_one_step(index: int, item: dict) -> tuple[Optional[Step], list[str]]: + errors: list[str] = [] + name = item.get("name") + if not isinstance(name, str) or not name.strip(): + errors.append(f"steps[{index}].name must be a non-empty string") + return None, errors + + label = f"steps[{index}] ({name!r})" + + kind = item.get("kind") + if kind not in _VALID_STEP_KINDS: + errors.append( + f"{label}.kind must be one of {sorted(_VALID_STEP_KINDS)}; got {kind!r}" + ) + return None, errors + + optional = item.get("optional") + if not isinstance(optional, bool): + errors.append(f"{label}.optional must be a bool; got {type(optional).__name__}") + optional = False + + description = item.get("description") + if not isinstance(description, str) or not description.strip(): + errors.append(f"{label}.description must be a non-empty string") + description = "" + + setter = item.get("setter", None) + if kind in ("data", "flag"): + if not isinstance(setter, str) or not setter.strip(): + errors.append( + f"{label}.setter must be a non-empty string for kind={kind!r}; " + f"got {setter!r}" + ) + setter = None + elif kind == "checkpoint": + if setter is not None: + errors.append( + f"{label}.setter must be null/absent for kind=checkpoint; " + f"got {setter!r}" + ) + setter = None + + cascade_on_set = item.get("cascade_on_set", True) + if not isinstance(cascade_on_set, bool): + errors.append( + f"{label}.cascade_on_set must be a bool; got {type(cascade_on_set).__name__}" + ) + cascade_on_set = True + + preserve_on_reset = item.get("preserve_on_reset", False) + if not isinstance(preserve_on_reset, bool): + errors.append( + f"{label}.preserve_on_reset must be a bool; got " + f"{type(preserve_on_reset).__name__}" + ) + preserve_on_reset = False + + json_schema_name = _coerce_validator_name( + item.get("json_schema"), f"{label}.json_schema", errors + ) + if json_schema_name is not None and json_schema_name not in known_validator_names(): + errors.append( + f"{label}.json_schema: unknown validator name {json_schema_name!r}; " + f"valid: {known_validator_names()}" + ) + json_schema_name = None + + cross_check_name = _coerce_validator_name( + item.get("cross_check"), f"{label}.cross_check", errors + ) + if cross_check_name is not None and not is_known_cross_check(cross_check_name): + errors.append( + f"{label}.cross_check: unknown cross_check name {cross_check_name!r}; " + f"valid: {known_cross_check_names()}" + ) + cross_check_name = None + + if errors and (kind not in _VALID_STEP_KINDS or not name): + return None, errors + + return ( + Step( + index=index, + name=name, + kind=kind, + optional=optional, + setter=setter, + description=description, + cascade_on_set=cascade_on_set, + json_schema=json_schema_name, + cross_check=cross_check_name, + preserve_on_reset=preserve_on_reset, + ), + errors, + ) + + +def _build_interactions( + raw: Any, + steps: Optional[list[Step]], + markers: Optional[MarkerSet], +) -> tuple[list[StepInteraction], list[str]]: + errors: list[str] = [] + out: list[StepInteraction] = [] + if raw is None: + return out, errors + if not isinstance(raw, list): + errors.append("step_interactions must be a list (or omitted)") + return out, errors + + by_name = {s.name: s for s in (steps or [])} + flag_values = set(markers.flag_values) if markers else set() + done_values = set(markers.checkpoint_done_values) if markers else set() + seen_when_steps: set[str] = set() + + for i, item in enumerate(raw): + if not isinstance(item, dict): + errors.append( + f"step_interactions[{i}] must be a mapping; got {type(item).__name__}" + ) + continue + kind = item.get("kind") + if kind not in _VALID_INTERACTION_KINDS: + errors.append( + f"step_interactions[{i}].kind must be one of " + f"{sorted(_VALID_INTERACTION_KINDS)}; got {kind!r}" + ) + continue + + when_step = item.get("when_step") + target_step = item.get("target_step") + when_value_in = item.get("when_value_in") + write_value = item.get("write_value") + + # when_step + if not isinstance(when_step, str) or not when_step.strip(): + errors.append( + f"step_interactions[{i}].when_step must be a non-empty string" + ) + continue + when_obj = by_name.get(when_step) + if when_obj is None: + errors.append( + f"step_interactions[{i}].when_step references unknown step " + f"{when_step!r}" + ) + elif when_obj.kind != "flag": + errors.append( + f"step_interactions[{i}].when_step ({when_step!r}) must be " + f"kind=flag; got kind={when_obj.kind!r}" + ) + + # target_step + if not isinstance(target_step, str) or not target_step.strip(): + errors.append( + f"step_interactions[{i}].target_step must be a non-empty string" + ) + continue + target_obj = by_name.get(target_step) + if target_obj is None: + errors.append( + f"step_interactions[{i}].target_step references unknown step " + f"{target_step!r}" + ) + elif target_obj.kind != "checkpoint": + errors.append( + f"step_interactions[{i}].target_step ({target_step!r}) must " + f"be kind=checkpoint; got kind={target_obj.kind!r}" + ) + + # when_value_in + if not isinstance(when_value_in, list) or not when_value_in: + errors.append( + f"step_interactions[{i}].when_value_in must be a non-empty list" + ) + when_value_in_clean: tuple[str, ...] = () + else: + when_value_in_clean = tuple(when_value_in) + for v in when_value_in: + if v not in flag_values: + errors.append( + f"step_interactions[{i}].when_value_in: value {v!r} " + f"is not in markers.flag_values" + ) + + # write_value + if not isinstance(write_value, str) or not write_value: + errors.append( + f"step_interactions[{i}].write_value must be a non-empty string" + ) + elif write_value not in done_values: + errors.append( + f"step_interactions[{i}].write_value: {write_value!r} is not " + f"in markers.checkpoint_done_values" + ) + + if kind == "flag_auto_na_target" and isinstance(when_step, str): + if when_step in seen_when_steps: + errors.append( + f"step_interactions[{i}]: duplicate flag_auto_na_target " + f"for when_step {when_step!r} (only one is allowed per source)" + ) + seen_when_steps.add(when_step) + + out.append( + StepInteraction( + kind=kind, + when_step=when_step, + when_value_in=when_value_in_clean, + target_step=target_step, + write_value=write_value if isinstance(write_value, str) else "", + ) + ) + + return out, errors diff --git a/connectus/workflow_state/csv_io.py b/connectus/workflow_state/csv_io.py new file mode 100644 index 00000000000..9e3ed9337c3 --- /dev/null +++ b/connectus/workflow_state/csv_io.py @@ -0,0 +1,294 @@ +"""CSV I/O for the workflow_state package. + +Reads and writes the bundled ``connectus/connectus-migration-pipeline.csv`` +file. Both read and write paths normalize each row via +:func:`~workflow_state.state_machine.normalize_row` so contradictory +"value-past-incomplete-step" cells get cleaned up automatically. + +Per Q4 (design overrides): ``CSV_PATH`` stays hardcoded here — it is +not driven by YAML config. + +The module reads ``CSV_PATH`` from the package namespace at call time +so that tests using ``monkeypatch.setattr(workflow_state, "CSV_PATH", ...)`` +work transparently. Same for ``os`` (some tests patch ``os.replace``). +""" +from __future__ import annotations + +import csv +import io +import os as _os_module +import sys +import tempfile +from typing import Optional + +from workflow_state.config_loader import get_config +from workflow_state.state_machine import _normalize_rows_with_warning + + +# This file is connectus/workflow_state/csv_io.py — go up TWO dirs to +# reach the workspace root, matching the legacy module's BASE_DIR. +BASE_DIR = _os_module.path.dirname( + _os_module.path.dirname(_os_module.path.dirname(_os_module.path.abspath(__file__))) +) +CSV_PATH = _os_module.path.join(BASE_DIR, "connectus", "connectus-migration-pipeline.csv") + +# Re-exposed for monkey-patch parity with the legacy module +# (tests do ``monkeypatch.setattr(workflow_state.os, "replace", _boom)``). +os = _os_module + + +def _csv_path() -> str: + """Look up CSV_PATH from the package namespace at call time.""" + import workflow_state as _ws + return _ws.CSV_PATH + + +def _os() -> object: + """Look up the ``os`` module via the package namespace.""" + import workflow_state as _ws + return _ws.os + + +def load_csv() -> list[dict[str, str]]: + """Load the CSV and return list of row dicts. Normalizes on read.""" + cfg = get_config() + expected = cfg.all_columns + csv_path = _csv_path() + with open(csv_path, "r", encoding="utf-8") as f: + reader = csv.DictReader(f) + fieldnames = reader.fieldnames or [] + if fieldnames != expected: + missing = [c for c in expected if c not in fieldnames] + extra = [c for c in fieldnames if c not in expected] + print( + "WARNING: CSV header does not match expected schema.\n" + f" Expected {len(expected)} columns, got {len(fieldnames)}.\n" + f" Missing: {missing}\n" + f" Extra: {extra}", + file=sys.stderr, + ) + rows = list(reader) + + _normalize_rows_with_warning(rows, context="loaded") + return rows + + +def save_csv(rows: list[dict[str, str]]) -> None: + """Write rows back to CSV atomically. Normalizes on write.""" + if not rows: + return + + _normalize_rows_with_warning(rows, context="saved") + + fieldnames = list(rows[0].keys()) + + output = io.StringIO() + writer = csv.DictWriter( + output, + fieldnames=fieldnames, + quoting=csv.QUOTE_MINIMAL, + lineterminator="\n", + ) + writer.writeheader() + writer.writerows(rows) + + csv_path = _csv_path() + os_mod = _os() + target_dir = os_mod.path.dirname(csv_path) or "." + tmp_path: Optional[str] = None + try: + with tempfile.NamedTemporaryFile( + mode="w", + encoding="utf-8", + dir=target_dir, + prefix=".connectus-migration-pipeline.", + suffix=".tmp", + delete=False, + ) as tmp: + tmp_path = tmp.name + tmp.write(output.getvalue()) + os_mod.replace(tmp_path, csv_path) + tmp_path = None + finally: + if tmp_path is not None and os_mod.path.exists(tmp_path): + try: + os_mod.remove(tmp_path) + except OSError: + pass + + +def find_row(rows: list[dict[str, str]], integration_id: str) -> Optional[int]: + """Find a row by Integration ID (case-insensitive). Returns index or None.""" + name_lower = integration_id.lower().strip() + for i, row in enumerate(rows): + if row.get("Integration ID", "").strip().lower() == name_lower: + return i + return None + + +# --------------------------------------------------------------------------- +# Destructive helpers +# --------------------------------------------------------------------------- + +def wipe_workflow_data( + *, + confirm: bool = False, + backup: bool = True, +) -> dict[str, object]: + """⚠️ DESTRUCTIVE: wipe all workflow columns from the pipeline CSV. + + For each existing data row, preserves every identity column verbatim + (``Integration ID``, ``Integration File Path``, ``Connector ID`` and + any future identity columns declared in + :data:`connectus/workflow_state_config.yml`) and clears every + workflow column to the empty string. The header is regenerated from + the YAML config so it always matches the current workflow plan + (i.e. running this after adding/removing/renaming a step in the + YAML re-aligns the CSV columns to that plan). + + This is intended for the rare case where the workflow plan changes + shape and you want to keep the integration roster but throw away + every per-row state cell. **Do not use this to "reset" a single + integration** — use the ``reset`` CLI command for that. + + Parameters + ---------- + confirm: + Must be ``True`` or this function raises :class:`RuntimeError` + without touching disk. This is a guardrail so accidental + callers (typos, misconfigured scripts, an LLM auto-completing) + cannot blow the file away. + backup: + When ``True`` (default), write a sibling backup file at + ``.bak.`` (preserving the *current* CSV + contents) before rewriting. The backup path is returned in the + result dict. + + Returns + ------- + dict + A summary of what changed:: + + { + "csv_path": str, # path that was rewritten + "backup_path": str | None, # backup copy, if backup=True + "rows": int, # data row count (preserved) + "header": list[str], # YAML-derived header written + "cells_cleared": int, # workflow cells that had data + "rows_touched": int, # rows whose workflow cols had data + } + + Raises + ------ + RuntimeError + If ``confirm`` is not ``True``. + FileNotFoundError + If the pipeline CSV does not exist. + """ + if confirm is not True: + raise RuntimeError( + "wipe_workflow_data() refused to run: pass confirm=True to opt in. " + "This call would have erased every workflow cell from " + "the connectus pipeline CSV." + ) + + cfg = get_config() + expected_header = list(cfg.all_columns) + identity_cols = list(cfg.identity_column_names) + n_workflow = len(cfg.workflow_columns) + + csv_path = _csv_path() + if not _os().path.exists(csv_path): + raise FileNotFoundError(csv_path) + + # Read the existing rows. Use the raw csv reader rather than the + # package's normalising load_csv() because we want to preserve + # identity columns even if the on-disk header has drifted from the + # current YAML schema (which is the most common reason this + # function is called). + with open(csv_path, "r", encoding="utf-8", newline="") as f: + reader = csv.reader(f) + try: + old_header = next(reader) + except StopIteration as e: # pragma: no cover - empty file + raise RuntimeError(f"{csv_path} is empty; nothing to wipe") from e + old_rows = [list(r) for r in reader] + + # Index identity columns by name in the OLD header so we can carry + # them forward by name (not by position). Missing identity columns + # become empty strings (caller will see them in the diagnostics). + old_idx = {name: i for i, name in enumerate(old_header)} + workflow_old_indices = [ + old_idx[name] for name in cfg.workflow_columns if name in old_idx + ] + + cells_cleared = 0 + rows_touched = 0 + for r in old_rows: + had_data = False + for i in workflow_old_indices: + if i < len(r) and r[i].strip(): + cells_cleared += 1 + had_data = True + if had_data: + rows_touched += 1 + + # Optional sibling backup of the *current* file before rewrite. + backup_path: Optional[str] = None + if backup: + ts = int(__import__("time").time()) + backup_path = f"{csv_path}.bak.{ts}" + # Use shutil.copy2 to preserve mode/mtime, similar to `cp`. + import shutil + shutil.copy2(csv_path, backup_path) + + # Build the new rows: identity columns by name, then 16 empty cells. + new_rows: list[list[str]] = [] + for r in old_rows: + ident = [ + (r[old_idx[name]] if name in old_idx and old_idx[name] < len(r) else "") + for name in identity_cols + ] + new_rows.append(ident + [""] * n_workflow) + + # Atomic write via a tempfile sibling, mirroring save_csv()'s pattern. + output = io.StringIO() + writer = csv.writer( + output, + quoting=csv.QUOTE_MINIMAL, + lineterminator="\n", + ) + writer.writerow(expected_header) + writer.writerows(new_rows) + + os_mod = _os() + target_dir = os_mod.path.dirname(csv_path) or "." + tmp_path: Optional[str] = None + try: + with tempfile.NamedTemporaryFile( + mode="w", + encoding="utf-8", + dir=target_dir, + prefix=".connectus-migration-pipeline.", + suffix=".tmp", + delete=False, + ) as tmp: + tmp_path = tmp.name + tmp.write(output.getvalue()) + os_mod.replace(tmp_path, csv_path) + tmp_path = None + finally: + if tmp_path is not None and os_mod.path.exists(tmp_path): + try: + os_mod.remove(tmp_path) + except OSError: + pass + + return { + "csv_path": csv_path, + "backup_path": backup_path, + "rows": len(new_rows), + "header": expected_header, + "cells_cleared": cells_cleared, + "rows_touched": rows_touched, + } diff --git a/connectus/workflow_state/display.py b/connectus/workflow_state/display.py new file mode 100644 index 00000000000..11b9a2455d0 --- /dev/null +++ b/connectus/workflow_state/display.py @@ -0,0 +1,233 @@ +"""Pretty-print and rendering helpers for workflow_state.""" +from __future__ import annotations + +import json +from typing import Optional + +from workflow_state.config_loader import get_config +from workflow_state.state_machine import ( + current_step, + has_workflow_progress, + is_done, +) +from workflow_state.types import Step + + +def _summary_value(step: Step, raw: str) -> str: + """Short inline display for status output.""" + cfg = get_config() + val = raw.strip() + if not val: + if step.kind == "checkpoint": + return "⬜" + return "(not set)" + if step.kind == "data" and step.name in cfg.json_valued_columns: + if len(val) > 60: + return f"{val[:57]}… (set; show-step for full)" + return val + return val + + +def _auth_other_connection_summary(raw: str) -> str: + """One-line ``other_connection`` summary for an Auth Details JSON blob.""" + val = raw.strip() + if not val: + return "(not set)" + try: + parsed = json.loads(val) + except json.JSONDecodeError: + return "(invalid JSON — cannot extract other_connection)" + if not isinstance(parsed, dict): + return "(invalid Auth Details object)" + if "other_connection" not in parsed: + return "(not set — re-run set-auth)" + oc = parsed["other_connection"] + if not isinstance(oc, list): + return f"(malformed: expected list, got {type(oc).__name__})" + if not oc: + return "[] (none)" + return json.dumps(oc) + + +def format_status(row: dict[str, str]) -> str: + """Format the workflow status of a single integration.""" + cfg = get_config() + integration_id = row.get("Integration ID", "") + cur = current_step(row) + done_count = sum(1 for s in cfg.steps if is_done(row, s)) + total = len(cfg.steps) + + lines = [ + f"\n{'=' * 60}", + f" {integration_id}", + f"{'=' * 60}", + ] + + file_path = row.get("Integration File Path", "").strip() + connector_id = row.get("Connector ID", "").strip() + assignee = row.get("assignee", "").strip() + + lines.append(f" Assignee: {assignee if assignee else '(unassigned)'}") + lines.append(f" File Path: {file_path if file_path else '(not set)'}") + if file_path: + lines.append( + f" (run 'workflow_state.py files {integration_id}' " + f"to list all source files)" + ) + lines.append(f" Connector ID: {connector_id if connector_id else '(not set)'}") + lines.append("") + + lines.append(f" Workflow ([{done_count}/{total}]):") + lines.append(" " + "-" * 40) + for step in cfg.steps: + marker = " " + if cur is not None and step.index == cur.index: + marker = "▶" + raw = row.get(step.name, "") + display = _summary_value(step, raw) + lines.append(f" {marker}{step.index:2d}. {step.name:38s} : {display}") + if step.name == "Auth Details" and raw.strip(): + oc_summary = _auth_other_connection_summary(raw) + lines.append(f" {'other_connection':38s} : {oc_summary}") + + lines.append("") + if cur is None: + if has_workflow_progress(row): + lines.append(f" 🎉 All {total} steps complete!") + else: + lines.append(" ⏳ Not started") + else: + verb = cur.setter or "markpass" + lines.append(f" ➡️ Current step: #{cur.index} {cur.name} (run: {verb})") + + return "\n".join(lines) + + +def format_dashboard_row(row: dict[str, str]) -> Optional[str]: + """Compact dashboard line. Returns None for not-started rows.""" + cfg = get_config() + if not has_workflow_progress(row): + return None + + integration_id = row.get("Integration ID", "") + cur = current_step(row) + done_count = sum(1 for s in cfg.steps if is_done(row, s)) + total = len(cfg.steps) + + bar = "".join("█" if is_done(row, s) else "░" for s in cfg.steps) + status = cur.name if cur is not None else "✅ DONE" + return f" {integration_id:45s} [{bar}] {done_count}/{total} → {status}" + + +def format_step_value(row: dict[str, str], step_name: str) -> str: + """Pretty-print the value at ``step_name`` for ``row``.""" + cfg = get_config() + name = row.get("Integration ID", "") + raw = row.get(step_name, "") + value = raw.strip() + + header = ( + f"\n{'=' * 60}\n" + f" {name} — {step_name}\n" + f"{'=' * 60}" + ) + + if not value: + return f"{header}\n (not set)" + + if step_name in cfg.json_valued_columns: + try: + parsed = json.loads(value) + pretty = json.dumps(parsed, indent=2, sort_keys=False) + if ( + step_name == "Auth Details" + and isinstance(parsed, dict) + and "other_connection" not in parsed + ): + pretty += ( + "\n\n other_connection: (not set — re-run set-auth)" + ) + return f"{header}\n{pretty}" + except json.JSONDecodeError: + return f"{header}\n {value}" + + return f"{header}\n {value}" + + +def format_by_assignee(rows: list[dict[str, str]], assignee_name: str) -> str: + if not rows: + return f"No integrations found for assignee '{assignee_name}'." + + lines = [f"\nIntegrations assigned to '{assignee_name}' ({len(rows)}):"] + for row in rows: + name = row.get("Integration ID", "") + if not has_workflow_progress(row): + step_display = "not started" + else: + cur = current_step(row) + step_display = cur.name if cur is not None else "✅ DONE" + lines.append(f" - {name:45s} → {step_display}") + return "\n".join(lines) + + +# --------------------------------------------------------------------------- +# `next` command formatting +# --------------------------------------------------------------------------- + +def _example_value_for(step: Step) -> str: + """Return a canonical example value for the example CLI line.""" + cfg = get_config() + if step.kind == "data" and step.name in cfg.json_valued_columns: + if step.name == "Auth Details": + return ("'{\"auth_types\":[],\"config\":\"NoneRequired\"," + "\"other_connection\":[]}'") + if step.name == "Params for test with default in code": + return "'[]'" + if step.name == "Params same in other handlers": + return "'[]'" + return "'{}'" + if step.name == "assignee": + return '""' + if step.kind == "flag": + return "YES" + return "" + + +def format_next_line(row: dict[str, str]) -> str: + """Format the literal next action for a row.""" + cfg = get_config() + integration_id = row.get("Integration ID", "") + cur = current_step(row) + if cur is None: + return f"{integration_id} — all {len(cfg.steps)} steps complete. 🎉" + + lines = [f"{integration_id} — step {cur.index} of {len(cfg.steps)}: {cur.name}"] + if cur.setter: + example = _example_value_for(cur) + cmd = (f"python3 connectus/workflow_state.py {cur.setter} " + f"\"{integration_id}\" {example}".rstrip()) + lines.append(f" Run: {cmd}") + if cur.optional: + lines.append( + f" Or: python3 connectus/workflow_state.py skip " + f"\"{integration_id}\" \"{cur.name}\"" + ) + else: + lines.append( + f" Run: python3 connectus/workflow_state.py markpass " + f"\"{integration_id}\" \"{cur.name}\"" + ) + lines.append(f" About: {cur.description}") + return "\n".join(lines) + + +def format_step_for_listing(row: dict[str, str]) -> str: + """Return the user-facing step display: 'not started' / step name / '✅ DONE'.""" + if not has_workflow_progress(row): + return "not started" + cur = current_step(row) + return cur.name if cur is not None else "✅ DONE" + + +# Legacy alias (private name used internally). +_format_step_for_listing = format_step_for_listing diff --git a/connectus/workflow_state/exceptions.py b/connectus/workflow_state/exceptions.py new file mode 100644 index 00000000000..a3ec59a7955 --- /dev/null +++ b/connectus/workflow_state/exceptions.py @@ -0,0 +1,34 @@ +"""Custom exceptions for the workflow_state package.""" +from __future__ import annotations + + +class WorkflowError(Exception): + """User-facing workflow violation. Caller prints `.message` and exits 1. + + Preserved verbatim from the legacy ``workflow_state.py`` module so that + external consumers (notably ``connectus/check_command_params.py``) + that catch this exception continue to work after the refactor. + """ + + def __init__(self, message: str) -> None: + super().__init__(message) + self.message = message + + +class ConfigLoadError(Exception): + """Raised by the YAML config loader when the config is missing, + malformed, or fails schema validation. + + Mirrors :class:`auth_config_parser.AuthConfigParseError`: collects + every individual problem in ``.errors`` so the caller can see all of + them in one pass instead of fixing them one at a time. + + Attributes: + message: Human-readable summary (also the ``str(exc)`` value). + errors: List of individual error strings (>=1). + """ + + def __init__(self, message: str, errors: list[str] | None = None) -> None: + super().__init__(message) + self.message = message + self.errors = errors or [message] diff --git a/connectus/workflow_state/state_machine.py b/connectus/workflow_state/state_machine.py new file mode 100644 index 00000000000..5db7bca736b --- /dev/null +++ b/connectus/workflow_state/state_machine.py @@ -0,0 +1,344 @@ +"""The cascade-reset engine and state predicates. + +All step-shape decisions (which step indices exist, which are flags, +which are checkpoints, which markers count as "done") flow from the +:class:`~workflow_state.types.WorkflowConfig` returned by +:func:`~workflow_state.config_loader.get_config`. The legacy module-level +constants (``STEPS``, ``CHECK``, …) are derived from it via +:mod:`workflow_state.api`. +""" +from __future__ import annotations + +import sys +from typing import Optional + +from workflow_state.config_loader import get_config +from workflow_state.exceptions import WorkflowError +from workflow_state.types import Step + + +# --------------------------------------------------------------------------- +# State predicates +# --------------------------------------------------------------------------- + +def is_checked(value: str) -> bool: + """Whether a checkpoint cell value represents 'done'. + + Q2 BREAKING CHANGE (2026-05): only the canonical values listed in + ``markers.checkpoint_done_values`` are accepted (default: ``"✅"`` + and ``"N/A"``). Historical aliases (``YES``, ``true``, ``True``, + ``done``, ``Done``, ``DONE``) are NO LONGER recognized. + """ + cfg = get_config() + return value.strip() in cfg.markers.checkpoint_done_values + + +def is_done(row: dict[str, str], step: Step) -> bool: + """The unified completion predicate for any step kind.""" + cfg = get_config() + val = row.get(step.name, "").strip() + if step.kind == "data": + return val != "" + if step.kind == "flag": + return val.upper() in set(cfg.markers.flag_values) + if step.kind == "checkpoint": + return is_checked(val) + raise AssertionError(f"Unknown step kind: {step.kind!r}") + + +def current_step(row: dict[str, str]) -> Optional[Step]: + """First step that is not yet done; ``None`` if every step is done.""" + cfg = get_config() + for step in cfg.steps: + if not is_done(row, step): + return step + return None + + +def get_current_step(row: dict[str, str]) -> Optional[str]: + """Legacy wrapper: returns the current step's name (or None).""" + s = current_step(row) + return s.name if s is not None else None + + +def get_step(name: str) -> Step: + """Look up a Step by name; raise :class:`WorkflowError` if unknown.""" + cfg = get_config() + step = cfg.step_by_name.get(name) + if step is None: + raise WorkflowError( + f"Unknown step: '{name}'.\n" + f" Valid steps:\n" + + "\n".join(f" {s.index:2d}. {s.name}" for s in cfg.steps) + ) + return step + + +def get_step_index(step_name: str) -> int: + """Return the 0-based index of a checkpoint step within + ``CHECKPOINT_COLUMNS`` (preserves old API for any external callers). + """ + cfg = get_config() + checkpoint_columns = cfg.checkpoint_columns + try: + return checkpoint_columns.index(step_name) + except ValueError: + raise ValueError( + f"Unknown checkpoint step: '{step_name}'. " + f"Valid steps: {', '.join(checkpoint_columns)}" + ) + + +# --------------------------------------------------------------------------- +# Cascade reset and normalization +# --------------------------------------------------------------------------- + +def reset_after( + row: dict[str, str], + step: Step, + *, + respect_preserve: bool = False, +) -> tuple[list[str], list[str]]: + """Clear every step strictly after ``step``. + + Args: + row: The integration row, mutated in place. + step: The pivot step. Steps with ``index > step.index`` are + candidates for clearing. + respect_preserve: When ``True``, candidate steps whose + ``preserve_on_reset`` flag is True are kept intact (their + names are returned in the second tuple element so the + caller can warn the user). When ``False`` (the default), + preserve flags are ignored and every later step is cleared. + This default keeps the legacy ``set-auth``/``apply_step_action`` + cascade behavior unchanged: auth changes invalidate every + downstream artifact and must wipe Params* too. + + Returns: + ``(cleared, preserved)`` — both lists of column names. ``cleared`` + contains every step that was non-empty before the call AND was + wiped; ``preserved`` lists every step that was non-empty AND was + kept due to ``respect_preserve=True``. Empty values are not + reported in either list. + """ + cfg = get_config() + cleared: list[str] = [] + preserved: list[str] = [] + for s in cfg.steps: + if s.index <= step.index: + continue + had_value = row.get(s.name, "") != "" + if respect_preserve and s.preserve_on_reset: + if had_value: + preserved.append(s.name) + continue + if had_value: + cleared.append(s.name) + row[s.name] = "" + return cleared, preserved + + +def normalize_row(row: dict[str, str]) -> list[str]: + """Auto-clear any value past the first incomplete step. + + Returns the list of column names that were cleared. The caller is + responsible for printing a stderr warning if the list is non-empty. + """ + cfg = get_config() + cleared: list[str] = [] + found_incomplete = False + for step in cfg.steps: + if not found_incomplete: + if not is_done(row, step): + found_incomplete = True + continue + if row.get(step.name, "").strip() != "": + cleared.append(step.name) + row[step.name] = "" + return cleared + + +def _normalize_rows_with_warning(rows: list[dict[str, str]], context: str) -> None: + """Normalize each row in place. Print one stderr warning per modified row.""" + for row in rows: + cleared = normalize_row(row) + if cleared: + integration_id = row.get("Integration ID", "") + print( + f"WARNING: normalized {context} row '{integration_id}': " + f"cleared columns {cleared} (values were past the first incomplete step).", + file=sys.stderr, + ) + + +# --------------------------------------------------------------------------- +# Unified dispatch — the heart of the cascade-reset rule +# --------------------------------------------------------------------------- + +def _can_advance_to(row: dict[str, str], target: Step) -> tuple[bool, str]: + """True iff every step strictly before ``target`` is done.""" + cfg = get_config() + for s in cfg.steps: + if s.index >= target.index: + break + if not is_done(row, s): + verb = s.setter if s.setter else "markpass" + return False, ( + f"Cannot advance to '{target.name}' (step {target.index}/{len(cfg.steps)}) yet — " + f"prior step #{s.index} '{s.name}' is not done.\n" + f" Run: workflow_state.py {verb} " + + ("" if s.setter else f'"{s.name}"') + ) + return True, "" + + +def apply_step_action( + row: dict[str, str], + target: Step, + new_value: str, + *, + verb: str, +) -> tuple[list[str], bool]: + """Apply a step action with cascade-reset semantics. + + Returns ``(cleared_columns, was_no_op)``. + + Behavior: + - If ``target`` is AHEAD of the current step: raise :class:`WorkflowError`. + - If ``target`` is AT current step: write the value (no clearing). + - If ``target`` is BEHIND current (or already done): write the new + value AND ``reset_after(target)`` — UNLESS ``target.cascade_on_set`` + is False (the YAML-driven assignee carve-out), in which case the + write is performed without resetting. + - For ``flag`` steps: setting the same value is a no-op (no reset). + """ + cfg = get_config() + cur = current_step(row) + cur_idx = cur.index if cur is not None else len(cfg.steps) + 1 + + if cur is not None and target.index > cur_idx: + raise WorkflowError( + f"Cannot {verb} '{target.name}' (step {target.index}/{len(cfg.steps)}) yet — " + f"current step is #{cur.index} '{cur.name}'.\n" + f" Complete it first via " + f"'{cur.setter or 'markpass'}'." + ) + + if target.kind == "flag": + existing = row.get(target.name, "").strip().upper() + if existing == new_value.strip().upper() and existing in set(cfg.markers.flag_values): + return [], True + + # YAML-driven carve-out: when cascade_on_set=False (e.g. assignee), + # write but DO NOT cascade-reset later steps. + if not target.cascade_on_set: + row[target.name] = new_value + return [], False + + row[target.name] = new_value + # set-auth/markpass cascade: do NOT honor preserve_on_reset. + # Auth-classification changes invalidate every downstream artifact. + cleared, _preserved = reset_after(row, target, respect_preserve=False) + return cleared, False + + +# --------------------------------------------------------------------------- +# Helpers used by multiple commands +# --------------------------------------------------------------------------- + +def has_workflow_progress(row: dict[str, str]) -> bool: + """Return True if the row has any non-trivial workflow progress. + + Being merely assigned does NOT count as progress. + """ + cfg = get_config() + return any( + row.get(s.name, "").strip() + for s in cfg.steps + if s.name != "assignee" + ) + + +# --------------------------------------------------------------------------- +# Backward-compat shims (for old call sites and tests) +# --------------------------------------------------------------------------- + +def reset_from_step(row: dict[str, str], step_name: str) -> None: + """Legacy API: clear ``step_name`` and every later step. + + NOTE: This legacy helper is the unmodified pre-``preserve_on_reset`` + behaviour — it wipes blindly. New callers should use the CLI + ``reset-to`` / ``fail`` commands (which honour ``preserve_on_reset``) + or call :func:`reset_after` directly with ``respect_preserve=True``. + """ + cfg = get_config() + step = cfg.step_by_name.get(step_name) + if step is None: + raise ValueError( + f"Unknown step: '{step_name}'. " + f"Valid steps: {', '.join(cfg.workflow_columns)}" + ) + prev_index = step.index - 1 + if prev_index < 1: + for s in cfg.steps: + row[s.name] = "" + return + prev = cfg.step_by_index[prev_index] + row[step.name] = "" + reset_after(row, prev) + + +def markpass_step(row: dict[str, str], step_name: str) -> str: + """Legacy API: mark a checkpoint step as passed. Returns a status message.""" + cfg = get_config() + integration_id = row.get("Integration ID", "") + non_checkpoint = cfg.non_checkpoint_steps + + if step_name in non_checkpoint: + correct_cmd = non_checkpoint[step_name] + return ( + f"ERROR: '{step_name}' is not a pass/fail checkpoint.\n" + f" Use '{correct_cmd}' instead.\n" + f" Example: workflow_state.py {correct_cmd} " + f"\"{integration_id}\" " + ) + + step = cfg.step_by_name.get(step_name) + if step is None: + raise ValueError( + f"Unknown checkpoint step: '{step_name}'. " + f"Valid steps: {', '.join(cfg.checkpoint_columns)}" + ) + + if is_done(row, step): + return f"'{step_name}' is already marked as passed for '{integration_id}'." + + # Honour any flag_auto_na_target interaction whose target_step matches. + for inter in cfg.step_interactions: + if inter.kind == "flag_auto_na_target" and inter.target_step == step_name: + flag = row.get(inter.when_step, "").strip().upper() + if flag in {v.upper() for v in inter.when_value_in}: + row[step_name] = inter.write_value + return f"'{step_name}' set to {inter.write_value} (auth parity test not required)." + if flag == "": + return ( + f"ERROR: Cannot mark '{step_name}' as passed — " + f"'{inter.when_step}' flag is not set.\n" + f" Use 'set-auth-flag' first.\n" + f" Example: workflow_state.py set-auth-flag " + f"\"{integration_id}\" YES" + ) + + ok, reason = _can_advance_to(row, step) + if not ok: + cur = current_step(row) + cur_name = cur.name if cur else "(none)" + return ( + f"ERROR: Cannot mark '{step_name}' as passed — " + f"you are not up to that step yet.\n" + f" Current step: '{cur_name}'\n" + f" {reason}" + ) + + row[step.name] = cfg.markers.check + return f"✅ '{step_name}' marked as passed for '{integration_id}'." diff --git a/connectus/workflow_state/tests/__init__.py b/connectus/workflow_state/tests/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/connectus/workflow_state/tests/test_config_loader.py b/connectus/workflow_state/tests/test_config_loader.py new file mode 100644 index 00000000000..626940f79b5 --- /dev/null +++ b/connectus/workflow_state/tests/test_config_loader.py @@ -0,0 +1,399 @@ +"""Tests for :mod:`workflow_state.config_loader`. + +Covers the YAML loader's happy path, the multi-error collection +pattern, and every validation rule listed in +``workflow_state_DESIGN.md`` §5.2. +""" +from __future__ import annotations + +import os +from pathlib import Path + +import pytest + +from workflow_state.config_loader import ( + _reset_config_for_testing, + default_config_path, + get_config, + load_config, +) +from workflow_state.exceptions import ConfigLoadError +from workflow_state.types import ( + IdentityColumn, + MarkerSet, + Step, + StepInteraction, + WorkflowConfig, +) +from workflow_state.validators import ( + get_named_validator, + validate_auth_detail, +) + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +MINIMAL_VALID_YAML = """\ +schema_version: 1 +identity_columns: + - {"name": "Integration ID", "description": "primary key"} +markers: + check: "✅" + fail: "❌" + na: "N/A" + checkpoint_done_values: ["✅", "N/A"] + flag_values: ["YES", "NO", "N/A"] +steps: + - name: "assignee" + kind: data + optional: false + setter: set-assignee + cascade_on_set: false + description: "owner" + - name: "manifest" + kind: checkpoint + optional: false + setter: null + description: "make manifest" + - name: "needs parity" + kind: flag + optional: false + setter: set-parity + description: "decide" + - name: "parity passes" + kind: checkpoint + optional: false + setter: null + description: "run parity" +step_interactions: + - kind: flag_auto_na_target + when_step: "needs parity" + when_value_in: ["NO", "N/A"] + target_step: "parity passes" + write_value: "N/A" +""" + + +def _write_yaml(tmp_path: Path, body: str) -> str: + """Write ``body`` to a fixture YAML and return its absolute path.""" + p = tmp_path / "wf.yml" + p.write_text(body, encoding="utf-8") + return str(p) + + +@pytest.fixture(autouse=True) +def _reset_singleton(): + """Clear the cached config before AND after each test so we don't leak.""" + _reset_config_for_testing() + yield + _reset_config_for_testing() + + +# --------------------------------------------------------------------------- +# Happy paths +# --------------------------------------------------------------------------- + +class TestLoadDefault: + def test_default_config_path_exists(self) -> None: + assert os.path.isfile(default_config_path()) + + def test_default_yaml_loads(self) -> None: + cfg = load_config() + assert isinstance(cfg, WorkflowConfig) + # The bundled YAML has 16 steps and 3 identity columns. + assert len(cfg.steps) == 16 + assert len(cfg.identity_columns) == 3 + # Markers match the expected sentinels. + assert cfg.markers.check == "✅" + assert cfg.markers.na == "N/A" + + def test_default_yaml_first_step_is_assignee_with_cascade_off(self) -> None: + cfg = load_config() + first = cfg.steps[0] + assert first.name == "assignee" + assert first.cascade_on_set is False + + def test_default_yaml_has_one_flag_auto_na_interaction(self) -> None: + cfg = load_config() + flag_inters = [i for i in cfg.step_interactions if i.kind == "flag_auto_na_target"] + assert len(flag_inters) == 1 + inter = flag_inters[0] + assert inter.when_step == "requires auth parity test" + assert inter.target_step == "auth parity test passes" + assert inter.write_value == "N/A" + + def test_get_config_singleton_caches(self) -> None: + a = get_config() + b = get_config() + assert a is b + + +class TestMinimalFixture: + def test_loads_minimal_valid_yaml(self, tmp_path) -> None: + p = _write_yaml(tmp_path, MINIMAL_VALID_YAML) + cfg = load_config(p) + assert len(cfg.steps) == 4 + assert cfg.steps[0].cascade_on_set is False + assert cfg.steps[1].cascade_on_set is True # default + + def test_reset_for_testing_clears_cache(self, tmp_path) -> None: + p = _write_yaml(tmp_path, MINIMAL_VALID_YAML) + cfg1 = load_config(p) + # Same path, second call → singleton hit (same instance). + cfg2 = load_config(p) + assert cfg1 is cfg2 + # Reset → new instance. + _reset_config_for_testing() + cfg3 = load_config(p) + assert cfg3 is not cfg1 + assert cfg3.steps == cfg1.steps + + def test_step_interaction_resolves(self, tmp_path) -> None: + p = _write_yaml(tmp_path, MINIMAL_VALID_YAML) + cfg = load_config(p) + inter = cfg.find_flag_auto_na_target("needs parity") + assert inter is not None + assert inter.target_step == "parity passes" + assert inter.write_value == "N/A" + + def test_validator_binding_resolves_to_callable(self) -> None: + # The default YAML binds Auth Details → 'auth_details' validator. + cfg = load_config() + auth_step = cfg.step_by_name["Auth Details"] + assert auth_step.json_schema == "auth_details" + validator = get_named_validator(auth_step.json_schema) + assert validator is validate_auth_detail + + +# --------------------------------------------------------------------------- +# Error paths +# --------------------------------------------------------------------------- + +class TestLoadErrors: + def test_missing_file_raises(self, tmp_path) -> None: + bogus = str(tmp_path / "does_not_exist.yml") + with pytest.raises(ConfigLoadError) as exc: + load_config(bogus) + assert "not found" in exc.value.message + assert bogus in exc.value.message + + def test_invalid_yaml_syntax_raises(self, tmp_path) -> None: + p = tmp_path / "bad.yml" + p.write_text("this: is: not valid yaml: [", encoding="utf-8") + with pytest.raises(ConfigLoadError) as exc: + load_config(str(p)) + assert "YAML parse error" in exc.value.message + + def test_unknown_schema_version_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace("schema_version: 1", "schema_version: 99") + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("schema_version" in e for e in exc.value.errors) + + def test_missing_required_top_level_section(self, tmp_path) -> None: + # Drop the `steps:` key entirely. + body = MINIMAL_VALID_YAML.split("steps:")[0] + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("steps" in e for e in exc.value.errors) + + def test_extra_top_level_key_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML + "\nrandom_extra_key: value\n" + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("random_extra_key" in e for e in exc.value.errors) + + def test_identity_column_duplicate_name_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'identity_columns:\n - {"name": "Integration ID", "description": "primary key"}', + 'identity_columns:\n' + ' - {"name": "Integration ID", "description": "a"}\n' + ' - {"name": "Integration ID", "description": "b"}', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("duplicate" in e.lower() for e in exc.value.errors) + + def test_identity_column_collides_with_step_name(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'identity_columns:\n - {"name": "Integration ID", "description": "primary key"}', + 'identity_columns:\n - {"name": "assignee", "description": "collides"}', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("collide" in e.lower() for e in exc.value.errors) + + def test_step_kind_invalid_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace("kind: data", "kind: bogus", 1) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("bogus" in e for e in exc.value.errors) + + def test_data_step_without_setter_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + " setter: set-assignee\n", + " setter: null\n", + 1, + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("setter" in e for e in exc.value.errors) + + def test_checkpoint_step_with_setter_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + ' - name: "manifest"\n' + ' kind: checkpoint\n' + ' optional: false\n' + ' setter: null\n', + ' - name: "manifest"\n' + ' kind: checkpoint\n' + ' optional: false\n' + ' setter: set-manifest\n', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("setter must be null" in e or "set-manifest" in e for e in exc.value.errors) + + def test_duplicate_step_name_rejected(self, tmp_path) -> None: + # Rename "parity passes" -> "manifest" so two steps share that name. + body = MINIMAL_VALID_YAML.replace('"parity passes"', '"manifest"') + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("duplicate step name" in e for e in exc.value.errors) + + def test_duplicate_setter_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + " setter: set-parity", + " setter: set-assignee", # collide with first step's setter + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("duplicate setter" in e for e in exc.value.errors) + + def test_unknown_json_schema_name_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + ' setter: set-assignee\n' + ' cascade_on_set: false\n', + ' setter: set-assignee\n' + ' cascade_on_set: false\n' + ' json_schema: {"validator": "definitely_unknown"}\n', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("definitely_unknown" in e for e in exc.value.errors) + + def test_unknown_cross_check_name_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + ' setter: set-assignee\n' + ' cascade_on_set: false\n', + ' setter: set-assignee\n' + ' cascade_on_set: false\n' + ' cross_check: {"validator": "no_such_check"}\n', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("no_such_check" in e for e in exc.value.errors) + + def test_markers_check_must_be_in_done_values(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'checkpoint_done_values: ["✅", "N/A"]', + 'checkpoint_done_values: ["N/A"]', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("✅" in e for e in exc.value.errors) + + def test_step_interaction_unknown_step_rejected(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'when_step: "needs parity"', + 'when_step: "ghost step"', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("ghost step" in e for e in exc.value.errors) + + def test_step_interaction_when_step_must_be_flag(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'when_step: "needs parity"', + 'when_step: "manifest"', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("must be" in e and "flag" in e for e in exc.value.errors) + + def test_step_interaction_target_step_must_be_checkpoint(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'target_step: "parity passes"', + 'target_step: "needs parity"', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("checkpoint" in e for e in exc.value.errors) + + def test_step_interaction_when_value_in_must_be_subset(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'when_value_in: ["NO", "N/A"]', + 'when_value_in: ["INVALID"]', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("INVALID" in e for e in exc.value.errors) + + def test_step_interaction_write_value_must_be_in_done_values(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML.replace( + 'write_value: "N/A"', + 'write_value: "GARBAGE"', + ) + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert any("GARBAGE" in e for e in exc.value.errors) + + def test_multi_error_collection(self, tmp_path) -> None: + # Three problems at once. + body = MINIMAL_VALID_YAML + body = body.replace("schema_version: 1", "schema_version: 42") + body = body.replace("kind: data", "kind: bogus", 1) + body = body + "\nrandom_extra_key: x\n" + p = _write_yaml(tmp_path, body) + with pytest.raises(ConfigLoadError) as exc: + load_config(p) + assert len(exc.value.errors) >= 3 + + +# --------------------------------------------------------------------------- +# Cascade-on-set defaulting +# --------------------------------------------------------------------------- + +class TestCascadeOnSet: + def test_default_true_when_unspecified(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML + p = _write_yaml(tmp_path, body) + cfg = load_config(p) + # Step 2 ("manifest") doesn't specify cascade_on_set; default is True. + assert cfg.step_by_name["manifest"].cascade_on_set is True + + def test_explicit_false_on_assignee(self, tmp_path) -> None: + body = MINIMAL_VALID_YAML + p = _write_yaml(tmp_path, body) + cfg = load_config(p) + assert cfg.step_by_name["assignee"].cascade_on_set is False diff --git a/connectus/workflow_state/tests/test_state_machine.py b/connectus/workflow_state/tests/test_state_machine.py new file mode 100644 index 00000000000..894635da993 --- /dev/null +++ b/connectus/workflow_state/tests/test_state_machine.py @@ -0,0 +1,273 @@ +"""Focused tests of the cascade-reset engine driven by the YAML config. + +These tests build an in-process WorkflowConfig fixture to prove that the +state-machine reads behaviour from the YAML rather than from Python +literals. +""" +from __future__ import annotations + +import pytest + +from workflow_state.config_loader import ( + _reset_config_for_testing, + load_config, +) +from workflow_state.state_machine import ( + apply_step_action, + is_checked, + reset_after, +) + + +_FIXTURE_YAML = """\ +schema_version: 1 +identity_columns: + - {"name": "Integration ID", "description": "id"} +markers: + check: "✅" + fail: "❌" + na: "N/A" + checkpoint_done_values: ["✅", "N/A"] + flag_values: ["YES", "NO", "N/A"] +steps: + - name: "alpha" + kind: data + optional: false + setter: set-alpha + cascade_on_set: true + description: "first" + - name: "beta" + kind: checkpoint + optional: false + setter: null + description: "second" + - name: "gamma" + kind: checkpoint + optional: false + setter: null + description: "third" +""" + + +@pytest.fixture(autouse=True) +def _reset_singleton(): + _reset_config_for_testing() + yield + _reset_config_for_testing() + + +def _write_and_load(tmp_path, body: str): + p = tmp_path / "wf.yml" + p.write_text(body, encoding="utf-8") + return load_config(str(p)) + + +class TestCascadeReset: + def test_setting_alpha_clears_beta_and_gamma(self, tmp_path) -> None: + cfg = _write_and_load(tmp_path, _FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "old", + "beta": "✅", + "gamma": "✅", + } + target = cfg.step_by_name["alpha"] + cleared, no_op = apply_step_action(row, target, "new", verb="set-alpha") + assert no_op is False + assert row["alpha"] == "new" + assert row["beta"] == "" + assert row["gamma"] == "" + assert "beta" in cleared and "gamma" in cleared + + def test_setting_a_step_with_cascade_on_set_false_does_not_reset( + self, tmp_path + ) -> None: + body = _FIXTURE_YAML.replace( + " cascade_on_set: true\n", + " cascade_on_set: false\n", + ) + cfg = _write_and_load(tmp_path, body) + row = { + "Integration ID": "X", + "alpha": "old", + "beta": "✅", + "gamma": "✅", + } + target = cfg.step_by_name["alpha"] + cleared, no_op = apply_step_action(row, target, "new", verb="set-alpha") + assert no_op is False + assert row["alpha"] == "new" + # Carve-out: beta/gamma untouched. + assert row["beta"] == "✅" + assert row["gamma"] == "✅" + assert cleared == [] + + +class TestIsCheckedQ2BreakingChange: + """Q2 BREAKING CHANGE: only canonical done values are accepted.""" + + def test_canonical_check_is_done(self, tmp_path) -> None: + _write_and_load(tmp_path, _FIXTURE_YAML) + assert is_checked("✅") is True + + def test_canonical_na_is_done(self, tmp_path) -> None: + _write_and_load(tmp_path, _FIXTURE_YAML) + assert is_checked("N/A") is True + + @pytest.mark.parametrize("alias", ["YES", "true", "True", "done", "Done", "DONE"]) + def test_dropped_aliases_are_not_done(self, tmp_path, alias: str) -> None: + # Q2 breaking change: dropped historical alias support. + _write_and_load(tmp_path, _FIXTURE_YAML) + assert is_checked(alias) is False, ( + f"Q2 breaking change: alias {alias!r} should no longer be " + f"recognized as 'done'." + ) + + +_PRESERVE_FIXTURE_YAML = """\ +schema_version: 1 +identity_columns: + - {"name": "Integration ID", "description": "id"} +markers: + check: "✅" + fail: "❌" + na: "N/A" + checkpoint_done_values: ["✅", "N/A"] + flag_values: ["YES", "NO", "N/A"] +steps: + - name: "alpha" + kind: data + optional: false + setter: set-alpha + cascade_on_set: true + description: "first" + - name: "beta" + kind: data + optional: false + setter: set-beta + preserve_on_reset: true + description: "second (preserved)" + - name: "gamma" + kind: checkpoint + optional: false + setter: null + description: "third" + - name: "delta" + kind: checkpoint + optional: false + setter: null + description: "fourth" +""" + + +class TestResetAfter: + def test_reset_after_clears_only_strictly_later_steps(self, tmp_path) -> None: + cfg = _write_and_load(tmp_path, _FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "v", + "beta": "✅", + "gamma": "✅", + } + cleared, preserved = reset_after(row, cfg.step_by_name["beta"]) + assert row["alpha"] == "v" + assert row["beta"] == "✅" + assert row["gamma"] == "" + assert cleared == ["gamma"] + assert preserved == [] + + def test_reset_after_default_ignores_preserve_flag(self, tmp_path) -> None: + """Default ``respect_preserve=False`` keeps legacy set-auth cascade behaviour: + every later step is wiped regardless of preserve_on_reset.""" + cfg = _write_and_load(tmp_path, _PRESERVE_FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "v", + "beta": '["x"]', # preserve_on_reset=true + "gamma": "✅", + "delta": "✅", + } + cleared, preserved = reset_after(row, cfg.step_by_name["alpha"]) + # Legacy default: beta is wiped despite preserve_on_reset. + assert row["beta"] == "" + assert row["gamma"] == "" + assert row["delta"] == "" + assert "beta" in cleared and "gamma" in cleared and "delta" in cleared + assert preserved == [] + + def test_reset_after_with_respect_preserve_keeps_tagged_columns( + self, tmp_path + ) -> None: + """``respect_preserve=True`` keeps preserve_on_reset columns intact and + reports them in the second tuple element.""" + cfg = _write_and_load(tmp_path, _PRESERVE_FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "v", + "beta": '["x"]', # preserve_on_reset=true + "gamma": "✅", + "delta": "✅", + } + cleared, preserved = reset_after( + row, cfg.step_by_name["alpha"], respect_preserve=True + ) + # beta is preserved; gamma + delta are still wiped. + assert row["beta"] == '["x"]' + assert row["gamma"] == "" + assert row["delta"] == "" + assert preserved == ["beta"] + assert "gamma" in cleared and "delta" in cleared + assert "beta" not in cleared + + def test_reset_after_respect_preserve_only_reports_non_empty_preserved( + self, tmp_path + ) -> None: + """An empty preserved column is not noisy-reported.""" + cfg = _write_and_load(tmp_path, _PRESERVE_FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "v", + "beta": "", # preserve_on_reset=true but empty + "gamma": "✅", + "delta": "", + } + cleared, preserved = reset_after( + row, cfg.step_by_name["alpha"], respect_preserve=True + ) + # beta is still preserved (untouched), but not reported because empty. + assert row["beta"] == "" + assert row["gamma"] == "" + assert preserved == [] + assert cleared == ["gamma"] + + +class TestPreserveOnResetIntegration: + """End-to-end behaviour: set-auth still wipes preserved columns, + but a hypothetical reset-to (callers that pass respect_preserve=True) + keeps them. + """ + + def test_apply_step_action_set_auth_cascade_still_wipes_preserved( + self, tmp_path + ) -> None: + """The set-auth cascade goes through apply_step_action, which calls + reset_after with the legacy default (respect_preserve=False). Even + a preserve_on_reset=true column gets wiped — by design (auth changes + invalidate downstream artifacts).""" + cfg = _write_and_load(tmp_path, _PRESERVE_FIXTURE_YAML) + row = { + "Integration ID": "X", + "alpha": "old", + "beta": '["preserved-data"]', + "gamma": "✅", + "delta": "✅", + } + target = cfg.step_by_name["alpha"] + cleared, no_op = apply_step_action(row, target, "new", verb="set-alpha") + assert no_op is False + assert row["alpha"] == "new" + # Even preserve_on_reset=true beta is wiped on set-alpha cascade. + assert row["beta"] == "" + assert row["gamma"] == "" + assert row["delta"] == "" + assert "beta" in cleared diff --git a/connectus/workflow_state/tests/test_wipe_workflow_data.py b/connectus/workflow_state/tests/test_wipe_workflow_data.py new file mode 100644 index 00000000000..87a76ce9a04 --- /dev/null +++ b/connectus/workflow_state/tests/test_wipe_workflow_data.py @@ -0,0 +1,150 @@ +"""Tests for :func:`workflow_state.csv_io.wipe_workflow_data`. + +These tests redirect ``workflow_state.CSV_PATH`` at the package level +(matching the existing ``csv_io._csv_path`` indirection) so the real +pipeline file is never touched. +""" +from __future__ import annotations + +import csv as _csv +from pathlib import Path + +import pytest + +import workflow_state +from workflow_state.config_loader import _reset_config_for_testing +from workflow_state.csv_io import wipe_workflow_data + + +@pytest.fixture(autouse=True) +def _reset_singleton(): + _reset_config_for_testing() + yield + _reset_config_for_testing() + + +@pytest.fixture +def temp_csv(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path: + """Point workflow_state at a throwaway CSV inside ``tmp_path``.""" + p = tmp_path / "pipeline.csv" + monkeypatch.setattr(workflow_state, "CSV_PATH", str(p)) + return p + + +def _seed_csv(path: Path, header: list[str], rows: list[list[str]]) -> None: + with open(path, "w", encoding="utf-8", newline="") as f: + w = _csv.writer(f) + w.writerow(header) + w.writerows(rows) + + +def _read_csv(path: Path) -> tuple[list[str], list[list[str]]]: + with open(path, "r", encoding="utf-8", newline="") as f: + r = _csv.reader(f) + header = next(r) + return header, [list(row) for row in r] + + +def test_refuses_without_confirm(temp_csv: Path) -> None: + # Need a file on disk so the missing-confirm error fires before the + # FileNotFoundError check. + _seed_csv( + temp_csv, + ["Integration ID", "Integration File Path", "Connector ID"], + [["a", "p/a.yml", "ConnA"]], + ) + with pytest.raises(RuntimeError, match="confirm=True"): + wipe_workflow_data() # confirm defaults to False + # The file is untouched. + header, rows = _read_csv(temp_csv) + assert header == ["Integration ID", "Integration File Path", "Connector ID"] + assert rows == [["a", "p/a.yml", "ConnA"]] + + +def test_missing_csv_raises_file_not_found(temp_csv: Path) -> None: + # temp_csv path is set but no file exists. + with pytest.raises(FileNotFoundError): + wipe_workflow_data(confirm=True, backup=False) + + +def test_wipes_workflow_columns_preserves_identity(temp_csv: Path) -> None: + cfg = workflow_state.get_config() + header = list(cfg.all_columns) + # Two rows: one totally clean, one with several workflow cells filled. + n_workflow = len(cfg.workflow_columns) + row_clean = ["int-1", "Packs/X/x.yml", "ConnX"] + [""] * n_workflow + row_dirty_values = ["✅"] * n_workflow + row_dirty = ["int-2", "Packs/Y/y.yml", "ConnY"] + row_dirty_values + _seed_csv(temp_csv, header, [row_clean, row_dirty]) + + result = wipe_workflow_data(confirm=True, backup=False) + + assert result["rows"] == 2 + assert result["cells_cleared"] == n_workflow # only row_dirty had data + assert result["rows_touched"] == 1 + assert result["backup_path"] is None + assert result["header"] == header + + new_header, new_rows = _read_csv(temp_csv) + assert new_header == header + assert len(new_rows) == 2 + # Identity columns intact. + assert new_rows[0][:3] == ["int-1", "Packs/X/x.yml", "ConnX"] + assert new_rows[1][:3] == ["int-2", "Packs/Y/y.yml", "ConnY"] + # Every workflow column is empty for every row. + for r in new_rows: + assert len(r) == len(header) + assert all(cell == "" for cell in r[3:]) + + +def test_writes_backup_when_requested(temp_csv: Path) -> None: + cfg = workflow_state.get_config() + header = list(cfg.all_columns) + n_workflow = len(cfg.workflow_columns) + _seed_csv( + temp_csv, + header, + [["int-1", "p/1.yml", "ConnA"] + [""] * n_workflow], + ) + original_bytes = temp_csv.read_bytes() + + result = wipe_workflow_data(confirm=True, backup=True) + + backup_path = result["backup_path"] + assert isinstance(backup_path, str) + assert backup_path.startswith(str(temp_csv) + ".bak.") + # Backup is byte-identical to the pre-wipe state. + assert Path(backup_path).read_bytes() == original_bytes + + +def test_realigns_header_to_yaml_when_old_header_drifted( + temp_csv: Path, +) -> None: + """If the on-disk header has bogus extra columns, the rewrite should + rebuild the header from the YAML and only carry forward identity + columns by name.""" + cfg = workflow_state.get_config() + expected_header = list(cfg.all_columns) + drifted_header = ["Integration ID", "Integration File Path", + "Connector ID", "obsolete_step_a", "obsolete_step_b"] + _seed_csv( + temp_csv, + drifted_header, + [ + ["int-1", "p/1.yml", "ConnA", "junk1", "junk2"], + ["int-2", "p/2.yml", "ConnB", "", ""], + ], + ) + + result = wipe_workflow_data(confirm=True, backup=False) + + assert result["header"] == expected_header + assert result["rows"] == 2 + new_header, new_rows = _read_csv(temp_csv) + assert new_header == expected_header + assert new_rows[0][:3] == ["int-1", "p/1.yml", "ConnA"] + assert new_rows[1][:3] == ["int-2", "p/2.yml", "ConnB"] + # No leftover obsolete columns. + for r in new_rows: + assert len(r) == len(expected_header) + assert all(cell == "" for cell in r[3:]) diff --git a/connectus/workflow_state/types.py b/connectus/workflow_state/types.py new file mode 100644 index 00000000000..5b58f2ed31d --- /dev/null +++ b/connectus/workflow_state/types.py @@ -0,0 +1,167 @@ +"""Typed dataclasses for the workflow_state package. + +Pure data — no I/O, no validation logic. The :func:`config_loader.load_config` +function constructs ``WorkflowConfig`` instances from the YAML file and the +engine consumes those objects. +""" +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Optional + + +@dataclass(frozen=True) +class Step: + """A single step in the unified workflow sequence. + + Backward-compatible: the original positional signature + ``Step(index, name, kind, optional, setter, description)`` still + works because the trailing fields all have defaults. + + Two carve-out flags govern the cascade-reset rule: + + - ``cascade_on_set`` (set-write side): when False, a successful + ``set-X`` write to THIS step does NOT cascade-reset later steps. + Example: ``assignee`` (changing the owner shouldn't nuke their + progress). + - ``preserve_on_reset`` (reset side): when True, this step's value + is PRESERVED across ``reset-to``/``fail`` operations whose blast + radius would otherwise include it. The ``set-auth`` cascade + (which calls ``reset_after`` directly) ignores this flag — auth + changes invalidate downstream artifacts and must continue to + wipe everything. Plain ``reset`` (the "wipe the whole row" verb) + also ignores this flag. + + The single carve-out: if the user names a preserved step + EXPLICITLY as the target of ``reset-to``/``fail``, the user's + intent wins for that one step (the named target is cleared), but + LATER preserved steps in the same operation are still preserved. + """ + + index: int # 1..N + name: str # CSV column AND user-facing identifier + kind: str # "data" | "checkpoint" | "flag" + optional: bool # True only for steps that may be `skip`-ped + setter: Optional[str] # CLI subcommand for setting; None for pure markpass + description: str # short human-readable summary + cascade_on_set: bool = True # if False, setting this step does NOT cascade-reset + json_schema: Optional[str] = None # named JSON validator key (or None) + cross_check: Optional[str] = None # named cross-step validator key (or None) + preserve_on_reset: bool = False # if True, reset-to/fail preserve this column's value + + +@dataclass(frozen=True) +class IdentityColumn: + """One identity / metadata CSV column entry (never managed by the workflow).""" + + name: str + description: str = "" + + +@dataclass(frozen=True) +class MarkerSet: + """Sentinel and marker values used by the engine.""" + + check: str + fail: str + na: str + checkpoint_done_values: tuple[str, ...] + flag_values: tuple[str, ...] + + +@dataclass(frozen=True) +class StepInteraction: + """One cross-step interaction rule (today only ``flag_auto_na_target``).""" + + kind: str + when_step: str + when_value_in: tuple[str, ...] + target_step: str + write_value: str + + +@dataclass(frozen=True) +class WorkflowConfig: + """The fully-loaded, validated workflow configuration. + + All derived collections (``step_by_name``, ``workflow_columns``, …) + are computed lazily via @property so the dataclass remains a pure + record of what was in the YAML. + """ + + schema_version: int + identity_columns: tuple[IdentityColumn, ...] + markers: MarkerSet + steps: tuple[Step, ...] + step_interactions: tuple[StepInteraction, ...] = field(default_factory=tuple) + + # ---- Derived helpers ------------------------------------------------ + + @property + def step_by_name(self) -> dict[str, Step]: + return {s.name: s for s in self.steps} + + @property + def step_by_index(self) -> dict[int, Step]: + return {s.index: s for s in self.steps} + + @property + def identity_column_names(self) -> list[str]: + return [c.name for c in self.identity_columns] + + @property + def workflow_columns(self) -> list[str]: + return [s.name for s in self.steps] + + @property + def workflow_data_columns(self) -> list[str]: + return [s.name for s in self.steps if s.kind == "data"] + + @property + def checkpoint_columns(self) -> list[str]: + return [s.name for s in self.steps if s.kind == "checkpoint"] + + @property + def json_valued_columns(self) -> set[str]: + # A "JSON-valued column" is any data step that has a named + # json_schema validator. This deliberately matches the legacy + # `JSON_VALUED_COLUMNS` (data steps minus the `assignee` step, + # which has no json_schema in YAML). + return { + s.name for s in self.steps + if s.kind == "data" and s.json_schema is not None + } + + @property + def all_columns(self) -> list[str]: + return self.identity_column_names + self.workflow_columns + + @property + def expected_column_count(self) -> int: + return len(self.all_columns) + + @property + def non_checkpoint_steps(self) -> dict[str, str]: + """Mapping of step name → setter command for steps that have a setter.""" + return {s.name: s.setter for s in self.steps if s.setter is not None} + + @property + def auth_parity_flag_column(self) -> Optional[str]: + """The ``when_step`` of the (single) ``flag_auto_na_target`` interaction. + + Returns None if no such interaction is configured. The legacy + constant ``AUTH_PARITY_FLAG_COLUMN`` is derived from this. + """ + for inter in self.step_interactions: + if inter.kind == "flag_auto_na_target": + return inter.when_step + return None + + def find_flag_auto_na_target(self, when_step: str) -> Optional[StepInteraction]: + """Return the ``flag_auto_na_target`` interaction whose ``when_step`` + matches the argument, or ``None``. + """ + for inter in self.step_interactions: + if inter.kind == "flag_auto_na_target" and inter.when_step == when_step: + return inter + return None diff --git a/connectus/workflow_state/validators.py b/connectus/workflow_state/validators.py new file mode 100644 index 00000000000..3bb26a40d39 --- /dev/null +++ b/connectus/workflow_state/validators.py @@ -0,0 +1,221 @@ +"""Per-cell JSON validators and the named-validator registry. + +The state machine and CLI never call these functions by name directly — +they look them up by the YAML config's ``json_schema`` / ``cross_check`` +fields via :func:`get_named_validator` / :func:`get_named_cross_check`. +This makes the binding declarative without giving up the rich, hand-written +validation logic for the two real schemas. +""" +from __future__ import annotations + +import json +from typing import Callable, Optional + +from auth_config_parser import ( + auth_param_ids_with_sources as _pkg_auth_param_ids_with_sources, + parse_auth_details as _pkg_parse_auth_details, + validate_auth_details as _pkg_validate_auth_details, +) + + +# --------------------------------------------------------------------------- +# Auth Details — delegates to the auth_config_parser package +# --------------------------------------------------------------------------- + +def validate_auth_detail(value: str) -> list[str]: + """Validate Auth Details JSON shape. Returns list of errors ([] = valid). + + Backward-compatible wrapper that delegates to + :func:`auth_config_parser.validate_auth_details`. + """ + return _pkg_validate_auth_details(value) + + +# --------------------------------------------------------------------------- +# Params to Commands +# --------------------------------------------------------------------------- + +# Hint embedded in every "extra top-level key" error reported by +# :func:`validate_params_to_commands`. Kept in sync with +# ``connectus/column-schemas.md`` §Params to Commands. +_PARAMS_TO_COMMANDS_STRIP_HINT = ( + "strip it before persisting (see column-schemas.md " + "§Params to Commands). One-liner: python3 -c " + "\"import sys, json; o = json.load(sys.stdin); " + "o.pop('diagnostics', None); print(json.dumps(o))\"" +) + + +def validate_params_to_commands(value: str) -> list[str]: + """Validate Params to Commands JSON shape. Returns errors ([] = valid). + + Strict shape (per ``connectus/column-schemas.md`` §Params to Commands):: + + { + "integration": "", + "commands": { + "": ["", ...], + ... + } + } + """ + errors: list[str] = [] + + try: + payload = json.loads(value) + except json.JSONDecodeError as e: + return [f"Invalid JSON: {e}"] + + if not isinstance(payload, dict): + return [f"Expected a JSON object, got {type(payload).__name__}"] + + expected_keys = {"integration", "commands"} + actual_keys = set(payload.keys()) + missing = expected_keys - actual_keys + extras = actual_keys - expected_keys + + if missing: + errors.append( + f"Missing required top-level key(s): {sorted(missing)}; " + f"payload must contain exactly {sorted(expected_keys)}." + ) + + if extras: + sorted_extras = sorted(extras) + if "diagnostics" in extras: + errors.append( + f"Extra top-level key 'diagnostics' is forbidden in " + f"'Params to Commands' (it is internal analyzer " + f"metadata, not pipeline data); " + f"{_PARAMS_TO_COMMANDS_STRIP_HINT}" + ) + other_extras = [k for k in sorted_extras if k != "diagnostics"] + if other_extras: + errors.append( + f"Extra top-level key(s) {other_extras} are " + f"forbidden; {_PARAMS_TO_COMMANDS_STRIP_HINT}" + ) + else: + errors.append( + f"Extra top-level key(s) {sorted_extras} are forbidden; " + f"{_PARAMS_TO_COMMANDS_STRIP_HINT}" + ) + + if "integration" in payload: + integration = payload["integration"] + if not isinstance(integration, str): + errors.append( + f"'integration' must be a string, got " + f"{type(integration).__name__}" + ) + elif integration == "": + errors.append("'integration' must be a non-empty string") + + if "commands" in payload: + commands = payload["commands"] + if not isinstance(commands, dict): + errors.append( + f"'commands' must be a JSON object, got " + f"{type(commands).__name__}" + ) + else: + for cmd, param_list in commands.items(): + if not isinstance(param_list, list): + errors.append( + f"commands[{cmd!r}]: expected a list of param " + f"ids, got {type(param_list).__name__}" + ) + continue + for i, p in enumerate(param_list): + if not isinstance(p, str): + errors.append( + f"commands[{cmd!r}][{i}]: param id must be " + f"a string, got {type(p).__name__}" + ) + continue + if p == "": + errors.append( + f"commands[{cmd!r}][{i}]: param id must be " + f"a non-empty string" + ) + + return errors + + +# --------------------------------------------------------------------------- +# Generic "any JSON" validator (used by the JSON-shaped data steps that +# don't have a richer schema, e.g. "Params for test with default in code"). +# --------------------------------------------------------------------------- + +def validate_any_json(value: str) -> list[str]: + """Accept anything that parses as JSON; return errors ([] = valid).""" + try: + json.loads(value) + except json.JSONDecodeError as e: + return [f"Invalid JSON: {e}"] + return [] + + +# --------------------------------------------------------------------------- +# Auth-derived param sources (helper for cross-check + auth_param_ids API) +# --------------------------------------------------------------------------- + +def auth_param_sources(auth_detail: dict) -> dict[str, list[str]]: + """Return ``{yml_param_id: [, ...]}`` for a raw + Auth Details dict. + + Returns an empty dict if the dict is structurally invalid. + """ + from auth_config_parser import AuthConfigParseError + + try: + details = _pkg_parse_auth_details(auth_detail) + except AuthConfigParseError: + return {} + return _pkg_auth_param_ids_with_sources(details) + + +# --------------------------------------------------------------------------- +# Named-validator registries (consulted by the state machine / CLI) +# --------------------------------------------------------------------------- + +ValidatorFn = Callable[[str], list[str]] + + +_NAMED_VALIDATORS: dict[str, ValidatorFn] = { + "auth_details": validate_auth_detail, + "params_to_commands": validate_params_to_commands, + "any_json": validate_any_json, +} + + +def get_named_validator(name: str) -> Optional[ValidatorFn]: + """Look up a per-cell validator by its YAML name.""" + return _NAMED_VALIDATORS.get(name) + + +def known_validator_names() -> list[str]: + """Return the sorted list of known per-cell validator names.""" + return sorted(_NAMED_VALIDATORS.keys()) + + +# Cross-check registry: takes (integration_id, payload_dict) and raises +# WorkflowError on conflict. Implementations are wired in +# :mod:`state_machine` because they consult the CSV and ``auth_param_ids``. + +_NAMED_CROSS_CHECKS: dict[str, str] = { + # Maps the YAML name → a stable identifier the engine uses to look up + # the actual implementation. The implementation lives in the state + # machine module to avoid an import cycle (cross-checks consult the + # CSV which lives in csv_io). + "params_to_commands_no_auth_overlap": "params_to_commands_no_auth_overlap", +} + + +def known_cross_check_names() -> list[str]: + """Return the sorted list of known cross-check validator names.""" + return sorted(_NAMED_CROSS_CHECKS.keys()) + + +def is_known_cross_check(name: str) -> bool: + return name in _NAMED_CROSS_CHECKS diff --git a/connectus/workflow_state_DESIGN.md b/connectus/workflow_state_DESIGN.md new file mode 100644 index 00000000000..7a2f42e50de --- /dev/null +++ b/connectus/workflow_state_DESIGN.md @@ -0,0 +1,915 @@ +> **STATUS: implemented.** This document is the original design proposal for the `workflow_state` config-driven refactor. The refactor has been completed and the design described here is now reality — see [`connectus/workflow_state/`](workflow_state/__init__.py:1) for the implementation, [`connectus/workflow_state_config.yml`](workflow_state_config.yml:1) for the live YAML config, and [`connectus/Readme.md`](Readme.md:1) for current-state documentation. The "Open Questions" in §11 have been decided and are noted inline below. Line-number references throughout the doc point at the pre-refactor monolith and are kept for historical traceability. + +# `workflow_state` — Config-Driven Refactor Design + +A design for splitting [`connectus/workflow_state.py`](workflow_state.py) into a +small package whose **shape of the workflow** (the 16 ordered steps, their +kinds, the column schema, optional/skippable behaviour, the CLI verbs that set +each step) lives in a hardcoded YAML file rather than in Python literals. + +The runtime engine (cascade reset, normalization, CSV I/O, CLI dispatch) stays +in Python. Only the *declarative* parts move out. + +--- + +## 1. Current State Analysis + +> *This section describes the codebase before the refactor; preserved for historical context. Today the package is split as described in §3 and the implementation lives in [`connectus/workflow_state/`](workflow_state/__init__.py:1) — `connectus/workflow_state.py` is now a thin backward-compatibility shim.* + +[`connectus/workflow_state.py`](workflow_state.py) is a single ~2 585-line +script (historical; today's implementation is split across the +[`workflow_state/`](workflow_state/__init__.py:1) package) that did seven +distinct jobs: + +| Concern | Where it lives today | Lines | +|---|---|---| +| Step / column declarations | [`STEPS`](workflow_state.py:171) literal + derived constants | 105–230 | +| Step model dataclass | [`Step`](workflow_state.py:134) | 134–144 | +| State predicates (`is_done`, `current_step`, `is_checked`) | top-level helpers | 247–290 | +| Cascade-reset / normalization engine | [`reset_after`](workflow_state.py:308), [`normalize_row`](workflow_state.py:319), [`apply_step_action`](workflow_state.py:704) | 304–750 | +| CSV I/O (atomic save, normalization on read/write) | [`load_csv`](workflow_state.py:358), [`save_csv`](workflow_state.py:379) | 354–428 | +| Per-cell schema validators (`Auth Details`, `Params to Commands`) | [`validate_auth_detail`](workflow_state.py:434), [`validate_params_to_commands`](workflow_state.py:462) | 430–578 | +| CLI commands & programmatic API | `cmd_*` + dispatch dict | 1042–2580 | + +External callers (verified in the workspace): + +- [`connectus/check_command_params.py`](check_command_params.py:589) — lazy + `from workflow_state import auth_param_ids, WorkflowError`. +- [`connectus/check_command_params_test.py`](check_command_params_test.py:2123) — patches `workflow_state.auth_param_ids`. +- The CLI itself (the `__main__` entrypoint and the + [`connectus-migration-SKILL.md`](connectus-migration-SKILL.md) which shells + out via `python3 connectus/workflow_state.py …`). +- [`workflow_state_test.py`](workflow_state_test.py:22) — imports ~30 public + names by name; this is the most comprehensive consumer and the strongest + back-compat constraint. + +### What is hardcoded today that should move to YAML + +The ranking below is by "amount of repeated literal data per item": + +1. **The 16 [`Step`](workflow_state.py:134) entries** ([`STEPS`](workflow_state.py:171)) — `index`, `name`, `kind`, `optional`, `setter`, `description`. Currently 33 lines of dense literal that every change touches. **Highest payoff.** +2. **The non-workflow data columns** ([`DATA_COLUMNS`](workflow_state.py:117)) — three identity/metadata column names. +3. **Sentinel / marker constants** ([`CHECK`](workflow_state.py:112), [`FAIL_MARK`](workflow_state.py:113), [`NA_MARK`](workflow_state.py:114), [`VALID_FLAG_VALUES`](workflow_state.py:123)) — all of these are part of the workflow's "shape", not its engine. +4. **The set of values that count as "done" for a checkpoint** ([`is_checked`](workflow_state.py:247) hardcodes `("✅", "✅", "YES", "N/A", "N/A", "true", "True", "done", "Done", "DONE")`). +5. **The "auth-parity flag → auto-N/A target" coupling** — the special case + between step #12 and step #13. Today implemented by string-comparing + [`AUTH_PARITY_FLAG_COLUMN`](workflow_state.py:219) and `"auth parity test passes"` in + [`cmd_set_auth_flag`](workflow_state.py:1382), [`cmd_markpass`](workflow_state.py:1469), + [`markpass_step`](workflow_state.py:987), and the programmatic API + ([`markpass_integration_step`](workflow_state.py:2425)). +6. **The `set-assignee` carve-out** — the rule that `set-assignee` and + `set-assignee-by-connector` skip the cascade-reset. Today expressed by + bypassing [`apply_step_action`](workflow_state.py:704) in + [`cmd_set_assignee`](workflow_state.py:1352) and + [`cmd_set_assignee_by_connector`](workflow_state.py:1756) and described in a comment. +7. **CLI verb → step-name mappings** — implicit in + [`NON_CHECKPOINT_STEPS`](workflow_state.py:224), explicit in the + [`COMMANDS`](workflow_state.py:2538) dict. The verb name *for setting a step* is + already on the [`Step`](workflow_state.py:134) dataclass; the verbs *that aren't + per-step* (`status`, `dashboard`, `next`, `list*`, `files`, `auth-params`, + `reset`, …) stay in code. +8. **JSON-cell schema rules for `Auth Details` and `Params to Commands`** — + delegated to [`auth_config_parser`](auth_config_parser/DESIGN.md) and to + [`validate_params_to_commands`](workflow_state.py:462). These are NOT in scope + for the YAML extraction; see §11 Open Questions. + +What we explicitly leave **in code**: + +- The cascade-reset semantics and the `is_done` predicate dispatch by step + `kind`. These are the engine; YAML declares which kind a step is and the + engine knows what to do with each kind. +- All CSV I/O, the `next/--mine/--connector` arg parser, all `cmd_*` + functions, the formatting/display helpers, and the programmatic API. +- The `Auth Details` / `Params to Commands` JSON validators (they have their + own grammars; the YAML config only names *which* validator runs on *which* + step). + +--- + +## 2. Proposed YAML Schema — `connectus/workflow_state_config.yml` + +The file is a single YAML document with three top-level sections: +[`identity_columns`](workflow_state_config.yml:1), [`steps`](workflow_state_config.yml:1), +and [`markers`](workflow_state_config.yml:1). All other current behaviour (cascade +reset, normalization, CLI dispatch) is engine, not config. + +The schema is intentionally flat and explicit — a human reading the YAML +should be able to reconstruct the entire workflow without consulting the +Python. + +### 2.1 Top-level structure + +```yaml +# connectus/workflow_state_config.yml +# Hardcoded declarative configuration for the connectus migration workflow +# state machine. The runtime engine lives in connectus/workflow_state/. +# Editing this file changes the workflow's shape; engine code does not need +# to change for declarative edits (adding/removing/reordering steps, +# changing descriptions, toggling optional, etc.). + +schema_version: 1 + +# Identity / metadata columns — present in the CSV but NOT part of the +# 16-step workflow. Never cleared by cascade reset. Order matters: this +# is the leftmost prefix of the CSV header. +identity_columns: + - name: "Integration ID" + description: "Unique human-readable id; primary key for find_row()." + - name: "Integration File Path" + description: "Repo-relative path to the integration's YML manifest." + - name: "Connector ID" + description: "ConnectUs connector this integration belongs to." + +# Sentinels and marker strings. These are the literal values written into +# CSV cells by the engine. Changing them is a data migration, not a code +# change — bump schema_version when you do. +markers: + check: "✅" + fail: "❌" + na: "N/A" + # Values that count as "done" when seen in a checkpoint cell on read. + # Includes the canonical `check` and `na` values plus historical aliases + # we tolerate from human-edited rows. + checkpoint_done_values: + - "✅" + - "YES" + - "N/A" + - "true" + - "True" + - "done" + - "Done" + - "DONE" + # Valid values for any step whose kind is `flag`. + flag_values: + - "YES" + - "NO" + - "N/A" + +# Cross-step interactions that don't fit the linear cascade model. +# Today there is exactly one: the auth-parity flag (#12) auto-fills the +# auth-parity test (#13) when set to NO/N/A. Modelled here so the engine +# can find the rule by name instead of string-comparing column titles. +step_interactions: + - kind: flag_auto_na_target + when_step: "requires auth parity test" # source step (must be kind=flag) + when_value_in: ["NO", "N/A"] + target_step: "auth parity test passes" # target step (must be kind=checkpoint) + write_value: "N/A" + +# The unified ordered sequence. Order in this list IS the step index +# (1-based). The engine asserts len(steps) >= 1 and unique names. +steps: + - name: "assignee" + kind: data + optional: false + setter: set-assignee + cascade_on_set: false # the set-assignee carve-out (override #5) + description: "Assign an owner to drive this integration's migration." + + - name: "Auth Details" + kind: data + optional: false + setter: set-auth + json_schema: auth_details # named validator the engine looks up + description: "Record the auth classification JSON (validated against the Auth Details schema)." + + - name: "Params to Commands" + kind: data + optional: false + setter: set-params-to-commands + json_schema: params_to_commands + cross_check: params_to_commands_no_auth_overlap + description: "Map each integration command to the parameter IDs it consumes (JSON)." + + - name: "Params for test with default in code" + kind: data + optional: false + setter: set-params-for-test + json_schema: any_json + description: "List the param IDs whose defaults live in the integration source (JSON)." + + - name: "Params same in other handlers" + kind: data + optional: true + setter: set-shared-params + json_schema: any_json + description: "Optional: list params shared verbatim with sibling handlers (or `skip`)." + + - name: "generated manifest" + kind: checkpoint + optional: false + setter: null + description: "Generate the ConnectUs manifest YAML for the integration." + + - name: "run manifest make validate" + kind: checkpoint + optional: false + setter: null + description: "Run `make validate` on the generated manifest." + + - name: "wrote/checked code" + kind: checkpoint + optional: false + setter: null + description: "Write or review the integration source code." + + - name: "shadowed command test passes" + kind: checkpoint + optional: false + setter: null + description: "Verify there are no shadowed/conflicting commands in the same connector." + + - name: "write tests" + kind: checkpoint + optional: false + setter: null + description: "Author unit tests for the integration." + + - name: "precommit/validate/unit tests passed" + kind: checkpoint + optional: false + setter: null + description: "Run pre-commit, validate, and unit tests via demisto-sdk pre-commit." + + - name: "requires auth parity test" + kind: flag + optional: false + setter: set-auth-flag + description: "Decide whether the integration needs an auth-parity test (YES/NO/N/A)." + + - name: "auth parity test passes" + kind: checkpoint + optional: false + setter: null + description: "Run the auth-parity test (auto-N/A when step 12 is NO/N/A)." + + - name: "param parity test passes" + kind: checkpoint + optional: false + setter: null + description: "Run the parameter-parity test." + + - name: "code reviewed" + kind: checkpoint + optional: false + setter: null + description: "Complete code review." + + - name: "code merged" + kind: checkpoint + optional: false + setter: null + description: "Merge the integration to the branch." +``` + +### 2.2 Per-field semantics (the loader enforces these) + +| Field | Required | Type | Meaning / constraint | +|---|---|---|---| +| `schema_version` | yes | int | Currently `1`. Loader rejects anything it doesn't know. | +| `identity_columns` | yes | list[obj] | ≥1 entry. Never modified by the engine. Order = leftmost CSV columns. | +| `identity_columns[].name` | yes | str | Non-empty, unique within the list. | +| `identity_columns[].description` | no | str | Documentation only. | +| `markers.check` | yes | str | The "passed" sentinel; written by `markpass` and accepted on read. | +| `markers.fail` | yes | str | Reserved (today only used as a constant; unused in writes). | +| `markers.na` | yes | str | The "N/A" sentinel; written by `skip`, by the flag-auto-NA interaction, and accepted on read. | +| `markers.checkpoint_done_values` | yes | list[str] | Superset that `is_checked` accepts. Must contain `markers.check` and `markers.na`. | +| `markers.flag_values` | yes | list[str] | The exact set a `flag`-kind step accepts. | +| `step_interactions[]` | no | list[obj] | Optional cross-step rules. Today only `flag_auto_na_target`. | +| `steps[]` | yes | list[obj] | ≥1 entry. Order = 1-based step index. | +| `steps[].name` | yes | str | Non-empty, unique across all steps. **This is also the CSV column title** — exact match required. | +| `steps[].kind` | yes | enum | One of `data`, `checkpoint`, `flag`. | +| `steps[].optional` | yes | bool | If `true`, the engine accepts `skip` (writes `markers.na`). | +| `steps[].setter` | conditional | str \| null | Required for `data` and `flag` kinds; must be `null` for `checkpoint`. The CLI subcommand the user runs to set this step. | +| `steps[].cascade_on_set` | no | bool | Default `true`. When `false`, setting this step does NOT cascade-reset later steps (the `set-assignee` carve-out). | +| `steps[].json_schema` | no | str | Named validator: `auth_details`, `params_to_commands`, `any_json`, or absent (no JSON validation; raw string). The engine has a small registry mapping name → callable. Future cells slot in here. | +| `steps[].cross_check` | no | str | Named cross-step semantic check (e.g. `params_to_commands_no_auth_overlap`). Same registry pattern as `json_schema`. | +| `steps[].description` | yes | str | Used by `next` and `format_status`. | + +### 2.3 What the YAML does NOT encode + +- The cascade-reset rule itself (the engine's job). +- The `is_done` semantics per kind (engine). +- The grammar of `Auth Details` config expressions (lives in + [`auth_config_parser`](auth_config_parser/DESIGN.md)). +- CLI verbs that aren't per-step setters (`status`, `dashboard`, `next`, + `list*`, `files`, `auth-params`, `reset`, `reset-to`, `fail`, `markpass`, + `skip`, `at-step`, `show-step`, `set-assignee-by-connector`, `help`). + These are engine commands; only the per-step *setters* are listed in + YAML on each `Step`. +- Display formatting (the `format_*` helpers stay in code). + +--- + +## 3. Module Structure + +The new layout mirrors [`connectus/auth_config_parser/`](auth_config_parser/) at +a comparable scope, but stays small. Pragmatic over speculative. + +``` +connectus/ +├── workflow_state.py # Thin shim — re-exports the package's public API +│ # for back-compat. Imports * from the package. +│ # Existing callers `from workflow_state import …` +│ # keep working unmodified. +├── workflow_state_config.yml # NEW: hardcoded YAML, the source of truth +│ # for steps/columns/markers/interactions. +└── workflow_state/ + ├── __init__.py # Public API re-exports + CLI entrypoint glue. + ├── DESIGN.md # This file (moved or symlinked here). + ├── config_loader.py # Reads YAML, validates schema, builds typed config. + ├── schema.py # Dataclasses: Step, MarkerSet, IdentityColumn, + │ # StepInteraction, WorkflowConfig. + ├── state_machine.py # is_done / current_step / reset_after / + │ # apply_step_action / normalize_row. + ├── csv_io.py # load_csv / save_csv / find_row. + ├── validators.py # validate_auth_detail / validate_params_to_commands + │ # + the named-validator registry that maps + │ # YAML's `json_schema:` strings to callables. + ├── display.py # format_status / format_dashboard_row / + │ # format_step_value / format_next_line / … + ├── api.py # Programmatic API: get_integration_status, + │ # next_step_for, set_integration_auth, … + ├── cli.py # All cmd_* functions + COMMANDS dispatch + │ # + main(). The __main__ block lives here. + └── tests/ + ├── __init__.py + ├── test_config_loader.py # YAML loader & validator tests (NEW). + ├── test_state_machine.py # Existing engine tests, ported. + ├── test_csv_io.py # CSV round-trip / atomic save tests, ported. + ├── test_cli.py # cmd_* tests, ported. + └── test_api.py # Programmatic API tests, ported. +``` + +### Why this split (and not finer) + +- **`config_loader` separate from `schema`** — typed dataclasses (pure data, + no I/O) deserve their own module so tests can build a `WorkflowConfig` + directly without touching the YAML on disk. Mirrors the + [`types.py`](auth_config_parser/types.py) / [`parser.py`](auth_config_parser/parser.py) split. +- **`validators` separate from `state_machine`** — JSON schema validation + is per-cell and largely delegates to + [`auth_config_parser`](auth_config_parser/__init__.py); the state machine + doesn't import the validator implementations, only looks them up by name. +- **`display` and `cli` separate from `api`** — the programmatic API + ([`get_integration_status`](workflow_state.py:2165), + [`set_integration_auth`](workflow_state.py:2507), etc.) returns plain dicts + and is consumed by the SKILL via subprocess and (if it switches to + in-process) by Python imports. Display/CLI is print-only. +- **`workflow_state.py` shim stays at the old path** — preserves + `from workflow_state import auth_param_ids, WorkflowError` (used by + [`check_command_params.py:589`](check_command_params.py:589)) and the + ~30 imports in [`workflow_state_test.py:22`](workflow_state_test.py:22) + without touching either file. The shim is ~25 lines of `from + workflow_state.* import *` plus an `if __name__ == "__main__": + workflow_state.cli.main()` so the CLI invocation `python3 + connectus/workflow_state.py …` continues to work. + +--- + +## 4. Public API + +The shim at [`connectus/workflow_state.py`](workflow_state.py) re-exports +everything below from the new package. Tests and external callers see no +change. + +### 4.1 Constants — preserved (now derived from YAML) + +| Symbol | Today | After | +|---|---|---| +| [`CHECK`](workflow_state.py:112) | str literal | `_CFG.markers.check` | +| [`FAIL_MARK`](workflow_state.py:113) | str literal | `_CFG.markers.fail` | +| [`NA_MARK`](workflow_state.py:114) | str literal | `_CFG.markers.na` | +| [`VALID_FLAG_VALUES`](workflow_state.py:123) | set literal | `set(_CFG.markers.flag_values)` | +| [`VALID_AUTH_TYPES`](workflow_state.py:127) | derived from `AuthType` enum | unchanged (still imported from [`auth_config_parser`](auth_config_parser/__init__.py)) | +| [`DATA_COLUMNS`](workflow_state.py:117) | list literal | `[c.name for c in _CFG.identity_columns]` | +| [`STEPS`](workflow_state.py:171) | list[Step] literal | built by `config_loader.load()` | +| [`STEP_BY_NAME`](workflow_state.py:208) | dict | derived | +| [`STEP_BY_INDEX`](workflow_state.py:209) | dict | derived | +| [`WORKFLOW_COLUMNS`](workflow_state.py:212) | list | derived | +| [`WORKFLOW_DATA_COLUMNS`](workflow_state.py:213) | list | derived | +| [`CHECKPOINT_COLUMNS`](workflow_state.py:214) | list | derived | +| [`JSON_VALUED_COLUMNS`](workflow_state.py:215) | set | derived (`s.json_schema is not None`) | +| [`AUTH_PARITY_FLAG_COLUMN`](workflow_state.py:219) | str literal | derived: the `when_step` of the single `flag_auto_na_target` interaction (asserted unique). Falls back to the literal name on legacy YAML. | +| [`ALL_COLUMNS`](workflow_state.py:220) | list | derived | +| [`EXPECTED_COLUMN_COUNT`](workflow_state.py:221) | int | `len(ALL_COLUMNS)` | +| [`NON_CHECKPOINT_STEPS`](workflow_state.py:224) | dict | derived (`{s.name: s.setter for s in steps if s.setter}`) | + +### 4.2 Types — preserved + +| Symbol | Today | After | +|---|---|---| +| [`Step`](workflow_state.py:134) | dataclass | moved to `workflow_state.schema.Step`. Two new optional fields: `cascade_on_set: bool = True` and `json_schema: Optional[str] = None`. **Old constructor signature `Step(index, name, kind, optional, setter, description)` continues to work positionally** — defaults supply the new fields. | +| [`WorkflowError`](workflow_state.py:235) | exception | moved to `workflow_state.state_machine.WorkflowError` (re-exported). Unchanged. | + +### 4.3 Functions — preserved (signatures identical) + +State engine: +- [`is_checked(value: str) -> bool`](workflow_state.py:247) +- [`is_done(row, step) -> bool`](workflow_state.py:253) +- [`current_step(row) -> Optional[Step]`](workflow_state.py:265) +- [`get_current_step(row) -> Optional[str]`](workflow_state.py:275) (legacy alias) +- [`get_step(name) -> Step`](workflow_state.py:281) +- [`get_step_index(name) -> int`](workflow_state.py:292) +- [`reset_after(row, step) -> list[str]`](workflow_state.py:308) +- [`normalize_row(row) -> list[str]`](workflow_state.py:319) +- [`apply_step_action(row, target, new_value, *, verb) -> tuple[list[str], bool]`](workflow_state.py:704) +- [`reset_from_step(row, step_name) -> None`](workflow_state.py:961) (legacy) +- [`markpass_step(row, step_name) -> str`](workflow_state.py:987) (legacy) +- [`has_workflow_progress(row) -> bool`](workflow_state.py:900) + +CSV I/O: +- [`load_csv() -> list[dict]`](workflow_state.py:358) +- [`save_csv(rows) -> None`](workflow_state.py:379) +- [`find_row(rows, integration_id) -> Optional[int]`](workflow_state.py:421) + +Validators: +- [`validate_auth_detail(value) -> list[str]`](workflow_state.py:434) +- [`validate_params_to_commands(value) -> list[str]`](workflow_state.py:462) +- [`auth_param_ids(integration_id) -> list[str]`](workflow_state.py:609) — used by [`check_command_params.py`](check_command_params.py:589); back-compat critical. + +Display: +- [`format_status(row)`](workflow_state.py:794), [`format_dashboard_row(row)`](workflow_state.py:844), + [`format_step_value(row, step_name)`](workflow_state.py:859), + [`format_next_line(row)`](workflow_state.py:1984), [`format_by_assignee(rows, name)`](workflow_state.py:927). + +Listing/grouping: +- [`list_by_assignee`](workflow_state.py:912), [`list_by_connector`](workflow_state.py:918). + +Programmatic API (returns dicts): +- [`get_integration_status`](workflow_state.py:2165), [`get_integration_files`](workflow_state.py:2187), [`next_step_for`](workflow_state.py:2320), [`list_integrations_by_connector`](workflow_state.py:2356), [`integrations_for_assignee`](workflow_state.py:2366), [`assign_connector`](workflow_state.py:2376), [`markpass_integration_step`](workflow_state.py:2411), [`fail_integration_step`](workflow_state.py:2455), [`reset_integration_to_step`](workflow_state.py:2479), [`skip_integration_step`](workflow_state.py:2483), [`set_integration_auth`](workflow_state.py:2507). + +CLI (`cmd_*`) — all preserved by name. Unchanged behaviour. + +### 4.4 New public API (added by this refactor) + +| Symbol | Module | Purpose | +|---|---|---| +| `WorkflowConfig` (frozen dataclass) | `workflow_state.schema` | The fully-loaded, validated config. Has `.identity_columns`, `.markers`, `.steps`, `.step_interactions`, plus convenience properties (`step_by_name`, `step_by_index`, `workflow_columns`, …). | +| `MarkerSet` (frozen dataclass) | `workflow_state.schema` | The `markers:` block. | +| `IdentityColumn` (frozen dataclass) | `workflow_state.schema` | One identity column entry. | +| `StepInteraction` (frozen dataclass) | `workflow_state.schema` | One `step_interactions[]` entry; today only `flag_auto_na_target`. | +| `ConfigLoadError` (exception) | `workflow_state.config_loader` | Raised when the YAML is missing / malformed / fails schema validation. Carries `.errors: list[str]` like [`AuthConfigParseError`](auth_config_parser/types.py:159). | +| `load_config(path: str \| None = None) -> WorkflowConfig` | `workflow_state.config_loader` | Loads from `path` or from the default location next to the package. Cached behind a module-level singleton accessor. | +| `get_config() -> WorkflowConfig` | `workflow_state.config_loader` | Returns the singleton. First call triggers `load_config()`. Tests can call `_reset_config_for_testing()` (private) to swap in a fixture. | + +### 4.5 Removed — none + +Nothing is removed in the back-compat shim. Internally, the +`STEPS` list literal and the `Step()` constructor calls in +[`workflow_state.py:171-204`](workflow_state.py:171) get deleted (replaced by +the loader's output), but the names `STEPS`, `STEP_BY_NAME`, etc. remain +exported. + +--- + +## 5. Config Loading & Validation Strategy + +### 5.1 Lifecycle + +- **When**: lazily, on the first access to any derived constant (`STEPS`, + `CHECK`, etc.) or via explicit `get_config()` / `load_config()` calls. + In practice this is at module import time of the shim, because the shim + re-exports the constants. **An invalid YAML therefore fails *fast*** — + the very first `import workflow_state` raises `ConfigLoadError` with all + problems listed. +- **Cached**: `load_config()` memoizes by absolute path. Tests can call + `_reset_config_for_testing()` to clear the cache and load a fixture YAML. +- **Default path**: the file lives next to the package + (`connectus/workflow_state_config.yml`). The loader resolves it from + `__file__` so it works whether the package is on PYTHONPATH or invoked + directly. + +### 5.2 Validation rules (all enforced by the loader) + +The loader collects ALL errors and raises one `ConfigLoadError` whose +`.errors` lists every problem (matches the multi-error pattern in +[`validate_auth_details`](auth_config_parser/validator.py)): + +1. **YAML parses cleanly** — surfaces `yaml.YAMLError` line/column. +2. **Top-level shape** — exactly the keys + `{schema_version, identity_columns, markers, steps}` (plus optional + `step_interactions`). +3. **`schema_version == 1`** — anything else is rejected with a "this loader + doesn't know schema_version=N" message. +4. **`identity_columns`**: list of 1+ items, each with non-empty unique + `name`. +5. **`markers`**: all required keys present (`check`, `fail`, `na`, + `checkpoint_done_values`, `flag_values`); `markers.checkpoint_done_values` + is a non-empty list of strings and CONTAINS `markers.check` and + `markers.na`; `markers.flag_values` is a non-empty list of unique strings. +6. **`steps`**: list of 1+ items. For each step: + - `name` is a non-empty string, unique across all steps, and does NOT + collide with any `identity_columns[].name`. + - `kind ∈ {data, checkpoint, flag}`. + - `optional` is a bool. + - `description` is a non-empty string. + - **kind-specific rules**: + - `data`: `setter` must be a non-empty string. + - `flag`: `setter` must be a non-empty string. + - `checkpoint`: `setter` must be `null`/absent. + - `cascade_on_set` (if present) is a bool. + - `json_schema` (if present) is a known validator name (`auth_details`, + `params_to_commands`, `any_json`). + - `cross_check` (if present) is a known cross-check name + (`params_to_commands_no_auth_overlap`). +7. **Setters are unique** across steps (no two steps share a setter name). +8. **`step_interactions`**: each interaction's `when_step` and `target_step` + must reference real step names; `when_step` must be `kind=flag`; + `target_step` must be `kind=checkpoint`; `when_value_in` must be a + subset of `markers.flag_values`; `write_value` must be in + `markers.checkpoint_done_values`. **At most ONE + `flag_auto_na_target` interaction** per `when_step` (engine assumes + uniqueness when looking up the rule). + +### 5.3 Error surfacing + +- `ConfigLoadError` is raised with a multi-line message naming the file, + the section, and the field, e.g. + + ``` + ConfigLoadError: connectus/workflow_state_config.yml has 2 problem(s): + - steps[6] ('generated manifest'): kind=checkpoint requires setter to be null, got 'set-foo' + - markers.checkpoint_done_values: missing required value '✅' (markers.check) + ``` + +- Because import-time loading is the trigger, a malformed YAML stops the + CLI before any subcommand runs — no risk of partial state. +- Tests assert specific error substrings (mirroring the pattern at + [`workflow_state_test.py`](workflow_state_test.py:1128) for + `validate_auth_detail`). + +### 5.4 Why YAML and not JSON / TOML + +- Multi-line `description:` strings read better in YAML. +- The file is hand-edited, never machine-generated. +- The project's manifest files (`pack_metadata.json`, `*.yml`) are already + YAML-heavy; team familiarity is high. +- We add `pyyaml` as the only new dependency. (`PyYAML` is already + transitively present via `demisto-sdk`; no real new install.) + +--- + +## 6. Migration Plan (ordered) + +Each step is independently testable; the test suite stays green at every +step. + +1. **Create `connectus/workflow_state_config.yml`** with the exact 16 steps, + markers, identity columns, and the one `step_interactions` entry that + correspond to today's literal data. Bit-for-bit identical to current + `STEPS`. + +2. **Create `connectus/workflow_state/schema.py`** — the dataclasses + (`Step`, `MarkerSet`, `IdentityColumn`, `StepInteraction`, + `WorkflowConfig`). `Step` stays positionally compatible + (`Step(index, name, kind, optional, setter, description, cascade_on_set=True, json_schema=None)`). + No I/O, pure types, mirroring [`auth_config_parser/types.py`](auth_config_parser/types.py). + +3. **Create `connectus/workflow_state/config_loader.py`** — `load_config()`, + `get_config()`, `ConfigLoadError`, and `_reset_config_for_testing()`. + Implement all §5.2 rules. Add `tests/test_config_loader.py` with + one passing test per rule. + +4. **Create `connectus/workflow_state/state_machine.py`** by moving + [`is_checked`](workflow_state.py:247), [`is_done`](workflow_state.py:253), + [`current_step`](workflow_state.py:265), [`reset_after`](workflow_state.py:308), + [`normalize_row`](workflow_state.py:319), [`apply_step_action`](workflow_state.py:704), + [`_can_advance_to`](workflow_state.py:688), [`reset_from_step`](workflow_state.py:961), + [`markpass_step`](workflow_state.py:987), [`has_workflow_progress`](workflow_state.py:900), + [`get_step`](workflow_state.py:281), [`get_step_index`](workflow_state.py:292), + [`get_current_step`](workflow_state.py:275), [`WorkflowError`](workflow_state.py:235). + These functions consult `get_config()` instead of module-level literals. + The `flag_auto_na_target` interaction is read from config in + `markpass_step` and `markpass_integration_step` (the two places that + currently hardcode the #12→#13 coupling). + +5. **Create `connectus/workflow_state/csv_io.py`** by moving + [`load_csv`](workflow_state.py:358), [`save_csv`](workflow_state.py:379), + [`find_row`](workflow_state.py:421), [`_normalize_rows_with_warning`](workflow_state.py:341), + [`CSV_PATH`](workflow_state.py:110), [`BASE_DIR`](workflow_state.py:109). + +6. **Create `connectus/workflow_state/validators.py`** by moving + [`validate_auth_detail`](workflow_state.py:434), + [`validate_params_to_commands`](workflow_state.py:462), and the small + `_PARAMS_TO_COMMANDS_STRIP_HINT` constant. Add a registry: + + ```python + _NAMED_VALIDATORS = { + "auth_details": validate_auth_detail, + "params_to_commands": validate_params_to_commands, + "any_json": _validate_any_json, + } + + def get_named_validator(name: str) -> Callable[[str], list[str]]: + ... + ``` + + Cross-checks similarly: + + ```python + _NAMED_CROSS_CHECKS = { + "params_to_commands_no_auth_overlap": _check_params_to_commands_overlap, + } + ``` + + The `cmd_set_*` handlers look up their validator/cross-check by the + step's `json_schema`/`cross_check` field rather than hardcoding + `step_name == "Auth Details"`/`"Params to Commands"`. + +7. **Create `connectus/workflow_state/display.py`** by moving the + `format_*` helpers and the helpers they call + ([`_summary_value`](workflow_state.py:756), + [`_auth_other_connection_summary`](workflow_state.py:771), + [`_example_value_for`](workflow_state.py:1966), + [`_format_step_for_listing`](workflow_state.py:1669)). + +8. **Create `connectus/workflow_state/api.py`** by moving the dict-returning + helpers ([`get_integration_status`](workflow_state.py:2165) … through … + [`set_integration_auth`](workflow_state.py:2507)) and + [`get_integration_files`](workflow_state.py:2187). + +9. **Create `connectus/workflow_state/cli.py`** by moving every `cmd_*` + function, the [`COMMANDS`](workflow_state.py:2538) dict, the + [`_parse_next_flags`](workflow_state.py:2011) helper, and `main()`. The + CLI handlers look up their validator via the step's `json_schema` field. + +10. **Create `connectus/workflow_state/__init__.py`** with explicit + re-exports — every name listed in §4. + +11. **Replace `connectus/workflow_state.py` with a thin shim** that does + `from workflow_state import *` (the *package*) and runs `main()` when + invoked as `__main__`. Total ~25 lines. + +12. **Run the full test suite** [`workflow_state_test.py`](workflow_state_test.py). + Expected to pass unchanged — every imported name is still exported + from the shim. Fix any drift. + +13. **Add new tests** under `connectus/workflow_state/tests/test_config_loader.py` + (see §7 below). Optionally split the existing 2 619-line monolith into + `test_state_machine.py`/`test_cli.py`/`test_api.py`/`test_csv_io.py` — + not required by this refactor. + +14. **Update [`connectus/column-schemas.md`](column-schemas.md)** to + cross-link the new YAML as the canonical source of column names and + sentinels (the schema rules for individual cells stay in that file). + +15. **Update [`connectus/Readme.md`](Readme.md)** (skim suggests it + documents the workflow at length) to point at the YAML for the + "what are the steps" section. + +### Files created +- `connectus/workflow_state_config.yml` +- `connectus/workflow_state/__init__.py` +- `connectus/workflow_state/config_loader.py` +- `connectus/workflow_state/schema.py` +- `connectus/workflow_state/state_machine.py` +- `connectus/workflow_state/csv_io.py` +- `connectus/workflow_state/validators.py` +- `connectus/workflow_state/display.py` +- `connectus/workflow_state/api.py` +- `connectus/workflow_state/cli.py` +- `connectus/workflow_state/tests/__init__.py` +- `connectus/workflow_state/tests/test_config_loader.py` + +### Files modified +- `connectus/workflow_state.py` → reduced to ~25-line shim. +- `connectus/workflow_state_test.py` → ideally unchanged; if any tests + imported `_set_json_data_step` etc. (one does, line 2600), keep that + re-exported from the shim. +- `connectus/column-schemas.md` → cross-links updated. +- `connectus/Readme.md` → cross-links updated. + +### Functions deleted (replaced) +- The `STEPS = [...]` literal block in [`workflow_state.py:171-204`](workflow_state.py:171) (replaced by config-loader output). +- The `Step` `@dataclass` definition at [`workflow_state.py:134-144`](workflow_state.py:134) (moved to `schema.py`, augmented). + +No public function/method is deleted. + +--- + +## 7. Test Plan + +### 7.1 Existing test file ([`workflow_state_test.py`](workflow_state_test.py)) + +**Goal: zero changes required.** The shim re-exports every imported name +([`workflow_state_test.py:22-82`](workflow_state_test.py:22)) so the test +file is the strongest signal that back-compat held. + +The few specific assertions that hardcode current behavior need a sanity +re-check: + +| Test | Assertion | After refactor | +|---|---|---| +| [`test_steps_has_exactly_16_entries`](workflow_state_test.py:141) | `len(STEPS) == 16` | passes — YAML has 16 entries | +| [`test_first_step_is_assignee`](workflow_state_test.py:144) | `STEPS[0].name == "assignee"` | passes | +| [`test_only_step_5_is_optional`](workflow_state_test.py:152) | step 5 optional, others not | passes | +| [`test_step_names_match_workflow_columns_in_order`](workflow_state_test.py:169) | derived order | passes | +| [`test_workflow_data_columns_derived`](workflow_state_test.py:172) | exact list | passes | +| [`test_checkpoint_columns_derived`](workflow_state_test.py:181) | exact list | passes | +| [`test_json_valued_columns_derived`](workflow_state_test.py:195) | exact set | passes (the four data-with-json_schema steps) | +| [`test_non_checkpoint_steps_mapping`](workflow_state_test.py:203) | exact dict | passes | +| [`test_total_column_count_unchanged`](workflow_state_test.py:213) | `EXPECTED_COLUMN_COUNT == 19` | passes (3 identity + 16 workflow) | +| [`test_data_columns_unchanged`](workflow_state_test.py:217) | exact list | passes (matches `identity_columns:`) | + +The atomic-save and CSV-normalization tests +([`TestAtomicSaveCsv`](workflow_state_test.py:1457)) move with `csv_io.py` +unchanged. + +The one place to watch: the test at +[`workflow_state_test.py:2600`](workflow_state_test.py:2600) does +`from workflow_state import _set_json_data_step`. The shim must re-export +private names too — easiest is `from workflow_state.cli import *` plus an +explicit `from workflow_state.cli import _set_json_data_step` line in the +shim. Document this in the shim. + +### 7.2 New tests — `connectus/workflow_state/tests/test_config_loader.py` + +| Test | Description | +|---|---| +| `test_loads_default_yaml` | The hardcoded `workflow_state_config.yml` loads cleanly and produces 16 steps, 3 identity columns, the expected markers. | +| `test_singleton_caches` | `get_config()` returns the same instance on repeated calls. | +| `test_reset_for_testing_clears_cache` | `_reset_config_for_testing()` causes the next `get_config()` to re-read. | +| `test_missing_file_raises_clear_error` | Pointing the loader at a non-existent path raises `ConfigLoadError` with the path in the message. | +| `test_invalid_yaml_syntax_raises` | Malformed YAML produces a `ConfigLoadError` mentioning line/column. | +| `test_unknown_schema_version_rejected` | `schema_version: 99` → error. | +| `test_extra_top_level_keys_rejected` | `something_else: foo` → error naming the extra key. | +| `test_missing_top_level_keys_rejected` | Drop `steps` → error naming `steps`. | +| `test_identity_column_duplicate_name_rejected` | Two with the same name → error. | +| `test_identity_column_collides_with_step_name_rejected` | `identity_columns[].name == steps[].name` → error. | +| `test_step_kind_invalid_rejected` | `kind: foo` → error. | +| `test_data_step_without_setter_rejected` | data kind + missing setter → error. | +| `test_flag_step_without_setter_rejected` | flag kind + missing setter → error. | +| `test_checkpoint_step_with_setter_rejected` | checkpoint kind + setter present → error. | +| `test_duplicate_step_name_rejected` | Two steps with same name → error. | +| `test_duplicate_setter_rejected` | Two steps with same setter → error. | +| `test_unknown_json_schema_name_rejected` | `json_schema: nope` → error listing valid names. | +| `test_unknown_cross_check_name_rejected` | `cross_check: nope` → error. | +| `test_markers_check_must_be_in_done_values` | `markers.check` not in `checkpoint_done_values` → error. | +| `test_markers_na_must_be_in_done_values` | `markers.na` not in `checkpoint_done_values` → error. | +| `test_step_interactions_unknown_step_rejected` | `when_step` references a non-existent step → error. | +| `test_step_interactions_when_step_must_be_flag` | `when_step` is a checkpoint → error. | +| `test_step_interactions_target_step_must_be_checkpoint` | `target_step` is data → error. | +| `test_step_interactions_when_value_in_must_be_subset_of_flag_values` | Value not in `markers.flag_values` → error. | +| `test_step_interactions_write_value_must_be_in_done_values` | Bad `write_value` → error. | +| `test_step_interactions_at_most_one_per_when_step` | Two interactions referencing same `when_step` → error. | +| `test_multi_error_collection` | Three problems in one YAML → `ConfigLoadError.errors` has 3 entries. | +| `test_cascade_on_set_default_true` | Step without `cascade_on_set` field → engine treats as `True`. | +| `test_cascade_on_set_false_assignee` | The `assignee` step is loaded with `cascade_on_set: False`. | + +### 7.3 New engine tests (in `test_state_machine.py` if split, otherwise added to `workflow_state_test.py`) + +| Test | Description | +|---|---| +| `test_set_assignee_uses_config_carve_out` | With a config where `cascade_on_set: false` is moved to a different step, `set-assignee` would cascade-reset (proves the rule comes from config, not hardcode). | +| `test_flag_auto_na_target_interaction_drives_step_13` | Mock a config where the `flag_auto_na_target` writes a different value, verify the engine honours it. | +| `test_engine_handles_alternative_marker_check` | Build a `WorkflowConfig` with `markers.check = "DONE"` and verify `is_done`/`apply_step_action` use it. | + +These tests use `_reset_config_for_testing()` + a fixture YAML written to +`tmp_path`, then assert on engine behavior. + +--- + +## 8. Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Shim doesn't re-export every name some test/external caller uses | The shim does `from workflow_state.X import *` for every submodule + explicit re-imports for known underscore-prefixed names ([`_set_json_data_step`](workflow_state_test.py:2600)). The full test suite catches the rest. | +| Import-time YAML load slows CLI startup | `pyyaml.safe_load` on a ~150-line file is sub-millisecond; negligible vs. CSV I/O. | +| YAML and CSV header drift | The loader asserts `identity_columns + steps` produce the exact CSV header on `load_csv()` — the existing `if fieldnames != ALL_COLUMNS` warning at [`workflow_state.py:363`](workflow_state.py:363) keeps working. | +| The `Step` dataclass picks up new optional fields and a downstream caller breaks | New fields have defaults; existing positional `Step(...)` constructions keep working. The struct is `frozen=True` so behavior is unchanged. No external caller constructs `Step` (verified by search). | +| YAML edits accidentally break the engine (e.g., remove the `flag_auto_na_target` interaction without removing #12 from `steps`) | The validator in §5.2 rule 8 catches it: a `flag` step with no interaction is allowed (engine just won't auto-fill #13), but the SKILL documents the coupling. Bigger structural changes (renaming step #13) get caught by the existing tests' explicit name asserts. | +| YAML is hand-edited and someone breaks the column header order | Engine raises at first `load_csv()` call with the existing +"CSV header does not match expected schema" warning. | + +--- + +## 9. Diagrams + +```mermaid +flowchart TD + YAML[workflow_state_config.yml] -->|load + validate| LD[config_loader.load_config] + LD -->|raises on error| ERR[ConfigLoadError] + LD --> CFG[WorkflowConfig dataclass] + CFG --> SCH[schema.py: Step / MarkerSet / IdentityColumn / StepInteraction] + + CFG --> SM[state_machine.py] + CFG --> IO[csv_io.py] + CFG --> VAL[validators.py] + CFG --> DSP[display.py] + CFG --> API[api.py] + CFG --> CLI[cli.py] + + SHIM[workflow_state.py shim] -->|from workflow_state import *| PKG[workflow_state package init] + PKG --> SM + PKG --> IO + PKG --> VAL + PKG --> DSP + PKG --> API + PKG --> CLI + + EXT1[check_command_params.py] -->|from workflow_state import auth_param_ids| SHIM + EXT2[workflow_state_test.py] -->|imports 30+ names| SHIM + EXT3[connectus-migration-SKILL.md] -->|python3 connectus/workflow_state.py …| SHIM + EXT3 --> CLI +``` + +```mermaid +flowchart LR + A[CSV row dict] --> B[is_done step] + B -->|kind=data| B1[non-empty cell] + B -->|kind=checkpoint| B2[value in markers.checkpoint_done_values] + B -->|kind=flag| B3[value in markers.flag_values] + + C[setter CLI cmd] --> D{step.cascade_on_set?} + D -->|true| E[apply_step_action: write + reset_after] + D -->|false| F[direct row write — no reset] + + G[set flag step] --> H{matches step_interactions flag_auto_na_target?} + H -->|yes and value in when_value_in| I[also write target_step = write_value] + H -->|no| J[no extra writes] +``` + +--- + +## 10. Design Constraints Checklist + +- [x] Pragmatic split — 7 modules, mirrors [`auth_config_parser/`](auth_config_parser/) without over-fragmenting. +- [x] Public API preserved verbatim via the shim. +- [x] No external caller (`check_command_params.py`) needs to change. +- [x] [`workflow_state_test.py`](workflow_state_test.py) does not need to change (modulo split-file imports if we choose to split). +- [x] YAML is the single source for steps/columns/markers/interactions. +- [x] Engine code knows nothing about specific step names; it works with whatever a valid YAML gives it. +- [x] Per-cell JSON validators stay in code (they're grammar, not config), but the *binding* of step → validator moves to YAML. +- [x] All errors collected and surfaced together (matches existing pattern in [`auth_config_parser`](auth_config_parser/DESIGN.md)). +- [x] No new runtime dependency beyond `pyyaml` (already transitively present). + +--- + +## 11. Open Questions + +1. **Should `cmd_set_assignee` and `cmd_set_assignee_by_connector`'s + carve-out be expressed as `cascade_on_set: false` on the `assignee` + step (current proposal), OR as a separate `interactions[]` rule + ("admin updates")?** The dataclass field is simpler and the engine + already needs to read it; the interactions list is only justified if + we expect more carve-outs. **Recommendation: go with + `cascade_on_set: false` (current proposal) and revisit if a second + carve-out shows up.** + + **DECIDED: `cascade_on_set: false` on the assignee step.** Implemented + exactly as proposed — see the `assignee` step in + [`workflow_state_config.yml`](workflow_state_config.yml:54). + +2. **Should `markers.checkpoint_done_values` include or exclude the + historical aliases (`"YES"`, `"true"`, `"True"`, `"done"`, `"Done"`, + `"DONE"`)?** Today they are accepted on read but never written. The + YAML lists them for parity with [`is_checked`](workflow_state.py:247); + if we want a strict-only mode (reject unknown values when reading), we + add a future field like `markers.strict_checkpoint_values: bool`. + **Recommendation: keep the historical aliases for now (no behavior + change); a stricter mode is a separate cleanup.** + + **DECIDED (REVERSED FROM ORIGINAL RECOMMENDATION): strict `"✅"` and + `"N/A"` only as of Q2 2026-05.** Historical aliases are no longer + recognized by [`is_checked()`](workflow_state/state_machine.py:24). + The canonical list lives in `markers.checkpoint_done_values` — + see [`workflow_state_config.yml:22-24`](workflow_state_config.yml:22) + for the breaking-change comment. + +3. **Should the per-cell JSON schema validators (`validate_auth_detail`, + `validate_params_to_commands`) ALSO move to YAML eventually + (e.g., declared via JSONSchema)?** Out of scope here; they are + grammar-rich (Auth Details has a config-expression mini-language) and + declarative validation would be much more code than the current + imperative validators. **Recommendation: leave in code, look up by + name from YAML.** + + **DECIDED: validators stay in code, looked up by name from YAML.** + The `json_schema.validator` field on each step in + [`workflow_state_config.yml`](workflow_state_config.yml:65) names the + in-code validator at + [`workflow_state/validators.py`](workflow_state/validators.py:1) + (`auth_details` delegates to + [`auth_config_parser.validate_auth_details`](auth_config_parser/validator.py:47); + `params_to_commands` lives at + [`workflow_state/validators.py:49`](workflow_state/validators.py:49)). + +4. **Should `CSV_PATH` itself be config-driven (currently a module-level + `os.path.join`)?** Today there's exactly one CSV. Moving it to YAML + is overkill until we have a second pipeline. **Recommendation: leave + `CSV_PATH` in `csv_io.py`.** + + **DECIDED: `CSV_PATH` stays in code.** Implemented at + [`workflow_state/csv_io.py`](workflow_state/csv_io.py:1) as proposed. + +5. **Should we split the existing 2 619-line + [`workflow_state_test.py`](workflow_state_test.py) into per-module test + files as part of this refactor?** Not required for correctness; would + make future edits easier. **Recommendation: defer to a separate + cleanup; this refactor's success criterion is "the existing test file + passes unchanged".** + + **DECIDED: deferred — single test file kept.** The success criterion + was met (562/562 tests pass against the split package via the shim); + a per-module test split remains open as future cleanup. diff --git a/connectus/workflow_state_config.yml b/connectus/workflow_state_config.yml new file mode 100644 index 00000000000..58b30b8f1ed --- /dev/null +++ b/connectus/workflow_state_config.yml @@ -0,0 +1,157 @@ +# connectus/workflow_state_config.yml +# Hardcoded declarative configuration for the connectus migration workflow +# state machine. The runtime engine lives in connectus/workflow_state/. +# Editing this file changes the workflow's shape; engine code does not need +# to change for declarative edits (adding/removing/reordering steps, +# changing descriptions, toggling optional, etc.). + +schema_version: 1 + +# Identity / metadata columns — present in the CSV but NOT part of the +# 16-step workflow. Never cleared by cascade reset. Order matters: this +# is the leftmost prefix of the CSV header. +identity_columns: + - {"name": "Integration ID", "description": "Unique human-readable id; primary key for find_row()."} + - {"name": "Integration File Path", "description": "Repo-relative path to the integration's YML manifest."} + - {"name": "Connector ID", "description": "ConnectUs connector this integration belongs to."} + +# Sentinels and marker strings. These are the literal values written into +# CSV cells by the engine. Changing them is a data migration, not a code +# change — bump schema_version when you do. +# +# Q2 BREAKING CHANGE (2026-05): historical aliases for "done" +# ("YES", "true", "True", "done", "Done", "DONE") have been DROPPED. +# Only the strict canonical values are accepted on read AND write. +markers: + check: "✅" + fail: "❌" + na: "N/A" + # Values that count as "done" when seen in a checkpoint cell on read. + # Strict-only as of Q2 breaking change. + checkpoint_done_values: + - "✅" + - "N/A" + # Valid values for any step whose kind is `flag`. + flag_values: + - "YES" + - "NO" + - "N/A" + +# Cross-step interactions that don't fit the linear cascade model. +# Today there is exactly one: the auth-parity flag (#12) auto-fills the +# auth-parity test (#13) when set to NO/N/A. Modelled here so the engine +# can find the rule by name instead of string-comparing column titles. +step_interactions: + - kind: flag_auto_na_target + when_step: "requires auth parity test" + when_value_in: ["NO", "N/A"] + target_step: "auth parity test passes" + write_value: "N/A" + +# The unified ordered sequence. Order in this list IS the step index +# (1-based). The engine asserts len(steps) >= 1 and unique names. +steps: + - name: "assignee" + kind: data + optional: false + setter: set-assignee + cascade_on_set: false + description: "Assign an owner to drive this integration's migration." + + - name: "Auth Details" + kind: data + optional: false + setter: set-auth + json_schema: {"validator": "auth_details"} + description: "Record the auth classification JSON (validated against the Auth Details schema)." + + - name: "Params to Commands" + kind: data + optional: false + setter: set-params-to-commands + json_schema: {"validator": "params_to_commands"} + cross_check: {"validator": "params_to_commands_no_auth_overlap"} + preserve_on_reset: true + description: "Map each integration command to the parameter IDs it consumes (JSON). Preserved across reset-to/fail (still wiped by set-auth and plain reset)." + + - name: "Params for test with default in code" + kind: data + optional: false + setter: set-params-for-test + json_schema: {"validator": "any_json"} + preserve_on_reset: true + description: "List the param IDs whose defaults live in the integration source (JSON). Preserved across reset-to/fail (still wiped by set-auth and plain reset)." + + - name: "Params same in other handlers" + kind: data + optional: true + setter: set-shared-params + json_schema: {"validator": "any_json"} + preserve_on_reset: true + description: "Optional: list params shared verbatim with sibling handlers (or `skip`). Preserved across reset-to/fail (still wiped by set-auth and plain reset)." + + - name: "generated manifest" + kind: checkpoint + optional: false + setter: null + description: "Generate the ConnectUs manifest YAML for the integration." + + - name: "run manifest make validate" + kind: checkpoint + optional: false + setter: null + description: "Run `make validate` on the generated manifest." + + - name: "wrote/checked code" + kind: checkpoint + optional: false + setter: null + description: "Write or review the integration source code." + + - name: "shadowed command test passes" + kind: checkpoint + optional: false + setter: null + description: "Verify there are no shadowed/conflicting commands in the same connector." + + - name: "write tests" + kind: checkpoint + optional: false + setter: null + description: "Author unit tests for the integration." + + - name: "precommit/validate/unit tests passed" + kind: checkpoint + optional: false + setter: null + description: "Run pre-commit, validate, and unit tests via demisto-sdk pre-commit." + + - name: "requires auth parity test" + kind: flag + optional: false + setter: set-auth-flag + description: "Decide whether the integration needs an auth-parity test (YES/NO/N/A)." + + - name: "auth parity test passes" + kind: checkpoint + optional: false + setter: null + description: "Run the auth-parity test (auto-N/A when step 12 is NO/N/A)." + + - name: "param parity test passes" + kind: checkpoint + optional: false + setter: null + description: "Run the parameter-parity test." + + - name: "code reviewed" + kind: checkpoint + optional: false + setter: null + description: "Complete code review." + + - name: "code merged" + kind: checkpoint + optional: false + setter: null + description: "Merge the integration to the branch." diff --git a/connectus/workflow_state_test.py b/connectus/workflow_state_test.py index 00cb846a1e6..d295c371c31 100644 --- a/connectus/workflow_state_test.py +++ b/connectus/workflow_state_test.py @@ -977,13 +977,17 @@ def test_reset_clears_all_workflow_columns_via_full_clear(self) -> None: def test_reset_after_clears_subsequent_only(self) -> None: row = _fully_complete_row() - cleared = reset_after(row, STEP_BY_NAME["wrote/checked code"]) + # Default respect_preserve=False keeps legacy "wipe everything after" + # semantics (used by the set-auth cascade path). + cleared, preserved = reset_after(row, STEP_BY_NAME["wrote/checked code"]) # Steps 1-8 untouched (assignee, Auth, P2C, P4T, Pshared, manifest, # validate, wrote) assert row["wrote/checked code"] == CHECK # 9+ cleared assert "shadowed command test passes" in cleared assert row["code merged"] == "" + # Legacy default does not honour preserve_on_reset: nothing reported. + assert preserved == [] # --------------------------------------------------------------------------- @@ -1420,6 +1424,161 @@ def test_set_integration_auth_not_found(self) -> None: result = set_integration_auth("Nope", VALID_AUTH_JSON_NONE) assert "error" in result and "not found" in result["error"].lower() + # ------------------------------------------------------------------ + # preserve_on_reset semantics — Option A + # ------------------------------------------------------------------ + # + # The three Params* data columns are tagged `preserve_on_reset: true` + # in workflow_state_config.yml. The semantics are: + # + # - reset-to / fail PRESERVE them (caller's blast radius shrinks). + # Exception: if the user explicitly names a preserved step as the + # reset-to target, that one step IS cleared; later preserved + # steps in the same operation are still kept. + # - set-auth STILL WIPES them (auth changes invalidate downstream + # artifacts; this is by design). + # - plain `reset` STILL WIPES everything (it's the "wipe the row" + # verb; no carve-outs). + # + # These tests pin the contract end-to-end through the programmatic + # API surface so a future refactor that breaks the wiring is caught. + + def test_fail_preserves_params_columns(self) -> None: + """fail-ing a checkpoint past Params* keeps the Params* values + intact. Demonstrates the central use case: "my pre-commit failed, + I shouldn't have to redo my per-command param research".""" + row = _fully_complete_row("X") + # Sanity — the fixture does set the Params* columns. + assert row["Params to Commands"] != "" + assert row["Params for test with default in code"] != "" + assert row["Params same in other handlers"] != "" + p2c_before = row["Params to Commands"] + p4t_before = row["Params for test with default in code"] + psh_before = row["Params same in other handlers"] + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + result = fail_integration_step( + "X", "precommit/validate/unit tests passed" + ) + assert "error" not in result + # Failed checkpoint cleared. + assert row["precommit/validate/unit tests passed"] == "" + # Later checkpoints cleared. + assert row["code merged"] == "" + # Params* columns preserved verbatim. + assert row["Params to Commands"] == p2c_before + assert row["Params for test with default in code"] == p4t_before + assert row["Params same in other handlers"] == psh_before + # The api response advertises which preserved columns retained + # values (only those that were non-empty). Order is workflow order. + assert "preserved" in result + assert result["preserved"] == [] # Params* are BEFORE the failed step. + + def test_fail_at_step_before_params_clears_params_BUT_preserved(self) -> None: + """When the fail target is BEFORE Params* in the workflow, the + Params* columns are in the blast radius — and preserve_on_reset + keeps them. Reported in result["preserved"].""" + row = _fully_complete_row("X") + p2c_before = row["Params to Commands"] + p4t_before = row["Params for test with default in code"] + psh_before = row["Params same in other handlers"] + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + # Reset all the way back to step 6 (`generated manifest`), + # which is the first checkpoint after the Params* trio. + result = fail_integration_step("X", "generated manifest") + assert "error" not in result + # Failed step + downstream non-preserved cleared. + assert row["generated manifest"] == "" + assert row["code merged"] == "" + # Params* (#3, #4, #5) are AHEAD of `generated manifest` (#6), + # so reset_after never visits them — they're trivially intact. + assert row["Params to Commands"] == p2c_before + + def test_fail_at_auth_preserves_params_via_respect_preserve(self) -> None: + """When the fail target is `Auth Details` (#2), Params* (#3-5) + ARE in the blast radius. preserve_on_reset must keep them.""" + row = _fully_complete_row("X") + p2c_before = row["Params to Commands"] + p4t_before = row["Params for test with default in code"] + psh_before = row["Params same in other handlers"] + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + result = fail_integration_step("X", "Auth Details") + assert "error" not in result + # Auth Details cleared (named explicitly). + assert row["Auth Details"] == "" + # Later non-preserved cleared. + assert row["generated manifest"] == "" + # Params* preserved. + assert row["Params to Commands"] == p2c_before + assert row["Params for test with default in code"] == p4t_before + assert row["Params same in other handlers"] == psh_before + # API response surfaces what was preserved. + assert set(result["preserved"]) == { + "Params to Commands", + "Params for test with default in code", + "Params same in other handlers", + } + + def test_fail_at_a_preserved_step_clears_THAT_step(self) -> None: + """Explicit-target carve-out: naming a preserved step as the fail + target overrides preservation FOR THAT STEP only. Later preserved + steps in the same operation are still preserved.""" + row = _fully_complete_row("X") + p4t_before = row["Params for test with default in code"] + psh_before = row["Params same in other handlers"] + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + result = fail_integration_step("X", "Params to Commands") + assert "error" not in result + # Named target IS cleared even though preserve_on_reset=true. + assert row["Params to Commands"] == "" + # Later preserved siblings (#4, #5) still preserved. + assert row["Params for test with default in code"] == p4t_before + assert row["Params same in other handlers"] == psh_before + # Later non-preserved (checkpoints) cleared as usual. + assert row["generated manifest"] == "" + assert set(result["preserved"]) == { + "Params for test with default in code", + "Params same in other handlers", + } + + def test_set_auth_still_wipes_params_columns(self) -> None: + """set-auth must continue to wipe Params* — auth changes + invalidate the per-command param contract by design.""" + row = _fully_complete_row("X") + assert row["Params to Commands"] != "" + assert row["Params for test with default in code"] != "" + assert row["Params same in other handlers"] != "" + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + result = set_integration_auth("X", VALID_AUTH_JSON_NONE) + assert "error" not in result + # Auth set. + assert row["Auth Details"] == VALID_AUTH_JSON_NONE + # Params* WIPED — preserve_on_reset is intentionally ignored on + # the set-auth cascade. + assert row["Params to Commands"] == "" + assert row["Params for test with default in code"] == "" + assert row["Params same in other handlers"] == "" + # Workflow rewinds to step #3. + assert result["current_step"] == "Params to Commands" + + def test_reset_integration_to_step_is_alias_for_fail(self) -> None: + """reset_integration_to_step is documented as a fail() alias. + Pin that it inherits the same preserve_on_reset behaviour.""" + from workflow_state import reset_integration_to_step + row = _fully_complete_row("X") + p2c_before = row["Params to Commands"] + with patch("workflow_state.load_csv", return_value=[row]), \ + patch("workflow_state.save_csv"): + result = reset_integration_to_step("X", "Auth Details") + assert "error" not in result + assert row["Auth Details"] == "" + assert row["Params to Commands"] == p2c_before + assert "Params to Commands" in result["preserved"] + def test_skip_integration_step_optional(self, monkeypatch) -> None: row = _blank_row("X") row["assignee"] = "A"