Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent naming of Github Status Check related predicates #921

Open
FloThinksPi opened this issue Feb 25, 2025 · 5 comments
Open

Inconsistent naming of Github Status Check related predicates #921

FloThinksPi opened this issue Feb 25, 2025 · 5 comments

Comments

@FloThinksPi
Copy link

FloThinksPi commented Feb 25, 2025

According to how github names their functionality according to their doc

The current predicates:

has_status:
...
has_workflow_result:
...

do not really fit into the naming from github.
For github there are checks which have an attribute called status that can have multiple values. In case the status value is completed, a check has an additional attribute called conclusion which gives more information over the termination reason.

So i would see some config similar to following example be more understandable as it uses the same terminology and hierarchy.

has_checks:
 status: ["completed"] # By default contains all possible statuses
 conclusion: ["success", "skipped"] # default, logical and with status in case status contains "completed"
 checks:
      - "check-name-1"
      - "check-name-2"
      - "check-regex-name-.*"
has_workflows:
 status: ["completed"]  # By default contains all possible statuses
 conclusion: ["success", "skipped"] # default, logical and with status in case status contains "completed"
 workflows:
      - ".github/workflows/a.yml"
      - ".github/workflows/b.yml"

Which would i suppose result in the same, current behaviour yet giving a lot more options.

It would btw also allow to put a has_check predicate into the if condition of a policybot rule allowing e.g. to enforce a rule if a check exists with any state even when not having a conclusion yet and skipping it if there is no check at all by the if condition of a rule.
I currently am fiddling around tying to implement support for that scenario where i want to require a check/workflow having certain conclusions but only if the check/workflow actually was triggered/created and im struggling to add this into the current naming schema of has_workflow_results as well as has_status.

Whats the opinions regarding the naming of these predicates ?

@bluekeyes
Copy link
Member

The naming of the has_status predicate is reflects how these features have evolved in GitHub over time. Before adding the Checks API, GitHub originally only had commit statuses, which are sometimes also called status checks (for instance in the branch protection configuration.) When we added this predicate to Policy Bot, it only supported commit statuses. We added support for checks to the existing predicate later on, once they became available in GitHub Enterprise and we started using them.

We combined the predicates because for the average user, the distinction between commit statuses and checks is not very obvious and is usually not relevant. While the API makes this distinction, the pull request UI combines these in to a single element and they all need to have a neutral or positive conclusion for the pull request to be "green" and look like everything is good.

When we added has_workflows, the interest was in completed workflows, so it copied the style of the existing has_status predicate.

I currently am fiddling around tying to implement support for that scenario where i want to require a check/workflow having certain conclusions but only if the check/workflow actually was triggered/created

Could you expand on this, maybe with an example of the specific problem you are trying to solve? I don't think I've encountered this before. Usually knowing the conclusion of the check or workflow is sufficient

@FloThinksPi
Copy link
Author

@bluekeyes Ah understandable i did not know github also evolved the naming of their features there over time.
I hacked something together i still need to try out in #922, maybe i also overlooked something in the current rules/predicates that could match the usecase.

Given one has many many github action workflows lets say 50. They all run conditionally based on path filters e.g.

name: My Workflow
on:
  push:
    branches:
      - master
    tags:
      - 2[0-9][0-9][0-9].[0-1][0-9]
    paths: # But only when dependencyies change
      - '.github/workflows/myworkflow.yaml'
      - '.github/actions/build/action.yaml'
      - '.github/actions/prepare/action.yaml'
      - 'image/*'
      - '!image/**/*.md'
      - 'requirements.txt'
      - 'dev-requirements.txt'
      - 'library/python/**/setup.py'
  pull_request:
    branches:
      - release-2[0-9][0-9][0-9].[0-1][0-9]
    paths: # But only when dependencyies change
      - '.github/workflows/myworkflow.yaml'
      - '.github/actions/build/action.yaml'
      - '.github/actions/prepare/action.yaml'
      - 'image/*'
      - '!image/**/*.md'
      - 'requirements.txt'
      - 'dev-requirements.txt'
      - 'library/python/**/setup.py'

I now want to require that workflow to be successful for Pull Requests in case it was triggered(due to files in the PR are matching the path filters). In case it is not triggered(PR does not change something within those path filters), i also dont want to require it as i want to handle it similar as the skipped conclusion of a workflow/status check. However when a path filter in a workflow did not match, we dont have the conclusion skipped actually we have no status check/workflow run in the GitHub api at all. This is described/discussed in length over at https://github.com/orgs/community/discussions/13690.

In the discussion solutions arise like creating a workflow that checks that all others finished - which i want to avoid since it blocks compute capacity for the time the longest workflow/job takes to finish.
Also e.g. https://engineering.mixpanel.com/enforcing-required-checks-on-conditional-ci-jobs-in-a-github-monorepo-8d4949694340 descibes this pattern to solve this long lacking feature from github as an github app.
Since policybot does the same esentially but as github app which does not waste resources i came up with following policy config.

policy:
  approval:
    - and:
      - My Test
      - ...
approval_rules:
  - name: My Test
    description: My Description
    options:
      invalidate_on_push: true
    if:
      changed_files:
        paths:
          - '.github/workflows/myworkflow.yaml'
          - '.github/actions/build/action.yaml'
          - '.github/actions/prepare/action.yaml'
          - 'image/.*'
          - 'requirements.txt'
          - 'dev-requirements.txt'
          - 'library/python/.*/setup.py'
        ignore:
           - 'image/.*/.*.md'
    requires:
      conditions:
        has_workflow_result:
          conclusions: ["success", "skipped"]
          workflows:
            - ".github/workflows/myworkflow.yml"

And that works perfectly when the policybot changed_file predicates logic exaclty matches the workflows path filte logic. However it duplicates the logic of the running condition in an expensive/error prone way. While github actions uses file globs in policybot one uses regex. The conversion(getting the same behaviour right) has turned out to be quite error prone. Also writing/maintaining that logic is quite time consuming for 50+ workflows.

So i also tried out different methods to use the has_workflow_result predicate.

  1. When having no if condition it all it awaits a conclusion for the workflow as it was never triggert it will wait forever
  2. When putting has_workflow_result predicate into the if condition and expecting a empty conclusion it will default the conclusion to success. So i cannot "activate" that rule whenever a workflow run exists without conclusion yet this way.

What i could imagine would work out enforcing such roules would be e.g. having a timeout for the predicate somehow

policy:
  approval:
    - and:
      - My Test
      - ...
approval_rules:
  - name: My Test
    description: My Description
    options:
      invalidate_on_push: true
    requires:
      conditions:
        has_workflow_result:
          conclusions: ["success", "skipped"]
          timeout: 300 # Wait at most 300 seconds for an appearance of this workflow(not a conclusion but at least an entry in github) - default to infinite
          timeout_behaviour: success # Or default to fail
          workflows:
            - ".github/workflows/myworkflow.yml"

Or e.g. having a predicate that can be put into the if condition to check for the existence of a run and enforce conclusions success or skipped just in case there is really a run. Otherwise skip the entire rule.

policy:
  approval:
    - and:
      - My Test
      - ...
approval_rules:
  - name: My Test
    description: My Description
    options:
      invalidate_on_push: true
    if:
      has_workflow:
         workflows:
            - ".github/workflows/myworkflow.yml"
    requires:
      conditions:
        has_workflow_result:
          conclusions: ["success", "skipped"]
          workflows:
            - ".github/workflows/myworkflow.yml"

What do you tink, is this something already possible and i oversee some predicate/usage here or does it make sense to add support for this in one or the other way ?

@bluekeyes
Copy link
Member

Supporting this workflow is tricky because it's difficult to know for sure if a status will appear without encoding the same conditions (like the modified file paths) in the policy. If a status is not present at the moment we evaluate the pull request, does that mean it will never appear or does it only mean that GitHub hasn't started running the workflow yet?

In the current architecture, anything based around a timeout is not really feasible because Policy Bot only responds to events from GitHub. Without maintaining internal state, we can't easily tell when to "wake up" and evaluate a policy again.

In a previous comment on a different issue, I suggested a policy that might work here. It enforces the condition that if a check conclusion is present, it must be success or skipped. Unfortunately, it doesn't wait for pending checks to complete - as soon as any check passes, the policy passes.

Supporting pending / incomplete checks might be possible. We discussed this a bit on a previous PR and decided to restrict the check to completed checks only as an optimization. Handling every status event can generate a lot of extra work, so any changes here would have to be extra careful about performance. Limiting this to the has_workflow / has_workflow_result predicate might be more feasible than doing to for all checks from all sources.

@FloThinksPi
Copy link
Author

FloThinksPi commented Mar 3, 2025

Ah understandable, i wasn't sure when the policies are evaluated by the raw code. But it being event driven and stateless makes this really difficult to implement indeed. Status checks usually are not there from the very beginning and this could very well lead to a rule acceptance even if it takes several seconds to start a check.

When the predicates would considder the state(which they already have in the response of the github api) i think this could cover the scenario like you proposed in your comment in issue #760 without jobs in progress counting as successfull rules.
Also policybot already recieves the webhooks for:

workflow_run.requested
workflow_run.completed
checkrun.created
check_run.completed
pull_request.synchronize
status

Which would already be enough to implement a predicate that works in the if condition without doing extra API calls or recieving additional webhooks i guess ? The raw go code itself that itterates trough the rules should be blazingly fast no matter how many conditions one has compared to doing an extra api call etc.

The sole thing that is not covered is e.g. on a webhook event pull_request.synchronize there may be no checks at all yet and the rule would evaluate to true non the less. Maybe it makes sense to delay the evaluation of status_ckeck and workflow_result predicates by some seconds - sleeping in go isnt expensive due to go rutines scheduling - and it could be configureable. Policybot then states your workflows/status check predicates are evaluated with a 10s delay(0s by default to not alter current behaviour) after the webhook was recieved - if your CI/CD system fails to register its statuses/workflows within that timeframe they are considdered not present ? Would you think that a submission in this direction could be accepted ?

@bluekeyes
Copy link
Member

My performance concern with pending statuses is that while we already receive the necessary webhooks, the handlers currently exit immediately if the status is not for a completed run (there's a similar condition for traditional commit statuses.) If the policy does not use pending statuses, I'd like to avoid all of the API calls required to run a full policy evaluation. This especially important when dealing with integrations that post multiple pending statuses to provide progress updates.

There are probably ways to solve this (see for instance the existing idea of triggers that skips evaluating events when it is known that the policy will not change), but it makes the implementation more involved.

One option that might work and be easier to add is to not evaluate the policy on pending events, but to still consider pending statuses when evaluating based on other events. Basically, if a pending status appears in a rule's predicates, mark the rule as pending rather than skipped. I'd have to think about if this would have other consequences.

I'd like to also avoid sleeping if possible since it increases the latency from making a change in GitHub to receiving feedback from Policy Bot. It's also hard to get this value right - 10s is probably too much in most cases, leading to extra waiting, but there may still be cases where it wasn't enough.

What if we allowed matching (completed) statuses/workflows by pattern? I think this would allow you to add a rule like this:

- name: has workflow results
  requires:
    condition:
      has_workflow_result:
        conclusions: ["success", "skipped"]
        workflow_patterns:
          - "\\.github/workflows/.*"

This would block approval until at least one workflow finished, without caring about which one. You'd then combine it in an and block with the rules for specific workflows. The idea is that by the time any workflow completes, we can assume that the statuses for the other workflows are also present and therefore the other rules will also have the correct state. I think we still need to support pending statuses for this to work though.

You could also look at #853 (glob support), which doesn't directly help, but might make it easier to keep the file paths synchronized between your workflows and Policy Bot.

In the end though, it's unfortunate that GitHub doesn't support requiring conditional workflows natively. Policy Bot really wasn't built to handle this and is still primarily focused on code review usage, which is why all of the solutions feel like hacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants