Skip to content

OSD - 29470: To create E2E Tests for CAD - Cluster has gone missing - Infra Nodes turned off #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ratnam915
Copy link
Contributor

OSD - 29470: To create E2E Tests for CAD - Cluster has gone missing - Infra Nodes turned off

Attached is the successful test case execution.
OSD-29470_test.txt

@openshift-ci openshift-ci bot requested review from bng0y and rafael-azevedo May 12, 2025 13:50
Copy link
Contributor

openshift-ci bot commented May 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ratnam915
Once this PR has been reviewed and has the lgtm label, please assign fahlmant for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov-commenter
Copy link

codecov-commenter commented May 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 31.86%. Comparing base (7052a13) to head (4ea9c58).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #441   +/-   ##
=======================================
  Coverage   31.86%   31.86%           
=======================================
  Files          35       35           
  Lines        2445     2445           
=======================================
  Hits          779      779           
  Misses       1607     1607           
  Partials       59       59           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
}
if !newLogsFound {
fmt.Println("No new service logs found.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is the failure case, I'd love for this to actual Fail but stil cleanup.

Maybe starting all stopped Infras could be a AfterEach function in the context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bergmannf : The above code has been fixed to restart the nodes irrespective the test status, also the test case was run post the change and it was a success

Copy link
Contributor

@RaphaelBut RaphaelBut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks great to me. Good stuff!

fmt.Printf("ID: %s\nSummary: %s\nDescription: %s\n\n", log.ID(), log.Summary(), log.Description())
}
}
Expect(newLogsFound).To(BeTrue(), "No new service logs were found after infrastructure node shutdown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how we keep the test flexible by not checking the content of the servicelogs, but on the other hand, would it be possible for other automations to interfere with this test by sending unrelated servicelogs which would make this test pass then?

Copy link
Contributor

@RaphaelBut RaphaelBut May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh btw, just took a look, it seems to be a limited support reason, so maybe its worth to check if limited support has been set? ( I am not sure if ocm sends a servicelog when limited support is set, but seems to be the case other wise you test would have failed? :D)
https://github.com/openshift/configuration-anomaly-detection/blob/main/pkg/investigations/chgm/chgm.go#L25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RaphaelBut : Changes have been made to check the number of service logs before and after the change, only if new service logs are present they are printed and the test case fails if new service logs are not present.

Also for this test case this is the directive that we got :

AWS CCS: cluster has gone missing (infra nodes turned off)

Can be triggered by continuously turning off infrastructure nodes for 20 minutes. 
Expectation: [new service log for turned off infra](https://github.com/openshift/configuration-anomaly-detection/blob/179db6ae2797352e6485ce75e9e3c0f256075418/pkg/investigations/chgm/chgm.go#L29) in OCM
Recovery after the test: start the stopped instances again.

Hence that is what we are looking for, for the other test cases Limited Support Reason is the expectation and hence we are checking that

Copy link
Contributor

openshift-ci bot commented May 16, 2025

@ratnam915: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants