-
Notifications
You must be signed in to change notification settings - Fork 48
OSD - 29470: To create E2E Tests for CAD - Cluster has gone missing - Infra Nodes turned off #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ratnam915 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #441 +/- ##
=======================================
Coverage 31.86% 31.86%
=======================================
Files 35 35
Lines 2445 2445
=======================================
Hits 779 779
Misses 1607 1607
Partials 59 59 🚀 New features to boost your workflow:
|
} | ||
} | ||
if !newLogsFound { | ||
fmt.Println("No new service logs found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is the failure case, I'd love for this to actual Fail
but stil cleanup.
Maybe starting all stopped Infras could be a AfterEach
function in the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bergmannf : The above code has been fixed to restart the nodes irrespective the test status, also the test case was run post the change and it was a success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks great to me. Good stuff!
fmt.Printf("ID: %s\nSummary: %s\nDescription: %s\n\n", log.ID(), log.Summary(), log.Description()) | ||
} | ||
} | ||
Expect(newLogsFound).To(BeTrue(), "No new service logs were found after infrastructure node shutdown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how we keep the test flexible by not checking the content of the servicelogs, but on the other hand, would it be possible for other automations to interfere with this test by sending unrelated servicelogs which would make this test pass then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh btw, just took a look, it seems to be a limited support reason, so maybe its worth to check if limited support has been set? ( I am not sure if ocm sends a servicelog when limited support is set, but seems to be the case other wise you test would have failed? :D)
https://github.com/openshift/configuration-anomaly-detection/blob/main/pkg/investigations/chgm/chgm.go#L25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RaphaelBut : Changes have been made to check the number of service logs before and after the change, only if new service logs are present they are printed and the test case fails if new service logs are not present.
Also for this test case this is the directive that we got :
AWS CCS: cluster has gone missing (infra nodes turned off)
Can be triggered by continuously turning off infrastructure nodes for 20 minutes.
Expectation: [new service log for turned off infra](https://github.com/openshift/configuration-anomaly-detection/blob/179db6ae2797352e6485ce75e9e3c0f256075418/pkg/investigations/chgm/chgm.go#L29) in OCM
Recovery after the test: start the stopped instances again.
Hence that is what we are looking for, for the other test cases Limited Support Reason is the expectation and hence we are checking that
@ratnam915: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
OSD - 29470: To create E2E Tests for CAD - Cluster has gone missing - Infra Nodes turned off
Attached is the successful test case execution.
OSD-29470_test.txt