Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Health fine tuning lacking #208

Open
1 task done
stevedistef opened this issue May 2, 2024 · 6 comments
Open
1 task done

Resource Health fine tuning lacking #208

stevedistef opened this issue May 2, 2024 · 6 comments
Assignees
Labels
Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern question Further information is requested

Comments

@stevedistef
Copy link

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Description

One of my customers implemented AMBA GA on Production Subscriptions, and they are seeing a lot of ResourceHealthUnhealthyAlert. We can see from the policy it can be turned off, but they would like to fine tune it (thresholding)? Is that possible?
image

When checking the alert itself, we see only these options. Is there any way to fine tune this alert?

image

@stevedistef stevedistef added the question Further information is requested label May 2, 2024
@Brunoga-MS
Copy link
Collaborator

Hello @stevedistef ,
thanks for your feedback. Based on the UI provided by the Service / Resource Health alerts, it seems there are no options to further fine tune these alerts. The only possible fine tuning I can see here is the reduction of the statuses listed in the Previous resource status for which I need to investigate internally. However, one question came to mind: are these alerts fired for the same resource or for resources that are just named the same but located into different subscriptions or resource groups?

Thanks,
Bruno.

@Brunoga-MS Brunoga-MS self-assigned this May 3, 2024
@stevedistef
Copy link
Author

Buon Giorno @bruno and as thanks as always helping us adopt AMBA. You and the whole tiger team you have there under @paulgrimley really make it much easier than a DIY project :-)

OK so on this one, as we discussed, I can also see this environment has these resource health alerts and I filtered for one of the resources which shows up alot, sometimes with the same time stamp. I also filtered for only the last 30 days and then sorted by time:
image

When we checked the two which seemed to be redundant, they are actually different (2 different alerts).
WHen we check the first one with that same timestamp, we see this:
image

WHen we check the other which came at the same time, we see this different alert:
image

so the question becomes do we really need to see both....
When examining the actual alert, we see this: (I clicked on Alert Rule in previous screen shot to get here):
image

ANd then EDIT:
we see that perhaps we have set up too many "previous conditions":
image

I am going to ask the team using AMBA to go to Monito:ALerts:Alert Rules and edit the Resource Health alter for each of their subscriptions, removing the two previous conditions, and save it.
so repeating this step 4x in this case:
image

image

We will see if this is acceptable....

@stevedistef
Copy link
Author

Customer trying over the weekend!

@dbelso
Copy link

dbelso commented May 13, 2024

Hi All,
I tried the workaround and it seemed working for few days but now the issue got worse and flood our Monitoring page.
image

@Brunoga-MS
Copy link
Collaborator

Brunoga-MS commented May 13, 2024

Hello @dbelso and @stevedistef ,
from your communication it looks like the fine tuning we applied was partially working. At this point we need to investigate further to understand why this is happening. We will keep you posted.

Thanks,
Bruno.

@JoeyBarnes JoeyBarnes added the Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern label Jul 10, 2024
@MarcoJanse
Copy link

I have a sort like question. First of all, I was wondering why the ResourceHealthUnhealtyAlert has a target scope of All resources in Subscription>. I was expecting the MonitorDisable parameter to disable the ResourceHealth alerts for all the resources depending on the tag value I had set. (like Dev or Sandbox).

The amount of events from ResourceHealth for just one VM that's being powered off is quite overwhelming.

There's even an alert when the status does not actually transition:

Image

For now, I have created a suppression rule for a couple of test VMs that are stopped/started frequently.
Is there a better way to do this that I'm not seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pattern: ALZ 🚁 Issues / PR's related to the ALZ Pattern question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants