-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVIDIA GPU] Introduce Monitoring Integration #12581
Open
strawgate
wants to merge
4,947
commits into
elastic:main
Choose a base branch
from
strawgate:nvidia_graphics
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tic#12074) Enable the creation of issues for flaky tests in the daily builds triggered using 9.0.0 as stack release.
…astic#12072) * add SQS calls and S3 permissions in docs * bump package version * fix pr id * add SQS GetQueueAttributes sort permissions
Credential construction by the v3.21 alpine results in system test failures with the error: private key should be a PEM or plain PKCS1 or PKCS8; parse error: asn1: structure error: tags don't match (16 vs {class:0 tag:13 length:45 isCompound:true}) {optional:false explicit:false application:false private:false defaultValue:<nil> tag:<nil> stringType:0 timeType:0 set:false omitEmpty:false} pkcs1PrivateKey @2 Pin alpine to v3.20 until the root of the issue is identified and fixed.
…n. (elastic#12092) Qualys can send empty XML response body with 200 success status. Handle this case as valid.
…2071) * Fix broken links * Update packages/google_workspace/_dev/build/docs/README.md Co-authored-by: Krishna Chaitanya Reddy Burri <[email protected]> * Fix tychon link * Fix Lumos link * Fix wiz link * Remove link to vulnerability data stream * Update wiz changelog and manifest * Update bbot changelog and manifest * Update cisco_duo changelog and manifest * Update ti_cybersixgill changelog and manifest * Update google_workspace changelog and manifest * Update lumos changelog and manifest * Update tychon changelog and manifest * Update thycotic_ss changelog and manifest * Update authentik changelog and manifest * update google workspace readme --------- Co-authored-by: Krishna Chaitanya Reddy Burri <[email protected]>
The source.ip field is never set, so this is redundant.
* Fix broken links * Remove the link from the Application insights integration * Update nats link as per shmsr suggestion * Add link on Jolokia parameters * Update citrix references for adc and waf * Add more specific links for adc and waf
…elastic#12103) *Added support for configurable retry options which was introduced in 8.16
… pipeline (elastic#12028) * fix optional chaining in the replica_status data stream pipeline
…ic#12107) Made with ❤️️ by updatecli Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…tic#12033) Include a new dynamic field for user_agent.version in pipeline tests in order to accept versions values with a trailing dot.
* Initial draft of the o365_metrics package with the `outlook_activity` data stream.
Add docs about retrieving ISAC feeds for Custom Threat Intelligence integration
…ic#12082) No dynamic mapping was being generated for tines.audit_log.inputs.inputs.options.*, and this package uses the tines.audit_log.inputs.inputs.options field directly, without having any mapping for it or its sub-properties. The workaround ensures that there is a mapping for tines.audit_log.inputs.inputs.* that serves for tines.audit_log.inputs.inputs.options as well as for its subobjects. The configured dynamic mapping was not being generated due to some issue in Fleet that we are investigating. We detected this issue while refactoring field mappings tests in elastic-package, more about this in elastic/elastic-package#2214[1]. [1]elastic/elastic-package#2214 (comment) Co-authored-by: Dan Kortschak <[email protected]>
…ase (elastic#12079) * bump CSPM templates URLs to use v8.17.0 * bump Asset Inv. templates URLs to use v8.17.0 * update versions (remove previews) * fix YAML
Change property connection_string to be a secret like in the other integrations.
* Fix broken links * Update changelog and manifest
…ic#12128) Made with ❤️️ by updatecli Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…0.108.0 (elastic#12131) Bumps [github.com/elastic/elastic-package](https://github.com/elastic/elastic-package) from 0.107.2 to 0.108.0. - [Release notes](https://github.com/elastic/elastic-package/releases) - [Changelog](https://github.com/elastic/elastic-package/blob/main/.goreleaser.yml) - [Commits](elastic/elastic-package@v0.107.2...v0.108.0) --- updated-dependencies: - dependency-name: github.com/elastic/elastic-package dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Mario Rodriguez Molins <[email protected]>
Changes added: - Add a limit parameter, that can be used to control the size of responses from TAXII servers (see https://docs.oasis-open.org/cti/taxii/v2.1/os/taxii-v2.1-os.html#_Toc31107517) - To avoid fetching duplicate indicators every interval, now the response header X-Taxii-Date-Added-Last is stored in the cursor and used to populate the added_after parameter every iteration (see https://docs.oasis-open.org/cti/taxii/v2.1/os/taxii-v2.1-os.html#_Toc31107519)
* Update link * Update changelog and manifest
elastic#11920) This is enabled per data stream to allow tuning of behaviour.
…nt" tag to documents with event.kind set to "pipeline_error" (elastic#12108) This manually replays the changes in elastic#12046.
…ONN_TIMEDOUT (elastic#12556) - Handle additional parsing cases for SSLVPN HTTPREQUEST and TCPCONN_TIMEDOUT events
Bump golang.org/x/net from 0.23.0 to 0.33.0 for mock service in /packages/websocket/_dev/deploy/docker/websocket-mock-service. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Add handling of Check Point firewall session logs in accordance with the ECS structure. Session logs aggregate multiple connection logs from the same network activity into a single event. The aggregation creates the following fields: - creation_time: UNIX timestamp of the first connection in the session. - last_hit_time: UNIX timestamp of the last recorded connection in the session. - duration: Duration (in seconds) of the session. - aggregated_log_count: Number of connection logs aggregated into the session. - connection_count: Number of connections recorded in the session. - update_count: Number of times the session was updated. This commit will: 1. Interpret creation_time and last_hit_time as dates, storing them in the ECS fields event.start and event.end, respectively. 2. Convert duration to nanoseconds, as per the ECS event.duration specification, and store it in the event.duration field. 3. Ensure checkpoint.aggregated_log_count, checkpoint.connection_count, and checkpoint.update_count are mapped to numeric types. Note that `checkpoint.aggregated_log_count`, `checkpoint.connection_count`, and `checkpoint.update_count` which were previously mapped dynamically as keyword data types are now statically mapped as integer data types. Closes elastic#11894
…ic#12383) * Add network.protocol for dns and dhcp pipelines
The current data flow for the fields changed here is NetworkMessageId[1] → m365_defender.event.network.message_id → email.message_id and InternetMessageId[1] → m365_defender.event.internet_message_id → email.local_id, but the definition of email.message_id is that it represents the RFC5322 Message-ID[2], corresponding to the Defender InternetMessageId value, and email.local_id[3] is the non-persistent identifier, reasonably corresponding to the Defender NetworkMessageId value. Also add m365_defender.event.internet_message_id to final remove processor. [1]https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailevents-table#:~:text=NetworkMessageId,sending%20email%20system [2]https://www.elastic.co/guide/en/ecs/current/ecs-email.html#field-email-message-id [3]https://www.elastic.co/guide/en/ecs/current/ecs-email.html#field-email-local-id
Fix system tests for Custom TI and Tychon for 9.0
* update drives data stream * update managed_volumes data stream * update monitoring_jobs data stream * update mssql_databases data stream * update physical_hosts data stream * update virtualmachines data stream * update docs * remove httpjson from manifest * add changelog entry * format * update docs * improve docs * rename first to pageSize * improve resource_timeout description * remove count metric from managed_volumes * make cluster and sla domain base fields * improve pageSize description * improve changelog * change virtual machines data stream name * update sample events and pipelines * build docs * run format * fix virtual machines tag * fix virtual machines sample event * build docs
… look-back" time option to all data streams (elastic#12382)
…c#12543) sampling.tail.storage_limit is 0 by default in 9.0. See elastic/apm-server#15467 . As UI validation requires unit (e.g. GB), set apm integration default storage limit to 0GB which carries the same meaning.
…cs mappings (elastic#12568) [elastic_agent] Add missing apm-server tail sampling monitoring metrics mappings Tail-based sampling monitoring metrics were missed in the bugfix in elastic#10414
This commit updates the Kubernetes Container Logs documentation to better explain that an input is always generated for every container. It also fixes a broken link.
…lastic#12579) - Fixed missing tz_map reference in agent files
💔 Build Failed
Failed CI StepsHistory |
Quality Gate passedIssues Measures |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Introduce NVIDIA GPU Monitoring Integration
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Deploy NVIDIA DGCM on a device with an NVIDIA GPU to get a prometheus metrics endpoint that you can provide to the integration.
If you have docker this just requires:
Configure the integration to point at the host running the container and GPU
http://nvidiahost:9400/metrics
Some metrics are not enabled by default with the container, enabling all metrics requires some extra steps.
Related issues
Fixes #11930
Screenshots
WIP: