Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ai): First implementation of TritonServer metrics feature #5687

Merged
merged 36 commits into from
Feb 20, 2025

Conversation

pierantoniomerlino
Copy link
Contributor

@pierantoniomerlino pierantoniomerlino commented Jan 31, 2025

This PR adds a new API for managing status and performance metrics of an Inference Engine. The implementation for the Triton Server is added as well.

Related Issue: This PR fixes/closes N/A

Description of the solution adopted: The new InferenceEngineMetricsService interface provides APIs to get metrics from an inference engine in the form of a Map whose keys are the metric names.

The implementation on the TritonServer engine, retrieves the GPU metrics from the metrics ports via http and the model statistics from the inference port via grpc. Both metrics are parsed and re-arranged in a json format. The keys for the GPU metrics are:

gpu_metrics.<gpu_uuid>

where the gpu_uuid is an identifier for the GPU device. For the model statistics, instead, the keys are:

model_metrics.<model.name>.<version>

Moreover, the old TritonServer implementation has been deleted, since it has been deprecated in previous releases.

@pierantoniomerlino
Copy link
Contributor Author

@mattdibi could you please take a look at this PR, even if it is still in draft? I've to test some corner cases and check some code choices, but the main stuff is here.

@pierantoniomerlino pierantoniomerlino marked this pull request as ready for review February 10, 2025 15:06
@pierantoniomerlino
Copy link
Contributor Author

Tested on a Nvidia Jetson Orin Nano with the following scenarios:

  • containerized TritonServer on the Orin Nano
  • remote TritonServer
  • old implementation (remote only)

The native and containerized versions expose the metrics configuration, while the remote one hasn't any property. However, they still implement the InferenceEngineMetricService since the metrics are likely enabled in the Triton Server default configuration.

cardinality="0"
required="true"
default="true"
description="Enable the Triton Server Metrics feature. This property enables the default, CPU and GPU metrics, if available.">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="Enable the Triton Server Metrics feature. This property enables the default, CPU and GPU metrics, if available.">
description="Enable the Triton Server Metrics feature. This property enables the default CPU and GPU metrics, if available.">

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MMaiero The property enables the metrics feature that contains CPU, GPU and standard statistics. I'll rephrase in this way:

Enable the Triton Server Metrics feature. This property enables the standard statistics and CPU/GPU metrics, if available.

cardinality="0"
required="false"
default=""
description="A semi-colon separated list of metrics-specific configuration settings for the Triton Server Metrics. (i.e. counter_latencies=true)">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description="A semi-colon separated list of metrics-specific configuration settings for the Triton Server Metrics. (i.e. counter_latencies=true)">
description="A semi-colon separated list of Triton Server Metrics. (i.e. counter_latencies=true)">

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MMaiero This property allows to configure a metric, enabling or disabling the statistics that are emitted by the service. For example, when the metrics are enabled, a user can disable the counter_latencies setting --metrics-config counter_latencies=false. So, they aren't new metrics, but configurations for metrics that are already there.

@pierantoniomerlino pierantoniomerlino marked this pull request as draft February 17, 2025 16:24
@pierantoniomerlino pierantoniomerlino marked this pull request as ready for review February 19, 2025 09:54
pierantoniomerlino and others added 19 commits February 20, 2025 09:22
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
…clipse.kura.ai.triton.server.TritonServerContainerService.xml

Co-authored-by: Matteo Maiero <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
@pierantoniomerlino
Copy link
Contributor Author

The parser for the model statistics has been updated using the JsonFormat class provided by com.google.protobuf:protobuf-jav-util. The library is embedded in the org.eclipse.kura.ai.triton.server bundle.

The library has been successfully checked using the Eclipse Dash License Tool:

echo "com.google.protobuf:protobuf-java-util:3.25.3" | java -jar org.eclipse.dash.licenses-1.1.1-20250220.065102-441.jar -
[main] INFO Querying Eclipse Foundation for license data for 1 items.
[main] INFO Found 0 items.
[main] INFO Querying ClearlyDefined for license data for 1 items.
[main] INFO Found 1 items.
[main] INFO Vetted license information was found for all content. No further investigation is required.

Signed-off-by: pierantoniomerlino <[email protected]>
mattdibi
mattdibi previously approved these changes Feb 20, 2025
Signed-off-by: pierantoniomerlino <[email protected]>
Signed-off-by: pierantoniomerlino <[email protected]>
@pierantoniomerlino pierantoniomerlino merged commit ddef8d6 into develop Feb 20, 2025
5 checks passed
@pierantoniomerlino pierantoniomerlino deleted the tritonserver_metrics branch February 20, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants