Skip to content

Add quality metrics for dotnetup #52792

Open
nagilson wants to merge 30 commits intodotnet:release/dnupfrom
nagilson:nagilson-dotnetup-telem-otl
Open

Add quality metrics for dotnetup #52792
nagilson wants to merge 30 commits intodotnet:release/dnupfrom
nagilson:nagilson-dotnetup-telem-otl

Conversation

@nagilson
Copy link
Member

@nagilson nagilson commented Feb 2, 2026

Resolves #50609
(allow #52717 to be merged first - this pr needed code from that branch so we need to merge it back in)

Add OpenTel Data for dotnetup

Summary

This PR introduces telemetry for the dotnetup CLI tool using OpenTelemetry with Azure Monitor. The telemetry provides actionable insights into installation success rates, user behavior patterns, and error analysis while maintaining PII rules.

The current connection key uses my own local app insights because we'll switch to the SDK CLI one once that work has been done by someone else in the team. That shouldn't be used in production, but since this is a dev branch we are not yet releasing, seems like that's ok. I've filed an issue to track: #52785

https://aka.ms/dotnetup-telemetry points to the release/dnup branch for the doc for now, but eventually it'd go on ms learn.
#52784

Key Features

Telemetry Infrastructure

  • OpenTelemetry + Azure Monitor: Integrated Azure.Monitor.OpenTelemetry.Exporter for telemetry collection
  • Activity-based tracing: Commands and operations tracked as OpenTelemetry activities with structured tags
  • Configurable: Telemetry can be disabled via DOTNET_CLI_TELEMETRY_OPTOUT environment variable

Error Categorization System

  • Product Errors: Bugs, crashes, server issues - count against quality metrics
  • User Errors: Invalid input, permissions, disk full, network issues - tracked separately for UX improvement
  • 17 specific error codes including VersionNotFound, ManifestFetchFailed, HashMismatch, ArchiveCorrupted, etc.
  • Errors are actually thrown with proper codes throughout the codebase (not just defined)

Success Rate Metrics

  • Success rate calculation excludes user errors to measure true product quality
  • Tracks install.result (installed vs already_installed) for accurate installation counts
  • Version comparison between latest and prior releases

User Behavior Tracking (PII-Safe)

  • install.path_source: Where install path came from (explicit, global_json, default, etc.)
  • install.path_type: Classification of path (system_programfiles, user_profile, local_appdata) - not actual paths
  • install.has_global_json: Whether project has global.json
  • install.existing_install_type: Admin/User/none for existing installations
  • sdk.request_source: How SDK version was specified (explicit, default-latest, default-globaljson)
  • sdk.requested: Sanitized version string

PII Protection (Critical)

  • VersionSanitizer: All user-provided version strings are sanitized before telemetry
    • Known safe patterns pass through (e.g., "9.0", "latest", "9.0.100-preview.1")
    • Unknown patterns replaced with "invalid"
  • No raw exception messages: SetStatus() uses error type, not ex.Message
  • No RecordException(): Full exception objects not recorded (contain paths/PII)
  • Win32 errors: Use error codes (win32_error_5) instead of messages that may contain paths
  • Install paths: Classified by type, actual paths never recorded

Azure Workbook Dashboard

  • Success rate overview with version comparison
  • Command usage metrics and trends
  • Platform/environment breakdown (OS, architecture, CI vs interactive)
  • SDK installation analytics (most installed versions, request sources)
  • Separate sections for Product Errors vs User Errors (UX opportunities)
  • Performance percentiles (P50/P90/P99)
  • Daily active users tracking

Example Data:

image image

^ note some of this data is wrong / when the code was incorrect

Files Changed

New Files

  • Telemetry/DotnetupTelemetry.cs - Main telemetry singleton
  • Telemetry/ErrorCodeMapper.cs - Exception to error info mapping with categorization
  • Telemetry/VersionSanitizer.cs - PII-safe version string sanitization
  • Telemetry/dotnetup-workbook.json - Azure Workbook dashboard definition

Modified Files

  • CommandBase.cs - Template method for command telemetry
  • SdkInstallCommand.cs - Comprehensive install behavior tracking
  • InstallerOrchestratorSingleton.cs - Returns InstallResult, proper error throwing
  • NonUpdatingProgressTarget.cs / SpectreProgressTarget.cs - Operation-level telemetry
  • DotnetInstallException.cs - Extended error codes
  • DotnetArchiveExtractor.cs, DotnetArchiveDownloader.cs, ReleaseManifest.cs - Error code usage

Follow-up Items

  • Adapt telemetry notice for logging: The telemetry notice displayed to users should also be written to logs for visibility and debugging purposes
  • Create aka.ms URL: Need to create aka.ms/dotnetup-telemetry (or similar) pointing to the telemetry documentation in the release/dnup branch of dotnet/sdk repository
  • Documentation: Add telemetry documentation explaining what data is collected and how to opt out

Testing

  • Unit tests updated for error code categorization (17 error codes tested)
  • Manual testing of telemetry data in Azure Application Insights
  • Verified PII sanitization with various user inputs

Telemetry Notice

Users will see a telemetry notice on first run. The notice should link to documentation (via aka.ms redirect) that explains:

  • What data is collected
  • How data is used (improving dotnetup reliability and UX)
  • How to opt out (DOTNET_CLI_TELEMETRY_OPTOUT=1)

we should investigate if the error mapping can be outsourced as it seems silly we need to implement this ourselves
… is wrong failures

some of the categories may be incorrect, but this is a good starting point
I also initially included the sha but I want to be able to sort by error an dont have to parse out the sha which should be mappable to /from the version. Still, I kept the outsource of the build sha to a separate file bc I liked that isolated shareable pattern.
- Add llm detection
- first run disable env var
- stderr over stdout
consolidate error logic code
@nagilson nagilson marked this pull request as ready for review February 2, 2026 22:32
Copilot AI review requested due to automatic review settings February 2, 2026 22:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive telemetry infrastructure for the dotnetup CLI tool using OpenTelemetry with Azure Monitor. The implementation provides actionable insights into installation success rates, user behavior patterns, and error analysis while maintaining strict PII protection rules. The PR includes telemetry collection, error categorization, PII sanitization, test coverage, and Azure Workbook dashboards.

Changes:

  • Added OpenTelemetry telemetry infrastructure with Azure Monitor integration, activity-based tracing, and configurable opt-out
  • Implemented error categorization system with 17 specific error codes distinguishing product errors from user errors
  • Created PII protection mechanisms including version string sanitization, URL domain sanitization, and install path classification

Reviewed changes

Copilot reviewed 71 out of 71 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/dotnetup.Tests/dotnetup.Tests.csproj Excludes TelemetryIntegrationDemo from test compilation
test/dotnetup.Tests/VersionSanitizerTests.cs Comprehensive tests for version string sanitization with 241 test cases covering valid/invalid patterns
test/dotnetup.Tests/TestAssets/TelemetryIntegrationDemo/* Demo project showing how library consumers can integrate with telemetry ActivitySource
test/dotnetup.Tests/TelemetryTests.cs Tests for telemetry common properties, version sanitizer, URL sanitizer, error code mapper, and ActivitySource integration
test/dotnetup.Tests/Properties/AssemblyInfo.cs Module initializer to mark test runs as dev builds via environment variable
test/dotnetup.Tests/ParserTests.cs Tests for version option and Parser.Version property
test/dotnetup.Tests/ListCommandTests.cs Tests for new list command functionality
test/dotnetup.Tests/InfoCommandTests.cs Tests for new --info command functionality
test/dotnetup.Tests/ErrorCodeMapperTests.cs Tests for error code mapping and categorization with 17 error codes
test/dotnetup.Tests/ChannelVersionResolverTests.cs Tests for channel format validation
src/Installer/dotnetup/xlf/*.xlf Localization files updated with new strings for info/list commands and telemetry notice
src/Installer/dotnetup/dotnetup.csproj Added telemetry packages and versioning properties
src/Installer/dotnetup/docs/telemetry-notice.txt Documentation explaining telemetry data collection and opt-out
src/Installer/dotnetup/Telemetry/*.cs Core telemetry infrastructure classes including DotnetupTelemetry, ErrorCodeMapper, VersionSanitizer, etc.
src/Installer/dotnetup/Strings.resx New resource strings for info/list commands and telemetry notice
src/Installer/dotnetup/*.cs Updated progress targets, orchestrator, and commands to integrate telemetry
src/Installer/dotnetup/Commands/Sdk/Install/SdkInstallCommand.cs Comprehensive install behavior tracking with telemetry tags
src/Installer/dotnetup/Commands/List/*.cs New list command implementation
src/Installer/dotnetup/Commands/Info/*.cs New --info command implementation
src/Installer/dotnetup/CommandBase.cs Template method pattern for automatic telemetry in all commands
src/Installer/Microsoft.Dotnet.Installation/*.cs Updated library to throw DotnetInstallException with proper error codes and telemetry support
documentation/general/dotnetup/*.md Documentation for new list and info commands
Directory.Packages.props Added OpenTelemetry package versions
.github/copilot-instructions.md Build/test instructions for dotnetup

@nagilson
Copy link
Member Author

nagilson commented Feb 3, 2026

Failing test:

 Microsoft.DotNet.Tools.Bootstrapper.Tests.FirstRunNoticeTests.ShowIfFirstRun_CreatesSentinelFile [FAIL]
  Failed Microsoft.DotNet.Tools.Bootstrapper.Tests.FirstRunNoticeTests.ShowIfFirstRun_CreatesSentinelFile [6 ms]
  Error Message:
   Assert.True() Failure
Expected: True
Actual:   False
  Stack Trace:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant