Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc on OS onboarding #112026

Merged
merged 13 commits into from
Feb 10, 2025
Prev Previous commit
Next Next commit
Update raw links
richlander authored Feb 6, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 0b7d88aa5b528590110c1d7d082c1fd4c044a4d5
56 changes: 29 additions & 27 deletions docs/project/os-onboarding.md
Original file line number Diff line number Diff line change
@@ -2,49 +2,46 @@

Adding support for new operating systems versions is a frequent need. This guide describes how we do that, including policies we use.

[Porting .NET to a new operating system or architecture](../design/coreclr/botr/guide-for-porting.md) is a related task. Some of these patterns apply, but the overall task will be much larger.
[Porting .NET to a new operating system or architecture](../design/coreclr/botr/guide-for-porting.md) is a related task. The following patterns likely apply, but the overall task is much larger in scope.

References:

- [.NET OS Support Tracking](https://github.com/dotnet/core/issues/9638)
- [.NET Support](https://github.com/dotnet/core/blob/main/support.md)
- [Prereq container image lifecycle](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/main/lifecycle.md)

Internal links:

- https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/940/Support-for-Linux-Distros
- https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/933/Support-for-Apple-Operating-Systems-(macOS-iOS-and-tvOS)
- https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/939/Support-for-Windows-Operating-Systems
- [Support for Linux Distros](https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/940/Support-for-Linux-Distros) (MS internal)
- [Support for Apple Operating Systems](https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/933/Support-for-Apple-Operating-Systems-(macOS-iOS-and-tvOS)) (MS internal)
- [Support for Windows Operating Systems](https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki/939/Support-for-Windows-Operating-Systems) (MS internal)

## Context

In most cases, we find that new OS versions _may_ uncover problems in dotnet/runtime and once resolved don't affect up-stack components or apps. A key design point of our runtime is to be a quite complete cross-platform and -architecture abstraction, so resolving OS compatibility breaks for higher-level code is our enduring intent.
In most cases, we find that new OS versions _may_ uncover problems in dotnet/runtime, but don't affect up-stack components or apps once resolved. A key design point of our runtime is to be a quite complete cross-platform and -architecture abstraction, so resolving OS compatibility breaks for higher-level code is an enduring intent.

Nearly all the APIs that touch native code (networking, cryptography) and deal with standard formats (time zones, ASN.1) are in dotnet/runtime. In many cases, we only see test breaks when we onboard a new OS, often from code that tests edge cases.

## Approach

Our rule is that we declare support for a new OS (for all supported .NET versions) after it is validated in dotnet/runtime `main`. We only hold support on additional testing in special cases (which are uncommon).
Our rule is that we declare support (for all [supported .NET releases](https://github.com/dotnet/core/blob/main/releases.md)) for a new OS version after it is validated in dotnet/runtime `main`. We will only hold support on additional testing in special cases (which are uncommon).

Our testing philosophy is based on percieved risk and past experience. The effective test matrix is huge, the product of OSes \* supported versions \* architectures. We try to make smart choices to skip testing most of the matrix while retaining much of the practical coverage. We also know where we tend to get bitten most when we don't pay sufficient attention. For example, our bug risk across Linux, macOS, and Windows is not uniform.

We use pragmatism and efficiency to drive our decision making. All things being equal, we'll choose the lowest cost approach.

## Testing

Testing is the bread-and-butter of OS onboarding, particularly for a mature runtime like ours. New OS support always needs some form of test enablement.
Testing is the bread and butter of OS onboarding, particularly for a mature runtime like ours. New OS support always needs some form of test enablement.

Linux and some Windows testing is done in container images. This approach enables us to test many and regularly changing OS versions in a fixed/limited VM environment. The container image creation/update process is self-service.
Linux, Wasm, and some Windows testing is done in container images. This approach enables us to test many and regularly changing OS versions in a fixed/limited VM environment. The container image creation/update process is self-service (discussed later).

We also have VMs (Linux and Windows) and raw metal hardware (Apple) for more direct testing. This is the primary model for Apple and Windows OSes. The VMs are and Apple hardware are relatively slow moving and require [support from dnceng](https://github.com/dotnet/dnceng).
We use VMs (Linux and Windows) and raw metal hardware (Apple) in cases where containers are not practical or for more direct testing. This is the primary model for Apple and Windows OSes. The VMs and Apple hardware are relatively slow to change and require support from dnceng (discussed later).

### Adding coverage

New OS coverage should be added/tested first in `main`. If changes are required, we should prove them out first in `main` before committing to shipping them in a servicing release, if necessary.

There are multiple reasons to add a new OS reference to a release branch:

- Known product breaks that require validation and regression testing.
- Known product (as opposed to test) breaks that require validation and regression testing.
- Past experience suggests that coverage is required to protect against risk.
- OS version is or [will soon go EOL](https://github.com/dotnet/runtime/issues/111818#issuecomment-2613642202) and should be replaced by a newer version.

@@ -58,41 +55,46 @@ We will often replace an older OS version with a new one, when it comes availabl

We should remediate any EOL OS references in our codebase. They don't serve any benefit and come with some risk.

In the case that a .NET version will be EOL in <6 months, new coverage can typically be skipped. We may even be able to skip remediating EOL OS references. We often opt to stop updating [Supported OSes](https://github.com/dotnet/core/blob/main/os-lifecycle-policy.md) late in support period for related reasons. A lazy approach is often the best approach late in the game. Don't upset what's working.
In the case that a .NET version will be EOL in <6 (and certainly <3) months, new coverage can typically be skipped. We may even be able to skip remediating EOL OS references. We often opt to stop updating [Supported OSes](https://github.com/dotnet/core/blob/main/os-lifecycle-policy.md) late in support period for related reasons. A lazy approach is often the best approach late in the game. Don't upset what's working.

## Building

Our [build methodology](https://github.com/dotnet/runtime/blob/main/docs/project/linux-build-methodology.md) is oriented around cross-compiling, enabling us to target an old OS version and run on newer ones. It is uncommon for us to need to make changes to the build to address new OS versions, however, there are [rare cases where we need to make adjustments](https://github.com/dotnet/runtime/issues/101944).

We use both containers and VMs for building, depending on the OS.
We use both containers and VMs for building, depending on the OS. If we test in a container, we likely build in a container. Same for VMs.

Our primary concern is ensuring that we are using [supported operating systems and tools for our build](https://github.com/dotnet/runtime/tree/main/docs/workflow/requirements).

Our Linux build containers are based on Azure Linux. We [typically need to update them](https://github.com/dotnet/runtime/issues/112191) with a new version of Azure Linux once per release. We do not update the toolset, however. That's fixed, per release.
Our Linux build containers are based on Azure Linux. We [typically need to update them](https://github.com/dotnet/runtime/issues/112191) with a new version of Azure Linux once per release. We do not update the toolset, however. That's fixed, per release.

For Apple, we likely need to make an adjustment at each macOS or iOS release to account for an XCode version no longer being supported.

## Prereqs containers

New images need to be created for each new OS version in the [dotnet/dotnet-buildtools-prereqs-docker](https://github.com/dotnet/dotnet-buildtools-prereqs-docker) repo.
New images need to be created for each new OS version in the [dotnet/dotnet-buildtools-prereqs-docker](https://github.com/dotnet/dotnet-buildtools-prereqs-docker) repo, for testing. We also need to create and update images for Linux and Wasm build environments.

The repo is self-service and largely self-explanatory. One typically creates a new image using the pattern demonstrated by the previous version. Look at commits and [blame](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blame/776324ff16d38e22fd9f06c9842ec338a4b98489/src/alpine/3.20/helix/Dockerfile) to find people who are best suited to help.

Installing/building the Helix client can be quite involved, particularly for Arm platforms. Don't struggle with that. Just ask for help.

Test container images are referenced in our pipeline files:
Container images are referenced in our pipeline files:

- [eng/pipelines/coreclr/templates/helix-queues-setup.yml](https://github.com/dotnet/runtime/blob/main/eng/pipelines/coreclr/templates/helix-queues-setup.yml)
- [eng/pipelines/libraries/helix.yml](https://github.com/dotnet/runtime/blob/main/eng/pipelines/libraries/helix.yml)
- [eng/pipelines/common/templates/pipeline-with-resources.yml](https://github.com/dotnet/runtime/blob/main/eng/pipelines/common/templates/pipeline-with-resources.yml)

- https://github.com/dotnet/runtime/blob/main/eng/pipelines/coreclr/templates/helix-queues-setup.yml
- https://github.com/dotnet/runtime/blob/main/eng/pipelines/libraries/helix.yml
- https://github.com/dotnet/runtime/blob/main/eng/pipelines/common/templates/pipeline-with-resources.yml
Notes:

Those files are for the `main` branch. The same files should be located in the same location in release branches.
- The first two links are for testing and the last for building.
- The links are for the `main` branch. Release branches should have the same layout.

Example PRs:

- <https://github.com/dotnet/runtime/pull/111768>
- <https://github.com/dotnet/runtime/pull/111504>
- <https://github.com/dotnet/runtime/pull/110492>
- <https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1282>
- <https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1314>
- [dotnet/runtime #111768](https://github.com/dotnet/runtime/pull/111768)
- [dotnet/runtime #111504](https://github.com/dotnet/runtime/pull/111504)
- [dotnet/runtime #110492](https://github.com/dotnet/runtime/pull/110492)
- [dotnet/dotnet-buildtools-prereqs-docker #1282](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1282)
- [dotnet/dotnet-buildtools-prereqs-docker #1314](https://github.com/dotnet/dotnet-buildtools-prereqs-docker/pull/1314)

### VMs