Skip to content

Enlighten GetAssembliesMetadata task for multithreaded mode #13570

@jankratochvilcz

Description

@jankratochvilcz

Enlighten GetAssembliesMetadata task for multithreaded mode

Parent: #11834

Context

GetAssembliesMetadata is one of the remaining tasks listed in the migration epic (#11834) under "Other (either simple or with unknown issues)". The task body itself is small (one Execute method, ~20 lines), but its cwd dependence is buried in a heavyweight helper — AssemblyInformation in src/Tasks/AssemblyDependency/AssemblyInformation.cs — which GetAssembliesMetadata constructs once per input path:

GetAssembliesMetadata.Execute
  → foreach (string assemblyPath in AssemblyPaths)
       → FileSystems.Default.FileExists(assemblyPath)        // cwd-relative if assemblyPath is relative
       → new AssemblyInformation(assemblyPath)               // stores _sourceFile = assemblyPath
            → IMetaDataDispenser.OpenScope(sourceFile, ...)            (NETFRAMEWORK)
            → Assembly.ReflectionOnlyLoadFrom(sourceFile)              (NETFRAMEWORK fallback)
            → File.OpenRead(_sourceFile)                               (PEReader path)
            → File.OpenRead(path) inside GetRuntimeVersion(_sourceFile)
            → NativeMethods.TryReadMetadataString(_sourceFile, ...)
       → AssemblyAttributes.AssemblyFullPath = _sourceFile   // echoes the input string verbatim
       → CreateItemWithMetadata(attributes) → ItemSpec = attributes.AssemblyFullPath

Five distinct file-system call sites inside AssemblyInformation consume _sourceFile directly via Path-relative APIs. All five silently consume process Environment.CurrentDirectory when _sourceFile is relative. Under multithreaded execution that is wrong — concurrent GetAssembliesMetadata instances may belong to different projects, and there is only one cwd per process.

[SupportedOSPlatform("windows")]

The task is decorated [SupportedOSPlatform("windows")] and the existing tests are gated #if NETFRAMEWORK. The migration must keep this attribute and must not regress the cross-target/cross-framework behavior — AssemblyInformation's NETFRAMEWORK code path uses COM IMetaDataDispenser, while the .NET Core path uses PEReader over File.OpenRead. Both branches need an absolute path string.

Sin 7 — cwd-dependence buried in AssemblyInformation

AssemblyInformation has many other callers (Reference.cs, RAR pipeline). We must not change its public/internal signature to thread TaskEnvironment through — that would either explode blast radius or require an overload that all RAR sites would have to opt into. The contract is already "callers pass an already-resolved path"; GetAssembliesMetadata is simply violating that contract today by trusting assemblyPath to be absolute.

Sin 1 risk — AssemblyFullPath echoes the input

AssemblyAttributes.AssemblyFullPath is set to _sourceFile verbatim (AssemblyInformation.cs line ~293). GetAssembliesMetadata.CreateItemWithMetadata then assigns referenceItem.ItemSpec = attributes.AssemblyFullPath. If we naively absolutize the path before constructing AssemblyInformation, the output ItemSpec will inflate from bin\Release\Foo.dll to C:\repo\Project\bin\Release\Foo.dll — a behavior change visible to every downstream target that consumes @(AssembliesMetadata). Existing tests assert t.AssembliesMetadata[0].ItemSpec.ShouldBe(assemblyPath) against the input value, so this is observable.

Sin 6 risk — silent skip vs. ArgumentException

Today, missing files are silently skipped: FileSystems.Default.FileExists(assemblyPath) simply returns false and the loop continues with no warning. After migration, calling TaskEnvironment.GetAbsolutePath("") or (null) throws ArgumentException and would abort the entire batch. Old behavior must be preserved: invalid/empty entries are quietly skipped, never thrown.

Approach

Apply the PR #13267 (MSBuild/CallTarget) absolutize-at-boundary pattern — pattern #1 in the SKILL's "Patterns for migrating tasks with cwd-dependent helpers" rank. AssemblyInformation is left untouched; the task hands it an already-absolute path string. Output ItemSpec is reconstructed from the original input string.

  1. Mark the task class. Apply [MSBuildMultiThreadableTask] and implement IMultiThreadableTask on GetAssembliesMetadata with public TaskEnvironment TaskEnvironment { get; set; } = TaskEnvironment.Fallback;. Keep the existing [SupportedOSPlatform("windows")] attribute. Keep the TaskExtension base class (the runtime default is supplied by TaskRouter, but the explicit Fallback initializer matches the SKILL sign-off checklist for tasks that consume TaskEnvironment APIs).
  2. Rewrite the loop body in Execute to:
    • Walk AssemblyPaths with a per-entry try/skip guard so one bad input does not abort the batch (matches today's silent-skip semantics — see Sin 6).
    • Skip null/empty entries before calling GetAbsolutePath (defensive — matches today's FileExists returning false).
    • Compute AbsolutePath abs = TaskEnvironment.GetAbsolutePath(assemblyPath); once, and use (string)abs for both the FileSystems.Default.FileExists probe and the new AssemblyInformation(...) construction.
    • Pass the original assemblyPath (not abs) into CreateItemWithMetadata for the ItemSpec (Sin 1 mitigation). The metadata coming back from AssemblyInformation (AssemblyName, RuntimeVersion, etc.) is consumed unchanged.
  3. Modify CreateItemWithMetadata to accept the original input path string as an extra parameter and assign referenceItem.ItemSpec = originalAssemblyPath; (the original string, exactly as the user supplied it). Continue to consume attributes.* for all the metadata KeyValuePairs. The attributes.AssemblyFullPath field is no longer read from attributes for the ItemSpec — it's effectively unused by this task after the migration, but must remain on AssemblyAttributes for RAR.
  4. Do not call .GetCanonicalForm() on the absolutized path. AssemblyInformation's file-system calls (File.OpenRead, OpenScope, ReflectionOnlyLoadFrom) all accept non-canonical absolute paths fine; canonicalization is unnecessary work. (Sin 5 N/A here — there are no dictionary/set keys built from the absolutized value.)

Why no ChangeWave

For absolute-path inputs (the only inputs the existing tests cover and the only inputs SDK-style consumers ever produce — see "Test coverage assessment / Integration callers"), output is byte-identical to today: FileExists(abs) == FileExists(originalAbs), AssemblyInformation(abs) reads the same bytes, and ItemSpec = originalAssemblyPath already round-trips through AssemblyFullPath = _sourceFile = originalAssemblyPath.

For relative inputs, today the path is resolved against Environment.CurrentDirectory; after migration it's resolved against TaskEnvironment.ProjectDirectory. In legacy (non-MT) execution the two are equal, so observable behavior is unchanged. The semantic change only manifests under MT, which is the entire point of the migration.

If unforeseen behavioral diffs are surfaced during review (e.g., a consumer that compared ItemSpec against an absolutized path), gate the diff behind a new ChangeWave following PR #13069's precedent.

Test coverage assessment

Existing direct unit tests (src/Tasks.UnitTests/GetAssembliesMetadata_Tests.cs, 2 cases, NETFRAMEWORK only)

  • CheckPresenceOfCustomCOMAssemblyAttributes — builds a sample COM assembly via RunnerUtilities.ExecMSBuild, runs GetAssembliesMetadata against the absolute output path, asserts every metadata field including ItemSpec.ShouldBe(assemblyPath).
  • CheckPresenceOfCOMAssemblyAttributes — points at %WINDIR%\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll (absolute), asserts the same schema.

Both feed absolute paths. Coverage of relative-path inputs, missing-file-skip behavior, null/empty inputs, and MT-specific concerns (project-relative resolution, concurrency) is zero.

Integration callers in this repo

GetAssembliesMetadata is invoked from Microsoft.Common.CurrentVersion.targets in the COM reference / _GenerateProjectReferenceMetadataCache flow (search for <GetAssembliesMetadata in the targets). End-to-end coverage of that flow lives in the SDK / VS repos, not here. We will not add E2E tests in this repo as part of this issue.

Gaps to fill in this PR

  • G1 — Concurrency test. Two GetAssembliesMetadata instances with different TaskEnvironment.ProjectDirectory values, each given the same relative path string pointing at an assembly that exists under each project dir. Assert each instance reads the assembly in its own project dir (different MajorVersion / Description / etc.) — confirming no cross-talk via Environment.CurrentDirectory. NETFRAMEWORK gate retained. (SKILL red-team Phase 4.)
  • G2 — Project-relative test. TaskEnvironment.ProjectDirectory != Environment.CurrentDirectory, relative input resolves against ProjectDirectory, output ItemSpec is the original relative string (Sin 1 regression guard).
  • G3 — Inject TaskEnvironment into existing tests. Set t.TaskEnvironment = TaskEnvironmentHelper.CreateForTest() in CheckPresenceOfCustomCOMAssemblyAttributes and CheckPresenceOfCOMAssemblyAttributes so the absolute-path assertions remain valid (Fallback uses cwd as project dir; absolute inputs are unaffected by base-dir choice).
  • G4 — Silent-skip regression test. Mix of invalid entries (null, "", a non-existent absolute path, a non-existent relative path) interleaved with one valid absolute path. Assert: Execute() returns true, AssembliesMetadata.Length == 1, no errors logged. Confirms Sin 6 mitigation — GetAbsolutePath's ArgumentException does not escape the loop.
  • G5 — ItemSpec round-trip test (Sin 1 guard). Absolute input with mixed separators / trailing dot (e.g., C:\foo\bar/baz.dll). Assert ItemSpec equals the input verbatim, not a canonicalized or re-rooted form.

Acceptance criteria

  • GetAssembliesMetadata decorated [MSBuildMultiThreadableTask] and implements IMultiThreadableTask with TaskEnvironment initialized to TaskEnvironment.Fallback.
  • [SupportedOSPlatform("windows")] attribute preserved.
  • Execute absolutizes each assemblyPath via TaskEnvironment.GetAbsolutePath before calling FileSystems.Default.FileExists and new AssemblyInformation(...).
  • AssemblyInformation is constructed with the absolutized path string; the original input string is used as the resulting item's ItemSpec (Sin 1).
  • AssemblyInformation's signature is not changed; no overload added; no call site outside GetAssembliesMetadata is touched.
  • Null/empty/non-existent inputs are silently skipped — Execute() still returns true, no ArgumentException escapes, no warning is logged for missing files (matches today's behavior — Sin 6).
  • Both existing tests CheckPresenceOfCustomCOMAssemblyAttributes and CheckPresenceOfCOMAssemblyAttributes pass on net472 with no behavior changes; every existing ShouldBe assertion holds, including ItemSpec.ShouldBe(assemblyPath).
  • No AbsolutePath value leaks into ItemSpec, into the AssemblyFullPath metadata projection, or into any Log.LogError/LogWarning/LogMessage argument (SKILL Sin 1, Sin 2).
  • Tests G1, G2, G3, G4, G5 added and pass under the existing #if NETFRAMEWORK gate.
  • .\build.cmd -v quiet succeeds; Tasks.UnitTests and Build.UnitTests pass.
  • No new compiler warnings (warnings-as-errors in official build).
  • No ChangeWave required; if any test required modification beyond G3, document why in the PR and gate the diff behind the next wave.
  • multithreaded-task-migration SKILL sign-off checklist walked and passes (in particular: Sin 1 [Output] audit on AssembliesMetadata, Sin 6 null/empty-input audit on every GetAbsolutePath call, Sin 7 helper-graph audit confirming no other cwd consumer remains reachable from Execute).

Implementation order

  1. Capture baseline outputs of GetAssembliesMetadata for one absolute path against the existing Custom_COM test asset on current main. Save the ItemSpec and the full metadata KVP set as the byte-for-byte oracle.
  2. Apply attribute + interface to GetAssembliesMetadata. Do not yet change the loop body.
  3. Rewrite the Execute loop to absolutize at the boundary, hand absolutized strings to AssemblyInformation, and reconstruct ItemSpec from the original input. Wrap the per-entry body in a try/skip to preserve silent-skip semantics for null/empty inputs.
  4. Update CreateItemWithMetadata to take the original input string and assign it to ItemSpec.
  5. Update existing tests to inject TaskEnvironment (G3).
  6. Add G1 (concurrency), G2 (project-relative + ItemSpec preservation), G4 (silent-skip), G5 (ItemSpec round-trip) tests.
  7. Run full Tasks.UnitTests (NETFRAMEWORK leg) and Build.UnitTests; verify clean.
  8. Run repo build with -v quiet to ensure no new warnings.

Risks / open questions

  • AssemblyInformation has additional callers in the RAR pipeline (Reference.cs, etc.). This issue's scope explicitly excludes them — only GetAssembliesMetadata's call site is migrated. RAR has its own much larger migration tracked separately (cf. PR Enlighten RAR task #13319 for the heavyweight pattern).
  • The COM/IMetaDataDispenser path inside AssemblyInformation uses process-wide COM apartment state. This is orthogonal to cwd; it is not in scope for this issue but should be noted if a future GetAssembliesMetadata parallelism PR lands.
  • End-to-end coverage of the _GenerateProjectReferenceMetadataCache flow under MT lives in the SDK/VS repo. Coordination recommended before MT mode ships generally, but not blocking for this PR — the task is correct in isolation once migrated.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions