Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.NET 9] Potential (~9% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected #112167

Open
hach-que opened this issue Feb 5, 2025 · 3 comments
Labels
area-ReadyToRun-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) untriaged New issue has not been triaged by the area owner

Comments

@hach-que
Copy link
Contributor

hach-que commented Feb 5, 2025

Description

#88288 is still an issue in .NET 9, but I can not re-open that issue as it is now locked. The only difference between .NET 8 and .NET 9 is that the crash call stack is now different.

Redirecting standard output and standard error of child processes for a ReadyToRun executable on macOS M1 can result in:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].ExecuteFromThreadPool(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()

Some important notes about this bug:

  • The key element here is PublishReadyToRun. If ReadyToRun is turned off, this crash won't happen.
  • This happens for RID osx-arm64. I do not have an x64 Mac so I can't say whether it reproduces on the Intel architecture.
  • It can only happen the first time the .NET process tries to run a child process. If you successfully start a child process, then all future child processes will successfully start without this crash. Therefore the reproduction steps involve running the resulting executable in a Bash-level while true loop to reproduce it.
  • Repeated testing shows that this crash manifests 5% of the time in the reproduction test case, so it's not consistent, but also not ultra-rare.
  • I can reproduce this error for any of the following child processes:
    • /usr/bin/git init <path>,
    • /usr/bin/git --version, and
    • /bin/bash -c true
  • This bug only happens when you redirect both standard output and standard error for a child process. If you redirect only one of them, it doesn't seem to happen.
  • This bug replicates in:

I managed to reproduce this issue on the following system:

  • Mac mini (M1, 2020), M1 chip
  • 16GB RAM
  • macOS 14.4.1
  • .NET SDK 9.0.102, installed via the official .NET SDK installer for macOS

I could also reproduce it on a second system with identical OS version and hardware configuration (building the binary again, rather than copying the built binary), so it is not specific to a single machine or environment.

Reproduction Steps

Create Program.cs with this content:

using System;
using System.Diagnostics;

var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, _) =>
{
    cts.Cancel();
};
var cancellationToken = cts.Token;

{
    if (Directory.Exists("/tmp/git-test"))
    {
        Directory.Delete("/tmp/git-test", true);
    }
    Directory.CreateDirectory("/tmp/git-test");
    var startInfo = new ProcessStartInfo
    {
        FileName = "/usr/bin/git",
        UseShellExecute = false,
        CreateNoWindow = false,
    };
    startInfo.RedirectStandardInput = false;
    startInfo.RedirectStandardOutput = true;
    startInfo.RedirectStandardError = true;
    startInfo.ArgumentList.Add("init");
    startInfo.ArgumentList.Add("/tmp/git-test");
    var process = Process.Start(startInfo)!;
    process.OutputDataReceived += (sender, e) =>
    {
        var line = e?.Data?.TrimEnd();
        if (!string.IsNullOrWhiteSpace(line))
        {
            Console.WriteLine(line);
        }
    };
    process.BeginOutputReadLine();
    process.ErrorDataReceived += (sender, e) =>
    {
        var line = e?.Data?.TrimEnd();
        if (!string.IsNullOrWhiteSpace(line))
        {
            Console.WriteLine(line);
        }
    };
    process.BeginErrorReadLine();
    try
    {
        // Use our own semaphore and the Exited event
        // instead of Process.WaitForExitAsync, since that
        // function seems to be buggy and can stall.
        var exitSemaphore = new SemaphoreSlim(0);
        process.Exited += (sender, args) =>
        {
            exitSemaphore.Release();
        };
        process.EnableRaisingEvents = true;
        if (process.HasExited)
        {
            exitSemaphore.Release();
        }

        // Wait for the process to exit or until cancellation.
        await exitSemaphore.WaitAsync(cancellationToken);
    }
    finally
    {
        if (cancellationToken.IsCancellationRequested)
        {
            if (!process.HasExited)
            {
                process.Kill(true);
            }
        }
    }
    if (!process.HasExited)
    {
        // Give the process one last chance to exit normally
        // so we can try to get the exit code.
        process.WaitForExit(1000);
        if (!process.HasExited)
        {
            // We can't get the return code for this process.
            return int.MaxValue;
        }
    }
    Console.WriteLine($"git init exited with {process.ExitCode}");
}

Console.WriteLine("testing complete.");
return 0;

Create the procrepo.csproj project with this content:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net9.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <PublishSingleFile>true</PublishSingleFile>
    <SelfContained>true</SelfContained>
    <RuntimeIdentifiers>osx-arm64</RuntimeIdentifiers>
    <IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
    <PublishReadyToRun>true</PublishReadyToRun>
    <PublishTrimmed>true</PublishTrimmed>
    <EnableCompressionInSingleFile>true</EnableCompressionInSingleFile>
    <DebuggerSupport>false</DebuggerSupport>
    <TrimmerRemoveSymbols>true</TrimmerRemoveSymbols>
    <EnableUnsafeBinaryFormatterSerialization>false</EnableUnsafeBinaryFormatterSerialization>
    <EnableUnsafeUTF7Encoding>false</EnableUnsafeUTF7Encoding>
    <EventSourceSupport>false</EventSourceSupport>
    <HttpActivityPropagationSupport>false</HttpActivityPropagationSupport>
    <InvariantGlobalization>true</InvariantGlobalization>
    <MetadataUpdaterSupport>false</MetadataUpdaterSupport>
    <ShowLinkerSizeComparison>true</ShowLinkerSizeComparison>
  </PropertyGroup>

</Project>

Build the project with:

dotnet publish -c Release -r osx-arm64

Run the process in a loop with to reproduce the crash:

while true; do ./bin/Release/net9.0/osx-arm64/publish/procrepo ; done

Reproduction rate

When I ran the program with this Bash command:

SUCCESS=0
FAILURE=0
for ((i=1;i<=100;i++)); do ./bin/Release/net7.0/osx.11.0-arm64/publish/procrepo; if [ $? -eq 0 ]; then SUCCESS=$[$SUCCESS+1]; else FAILURE=$[$FAILURE+1]; fi; done
echo "Success: $SUCCESS"
echo "Failure: $FAILURE"

the results were that this crash happens 9% of the time.

Expected behavior

The .NET process should not crash with AccessViolationException.

Actual behavior

A crash with a callstack that looks similar to one of the following. It's not consistent, and I have seen callstacks that differ from the ones below, but these are the most common:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].MoveNext(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].ExecuteFromThreadPool(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[[System.Int32, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].get_Task()
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].MoveNext(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Sockets.SafeSocketHandle.SetHandleAndValid(IntPtr)
   at Microsoft.Win32.SafeHandles.SafePipeHandle.CreatePipeSocket(Boolean)
   at System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].MoveNext(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Sockets.Socket.LoadSocketTypeFromHandle(System.Net.Sockets.SafeSocketHandle, System.Net.Sockets.AddressFamily ByRef, System.Net.Sockets.SocketType ByRef, System.Net.Sockets.ProtocolType ByRef, Boolean ByRef, Boolean ByRef, Boolean ByRef)
   at System.Net.Sockets.Socket..ctor(System.Net.Sockets.SafeSocketHandle, Boolean)
   at Microsoft.Win32.SafeHandles.SafePipeHandle.CreatePipeSocket(Boolean)
   at System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].MoveNext(System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()

Regression?

No response

Known Workarounds

Turn off PublishReadyToRun in the project file when targeting macOS.

Configuration

.NET 9.0

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 5, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

@hach-que hach-que changed the title Rare AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when async tasks are continued Potential (~5% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected Feb 5, 2025
@hach-que hach-que changed the title Potential (~5% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected [.NET 9] Potential (~5% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected Feb 5, 2025
@hach-que
Copy link
Contributor Author

hach-que commented Feb 5, 2025

Given that #88288 was incorrectly closed under an assumption that it was resolved, is it worth adding the reproduction case here to an automation test suite to ensure that it is truly fixed? The bug isn't that rare, and this reproduction case should be able to be run on any macOS runner on GitHub Actions.

@hach-que hach-que changed the title [.NET 9] Potential (~5% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected [.NET 9] Potential (~9% chance) AccessViolationException in ReadyToRun executable targeting RID osx-arm64 when running a child process for the first time, and both standard output and standard error are redirected Feb 5, 2025
@jkotas jkotas added area-ReadyToRun-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed area-System.Threading.Tasks labels Feb 5, 2025
@janvorli
Copy link
Member

janvorli commented Feb 5, 2025

I can confirm it repros on my mac M1 device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ReadyToRun-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants