Skip to content

Releases: dotnet/orleans

v9.1.2

13 Feb 17:55
6905fa9
Compare
Choose a tag to compare

What's Changed

  • [Test] Skip flaky Rem_Azure_Basic test by @ReubenBond in #9345
  • Use UTC DateTime instead of DateTimeOffset in IAmAliveTime comparison by @ReubenBond in #9341
  • ADO.NET: Treat DateTime values as UTC by @ReubenBond in #9342
  • Provide more descriptive error messages in RuntimeTypeNameParser by @ReubenBond in #9343
  • Remove restriction limiting reminders to a maximum of 49 days by @willg1983 in #9319
  • Do not set cached hash code from GetConsistentHashCode(int) by @ReubenBond in #9349

Full Changelog: v9.1.1...v9.1.2

v9.1.1

11 Feb 16:27
7d14734
Compare
Choose a tag to compare

What's Changed

Full Changelog: v9.1.0...v9.1.1

v9.1.0

11 Feb 01:38
d2af146
Compare
Choose a tag to compare

What's Changed

  • Add dependabot configuration for .NET SDK updates by @JamieMagee in #9249
  • Ignore semver major dotnet sdk updates by @JamieMagee in #9252
  • Add optional default TTL to Cassandra clustering by @dmorganMsft in #9221
  • Correcting exception message by @rkargMsft in #9262
  • Use Activator.CreateInstance in CosmosGrainStorage when record does not exist by @egil in #9277
  • Explicitly target net8.0 for tests by @rkargMsft in #9278
  • Disable warning in TestClusterHostFactory by @benjaminpetit in #9284
  • Reference analyzers & code generator from all packages by @ReubenBond in #9294
  • Bump dotnet-sdk from 8.0.101 to 8.0.405 by @dependabot in #9283
  • Fix CorrelationId generation in MessageFactory by @ReubenBond in #9300
  • Ensure that exceptions propagated to Connection's Initialized task is observed by @d-jagoda in #9280
  • Add optional Methods w/ CancellationToken to IStorage by @Chris-Eckhardt in #9279
  • Route client observer messages through the local gateway when possible by @fanxiao92 in #9310
  • [Membership] Preserve latest IAmAliveTime across updates by @ReubenBond in #9303
  • [Membership] Monitor all stale silos by @ReubenBond in #9304
  • [Membership] When indirect probe fails, include intermediary's vote in suspecter list by @ReubenBond in #9302
  • [Membership] Use an expander graph to improve eviction speed when multiple hosts fail simultaneously by @ReubenBond in #9301
  • [Membership] Reduce default 'stale' silo detection time from 10min to 90s by @ReubenBond in #9305
  • Improve CancellationTokenSource hygiene and use preferred LINQ & StringBuilder methods by @ivanvyd in #9080
  • [Membership] Monitor 10 silos by default instead of 3 by @ReubenBond in #9306
  • [Membership] Ignore stale silos when calculating vote count requirements by @ReubenBond in #9307
  • Add Microsoft.Extensions.Configuration provider for azure queue streaming by @tskimmett in #8929
  • Use AssemblyLoadContext consistently when loading referenced assemblies by @ReubenBond in #8492
  • Add default OnCompletedAsync implementations for IAsyncObserver and IAsyncBatchObserver by @bluexo in #8783
  • Add static modifier to members when possible, private and internal only (no public API change) by @cybertyche in #8534
  • [Membership] Limit ValidateInitialConnectivity to MaxJoinAttemptTime by @ReubenBond in #9312
  • Enable MerlinBot autobaselining for GitHub by @ReubenBond in #9313
  • Additional logic for Silo Cleanup test by @rkargMsft in #9314
  • [Samples] Link to F# Grain Service sample by @HamzaFarooq95 in #9315
  • [Test] Fix URI passed to Azure Queue Streaming tests by @ReubenBond in #9317
  • Suppress status updates for local-only messages by @ReubenBond in #9321
  • Fix Azure Queue Streaming IConfiguration provider by @ReubenBond in #9320
  • PersistentStreamPullingAgent: suppress execution context in message pump by @ReubenBond in #9322
  • Fix skipping in AQStreamingTests by @ReubenBond in #9325
  • [Test] Fix AQStreamingTests cleanup by @ReubenBond in #9327
  • Terminate ClusterManifestProvider quickly during shutdown by @ReubenBond in #9329
  • Flow CancellationToken more appropriately in lifecycle by @ReubenBond in #9330
  • Improve graceful shutdown of ClientDirectory by @ReubenBond in #9333
  • Avoid accessing IServiceProvider after disposal in Catalog and MembershipGossiper by @ReubenBond in #9334
  • [Membership] Ignore stale silos when picking a probing intermediary by @ReubenBond in #9335
  • Enable GitHub Actions test logger by @ReubenBond in #9331
  • Fix skipping events on resume handshake by @benjaminpetit in #9336

New Contributors

Full Changelog: v9.0.1...v9.1.0

v9.0.1

23 Nov 17:30
b21af24
Compare
Choose a tag to compare

What's Changed

  • Change EventCounterIntervalSec to 1 sec to fix issue caused in dotnet-counters by @ntovas in #9235
  • Downgrade dependencies to .NET 8.0 group by @ReubenBond in #9246

Full Changelog: v9.0.0...v9.0.1

v9.0.0

14 Nov 19:07
cea079a
Compare
Choose a tag to compare

What's Changed since v8.2.0

New Contributors

Full Changelog: v8.2.0...v9.0.0

v7.2.7

15 Oct 23:00
55f292f
Compare
Choose a tag to compare

What's Changed

  • Fix potential grain timer deadlock during disposal by @ReubenBond in #8951
  • [7.x] Cherry-picked commits from [main] by @ReubenBond in #8995
    • Ensure reminder table is initialized before access (#8982)
    • Update Npgsql (#8994)
    • Fix behavior of DictionaryBaseCodec when values are added from constructor (#8993)
  • [7.x] Fix build + signing by @ReubenBond in #9174
  • Log argument types instead of values by @ReubenBond in #9177
  • [7.x] Azure DevOps: upload logs, blame/crash dumps, and publish to nuget by @ReubenBond in #9180
  • [7.x] Fix SourceLink repository by @ReubenBond in #9182

Full Changelog: v7.2.6...v7.2.7

v8.2.0

12 Jul 21:54
92e0bf3
Compare
Choose a tag to compare

New features

Activation repartitioning

ActivationRepartitioning.mp4

Above: a demonstration showing Activation Repartitioning in action. The red lines represent cross-silo communication. As the red lines are eliminated by the partitioning algorithm, throughput improves to over 2x the initial throughput.

Ledjon Behluli and @ReubenBond implemented activation repartitioning in #8877. When enabled, activation repartitioning collocates grains based on observed communication patterns to improve performance while keeping load balanced across your cluster. In initial benchmarks, we observe throughput improvements in the range of 30% to 110%. The following paragraphs provide more background and implementation details for those who are interested. The feature is currently experimental and to enable it you need to opt-in on every silo in your cluster using the ISiloBuilder.AddActivationRepartitioner() extension method, suppressing the experimental feature warning:

#pragma warning disable ORLEANSEXP001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
siloBuilder.AddActivationRepartitioner();
#pragma warning restore ORLEANSEXP001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.

The fastest and cheapest grains calls are ones which don't cross process boundaries. These grain calls do not need to be serialized and do not need to incur network transmission costs. For that reason, collocating related grains within the same host can significantly improve the performance of your application. On the other hand, if all grains were placed in a single host, that host may become overloaded and crash, and you would not be able to scale your application across multiple hosts. How can we maximize collocation of related grains while keeping load across your hosts balanced? Before describing our solution, we need to provide some background.

Grain placement in Orleans is flexible: Orleans executes a user-defined function when deciding where in a cluster to place each grain, providing your function with a list of the compatible silos in your cluster, that is, the silos which support the grain type and interface version which triggered placement. Grains calls are location-transparent, so callers do not need to know where a grain is located, allowing grains to be placed anywhere across your cluster of hosts. Each grain's current location is stored in a distributed directory and lookups to the directory are cached for performance.

Resource-optimized placement was implemented by @ledjon-behluli in #8815. Resource-optimized placement uses runtime statistics such as total and available memory, CPU usage, and grain count, collected from all hosts in the cluster, smooths them, and combines them to calculate a load score. It selects the least-loaded silo from a subset of hosts to balance load evenly across the cluster[^4]. If the load score of the local silo is within some configured range of the best candidate's load score, the local silo is chosen preferentially. This improves grain locality by leveraging the knowledge that the local silo initiated a call to the grain and therefore has some relation to that grain.
Ledjon wrote more about Resource-optimized placement in this blog post.

Originally, there was no straightforward way to move an active grain from one host to another without needing to fully deactivate the grain, unregister it from the grain directory, contend with concurrent callers on where to place the new activation, and reload its state from the database when the new activation is created. Live grain migration was introduced in #8452, allowing grains to transparently migrate from one silo to another on-demand without needing to reload state from the database, and without affecting pending requests. Live grain migration introduced two new lifecycle stages: dehydration and rehydration. The grain's in-memory state (application state, enqueued messages, metadata) is dehydrated into a migration packet which is sent to the destination silo where it's rehydrated. Live grain migration provided the mechanism for grains to migrate across hosts, but did not provide any out-of-the-box policies to automate migration. Users trigger grain migration by calling this.MigrateOnIdle() from within a grain, optionally providing a placement hint which the grain's configured placement director can use to select a destination host for the grain activation.

Finally, we have the pieces in place for activation repartitioning: grain activations are load-balanced across the cluster, and they are able to migrate from host to host quickly. While live grain migration gives developers a mechanism to migrate grain activations from one host to another, it does not provide any automated policy to do so. Remember, we want grains to be balanced across the cluster and collocated with related grains to reduce networking and serialization cost. This is a difficult challenge since:

  • An application can have millions of in-memory grains spread across tens or hundreds of silos.
  • Each grain can message any other grain.
  • The set of grains which each grain communicates with can change from minute to minute. For example, in an online game, player grains may join one match and communicate with each other for some time and then join a different match with an entirely different set of players afterwards.
  • Computing the minimum edge-cut for an arbitrary graph is NP-hard.
  • No single host has full knowledge of which grains are hosted on which other host and which grains they communicate with: the graph is distributed across the cluster and changes dynamically.
  • Storing the entire communication graph in memory could be prohibitively expensive.

Folks at Microsoft Research studied this problem and proposed a solution in a paper titled Optimizing Distributed Actor Systems for Dynamic Interactive Services. The paper, dubbed ActOp, proposes a decentralized approximate solution which achieves good results in their benchmarks. Their implementation was never merged into Orleans and we were unable to find the original implementation on Microsoft's internal network. So, after first implementing resource-optimized placement, community contributor @ledjon-behluli set out to implement activation repartitioning from scratch based on the ActOp paper. The following paragraphs describe the algorithm and the enhancements we made along the way.

The activation repartitioning algorithm involves pair-wise exchange of grains between two hosts at a time. Silos compute a candidate set of grains to send to a peer, then the peer does similarly, and uses a greedy algorithm to determine a final exchange set which minimizes cost while keeping silos balanced.

To compute the candidate sets, silos track which grains communicate with which other grains and how frequently. The whole graph would be unwieldy, so we only maintain the top-K communication edges using a variant of the Space-Saving[^1] algorithm. Messages are sampled via a multi-producer, single consumer ring buffer which drops messages if the partition is full. They are then processed by a single thread, which yields frequently to give other threads CPU time. When the distribution has low skew and the K parameter is fairly small, Space-Saving can require a lot of costly shuffling at the bottom of its max-heap (we use the heap variant to reduce memory). To address this, we use Filtered Space-Saving[^2] instead of Space-Saving. Filtered Space-Saving involves putting a 'sketch' data structure at the bottom of the max heap for the lower end of the distribution, which can greatly reduce churn at the bottom and improve performance by up to ~2x in our tests.

If the top-K communication edges are all internal (eg, because the algorithm has already optimized partitioning somewhat), silos won't find many good transfer candidates. We need to track internal edges to work out which grains should/shouldn't be transferred (cost vs benefit). To address this, we introduced a bloom filter to track grains where the cost of movement is greater than the benefit, removing them from the top-K data structure. From our experiments, this works very well with even a 10x smaller K. This performance improvement will come with a reduced ability to handle dynamic graphs, so in the future we may need to implement a decay strategy to address this as the bloom filter becomes saturated. To improve lookup performance, @ledjon-behluli implemented a blocked bloom filter[^3], which is used inste...

Read more

v8.2.0-preview1

22 May 23:40
71ea69e
Compare
Choose a tag to compare
v8.2.0-preview1 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v8.1.0...v8.2.0-preview1

v3.7.2

10 May 18:45
b24e446
Compare
Choose a tag to compare

What's Changed

  • [3.x] Fix directory/cache validation for defunct silos by @ReubenBond in #8498
  • [3.x] Fix potential grain timer deadlock during disposal by @ReubenBond in #8949
  • [3.x] Ensure reminder service is initialized before access by @ReubenBond in #8983

Full Changelog: v3.7.1...v3.7.2

v8.1.0

17 Apr 16:29
deda4ba
Compare
Choose a tag to compare

New features

Integration with Aspire

This release includes initial integration with .NET Aspire, allowing you to configure an Orleans cluster in your Aspire app host, specifying the resources the cluster uses. For example, you can specify that an Azure Table will be used for cluster membership, an Azure Redis resource will be used for the grain directory, and an Azure Blob Storage resource will be used to store grain state. The integration currently supports Redis and Azure Table & Blob storage resources. Support for other resources will be added later.

In the app host project, an Orleans cluster can be declared using the AddOrleans method, and then configured with clustering, grain storage, grain directory, and other providers using methods on the returned builder:

var storage = builder.AddAzureStorage("storage");
var clusteringTable = storage.AddTables("clustering");
var defaultStorage = storage.AddBlobs("grainstate");
var cartStorage = builder.AddRedis("redis-cart");

var orleans = builder.AddOrleans("my-app")
                     .WithClustering(clusteringTable)
                     .WithGrainStorage("Default", grainStorage)
                     .WithGrainStorage("cart", cartStorage);

// Add a server project (also called "silo")
builder.AddProject<Projects.OrleansServer>("silo")
       .WithReference(orleans);

// Add a project with a reference to the Orleans client
builder.AddProject<Projects.FrontEnd>("frontend")
       .WithReference(orleans);

In the client and server projects, add Orleans to the host builder as usual.

// For an Orleans server:
builder.UseOrleans();

// Or, for an Orleans client:
builder.UseOrleansClient();

Orleans will read configuration created by your Aspire app host project and configure the providers specified therein. To allow Orleans to access the configured resources, add them as keyed services using the corresponding Aspire component:

builder.AddKeyedAzureTableService("clustering");
builder.AddKeyedAzureBlobService("grainstate");
builder.AddKeyedRedis("redis-cart");

Resource-optimized placement

Resource-optimized placement, enabled via the [ResourceOptimizedPlacement] attribute on a grain class, balances grains across hosts based on available memory and CPU usage. For more details, see the PR: #8815.

What's Changed

Since 8.1.0-preview3

Additional changes since 8.0.0

New Contributors

Full Changelog: v8.0.0...v8.1.0