Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkInsertAsync<T> Iterating IEnumerable<T> source multiple times. #1694

Open
dkal-afk opened this issue Mar 2, 2025 · 0 comments
Open

BulkInsertAsync<T> Iterating IEnumerable<T> source multiple times. #1694

dkal-afk opened this issue Mar 2, 2025 · 0 comments

Comments

@dkal-afk
Copy link

dkal-afk commented Mar 2, 2025

Hi Boris (and Team)

Loving this library!

However, ran into this today.

Issue: public static Task BulkInsertAsync<T>(this DbContext context, IEnumerable<T> entities, BulkConfig? bulkConfig = null, Action<decimal>? progress = null, Type? type = null, CancellationToken cancellationToken = default) where T : class Iterates over IEnumerable<T> entities multiple times

Expected: Iterates over IEnumerable<T> only once.

Actual: Appears to Iterate over the IEnumerable multiple times (number of iterations grows with the number of items provided by the IEnumerable<T> (see test output for iteration and item creation counts)

Effect: Lazily Loaded IEnumerable<T> sources (i.e. code using yield return) has unanticipated effects:

  • The source is read multiple times, resulting in more objects than necessary being instantiated.
  • If used in conjunction with Guid.NewGuid() multiple copies of the same data are instantiated with different keys. this can result in subsequent insertions raising Referential Integrity Errors. (not depicted in test)

Remarks:

  • Looks like this was introduced in commit: 27407aae3e24ee337fc916bd1d653045c54cb066
    • Signature was changed from IList<T> to IEnumerable<T> but IEnumerable<T> was treated as an IList<T> of a known length rather than an IEnumerable<T> source of unknown length.
  • Code path at multiple points:
    • Attempts to get the length of the IEnumerable<T> (.Count())
    • Checks to see if there are any records (.Any())
    • Assigned the entirety of the IEnumerable<T> to memory (.ToList())

Notes:

  • Possible relation to 1562

Work Around:

  • Read everything to memory and pass a List<T> note: Memory Intensive with large data sets.

Test Outputs:

  • 1 Item
    • Iterations: 8
    • Items Generated: 8
  • 10 Items
    • Iterations: 17
    • Items Generated: 144
  • 100 Items
    • Iterations: 107
    • Items Generated: 10,404
  • 1000 Items
    • Iterations: 1,007
    • Items Generated: 1,004,004
  • 10000 Items
    • Iterations: 10,007
    • Items Generated: 100,040,004

Test Reproduction (Should drop into the EfCore.BulkExtensions.Tests.csproj)

public class EnumerationsTests(ITestOutputHelper helper)
{
    private delegate T Factory<out T>(int i);

    [Theory(DisplayName = "Enumerates IEnumerable<T> only once.")]
    [MemberData(nameof(ForEnumeratesOnlyOnce))]
    public async Task EnumeratesOnlyOnce(SqlType type, int itemsCount)
    {
        var enumerationHits = 0;
        var itemHits = 0;

        // Simulate Raw data being reconstituted
        Item Factory(int i)
        {
            itemHits += 1;
            return new Item(i, $"Item {i:D5}", "Description", 1, null, null, new List<ItemHistory>(0));
        }
        
        // Simulate reading records from an unknown source and reconstituting them to objects (Factory().
        IEnumerable<Item> Generator(int total, Factory<Item> factory)
        {
            enumerationHits += 1;

            for (var i = 0; i < total; i++)
            {
                yield return factory(i);
            }
        }



        await using var context = new TestContext(type);

        await context.BulkInsertAsync(Generator(itemsCount,  Factory));

        helper.WriteLine("IEnumerable was enumerated {0} times", enumerationHits);
        helper.WriteLine("Generated {0} items", itemHits);
        Assert.Equal(1, enumerationHits);
        Assert.Equal(itemsCount, itemHits);
    }

    public static IEnumerable<object[]> ForEnumeratesOnlyOnce()
    {
        SqlType[] types = [SqlType.Sqlite];
        int[] itemCounts = [1, 10, 100, 1000, 10000];

        foreach (var sqlType in types)
        {
            foreach (var itemCount in itemCounts)
            {
                yield return [sqlType, itemCount];
            }
        }
    }
}

Suggestions for fix:
Note: Apologies, I haven't got the capacity to provide a PR with fixes at the moment.

  • Ensure the IEnumerable<T> is iterated once.
  • Refactor any code that pre-emptively checks for the length of the IEnumerable<T> to count during execution.
  • Don't check for .Any() against the IEnumerable<T> instead optimistically execute the insertion, finishing if there are no Ts
  • If you can, avoid calling .ToList() on the IEnumerable<T>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant