Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelism causes problems with very large datasets #2

Open
scrollsaw opened this issue Jun 6, 2019 · 1 comment
Open

parallelism causes problems with very large datasets #2

scrollsaw opened this issue Jun 6, 2019 · 1 comment

Comments

@scrollsaw
Copy link

An issue I've noticed when using this on large data sets is that when the size of the input data set gets very large (> 1.5 million rows in my case) SQL Server will set up a plan to run the query using parallelism. This then causes the cluster connections to not be calculated correctly. You only get a fraction of the clusters back. I'm guessing it's because when running in parallel each chunk of the query doesn't know about the others. You can test for this by running the query with larger and larger data sets until SQL makes a parallel plan.

A solution is to just add OPTION (MAXDOP 1) to the query like so:

select dbo.TCC(id1, id2) from dbo.TestData OPTION (MAXDOP 1)

This restricts parallelism and the clusters are then returned correctly.

@yorek
Copy link
Owner

yorek commented Jun 6, 2019

Thanks a lot for reporting this. Parallelism will use the merge method to merge two different result into one. I'll try to run some test as soon as possible to figure out what's not working. In the meantime, thanks for the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants