Skip to content

makeclusters() in DataClusters class undefined #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stevenhua0320 opened this issue Aug 13, 2024 · 5 comments
Open

makeclusters() in DataClusters class undefined #64

stevenhua0320 opened this issue Aug 13, 2024 · 5 comments

Comments

@stevenhua0320
Copy link
Contributor

No description provided.

@sbillinge
Copy link
Contributor

Please can you add more information on this issue so another person could read it and know what to do? Thx

@stevenhua0320
Copy link
Contributor Author

This function is a empty function and needs refactor. It needs to based on DataClusters object's data points and resolution to make clusters.

@Ainamacar
Copy link
Contributor

This function only looks empty, it is actually foundational to the existing implementation. I'll summarize the current behavior and how it could be changed, if you wish to do so.

DataClusters is an iterable because it defines the __iter__ method. Python defines that the next() method returns the next member of an iterable. So when makeclusters() is run it is actually calling next() many times, and that is where the bulk of the work is done. That is why next() is never called explicitly -- it is done implicitly in makeclusters(). So the iteration is not over the clusters that have been found, but the process of clustering. I comment in the source that "This behavior could cause confusion and should perhaps be altered." Maybe that time is now?

If you wanted to change this behavior to make the clustering process more explicit (and thus easier to read/maintain), here is a possible approach.

  • Remove __iter__
  • Rename next() to cluster_next_point() or similar
  • Rewrite makeclusters() into a loop that updates current_idx and calls cluster_next_point()
  • Revise cluster_next_point() so it no longer updates current_idx and (probably) no longer returns self.
  • Write some good tests on non-trivial data to make sure nothing has broken.

@sbillinge
Copy link
Contributor

Thanks for the feeback @Ainamacar .

@Ainamacar
Copy link
Contributor

Ainamacar commented Aug 16, 2024

Edit: I moved a lengthy comment to PR #73, where it was more appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants