Skip to content

[FIX] Statistics.util.stats: Fix negative #nans for sparse#1659

Merged
lanzagar merged 1 commit into
biolab:masterfrom
nikicc:sparse-fix-nans-counting
Oct 17, 2016
Merged

[FIX] Statistics.util.stats: Fix negative #nans for sparse#1659
lanzagar merged 1 commit into
biolab:masterfrom
nikicc:sparse-fix-nans-counting

Conversation

@nikicc
Copy link
Copy Markdown
Contributor

@nikicc nikicc commented Oct 13, 2016

Computing the number of nans for sparse matrices was broken and sometimes returned negative numbers.

Variable non_zero contains the number of defined values in each column. So to calculate the number of undefined (nans) for each column we have to substract non_zero from the number of values in the column i.e. X.shape[0] and not X.shape[1]. The bug was noticed when the number of defined values in some column was larger than the number of features and consequenlty the number of nans was negative. This caused density in OWTable to exceeded 100%.

screen shot 2016-10-13 at 11 55 57

Computing the number of nans for sparse matrices was broken and sometimes returned negative numbers.

Variable `non_zero` contains the number of defined values in each column. So to calculate the number of undefined (nans) for each column we have to substract `non_zero` from the number of values in the column i.e. `X.shape[0]` and not `X.shape[1]`. The bug was noticed when the number of defined values in some column was larger than the number of features and consequenlty the number of nans was negative. This caused density in OWTable to exceeded 100%.
@nikicc nikicc changed the title Statistics.util.stats: Fix negative #nans in sparse [FIX] Statistics.util.stats: Fix negative #nans for sparse Oct 13, 2016
@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 13, 2016

Current coverage is 88.73% (diff: 100%)

Merging #1659 into master will not change coverage

@@             master      #1659   diff @@
==========================================
  Files            78         78          
  Lines          8150       8150          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           7232       7232          
  Misses          918        918          
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 282b1b0...7b35830

Copy link
Copy Markdown
Contributor

@lanzagar lanzagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the wrong computation was unnoticed, that probably means there is no test for nan counting. Could you add a simple one?

@nikicc
Copy link
Copy Markdown
Contributor Author

nikicc commented Oct 17, 2016

@lanzagar IMHO we already have a test for this — the one that I corrected in this PR — just that it was incorrect.

For sparse data "nans" are actually values not present in sparse matrix and the test_stats_sparse already contain those since csr_matrix does not store zeros.

@lanzagar
Copy link
Copy Markdown
Contributor

You are right, the (not very clear) test that was already there (but broken) does test this too. I guess it is enough to cover the basics now that it is fixed.

@lanzagar lanzagar merged commit 20e7e66 into biolab:master Oct 17, 2016
@nikicc nikicc deleted the sparse-fix-nans-counting branch October 17, 2016 14:28
@astaric astaric modified the milestone: 3.3.9 Nov 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants