[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data by VesnaT · Pull Request #1655 · biolab/orange3

VesnaT · 2016-10-12T10:40:02Z

No description provided.

janezd · 2016-10-12T10:47:02Z

I haven't yet checked the content of the PR (obviously), but before we merge it we should perhaps talk about the signal name again. I don't think that the verb "to flag" appears in the dictionary with that meaning. Even the use of word "flag" for some kind of marker is IMHO limited to computer science, so the name "flagged data" wouldn't mean anything to an outsider.

What is wrong with "marked"?

Vesna, sorry if this will require some (hopefully trivial) refactoring.

codecov-io · 2016-10-12T10:47:17Z

Current coverage is 89.38% (diff: 100%)

Merging #1655 into master will increase coverage by <.01%

@@             master      #1655   diff @@
==========================================
  Files            79         79          
  Lines          8589       8593     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7677       7681     +4   
  Misses          912        912          
  Partials          0          0

Powered by Codecov. Last update 4d1ea03...5610750

VesnaT · 2016-10-12T10:55:55Z

No worries.. I've intentionally made this PS 'small' (only four widgets are modified).

janezd · 2016-10-14T14:35:24Z

+    if name not in names:
+        return name
+    counts = [int(re.match(r"(" + name + " )(\d{1,}$)", n).group(2))
+              for n in names if re.match(r"(" + name + " )(\d{1,}$)", n)] + [1]


Something like this:

counts = max((int(mo.group(2)) for mo in re.finditer(r"(" + name + " )(\d{1,}$)", n)), default=0)

Matter of taste: I'd write r"({})(\d{{1,}}$".format(name)

janezd · 2016-10-14T14:54:34Z

        if subsets:
            return self.instances[np.unique(np.hstack(subsets))]

+    def get_indices(self, nodes):


get_instances could call this function. Don't forget to handle the case when get_indices returns None.

janezd · 2016-10-14T16:41:49Z

+        self.assertEqual(len(flagged), len(self.zoo))
+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+    def test_cascade_flagged_tables(self):


I could be too smart by half and replace your function that uses regular expressions with one that uses the first unoccupied name. This test wouldn't fail because there are no "holes". Can you add some code to this test to remove the second meta and then "flag" the table again, so my smart idea would fail?

janezd · 2016-10-14T16:56:16Z

+import numpy as np
+from Orange.data import Table, Domain, DiscreteVariable
+
+FLAGGED_SIGNAL_NAME = "Flagged Data"


Thinking about it again, I started liking the constant because it will indeed make us stick to the same name in all widgets.

The name is not perfect, though. Not only will we change "flagged" to something else, but it also suggests that it is a name of a flagged signal. ANNOTATED_DATA_SIGNAL_NAME = "Data" is better (than ANNOTATED_SIGNAL_NAME) but awfully long. Think about it...

If nothing else, the module belongs within Orange.widgets because it includes a name of the signal, hence it is related to widgets. :)

janezd · 2016-10-14T17:21:08Z

+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+        # select data points
+        points = random.sample(range(0, len(self.iris)), 20)


I prefer deterministic tests. Randomly choose some indices instead of choosing some indices at random. (https://xkcd.com/221/).

janezd · 2016-10-14T17:23:43Z

+
+        # check selected data output
+        selected = self.get_output("Data")
+        self.assertEqual(len(selected), len(points))


What about testing that the correct instances were chosen? There may be a better way to do it, but I somewhere used something like np.testing.assert_almost_equal(selected.X, self.iris.X[points]).

janezd · 2016-10-14T17:25:56Z

+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+        # select data points
+        points = random.sample(range(0, len(self.iris)), 20)


Same as in DistanceMap.

janezd · 2016-10-14T17:43:56Z

Github is showing some of my comments as outdated, although you haven't made any further commits. Please check those, too.

Apart from these minor suggestions, I like the PR, in particular your factoring out of the parts of the tests. It would be even greater if you could simulate, say, selection action, but I know this is probably too hard.

Please tell me/us when you make the changes, so we don't wait too long with merging, since rebasing dozens of widgets is not that much fun.

VesnaT · 2016-10-17T07:42:57Z

The comments are outdated because the code was moved to a Mixin in one of the following commits (441b15a), since it was very similar for all widgets. I could have rebased, but wanted to keep the code in case someone didn't like the Mixin idea.
I thought I did simulate the selection...

Since there are only clusters in Selected Data, 'Other' should be removed from its domain. The value is still present in Flagged Data domain.

VesnaT · 2016-10-18T09:31:49Z

Done.

janezd · 2016-10-18T18:20:36Z

+    domain = Domain(data.domain.attributes, data.domain.class_vars, metas)
+    annotated = np.zeros((len(data), 1))
+    if selected_indices is not None:
+        annotated[selected_indices] = 1


Should this be 0 or 1? If nothing is selected, all instances have to have Selected=No, no?

Should this be 0 or 1? If nothing is selected, all instances have to have Selected=No, no?

I'm stupid. Please ignore.

janezd · 2016-10-18T18:33:08Z

+        selected = [i for i, t in enumerate(zip(
+            self.widget.results.actual, self.widget.results.predicted[0]))
+                    if t in indices]
+        self.selected_indices = self.widget.results.row_indices[selected]


Would it be nicer if _select_data returned selected_indices instead of (ab?)using the instance's attributes for semi-global data storage?

janezd · 2016-10-18T18:43:25Z

+        self.same_input_output_domain = True
+        self.selected_indices = []
+
+    def test_outputs(self):


I like the way you factor out the tests.

janezd · 2016-10-18T18:46:58Z

+    def _compare_selected_annotated_domains(self, selected, annotated):
+        selected_vars = selected.domain.variables + selected.domain.metas
+        annotated_vars = annotated.domain.variables + annotated.domain.metas
+        self.assertTrue(all((var in annotated_vars for var in selected_vars)))


This tests whether annotated.domain.variables + annotated.domain.metas are a subset (<=) of selected.domain.variables + selected.domain.metas. Doing it explicitly, using sets, would be more obvious and easier to read, I guess.

janezd · 2016-10-18T19:00:07Z

        if not selected_indices:
            self.send("Selected Data", None)
-            self.send("Other Data", None)
+            annotated_data = create_annotated_table(items, selected_indices) \


Not really an issue, but since you're going to make another commit anyway, can you replace selected_indices with [], so that it's obvious this call will always select all (or no) instances.

janezd · 2016-10-18T19:03:37Z

-                unselected_data = data[~mask]
+                if self.append_clusters:
+                    def remove_other_value(vars_):
+                        vars_ = [var for var in vars_]


Why not copy?

janezd

I went through all changes and widgets. Since none of my suggestions are substantial, I'm approving the request, but you can still follow them if you decide so.

VesnaT · 2016-10-19T08:55:28Z

I will fix the suggested in the next PR.

VesnaT changed the title ~~[ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [WIP][ENH] Scatterplot and Unsupervised widgets: Output Flagged Data Oct 12, 2016

VesnaT force-pushed the flagged_data branch from a05f360 to 924dde3 Compare October 13, 2016 08:42

VesnaT changed the title ~~[WIP][ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot and Unsupervised widgets: Output Flagged Data Oct 13, 2016

VesnaT force-pushed the flagged_data branch 4 times, most recently from cd69ccc to cafa07c Compare October 14, 2016 12:38

VesnaT changed the title ~~[ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT force-pushed the flagged_data branch from cafa07c to 7dc0295 Compare October 14, 2016 13:10

VesnaT changed the title ~~[ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT changed the title ~~[ENH] Scatterplot, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT force-pushed the flagged_data branch from ceb2a5e to b232484 Compare October 14, 2016 14:21

VesnaT changed the title ~~[ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

janezd reviewed Oct 14, 2016

View reviewed changes

VesnaT changed the title ~~[ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 17, 2016

VesnaT force-pushed the flagged_data branch from b232484 to 8246eb9 Compare October 17, 2016 15:20

VesnaT added 3 commits October 18, 2016 09:53

misc: Add a module for 'Flagged Data' creation

6e29391

OWScatterPlot: Output Flagged Data

eafb96a

OWScatterPlot: Remove Other Data output

65b7525

VesnaT added 3 commits October 18, 2016 09:53

OWScatterPlot: Refactor send_data()

ac9c422

OWHierarchicalClustering: Output Flagged Data

f62e4e9

OWHierarchicalClustering: Remove Other Data output

dd266ba

VesnaT force-pushed the flagged_data branch from 8246eb9 to 24405e0 Compare October 18, 2016 08:00

VesnaT added 5 commits October 18, 2016 10:17

OWHierarchicalClustering: Set Outputs to None when data is removed

762d1d4

OWHierarchicalClustering: Remove 'Other' value from Cluster variable

b2553ab

Since there are only clusters in Selected Data, 'Other' should be removed from its domain. The value is still present in Flagged Data domain.

OWDistanceMap: Output Flagged Data

8e422a0

OWDistanceMap: Rename Data to Selected Data

793e81f

OWMDS: Output Flagged Data instead of Data

83b7bfd

VesnaT force-pushed the flagged_data branch from 24405e0 to 050aa33 Compare October 18, 2016 08:17

VesnaT added 4 commits October 18, 2016 11:19

Unittests: Refactoring

69ceb53

OWConfusionMatrix: Output Flagged Data

deb8585

OWTreeGraph: Output Flagged Data

4435821

OWHeatMap: Output Flagged Data

5610750

VesnaT force-pushed the flagged_data branch from 050aa33 to 5610750 Compare October 18, 2016 09:19

janezd reviewed Oct 18, 2016

View reviewed changes

janezd approved these changes Oct 18, 2016

View reviewed changes

janezd mentioned this pull request Oct 18, 2016

[ENH] Canvas: Always show the link dialog if the user holds Shift #1673

Merged

astaric merged commit caa0ff2 into biolab:master Oct 19, 2016

nikicc mentioned this pull request Nov 14, 2016

OWConfusionMatrix: Fix predicitons order #1751

Closed

3 tasks

astaric modified the milestone: 3.3.9 Nov 28, 2016

Uh oh!

Conversation

VesnaT commented Oct 12, 2016

Uh oh!

janezd commented Oct 12, 2016

Uh oh!

codecov-io commented Oct 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 89.38% (diff: 100%)

Uh oh!

VesnaT commented Oct 12, 2016

Uh oh!

janezd Oct 14, 2016 • edited by kernc Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janezd commented Oct 14, 2016

Uh oh!

VesnaT commented Oct 17, 2016

Uh oh!

VesnaT commented Oct 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janezd left a comment

Choose a reason for hiding this comment

Uh oh!

VesnaT commented Oct 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-io commented Oct 12, 2016 •

edited

Loading

janezd Oct 14, 2016 •

edited by kernc

Loading