Solve universe target issues #371

antoniocarlon · 2017-11-09T15:49:02Z

Restored universe target
Restored universe target for median and average aggregated measurements
Added a check to avoid having median and average aggregated measurements without universe target
Fixing other minor issues

Fixes #326

Note for the Acceptance: BLS (US), Australia and Canada (NHS and census) had some median and average aggregated measurements without universe target

javitonino

LGTM.

Questions:

What do you mean by "Note for the Acceptance: BLS (US) and Canada (NHS and census) had some median and average aggregated measurements without universe target"? I guess that "had" means that it's fixed in this PR?
I don't think we should deploy a dump with universes until we have the extensions changes ready.

javitonino · 2017-11-10T11:22:51Z

tasks/util.py

+                                 "WHERE t.tablename = '{table}' "
+                                 "AND c.aggregate IN ('average', 'median') "
+                                 "AND (reltype IS NULL OR reltype <> 'universe')".format(
+                                     table=self.output()._tablename)).fetchall()


I think this query also returns an error if an average/median column has a denominator. Does this case make sense?

I think it does (dividing an average of a class over the average of the total population, e.g: average income of engineers / average income).

Currently, there isn't any case like this in the ETL but it makes sense, having a median/average with both a universe and a denominator shouldn't trigger the check.

javitonino · 2017-11-10T11:24:45Z

tasks/util.py

@@ -1665,6 +1669,23 @@ def check_null_columns(self):
            raise ValueError('The following columns of the table "{table}" contain only NULL values: {columns}'.format(
                table=self.output().table, columns=', '.join([x[0] for x in result])))

+    def check_universe_in_aggregations(self):


Now that I see test changes, why don't we try to add a test for this check?

Added test forcing the ValueError to raise when there is a median or an average aggregation without universe denominator

antoniocarlon · 2017-11-14T10:54:51Z

This PR should fix all the median and average aggregated measurements without universe target. The note for the acceptance tries to help about what to test
There's no need for changes in the extension as I have updated the OBS_Meta generation (also, see the comments here)

antoniocarlon · 2017-11-14T11:43:06Z

tasks/carto.py

+        AND NOT EXISTS (
+          SELECT 1 FROM agg_wo_universe u
+          WHERE u.id = numer_c.id
+        )


I have also tested the possible loss of performance of adding this check and it's insignificant

Nice to see. Not sure if we want to keep this in this query as it is redundant with the tasks check. My concern is not performance but readability, my head almost exploded reading this. If you prefer to keep it, let's add some comments, at least.

javitonino · 2017-11-16T09:54:56Z

tasks/carto.py

+        AND NOT EXISTS (
+          SELECT 1 FROM agg_wo_universe u
+          WHERE u.id = numer_c.id
+        )


Nice to see. Not sure if we want to keep this in this query as it is redundant with the tasks check. My concern is not performance but readability, my head almost exploded reading this. If you prefer to keep it, let's add some comments, at least.

javitonino · 2017-11-16T09:59:08Z

tasks/carto.py

@@ -490,6 +500,10 @@ class OBSMeta(Task):
        AND numer_ctag.column_id = numer_c.id
        AND numer_ctag.tag_id = numer_tag.id
        AND numer_c.id = leftjoined_denoms.all_numer_id
+        AND NOT EXISTS (


We may want to add a check to remove the non-denominated versions of the universe numerators:

AND NOT (numer_agg IN ('median', 'average') AND denom_id IS NULL) or something along those lines.

Edit: There are no NULL denom_id entries in the obs_meta table (except if a column has no denominators at all). This idea doesn't work.

javitonino · 2017-11-16T09:59:37Z

tasks/util.py

+                                 "WHERE t.tablename = '{table}' "
+                                 "AND c.aggregate IN ('average', 'median') "
+                                 "GROUP BY 1, 2 "
+                                 "HAVING LOWER(STRING_AGG(COALESCE(cc.reltype,''), ',')) NOT LIKE '%universe%'".format(


You could make this prettier with ARRAY_AGG and ANY

javitonino

Cool, let's go test it!

javitonino · 2017-11-17T13:08:31Z

Like a charm, merging.

Small issues:

au.data is very slow generating columns due to the yields. It seems a return+list comprehension is faster
I manually reduced the version for some columns in order to reimport, since this was missing some obscure dumps (eg: ACS quantiles need to be bumped in order to be able to easily run the normal columns).

Also, I run into the slow meta generation again, fixed it with an analyze. See #311

Antonio added 9 commits November 8, 2017 11:18

Added universe

93ec6b0

Restoring universes

90e7705

Py3 and minor fixes

9d23c93

Added check for medians/averages without universe target

b1e12b6

Added missing universe targets

ab7c107

Merge branch 'master' into 326_Bring_back_the_universe

c338557

Restored UNIVERSE import

4ee9f75

Added UNIVERSE target to tests

388a9bc

Fixed tests

19645d6

CartoDB deleted a comment from houndci-bot Nov 10, 2017

javitonino reviewed Nov 10, 2017

View reviewed changes

Antonio added 7 commits November 13, 2017 10:27

Test added

c960854

Fixed check universe query

2fbd770

Merge branch 'master' into 326_Bring_back_the_universe

731fb35

Fixed averages medians without universe target in AU

1d4ea73

Remove unused requirement for AU

0688532

Avoid averages/medians without universe target in OBS_Meta

074dec0

Merge branch 'master' into 326_Bring_back_the_universe

44be029

antoniocarlon commented Nov 14, 2017

View reviewed changes

javitonino reviewed Nov 16, 2017

View reviewed changes

Antonio added 3 commits November 17, 2017 09:16

Solving merge conflicts

5ebd9c7

Removed redundant check

f5b8a9e

Added ARRAY_AGG + ANY

d3ba65b

javitonino approved these changes Nov 17, 2017

View reviewed changes

javitonino merged commit 7ab1c37 into master Nov 17, 2017

javitonino deleted the 326_Bring_back_the_universe branch November 17, 2017 13:08

antoniocarlon mentioned this pull request Nov 28, 2017

Fix universe numerators #389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve universe target issues #371

Solve universe target issues #371

antoniocarlon commented Nov 9, 2017 •

edited

Loading

javitonino left a comment

javitonino Nov 10, 2017

antoniocarlon Nov 13, 2017 •

edited

Loading

javitonino Nov 10, 2017

antoniocarlon Nov 13, 2017

antoniocarlon commented Nov 14, 2017

antoniocarlon Nov 14, 2017

javitonino Nov 16, 2017

javitonino Nov 16, 2017

javitonino Nov 16, 2017

javitonino Nov 16, 2017

javitonino left a comment

javitonino commented Nov 17, 2017 •

edited

Loading

Solve universe target issues #371

Solve universe target issues #371

Conversation

antoniocarlon commented Nov 9, 2017 • edited Loading

javitonino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniocarlon Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniocarlon commented Nov 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javitonino left a comment

Choose a reason for hiding this comment

javitonino commented Nov 17, 2017 • edited Loading

antoniocarlon commented Nov 9, 2017 •

edited

Loading

antoniocarlon Nov 13, 2017 •

edited

Loading

javitonino commented Nov 17, 2017 •

edited

Loading