-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add groupings column to leaderboard_cube
When players have never (for example) played in a league match, the cube groupings with league included will have NULLs in the league column, just like the groupings without league. This will cause these players' stats to be overcounted when not filtering by league. To prevent this, add a groupings column so we can know for sure which rows we should include in the results. In addition to fixing correctness problems, the groupings column also lets us rework our indexes to take advantage of bloom filters. Unlike b-tree indices, which are only efficient when filtering columns in-order, bloom indices have no ordering preference, and filter just as well as long as enough columns are being filtered by. The bloom index implementation in postgres does not support excluding NULLs. However, by including the groupings column in the index we can filter to the correct rows without requiring NULL support. Following the general outline of [1], the entropy in each of the filtering columns is: column entropy ======== ======= league 0.40 formatid 2.20 classid 2.69 mapid 7.07 grouping 3.56 As the information stored in the default signature length of 80 bits is 6.322, we can use one bit for each column (slightly shortchanging the mapid). This gives us a total number of set bits (I) of 80. Using the formula for signature length (s_r) assuming 4K pages and a 4x random read cost, we find the optimal signature length for a given number of filters (Q) is: I Q s_r = = ===== 5 1 809.5 5 2 169.4 5 3 75.8 5 4 46.6 5 5 33.6 This indicates we will support efficient querying with the default signature length of 80 when we are filtering by at least 3 columns. We will always filter by at least one columns (groupings), so the bloom index will be efficient for querying on 2 or more columns. This means we need more-efficient indices for the 1 column case. Fortunately, b-tree indices are a great fit here. In the case where we aren't filtering on any columns, we still want to filter by groupings, so we can use a b-tree index for that as well. This indexing strategy roughly halves the index space, and should be much more robust to arbitrary filter combinations. [1] https://web.archive.org/web/20190201134134/https://blog.coelho.net/database/2016/12/11/postgresql-bloom-index.html Fixes: 44be5a5 ("Optimize leaderboard") Signed-off-by: Sean Anderson <[email protected]>
- Loading branch information
Showing
7 changed files
with
153 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
CREATE EXTENSION IF NOT EXISTS bloom; | ||
|
||
DO $$ BEGIN | ||
CREATE OPERATOR CLASS enum_ops DEFAULT FOR TYPE anyenum USING bloom AS | ||
OPERATOR 1 =(anyenum, anyenum), | ||
FUNCTION 1 hashenum(anyenum); | ||
EXCEPTION WHEN duplicate_object THEN | ||
NULL; | ||
END $$; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
BEGIN; | ||
|
||
DROP MATERIALIZED VIEW leaderboard_cube; | ||
CREATE MATERIALIZED VIEW leaderboard_cube AS SELECT | ||
playerid, | ||
league, | ||
formatid, | ||
primary_classid AS classid, | ||
mapid, | ||
grouping(playerid, league, formatid, primary_classid, mapid) AS grouping, | ||
sum(log.duration) AS duration, | ||
sum((wins > losses)::INT) AS wins, | ||
sum((wins = losses)::INT) AS ties, | ||
sum((wins < losses)::INT) AS losses, | ||
sum(kills) AS kills, | ||
sum(deaths) AS deaths, | ||
sum(assists) AS assists, | ||
sum(dmg) AS dmg, | ||
sum(dt) AS dt, | ||
sum(shots) AS shots, | ||
sum(hits) AS hits | ||
FROM log_nodups AS log | ||
JOIN player_stats USING (logid) | ||
GROUP BY CUBE (playerid, league, formatid, classid, mapid) | ||
ORDER BY mapid, classid, formatid, playerid, league; | ||
|
||
-- To help out the query planner | ||
CREATE STATISTICS IF NOT EXISTS leaderboard_stats (dependencies, ndistinct, mcv) | ||
ON league, formatid, classid, mapid, grouping | ||
FROM leaderboard_cube; | ||
|
||
-- When we have no filters (or nothing better) | ||
CREATE INDEX IF NOT EXISTS leaderboard_grouping ON leaderboard_cube (grouping); | ||
|
||
-- When we have a single filter | ||
CREATE INDEX IF NOT EXISTS leaderboard_league ON leaderboard_cube (league) | ||
WHERE playerid NOTNULL | ||
AND league NOTNULL | ||
AND formatid ISNULL | ||
AND classid ISNULL | ||
AND mapid ISNULL | ||
AND grouping = b'01110'::INT; | ||
CREATE INDEX IF NOT EXISTS leaderboard_format ON leaderboard_cube (formatid) | ||
WHERE playerid NOTNULL | ||
AND league ISNULL | ||
AND formatid NOTNULL | ||
AND classid ISNULL | ||
AND mapid ISNULL | ||
AND grouping = b'01110'::INT; | ||
CREATE INDEX IF NOT EXISTS leaderboard_class ON leaderboard_cube (classid) | ||
WHERE playerid NOTNULL | ||
AND league ISNULL | ||
AND formatid ISNULL | ||
AND classid NOTNULL | ||
AND mapid ISNULL | ||
AND grouping = b'01110'::INT; | ||
CREATE INDEX IF NOT EXISTS leaderboard_map ON leaderboard_cube (mapid) | ||
WHERE playerid NOTNULL | ||
AND league ISNULL | ||
AND formatid ISNULL | ||
AND classid ISNULL | ||
AND mapid NOTNULL | ||
AND grouping = b'01110'::INT; | ||
|
||
-- When we have multiple filters | ||
CREATE INDEX IF NOT EXISTS leaderboard_bloom ON leaderboard_cube | ||
USING bloom (grouping, mapid, classid, formatid, league) | ||
WITH (col1=1, col2=1, col3=1, col4=1, col5=1) | ||
WHERE playerid NOTNULL; | ||
|
||
COMMIT; | ||
|
||
ANALYZE VERBOSE leaderboard_cube; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters