Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question : table consume disk space #153

Open
PBAtTheRingIo opened this issue Oct 25, 2023 · 2 comments
Open

Question : table consume disk space #153

PBAtTheRingIo opened this issue Oct 25, 2023 · 2 comments

Comments

@PBAtTheRingIo
Copy link

Hello,
I do not know if it's the way to ask a question on this project, if not sorry about that.
I'm using Debian GNU/Linux 12 (bookworm), Kernel Linux debian 6.1.0-9-amd64, PostgreSQL 15 from package 15.3-0+deb12u1, and the package postgresql-15-hll extension from package 2.17-1+b1.
We would like to see the result of the cardinality after adding a lot of UUID in using the function "hll_add(hll, hll_hash_bytea(?)) with a binary uuid", and I see that my disk space is consumed by the table.
Do you have any idea why, even if the HLL record is small (few kb), the result of "\d+" command and the size on disk can reach several Mb ? Do I have to configure PostgreSQL with dedicated parameters or do something else to not have this issue ?
Thanks a lot in advance

@emelsimsek
Copy link
Contributor

Can you explain your scenario a bit more? What is the data model and the commands that I can use to reproduce what you are observing?

@PBAtTheRingIo
Copy link
Author

I created a simple table with 1 integer and 1 hll like for instance:

CREATE TABLE views_hll (
    item_id integer NOT NULL,
    hll hll(14,5,8,1) DEFAULT hll_empty(14, 5, (8)::bigint, 1) NOT NULL
);

then I use a loop to insert a bunch of UUIDs one by one in a loop (all uuids are different) thanks to a SQl request like this one:
UPDATE "views_hll" AS v0 SET "hll" = hll_add(hll, hll_hash_bytea($1)) WHERE (v0."item_id" = $2);
where $1 is a binary UUID and $2 the id (1 for instance and always 1 for each update).

I check the size after 1, 10, 100, 1_000_1000 update in doing for instance:

SELECT pg_total_relation_size('views_hll');
or
SELECT pg_size_pretty(pg_total_relation_size('views_hll'));

And I check also the value directly on my disk, and I see on the disk and thanks to the previous commands the value growing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants