Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-balance-least-used: sort block groups by bytes used, not percentage used #33

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

intelfx
Copy link

@intelfx intelfx commented Sep 5, 2021

This way, we can quickly clean up completely empty block groups.

Additionally, if btrfs-balance-least-used -u 0 is invoked, only
consider fully empty block groups because this seems a useful edge case
(there can be a lot of almost-empty block groups which are technically
used_pct==0 but still take a lot of time to balance, so the user may as
well invoke with -u 1).

…tage used

This way, we can quickly clean up completely empty block groups.

Additionally, if `btrfs-balance-least-used -u 0` is invoked, only
consider fully empty block groups because this seems a useful edge case
(there can be a lot of almost-empty block groups which are technically
used_pct==0 but still take a lot of time to balance, so the user may as
well invoke with `-u 1`).
@knorrie
Copy link
Owner

knorrie commented Nov 20, 2021

Hi! That's interesting.

Do I understand correctly that:

  1. the actual goal of the change is to have -u 0 not act on block groups that are not completely empty
  2. using bytes instead of pct is a means to accomplish that?

This question because the title of the whole merge request is "sort block groups by bytes used, not percentage".

@knorrie
Copy link
Owner

knorrie commented Nov 20, 2021

Ok, I think I figured out what happens here.

So, the current code for used_pct in BlockGroupItem is:

@property
def used_pct(self):
    """Convenience property that calculates the percentage of usage."""
    return int(round((self.used * 100) / self.length))

(side note: the int() is redundant, since Python 3 round() already returns int, and can be dropped now.)

The round() causes usage 0.49% to display as 0 and 99.51% as 100. I don't know why I did this, it has been like that from the very beginning. The number was also meant as a convenience thing for quickly displaying something, not for providing a very accurate number to further act on.

When writing b-b-l-u much later, I didn't really think about this. So, yes, everything up to 0.49999..% usage is mixed with 0% in the same step and all of them display as 0%.

However, the suggested improvement to use actual bytes used also has a problem, because block groups can have different sizes. We'd rather first reclaim the unused space in a 10GiB RAID0 bg with 1023MiB used (~10%), than a 1GiB RAID0 bg with 1023MiB used (~100%).

If I try to think as a new user who uses b-b-l-u for the first time, and sees the -u option and percentage numbers flying by... Then I would kind of intuitively implicitly think it would work like this:

  • 0% is really empty, zero bytes used
  • 100% is really full, all bytes used
  • 1% means just a little bit of stuff, but not completely empty
  • 99% means almost full, but not totally
  • for values closer to the middle: ok, whatever, there's stuff inside

Does this make sense?

So, neither round() nor floor() or ceil() do the trick, but we could do something yolo by making 50% a bit magnetic, which results in the intuitively expected behaviour...

#!/usr/bin/python3

import math

SZ_1M = 0x00100000
SZ_1G = 0x40000000


def bla(used, length, fn):
    float_pct = (used * 100) / length
    print("{} {} {}".format(used, float_pct, fn(float_pct)))


def doit(fn):
    bla(0, SZ_1G, fn)
    bla(SZ_1M, SZ_1G, fn)
    bla(8 * SZ_1M, SZ_1G, fn)
    bla(1019 * SZ_1M, SZ_1G, fn)
    bla(1024 * SZ_1M, SZ_1G, fn)


print("h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:")
doit(round)
print()


def yolo(float_pct):
    if float_pct < 50:
        return math.ceil(float_pct)
    return math.floor(float_pct)


print("why not this:")
doit(yolo)

output:

h'okay, so, what a 1GiB BlockGroupItem used_pct does now is:
0 0.0 0
1048576 0.09765625 0
8388608 0.78125 1
1068498944 99.51171875 100
1073741824 100.0 100

why not this:
0 0.0 0
1048576 0.09765625 1
8388608 0.78125 1
1068498944 99.51171875 99
1073741824 100.0 100

@knorrie
Copy link
Owner

knorrie commented Dec 17, 2021

@intelfx What do you think of the above suggestion to always make 0% really 0% and 100% really exactly 100%?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants