Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for defragmenting files #49

Open
Forza-tng opened this issue Jan 4, 2025 · 0 comments
Open

Add support for defragmenting files #49

Forza-tng opened this issue Jan 4, 2025 · 0 comments

Comments

@Forza-tng
Copy link

Forza-tng commented Jan 4, 2025

It would be great if we could get support for BTRFS_IOC_DEFRAG_RANGE in python-btrfs. It would allow developers easier access to defragmenting files in their programs or scripts.

By targeting defragmentation to specific portions of files where there are lots of small extents, instead of defragmenting the entire file, we need less I/O for a large gain (similar to btrfs-balance-least-used), and we avoid breaking reflinks for portions of files that weren't touched.

Here is a 32GiB VM image that I tested this approach on. The file's length is mapped over the x-axis in buckets/bars, with roughly 550MiB per bucket. The amplitude represents the count of extents that begin within the bucket's byte range. It is clear from the histogram that a majority of the extents are located to a relatively small part of the file.

Histogram: bytes 0..32GiB in steps of 546MiB
Extents: 102,044
Peak bar represents: 33,572 extents

▁▁▁▁▁▁▁▃▁▁█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

With DEFRAG_RANGE we could target the bucket range that has the 33,000 extents, resulting in a substantial reduction with only a portion of the file's data rewritten. (Note that the amplitude scale isn't equal between the two histograms.)

Histogram: bytes 0...32GiB in steps of 546MiB
Extents: 69,639
Peak bar represents: 10,083 extents.
▂▃▃▁▁▁▃▇▂▃▁█▇▄▂▁▃▂▁▁▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

Here is a 2.1GiB archive that was written out in one go, so extents are much larger and evenly spread out.

Histogram: bytes 0...2.1GiB in steps of  35.7 MiB
Extents: 20
Peak bar represents: 3 extents
▃▁▁▃▁▁▁▃▁▁▃▃▁▁▃▁▁▁▃▁▁▁▃▁▁▃▁▁▁▃▁▁▃▁▁▁▃▁▁▃▁▁▁▃▁▁▁▃▁█▁▁▁▃▁▁▁▃▁▁

Expanding on this idea, it would be possible to do a reflink-aware defragmentarion where we check extents for their shared status, and only issue defrag on ranges containing unshared extents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant