Skip to content

Latest commit

 

History

History
173 lines (128 loc) · 6.76 KB

dirstat.mkd

File metadata and controls

173 lines (128 loc) · 6.76 KB

"dirstat" -- statistics for a directory (or tar file)

I often find I need a quick, birds-eye view of a directory, and it should include not just space, but also number of files and directories. If you've ever used qdirstat or similar tools, you know what I mean, but I don't know of anything that does that on the terminal.

Apart from that, it's nice to see a "flattened" view, upto a certain depth.

Oh, and before I get to the actual features, I should mention that dirstat can do the same kind of summarisation using the output of tar -tvf too! I'm not sure I know of any tool that can do that.



requirements

Uses GNU datamash. Admittedly, it could be replaced by any other similar script or I could even replace it with pure perl, but I was lazy and I happened to have had it installed already.

flattened view

Let me explain with an example. Here's what ncdu shows for one of my directories:

    ncdu 1.14.1 ~ Use the arrow keys to navigate, press ? for help
    --- /path-elided ---------------------------------------------------
        5.6 GiB [##########] /events
        2.4 GiB [####      ] /cameras
        1.0 GiB [#         ] /others
      139.4 MiB [          ] /scans
        ... snipped the rest ...

And here's what dirstat shows, with the default "depth" of 2:

      MB              NFiles           NDirs    Path
    2289.978             696              16    cameras/s3is
    1259.249              12               1    events/2012-elided
    1113.458             121               2    events/2009-elided
     836.024              40               4    events/elided-convocation
     527.447             225               1    events/elided-shimla
     336.254             136               2    others/mixed
     312.072             248               1    events/2012-elided-trip
     255.445              47               3    others/rnr
     227.087             132               1    events/2010-elided-and-elided
     185.888             143              13    cameras/a80
        ... snipped the rest ...

(Ignore the differences caused by MiB versus MB and so on for now).

Notice that even though ncdu shows events to be the largest directory, in terms of what is interesting to me -- which is the next level in the hierarchy -- the largest directory is actually cameras/s3is.

This is what I meant by "flattening out" the hierarchy, upto a given depth. And I believe this is very useful in a lot of situations. Especially when you can vary the depth and get different perspectives.

You cannot get this kind of insight from either ncdu or qdirstat.

unusual patterns

Here's another example. This is output of dirstat from my git repos directory, with some items removed for brevity:

MB          NFiles  NDirs   Path
6.094       251     185     gitolite
4.449       193      46     gitolite-doc
    [...]
0.883        46      25     active-aliases
0.779       349     196     dr.git
0.491        43      21     notes
    [...]

The "dr.git" tree seems to have a lot more directories and files than its neighbours (of similar size). Most likely suspect: lots of git activity and git gc or something similar is due.

You cannot get this kind of insight from ncdu, though qdirstat will show it fine.

tar files

I mentioned doing this on tar files up at the top. That's actually saved my butt a couple of times, when I noticed that the last few days backups of one of my VPSs was far larger than a week before:

    total 442812
    drwx------. 2 sitaram sitaram     4096 Aug 20 02:27 .
    drwxr-xr-x. 6 sitaram sitaram     4096 Aug 21 06:54 ..
        [...]
    -rw-r--r--. 1 sitaram sitaram 50688000 Aug 14 05:32 elided-2019-08-14.tar
    -rw-r--r--. 1 sitaram sitaram 76387520 Aug 15 05:32 elided-2019-08-15.tar
        [...]

Simply running tar -tvf elided-2019-08-14.tar | dirstat -d 3 and tar -tvf elided-2019-08-15.tar | dirstat -d 3 on side by side windows showed me what the problem was (I had downloaded some documents and put them in the wrong directory on the VPS).

"interactive"? Well no, but...

Of course, not being smart enough to write ncurses stuff, dirstat is a single shot (i.e., non-interactive) command. But, using the try command I can make it fairly interactive anyway, and pretty cheaply, in terms of amount of code I had to write.

In fact the "documentation" for the try command consists of a video demonstrating its use in playing with dirstat; see try.webm.

how to run it

So anyway, here's dirstat. It has the following options:

-d N        # depth to show; default 2
-s s|f|d    # sort by size(default), file count, or directory count

and you run it in one of these ways:

dirstat
# same as: find -ls | dirstat
# runs dirstat on the current directory
# remember default depth is "2"

cd /usr; dirstat -d 1
# stats for /usr, depth 1
find /usr -ls | dirstat -d 3
# same results as previous command; see notes on "depth" later

tar -tvf foo.tar | dirstat -d 3
# contents of tar, depth of 3

The output looks like the examples you have seen above, where the first column is the size in MB, next is the number of files, followed by the number of directories, and finally the "path", which is truncated to the depth requested.

You can use -s f or -s d to override the default of sort-by-size and sort instead by file count or directory count. Sort is always descending.

In all cases, if the output is to a terminal, dirstat will invoke a head -20 on the output. If you want to see all of the output, either use a pager like less, redirect it to a file, or simply pipe it to cat.

a quick warning about how "depth" is computed

I'm lazy. And the code is simple. "Depth" is handled by simply saying "keep only $DEPTH /-separated fields" (literally, cut -f1-$DEPTH -d/).

So if you run find /usr -ls | dirstat (and since the default depth is 2, you will get only one line of output -- for /usr. To go deeper, you either cd /usr then run dirstat, or specify a depth greater than 2 with -d.

Just to be clear, with depth=2, "/usr/foo" becomes "/usr", because there's a slash at the start, you've already lost one field to that, so that leaves one more, which is usr.