I often find I need a quick, birds-eye view of a directory, and it should
include not just space, but also number of files and directories. If you've
ever used qdirstat
or similar tools, you know what I mean, but I don't know
of anything that does that on the terminal.
Apart from that, it's nice to see a "flattened" view, upto a certain depth.
Oh, and before I get to the actual features, I should mention that dirstat
can do the same kind of summarisation using the output of tar -tvf
too! I'm
not sure I know of any tool that can do that.
- requirements
- flattened view
- unusual patterns
- tar files
- "interactive"? Well no, but...
- how to run it
- a quick warning about how "depth" is computed
Uses GNU datamash. Admittedly, it could be replaced by any other similar script or I could even replace it with pure perl, but I was lazy and I happened to have had it installed already.
Let me explain with an example. Here's what ncdu shows for one of my directories:
ncdu 1.14.1 ~ Use the arrow keys to navigate, press ? for help
--- /path-elided ---------------------------------------------------
5.6 GiB [##########] /events
2.4 GiB [#### ] /cameras
1.0 GiB [# ] /others
139.4 MiB [ ] /scans
... snipped the rest ...
And here's what dirstat
shows, with the default "depth" of 2:
MB NFiles NDirs Path
2289.978 696 16 cameras/s3is
1259.249 12 1 events/2012-elided
1113.458 121 2 events/2009-elided
836.024 40 4 events/elided-convocation
527.447 225 1 events/elided-shimla
336.254 136 2 others/mixed
312.072 248 1 events/2012-elided-trip
255.445 47 3 others/rnr
227.087 132 1 events/2010-elided-and-elided
185.888 143 13 cameras/a80
... snipped the rest ...
(Ignore the differences caused by MiB versus MB and so on for now).
Notice that even though ncdu
shows events
to be the largest directory, in
terms of what is interesting to me -- which is the next level in the
hierarchy -- the largest directory is actually cameras/s3is
.
This is what I meant by "flattening out" the hierarchy, upto a given depth. And I believe this is very useful in a lot of situations. Especially when you can vary the depth and get different perspectives.
You cannot get this kind of insight from either ncdu or qdirstat.
Here's another example. This is output of dirstat
from my git repos
directory, with some items removed for brevity:
MB NFiles NDirs Path
6.094 251 185 gitolite
4.449 193 46 gitolite-doc
[...]
0.883 46 25 active-aliases
0.779 349 196 dr.git
0.491 43 21 notes
[...]
The "dr.git" tree seems to have a lot more directories and files than its
neighbours (of similar size). Most likely suspect: lots of git activity and
git gc
or something similar is due.
You cannot get this kind of insight from ncdu, though qdirstat will show it fine.
I mentioned doing this on tar files up at the top. That's actually saved my butt a couple of times, when I noticed that the last few days backups of one of my VPSs was far larger than a week before:
total 442812
drwx------. 2 sitaram sitaram 4096 Aug 20 02:27 .
drwxr-xr-x. 6 sitaram sitaram 4096 Aug 21 06:54 ..
[...]
-rw-r--r--. 1 sitaram sitaram 50688000 Aug 14 05:32 elided-2019-08-14.tar
-rw-r--r--. 1 sitaram sitaram 76387520 Aug 15 05:32 elided-2019-08-15.tar
[...]
Simply running tar -tvf elided-2019-08-14.tar | dirstat -d 3
and tar -tvf elided-2019-08-15.tar | dirstat -d 3
on side by side windows showed me what the
problem was (I had downloaded some documents and put them in the wrong
directory on the VPS).
Of course, not being smart enough to write ncurses stuff, dirstat
is a
single shot (i.e., non-interactive) command. But, using the try
command I can make it
fairly interactive anyway, and pretty cheaply, in terms of amount of code I
had to write.
In fact the "documentation" for the try command consists of a video
demonstrating its use in playing with dirstat
; see try.webm.
So anyway, here's dirstat
. It has the following options:
-d N # depth to show; default 2
-s s|f|d # sort by size(default), file count, or directory count
and you run it in one of these ways:
dirstat
# same as: find -ls | dirstat
# runs dirstat on the current directory
# remember default depth is "2"
cd /usr; dirstat -d 1
# stats for /usr, depth 1
find /usr -ls | dirstat -d 3
# same results as previous command; see notes on "depth" later
tar -tvf foo.tar | dirstat -d 3
# contents of tar, depth of 3
The output looks like the examples you have seen above, where the first column is the size in MB, next is the number of files, followed by the number of directories, and finally the "path", which is truncated to the depth requested.
You can use -s f
or -s d
to override the default of sort-by-size and sort
instead by file count or directory count. Sort is always descending.
In all cases, if the output is to a terminal, dirstat will invoke a head -20
on the output. If you want to see all of the output, either use a pager like
less
, redirect it to a file, or simply pipe it to cat
.
I'm lazy. And the code is simple. "Depth" is handled by simply saying "keep
only $DEPTH /-separated
fields" (literally, cut -f1-$DEPTH -d/
).
So if you run find /usr -ls | dirstat
(and since the default depth is 2, you
will get only one line of output -- for /usr
. To go deeper, you either cd /usr
then run dirstat
, or specify a depth greater than 2 with -d
.
Just to be clear, with depth=2, "/usr/foo" becomes "/usr", because there's a
slash at the start, you've already lost one field to that, so that leaves one
more, which is usr
.