			The KDirStat Cache File Format
		       ================================

Author:  Stefan Hundhammer <sh@suse.de>
Updated: 2006-01-03


KDirStat can read cache files in either gzip or plain text (uncompressed)
format. The file format is line oriented.

Empty lines as well as lines with a '#' character as their first
non-whitespace character are ignored.

Example:


[kdirstat 2.5.1-devel cache file]
# Do not edit!
#
# Type	path		size	mtime	<optional fields>

D /work/home/sh/kde/kdirstat	159	0x43aea3d3
F 	ChangeLog	19288	0x43ae9a3b
F 	.cvsignore	207	0x41c194f3
F 	stamp-h1	23	0x43ae9d73
F 	acinclude.m4	357171	0x43ae9d52
F 	Makefile.am	1237	0x43ad58a0
F 	config.h	5460	0x43ae9977
D /work/home/sh/kde/kdirstat/kdirstat	549	0x43aea73b
F 	.cvsignore	108	0x3b7bda58
F 	kcleanupcollection.cpp	7615	0x41a3322d
F 	kdirstatapp.cpp	24254	0x43aea372
F 	kdirtree.cpp	33083	0x43adc843
F 	kdirstatsettings.cpp	30519	0x3e39540c
F 	kdirtreeiterators.cpp	7191	0x3e1ad131
F 	ktreemapview.cpp	17100	0x3f93ee7b
D /work/home/sh/kde/kdirstat/kdirstat/.libs	24	0x43ae9900
D /work/home/sh/kde/kdirstat/kdirstat/.deps	504	0x43aea73a
F 	ktreemaptile.Po	18074	0x43ae99a0
F 	kdirtree.Po	18134	0x43aea0ba
D /work/home/sh/kde/kdirstat/kdirstat/pics	144	0x43ae9d72
F 	hi32-action-symlink.png	1141	0x3c0b5152
F 	Makefile.am	59	0x41a47537
D /work/home/sh/kde/kdirstat/kdirstat/pics/CVS	64	0x41c194fc
F 	Entries	447	0x41c194fc
F 	Repository	23	0x41c194fb
F 	Root	54	0x41c194fb
D /work/home/sh/kde/kdirstat/kdirstat/CVS	64	0x434657ca
F 	Entries	2020	0x434657ca
F 	Repository	18	0x41c194f8
F 	Root	54	0x41c194f8


(End of example)


Header
======

The first line ( "[kdirstat 2.5.1-devel cache file]" ) is a header
identifying the file format. Future versions of KDirStat may or may not
check the version number (the second word of the header line) to make sure
the file format is compatible with that particular version of KDirStat.



Data Lines
==========

The data lines are separated into fields by whitespace (blanks or tabs).
Fields are not surrounded by single or double quotes.
The maximum line length is 1024 bytes.
Mandatory fields are:

- Type
- Path or name
- Size
- MTime (time of last modification)

After those mandatory fields there may be optional fields in this order:

- "blocks:" followed by a field with the number of blocks
- "links:"  followed by a field with the number of links

The identifiers of those optional fields ("blocks:", "links:") are case
insensitive.



Fields
======


Type
----

Any of:

"F"		plain file
"D"		directory
"L"		(symbolic) link
"BlockDev"	block device i-node
"CharDev"	character device i-node
"FIFO"		FIFO (named pipe)
"Socket"	socket

The type field is case insensitive.



Path or Name
------------

Either an absolute path (starting with "/") or only a base name relative to
the last preceding full path in the file.

Directory entries are required to have an absolute path. Entries for plain
files, symlinks, or special files (devices, FIFOs, sockets) may have an
absolute or a relative path.

    Hint: To save some disk space with relative paths, it makes sense to
    list the plain files in a directory first and then descend into any
    subdirectories when writing a cache file.


Paths and names are URL-encoded, i.e. any character (in particular whitespace)
that might otherwise be some kind of delimiter is specified as its hex code
with preceding "%":

	with blank	->	with%20blank
	with%percent	->	with%25percent

Take special care for blanks, tabs, newlines, and percent characters. It
does not hurt to escape a few more characters than would strictly be
neccessary.

As for encoding, unfortunately this is one big mess. 7 bit ASCII works
allright, but if there are any special characters, everything depends on the
locale in which the user created a file. There is no standard for file name
encodings in file systems; it all Special characters may come in all kinds of
flavours - in Latin-1 (ISO-8859-1), Latin-2, UTF-8, Japanese, Korean, Chinese,
whatever. In an ideal world, the file system would take care about this and
normalize file names with non-ASCII (7 bit) characters, but that doesn't
happen. So if one user uses, say, Latin-1 and another uses UTF-8, a file system
may have files with different encodings for each file name. Worse yet, the same
(special, i.e. non-7-bit-ASCII) character will be be stored in different
character representations for different file names.

Those character representations is what readdir() returns. There is no way to
tell in which encoding a name may come, so there is also no way to convert it
to a well-defined standard encoding like, say, UTF-8. So what gets stored in
the cache file is the same byte sequence as returned by readdir(). Those names
encoded in something other than the current locale of KDE where KDirStat is
running will of course be displayed with garbage letters, but this cannot be
helped. This is just the same as when the name is read directly from the file
system with readdir(). Bad luck.


Size
----

The entry's size (st_size in struct stat as returned by lstat() ).
Note: This is the entry's own size, not the accumulated size of all
children!

This size is given in bytes. It may also have a trailing unit (directly
following the number, without whitespace):

- "K" for kB (1024 bytes)
- "M" for MB (1024 kB)
- "G" for GB (1024 MB)

The size is always specified in integer numbers, never in fractional
numbers. So if it cannot be divided by a bigger unit without a fractional
part, use the next-lower unit that fits without fraction.

Examples:

1024		->	1K
1025		->	1025	(NOT 1.01K or something like this!)
8589934592	->	8G
8589934593	->	8589934593	(bad luck)


MTime
------

The entry's last modification time as time_t, i.e., in seconds since
1970-01-01 00:00:00.

May be specified in hex (with preceding 0x) or decimal.



Blocks
------

If a file is a sparse file (and only then) it has a "blocks:" field.

This is the content of the st_blocks field of struct stat as returned by
lstat(): The number of disk blocks actually allocated.

This number multiplied by the block size may be less than what st_size
returns; in this case that file is considered to be a "sparse" file or a
file with "holes".

A block size of 512 bytes is assumed.

Example:

	blocks:	17

This file has 17*512 bytes allocated.


Links
-----

If a non-directory entry has more than one hard link, the entry has a
"links:" field indicating the number of hard links:

	links:	7

