So I’m currently considering a move CryFs. It is relatively young and unproven (was is audited yet?), but seem less risky than encfs+cryptkeeper from what we currently know.
IMHO CryFs’ main downside are the lack of GUI, and security audit.
Yeah I was just discussing this with Calmh on another thread if anyone were to add padding to a 32k 64k or 128k block, but for each file being treated as a block, If you have a situation of thousands of small files of just a few bytes (not as uncommon as you think, especially for OS installations and some games) then you have massive gross overhead on disk. possibly more than double the file size in some situations.
Not really a CryFS only problem. It’s an issue for anyone who has to tackle the problem of “not giving away file size” by padding. My view is not a lot of padding is really necessary (maybe just a few extra bytes at most as long as it’s a non-deterministic amount) should suffice. Of course someone will disagree with me I’m sure, this is a highly subjective argument. You could always not bother padding and allow leakage of file size. But then the RIAA / MPAA could fingerprint that illegal album if MP3’s/movies you have downloaded and stored to some cloud.
Just as you said, creating many files or directories on disk also adds overhead.
A while ago I worked on a pooling implementation which reminds me of this problem. In some application you want to pre-allocate some buffers to improve performance. Buffers may be discarded/reused multiple times, so you want to maximize re-use of buffers to avoid memory overhead.
I found that restricting buffer sizes to the next larger power of 2 works nicely, and the overhead is typically 20% or lower.
This might be possible trade off for our situation. It would leak the approximate file size (approx by a factor of 2), but would greatly reduce the number of files on disk, and minimize padding.
Update: in our case we might want to round to the next smaller power of 2, thus creating multiple blocks of decreasing size, until you reach the smallest block (4k, or 16k, or 32k).
This would avoid creating too many files, in the O ( K * log(size) ) instead of O ( K * size ) with fixed block size approach. This would also reduce padding overhead