The overhead is not only about padding, I investigated further and added my findings on CryFs' bug tracker: https://github.com/cryfs/cryfs/issues/11
Just as you said, creating many files or directories on disk also adds overhead.
A while ago I worked on a pooling implementation which reminds me of this problem. In some application you want to pre-allocate some buffers to improve performance. Buffers may be discarded/reused multiple times, so you want to maximize re-use of buffers to avoid memory overhead.
I found that restricting buffer sizes to the next larger power of 2 works nicely, and the overhead is typically 20% or lower.
This might be possible trade off for our situation. It would leak the approximate file size (approx by a factor of 2), but would greatly reduce the number of files on disk, and minimize padding.
Update: in our case we might want to round to the next smaller power of 2, thus creating multiple blocks of decreasing size, until you reach the smallest block (4k, or 16k, or 32k).
This would avoid creating too many files, in the O ( K * log(size) ) instead of O ( K * size ) with fixed block size approach. This would also reduce padding overhead