I have an interesting issue, probably caused by me not understanding something. I have a 20gb file I want to sync between devices. It is static, and no other process is modifying it.
Now, what I see. When “Use Large Blocks” is not enabled, I get consistent 188.5 MiB/s indexing performance:
2019-02-22 15:43:51 Walk 5rslf-uzcoq [] current progress 1964900352/23638202528 at 188.5 MiB/s (8%)
When I enable “Use Large Blocks”, hashing only shows 3.4 MiB/s:
2019-02-22 15:38:52 Walk 5rslf-uzcoq [] current progress 167772160/23638202528 at 3.4 MiB/s (0%)
On startup, benchmark shows:
2019-02-22 11:14:05 Single thread SHA256 performance is 497 MB/s using minio/sha256-simd (94 MB/s using crypto/sha256).
2019-02-22 11:14:06 Hashing performance is 205.38 MB/s
All other settings for the folder have default values, installed version is 1.0.1. What am I doing wrong? Thank you
Update: I am observing the same behavior with 1.1.0-rc.1
Indexing for a 22gb takes about 2 hours with large blocks. Indexing for the same file without large blocks is 2 minutes. Kind of hard to miss the 60 times difference in time.
Indexing for a 4gb file is 6 minutes (13 MiB/s) vs 30 seconds without large blocks (same 180 MiB/s).
It does look like the larger the file (and so the dynamic block size) - the longer it takes to index.
For testing, I have created a folder, not shared with any other server, to ensure there is nothing else (like networking) is involved. The results above come from Synology NAS, but I have consistent results coming from a Linux box as well with crypto/sha256 (so it is not simd library):
2019-02-22 17:13:17 Single thread SHA256 performance is 205 MB/s using crypto/sha256 (201 MB/s using minio/sha256-simd).
2019-02-22 17:13:18 Hashing performance is 154.84 MB/s
Just out of curiosity I dumped a 50gb file. It shows the same hashing performance as 20gb one (around 3.4 MiB/s), and the time is 5 hours. Without large blocks it does ~180 MiB/s and completed in 5 minutes.
Given the default in 1.1 is to always use large blocks, this looks scary.
The large blocks enabled in the current rc make scanning performance 3mb/s, surely we need to cherry-pick the fix and roll a new rc? Or am I misunderstanding something.
Right… Yeah I didn’t think about large blocks being enabled by default in 1.1. Change is safe enough that I’m not worried about the late RC, so yeah, will do. This thing really does need an issue though.