v0.14.48 release candidate with large block size

(Jakob Borg) #1

So there’s a candidate for 0.14.48 out that has an interesting feature for those of you with large files. Quoting:

useLargeBlocks is an advanced folder setting that affects the handling of blocks for files larger than 256 MiB. When enabled, the file will be indexed and transferred using blocks larger than the standard 128 KiB. This results in a smaller block list and thus lower overhead. The larger block sizes are powers of two from from 128 KiB up to 16 MiB.

If you have files in the gigabyte range this should give a nice performance boost when syncing, and reduce memory usage in the process. It needs to be enabled per folder and mostly isn’t compatible with older versions. (It’ll become enabled by default at some point in the future.)

Give it a spin?

(ellnic) #2

I have several folders with files that are, in the majority, over 4GB. This is great!

Another step in the right direction for syncthing. Well done :slight_smile:

What would happen in the case of a heavily mixed folder? How would this impact the smaller files with this setting?

(Jakob Borg) #3

It doesn’t; it dynamically selects block size based on the file size, and the first step above the default 128 KiB block kicks in at 256 MiB file size.

Basically it tries to keep the number of blocks between one and two thousand per file, with less and more blocks outside of the extremes of course.

(ellnic) #4

Thanks for clarifying. I really like this, it should improve my sync speeds greatly for certain folders :slight_smile:

(Simon) #5

Very interested in this, and I’m (literally) dusting off a WD EX2 low power NAS to see if makes that device more capable as an affordable backups for video files.

One question, (apologies if it’s a dumb one), do you keep the block signatures separate for different block sizes - or is there a (very) remote change of a key collision for different sized blocks?

(Jakob Borg) #6

There is no chance of key collision regardless of block size. There is no relevant difference between “no” and “very remote” in this case.

(Simon) #7

Running v0.14.48-rc.4, macOS (64 bit)

Clean setup with a 400GB mixed media folder (SD card dumps: RAWs, Videos etc.)

I could not find a tick box in the GUI, so I edited:
/Users/simon/Library/Application Support/Syncthing/config.xml
to say:
(was false) for the folder in xml section in question (gui needs a tick box!)

Initial scan, 1 CPU core is pegged at 100% processing about 35MB/sec putting the scan at 3 hours, however, it occasionally peaks at closer to the drive max (260MB/sec), with a speed of 200MB/sec. (2.3 GHz Intel Core i7, 8Gb of RAM, syncthing is using 120Mb of RAM)

I have not setup a second node yet, so the scan speed is not contended with remote access (and visa-versa)

Will report more shortly, including db size for 400GB of files.

The volume I gave up on using syncthing with is around 7 TB, with things like disk images, so if this test goes well, hopefully I can use syncthing for everything (went back to rsync for that case)

For the second node, do I need to change the XML there, or will it pick it up from the device it is sharing it from?

Thank you for continuing to push Syncthing forward, where is the best place to make a contribution to the project?

(Simon) #8

As you can see, the data scan rate is pretty spikey, I would say 35MB/sec for 70% of the time, 200MB/sec for the rest. I have no idea which files it is on for the faster speeds. The system is not doing anything else, no other nodes shared.

(Simon) #9

After scanning, the database was 163MB

Onto the nodes now, I could not get the web ui to connect on the WD EX2 for some reason, I’ll come back to that, so initially I’ll be using a Surface 3 (not Pro) its a fanless 1.6GHz 64bit 2GB tablet that I use a photo frame (nice screen), with a USB3 external spinning disk (90MB/sec max) drive attached. I also attached wired network for the initial sync.

v0.14.48-rc.4, Windows (64 bit)

First thing I noticed after adding the share is that:
was set by default, I changed it to true and restarted, how does the large block setting propagate between nodes?

Speed is variable, generally 20MB/sec to 30MB/sec, often off, occasional peaks no more than 50MB/sec. This is wired Gbit connected (via USB3 to the surface) a windows share machine to machine copy got around the drive max of 90MB/sec (with Gbit network max being 110MByte/sec on a good day)

This is okay, quite useable once seeded.

I will do a data integrity check, and keep seeing if I can get the WD NAS to work, although I am only expecting 5MB/sec from that very low power device.

Edit – Actually, when I look at the node itself, it’s working the drive fairly hard, implying there may be re-used blocks it is locally copying rather than sending again, its still not pinned, but, it is often hitting 100% drive activity / around 80MByte/sec. This is great performance.

(Simon) #10

A minor observation, nothing to do with large blocks, but it would be faster for duplicated blocks to have them resent over a LAN than read and write from a spinning drive in small chunks. Duplicated files brought the read/write speed down to 5MB/sec compared to 80MB/sec for new chunks from the master node.

(Jakob Borg) #11

In fact large blocks should help here in the long run as you’ll get individual read/writes in the megabyte range instead of smaller. But you’ll want copiers=1 on spinning disks or concurrency will kill the performance.

(Jakob Borg) #12