Disable data encryption

imraro · July 15, 2014, 12:29pm

Let’s test 1G video-file from my phone:

stat Video/VID_0.mp4 | grep Size Size: 1106723023 Blocks: 2161576 IO Block: 4096 regular file

laptop(i7-3520M)# time sha256sum VID_0.mp4 8c4b354dc38daa47207c8c5730b7e8909f223042c1bdba932f6718077599bd6f VID_0.mp4

real 0m4.376s user 0m4.250s sys 0m0.120s

phone(Snapdragon 800)# time sha256sum VID_0.mp4 8c4b354dc38daa47207c8c5730b7e8909f223042c1bdba932f6718077599bd6f VID_0.mp4

real 0m24.883s user 0m16.650s sys 0m1.950s

calmh · July 15, 2014, 4:05pm

Looks awesome to me.

markhellewell · July 16, 2014, 2:39am

Have you considered LZ4 for compression? Should be significantly faster than deflate.

calmh · July 16, 2014, 5:59am

That’s a good suggestion. I’ll give it a try.

Yep!

BenchmarkDeflateRandom	     300	   5306454 ns/op	  24.70 MB/s
--- BENCH: BenchmarkDeflateRandom
	crypto_test.go:90: In 131072, out 81986, f=0.625504
	crypto_test.go:90: In 13107200, out 13067723, f=0.996988
	crypto_test.go:90: In 39321600, out 39285128, f=0.999072
BenchmarkDeflateASCII	     200	   6483742 ns/op	  20.22 MB/s
--- BENCH: BenchmarkDeflateASCII
	crypto_test.go:90: In 131072, out 72128, f=0.550293
	crypto_test.go:90: In 13107200, out 11464640, f=0.874683
	crypto_test.go:90: In 26214400, out 22972608, f=0.876335
BenchmarkDeflateSparse	     300	   5027997 ns/op	  26.07 MB/s
--- BENCH: BenchmarkDeflateSparse
	crypto_test.go:90: In 131072, out 31936, f=0.243652
	crypto_test.go:90: In 13107200, out 5033472, f=0.384023
	crypto_test.go:90: In 39321600, out 15148480, f=0.385246
BenchmarkDeflateZeros	   10000	    215761 ns/op	 607.49 MB/s
--- BENCH: BenchmarkDeflateZeros
	crypto_test.go:90: In 131072, out 0, f=0.000000
	crypto_test.go:90: In 13107200, out 55296, f=0.004219
	crypto_test.go:90: In 1310720000, out 5718016, f=0.004363
BenchmarkLZ4Random	   30000	     45711 ns/op	2867.39 MB/s
--- BENCH: BenchmarkLZ4Random
	crypto_test.go:134: In 131072, out 131587, f=1.003929
	crypto_test.go:134: In 13107200, out 13158700, f=1.003929
	crypto_test.go:134: In 1310720000, out 1315870000, f=1.003929
	crypto_test.go:134: In 3932160000, out 3947610000, f=1.003929
BenchmarkLZ4ASCII	   30000	     40092 ns/op	3269.26 MB/s
--- BENCH: BenchmarkLZ4ASCII
	crypto_test.go:134: In 131072, out 131587, f=1.003929
	crypto_test.go:134: In 13107200, out 13158700, f=1.003929
	crypto_test.go:134: In 1310720000, out 1315870000, f=1.003929
	crypto_test.go:134: In 3932160000, out 3947610000, f=1.003929
BenchmarkLZ4Sparse	    5000	    350328 ns/op	 374.14 MB/s
--- BENCH: BenchmarkLZ4Sparse
	crypto_test.go:134: In 131072, out 95931, f=0.731895
	crypto_test.go:134: In 13107200, out 9593100, f=0.731895
	crypto_test.go:134: In 655360000, out 479655000, f=0.731895
BenchmarkLZ4Zeros	   50000	     34311 ns/op	3820.02 MB/s
--- BENCH: BenchmarkLZ4Zeros
	crypto_test.go:134: In 131072, out 524, f=0.003998
	crypto_test.go:134: In 13107200, out 52400, f=0.003998
	crypto_test.go:134: In 1310720000, out 5240000, f=0.003998
	crypto_test.go:134: In 6553600000, out 26200000, f=0.003998

LZ4 becomes a no-op on the uncompressable-ish data.

nimrod · July 25, 2014, 3:35pm

I would suggest looking into using AES coupled with hardware AES-NI acceleration to reduce that as a bottleneck, I’m pretty sure I remember reading somewhere that the go crypto aes implementation includes it. BTW: Thanks for all the work you’ve put into syncthing, its becoming a very viable alternative to btsync for most purposes.

calmh · July 26, 2014, 11:17am

LZ4 is introduced in v0.9.0-beta5, and is indeed a pretty big win. Lower compression ratio than deflate, but the net result is a much faster sync when the available bandwidth exceeds 30 Mbps or so.

markhellewell · July 27, 2014, 12:28pm

Ooh, nice! Those benchmarks are wild

bilz · December 8, 2014, 5:34pm

Hello,

I went though the thread, and even if @calmh does not want leave encryption behind for LAN transfers, I am still wondering how the encryption impacts the transfer speed. I see in the code:

tlsCfg := &tls.Config{ Certificates: []tls.Certificate{cert}, NextProtos: []string{"bep/1.0"}, ClientAuth: tls.RequestClientCert, SessionTicketsDisabled: true, InsecureSkipVerify: true, MinVersion: tls.VersionTLS12, CipherSuites: []uint16{ tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, }, }

How do you select TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 from this list?

By the way, I see in the Go documentation what are the supported cipher suite ids. From the official list, there are also the TLS_NULL_WITH_NULL_NULL or TLS_ECDHE_RSA_WITH_NULL_SHA in the protocol. Do you have any idea if these have been implemented in Go? I assume it is not the case, since a search of these keywords in the Go sources does not give to any result.

AudriusButkevicius · December 8, 2014, 5:47pm

If you leave only that one suite, it should use only that.

I don’t think null ciphers are implemented in Go, though I am puzzled why. I guess you can always drop down to plain TCP which should be the same effect, that’s why. It’s just that syncthing relies on the certificate exchange which will be your problem.

Also, I guess this is to discourage use of unsecure suites.

calmh · December 8, 2014, 9:45pm

Probably the closest you can get to disabling crypto is replacing that entire list with TLS_RSA_WITH_RC4_128_SHA. That’s still good for hiding against some prying eyes, and should be a bit faster than AES.

bilz · December 9, 2014, 8:02pm

I compiled Syncthing with TLS_RSA_WITH_RC4_128_SHA only in the list, and the results are pretty surprising. Here is my setup (both are Linux Mint 17 = Ubuntu 14.04):

a desktop, CPU Intel Core i3 (LAN)
laptop, CPU Intel Core2Duo (LAN), much less powerful than the desktop since it uses a 32bits OS and Syncthing only uses one core)

I’ve cleaned completely the ~/.config/syncthing folder on both machines, and did a setup from scratch. The speed test is done with 2 files, the first one is ~ 1.3 Go while the second one is ~ 4.7 Go. These files are added on the desktop, and transferred to the laptop.

In each case, I turn off Syncthing on the laptop, add the file on the desktop, and when it is hashed, I turn on the laptop. So I’m sure the CPU load is only coming from the data transfer.

File 1, ~ 1.3 Go. Transfer speed: 30-35 Mo/s (8 Mo/s with AES) CPU load on the desktop (sending): 70-80 % of 4 cores CPU load on the laptop (receiving): 95-100 % of 1 core

I thought at first that the encryption algorithm was THE reason, but then I tested with another file.

File 2, ~ 4.7 Go. Transfer speed: 16Mo/s, quickly decreases to an average of 7-8 Mo/s (speed with AES unknown) CPU load on the desktop (sending): 95-100 % of 4 cores (?) CPU load on the laptop (receiving): 20-30 % of 1 core (?!?)

Damn, this is weird: the transfer speed depends on the file size, and the CPU load on the sending peer even more!

To be sure, I transferred a third file, ~ 1.5 Go. Result as expected, transfer speed of about 30-35 Mo/s. But then I noticed something on this transfer (CPU load on top, transfer speed at bottom):

At the level of the (beautiful) red arrow, the CPU starts increasing while the transfer speed decreases. Unfortunately, the transfer is quite fast, and I cannot see if the trend if confirmed on a long term.

I am puzzled now: obviously, the encryption algorithm play a role. Indeed, I reach a transfer speed of ~ 30-35 Mo/s with the RC4 method while it is limited to 8 Mo/s with AES 128. But there seems to be something weird with Syncthing itself: why would the transfer speed depend on the file size? And why is the CPU load much higher on the sending peer, while it is clearly more powerful than the receiving peer? Finally, why do I see a decreasing transfer speed while the CPU load increases on the sending node? @calmh, do you have an idea?

Rewt0r · December 9, 2014, 8:20pm

I’m guessing the increased load and slower transfer is purely because it takes longer to split the larger file into blocks?

bilz · December 9, 2014, 9:05pm

I doubt, since:

the split time is quite short, for example on my system it takes 10 seconds to split a file of 1.3 Go into pieces of 5 Mo.
I don’t expect an increasing CPU load with a larger file; I would expect something like split_time = system_speed x file_size, with system_speed being a constant whatever the size of the file.

calmh · December 9, 2014, 9:50pm

No idea. To summarize, “benchmarking is hard”. What I’d suggest though, is letting the sender get everything under control (online, scanned, idle). Then start the receiver with syncthing -reset to get a completely clean slate. It’ll transfer the entire repo in one go. Do this many times, without changing your data. Average your results.

AudriusButkevicius · December 9, 2014, 9:57pm

Also, it would be interesting to see the cpu profile on the sender, as I don’t see what could be consuming that much CPU, we are just using the standard TLS library, opening a file, reading a chunk, and writing it to the socket…

bilz · December 10, 2014, 9:26pm

I’ll have a look at that, might take some time to gather the data.

However, that would be very helpful to have Syncthing display the transfer time, i.e. the duration between the beginning and the end of the synchronization, on the receiver. Any idea how it can be done?

AudriusButkevicius · December 11, 2014, 1:10am

I guess you can just track the completion api or watch the event interface (downloadprogress event becoming non-empty which is start, and the empty which is finish)

bilz · December 12, 2014, 6:39pm

Hello,

I have gathered some statistics on transfer speeds.

Protocol

Files to transfer: 3 files containing random data (/dev/urandom) of size 100 MB, 1000 MB and 5000 MB

Methodology

Desktop and laptop are set up with one share folder (default), compression disabled
When the set up is complete, the receiver is turned off and its configuration folder is backed up.
Only one file is copied in the shared folder, then:

Sender is turned on with options STCPUPROFILE=1 and STTRACE=“events”
On the sender side, wait until Syncthing has finished “working”
On the receiver side, remove any file in the shared folder (except .stfolder) and replace the configuration directory with the backup.
Receiver is turned on with options STCPUPROFILE=1 and STTRACE=“events”. Logs are kept for future analysis.
When the transfer is finished, receiver and sender are turned off.
Repeat several time for each file.

At the end of the test, the log files are analyzed with a specific tool (Python script). The file size is know, and the transfer duration is obtained by subtracting the time of the event ‘LocalIndexUpdated’ and ‘ItemStarted’. An average and a standard deviation are computed.

Results

The following results gather, for each file, the transfer speed of each run as well as the average and standard deviation.

File size: 100.0 MB Run # Speed (MBps) 1 7.692 2 7.692 3 7.143 4 7.143 5 7.692 6 7.692 7 7.692 8 7.692 9 7.143 10 7.692 11 7.143 12 7.143 13 7.692 14 7.143 Speed average: 7.46 MBps std. dev.: 0.28 MBps

File size: 1000.0 MB Run # Speed (MBps) 1 7.407 2 7.463 3 7.407 4 7.463 5 7.353 6 7.246 7 7.463 8 7.463 9 7.519 10 7.407 Speed average: 7.42 MBps std. dev.: 0.08 MBps

File size: 5000.0 MB Run # Speed (MBps) 1 5.605 2 5.695 3 5.828 4 5.734 5 5.118 Speed average: 5.60 MBps std. dev.: 0.28 MBps

CPU profiles are provided for the sender and the receiver for every file (see file pprof.tar.gz (3.3 MB)). In the case of the 100 MB file, the sender CPU profiles of each run are gathered in one single file, because an empty file was produced otherwise. In the other cases, the CPU profile is provided separately for each run.

Analysis

At my level, I can just notice a drop in the transfer speed for the file with a size of 5000 MB. The scope of this test is limited to the standard version of Syncthing, but it would be interesting to compare with a modified version where the encryption algorithm RC4 (less ressource consuming) is used. Indeed, preliminary tests have shown that the drop in transfer speed is more significant (transfer speeds of ~30-35 MBps for a file of 100 MB to ~6-7 Mbps for a file of 5000 MB).

@calmh and @AudriusButkevicius: is this useful in any way? Any idea on extra info you might need?

bilz · December 17, 2014, 8:10pm

Any feedback on these tests?

AudriusButkevicius · December 17, 2014, 10:54pm

Sorry I am on holiday till Jan but I’ve bookmarked this.