Disable data encryption

Isn’t SHA1(~2-3 times faster than SHA-256) enough for this?

> CityHash appears to be very nearly as fast as a CRC-32 calculated using the Intel crc32 hardware instruction! I tested three CityHash routines and the Intel crc32 instruction on a 434 MB file. The crc32 instruction version (which computes a CRC-32C) took 24 ms of CPU time. CityHash64 took 55 ms, CityHash128 60 ms, and CityHashCrc128 50 ms. CityHashCrc128 makes use of the same hardware instruction, though it does not compute a CRC

Maybe. But it’s being questioned and is slowly being replaced by SHA-2 and friends. I’m not certain that we won’t appreciate the added security of a good hash, and give it a year or so and we will have twice faster computers anyway, and be using the same protocol hopefully for many years to come.

Stop talking about checksumming algorithms and non-cryptographic hashes. That’s not what we do.

OK. I’m sorry for your time.

Not at all! This gave a valuable result, namely highlighting how fucking slow the compression is! I had assumed that a light compression algorithm in it’s speed-over-compression mode would be a quick win.

That’ll be going away, unless I find something brutally wrong with my benchmarks.

Finally AES (128\256) test from my laptop:

openssl speed aes-128-cbc aes-256-cbc
...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc     101993.28k   112789.08k   112856.23k   116360.19k   112377.86k
aes-256 cbc      75975.16k    80961.54k    80826.88k    82967.89k    83211.61k

Yep. Transfers are much faster without compression.

I had to stop after this, but in my head firmly lodged that SHA-256 uses for file’s blocks checksum. I need more sleep )

This is a tricky subject. :slight_smile: Let me see if I can summarize it correctly:

  • For the connection crypto, we use the Go TLS implementation and it chooses the strongest cipher combination it has. Currently that means TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, which in turn means that data is encrypted with AES-128, authenticated by GCM. The “SHA256” in the string is, as far as I understand, only used as a pseudo random number generator…

  • For block hashes, we do use SHA-256. We want to continue doing this, because it gives us the peace of mind of knowing that if we need a block with hash 1234, and we have a block with hash 1234 somewhere on disk, they are the same. It also doesn’t matter that much since it’s a one time cost per file to hash it, amortized over however long that file lives.

  • In v0.8 and earlier, we use an expensive compression. In v0.9 (current master) I’ve removed that. Large data files are probably in a compressed format already, where it matters.

Let’s test 1G video-file from my phone:

stat Video/VID_0.mp4 | grep Size Size: 1106723023 Blocks: 2161576 IO Block: 4096 regular file

laptop(i7-3520M)# time sha256sum VID_0.mp4 8c4b354dc38daa47207c8c5730b7e8909f223042c1bdba932f6718077599bd6f VID_0.mp4

real 0m4.376s user 0m4.250s sys 0m0.120s

phone(Snapdragon 800)# time sha256sum VID_0.mp4 8c4b354dc38daa47207c8c5730b7e8909f223042c1bdba932f6718077599bd6f VID_0.mp4

real 0m24.883s user 0m16.650s sys 0m1.950s

Looks awesome to me.

Have you considered LZ4 for compression? Should be significantly faster than deflate.

That’s a good suggestion. I’ll give it a try.


Yep!

BenchmarkDeflateRandom	     300	   5306454 ns/op	  24.70 MB/s
--- BENCH: BenchmarkDeflateRandom
	crypto_test.go:90: In 131072, out 81986, f=0.625504
	crypto_test.go:90: In 13107200, out 13067723, f=0.996988
	crypto_test.go:90: In 39321600, out 39285128, f=0.999072
BenchmarkDeflateASCII	     200	   6483742 ns/op	  20.22 MB/s
--- BENCH: BenchmarkDeflateASCII
	crypto_test.go:90: In 131072, out 72128, f=0.550293
	crypto_test.go:90: In 13107200, out 11464640, f=0.874683
	crypto_test.go:90: In 26214400, out 22972608, f=0.876335
BenchmarkDeflateSparse	     300	   5027997 ns/op	  26.07 MB/s
--- BENCH: BenchmarkDeflateSparse
	crypto_test.go:90: In 131072, out 31936, f=0.243652
	crypto_test.go:90: In 13107200, out 5033472, f=0.384023
	crypto_test.go:90: In 39321600, out 15148480, f=0.385246
BenchmarkDeflateZeros	   10000	    215761 ns/op	 607.49 MB/s
--- BENCH: BenchmarkDeflateZeros
	crypto_test.go:90: In 131072, out 0, f=0.000000
	crypto_test.go:90: In 13107200, out 55296, f=0.004219
	crypto_test.go:90: In 1310720000, out 5718016, f=0.004363
BenchmarkLZ4Random	   30000	     45711 ns/op	2867.39 MB/s
--- BENCH: BenchmarkLZ4Random
	crypto_test.go:134: In 131072, out 131587, f=1.003929
	crypto_test.go:134: In 13107200, out 13158700, f=1.003929
	crypto_test.go:134: In 1310720000, out 1315870000, f=1.003929
	crypto_test.go:134: In 3932160000, out 3947610000, f=1.003929
BenchmarkLZ4ASCII	   30000	     40092 ns/op	3269.26 MB/s
--- BENCH: BenchmarkLZ4ASCII
	crypto_test.go:134: In 131072, out 131587, f=1.003929
	crypto_test.go:134: In 13107200, out 13158700, f=1.003929
	crypto_test.go:134: In 1310720000, out 1315870000, f=1.003929
	crypto_test.go:134: In 3932160000, out 3947610000, f=1.003929
BenchmarkLZ4Sparse	    5000	    350328 ns/op	 374.14 MB/s
--- BENCH: BenchmarkLZ4Sparse
	crypto_test.go:134: In 131072, out 95931, f=0.731895
	crypto_test.go:134: In 13107200, out 9593100, f=0.731895
	crypto_test.go:134: In 655360000, out 479655000, f=0.731895
BenchmarkLZ4Zeros	   50000	     34311 ns/op	3820.02 MB/s
--- BENCH: BenchmarkLZ4Zeros
	crypto_test.go:134: In 131072, out 524, f=0.003998
	crypto_test.go:134: In 13107200, out 52400, f=0.003998
	crypto_test.go:134: In 1310720000, out 5240000, f=0.003998
	crypto_test.go:134: In 6553600000, out 26200000, f=0.003998

LZ4 becomes a no-op on the uncompressable-ish data.

I would suggest looking into using AES coupled with hardware AES-NI acceleration to reduce that as a bottleneck, I’m pretty sure I remember reading somewhere that the go crypto aes implementation includes it. BTW: Thanks for all the work you’ve put into syncthing, its becoming a very viable alternative to btsync for most purposes.

LZ4 is introduced in v0.9.0-beta5, and is indeed a pretty big win. Lower compression ratio than deflate, but the net result is a much faster sync when the available bandwidth exceeds 30 Mbps or so.

Ooh, nice! Those benchmarks are wild

Hello,

I went though the thread, and even if @calmh does not want leave encryption behind for LAN transfers, I am still wondering how the encryption impacts the transfer speed. I see in the code:

tlsCfg := &tls.Config{ Certificates: []tls.Certificate{cert}, NextProtos: []string{"bep/1.0"}, ClientAuth: tls.RequestClientCert, SessionTicketsDisabled: true, InsecureSkipVerify: true, MinVersion: tls.VersionTLS12, CipherSuites: []uint16{ tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, }, }

How do you select TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 from this list?

By the way, I see in the Go documentation what are the supported cipher suite ids. From the official list, there are also the TLS_NULL_WITH_NULL_NULL or TLS_ECDHE_RSA_WITH_NULL_SHA in the protocol. Do you have any idea if these have been implemented in Go? I assume it is not the case, since a search of these keywords in the Go sources does not give to any result.

If you leave only that one suite, it should use only that.

I don’t think null ciphers are implemented in Go, though I am puzzled why. I guess you can always drop down to plain TCP which should be the same effect, that’s why. It’s just that syncthing relies on the certificate exchange which will be your problem.

Also, I guess this is to discourage use of unsecure suites.

Probably the closest you can get to disabling crypto is replacing that entire list with TLS_RSA_WITH_RC4_128_SHA. That’s still good for hiding against some prying eyes, and should be a bit faster than AES.

I compiled Syncthing with TLS_RSA_WITH_RC4_128_SHA only in the list, and the results are pretty surprising. Here is my setup (both are Linux Mint 17 = Ubuntu 14.04):

  • a desktop, CPU Intel Core i3 (LAN)
  • laptop, CPU Intel Core2Duo (LAN), much less powerful than the desktop since it uses a 32bits OS and Syncthing only uses one core)

I’ve cleaned completely the ~/.config/syncthing folder on both machines, and did a setup from scratch. The speed test is done with 2 files, the first one is ~ 1.3 Go while the second one is ~ 4.7 Go. These files are added on the desktop, and transferred to the laptop.

In each case, I turn off Syncthing on the laptop, add the file on the desktop, and when it is hashed, I turn on the laptop. So I’m sure the CPU load is only coming from the data transfer.

File 1, ~ 1.3 Go. Transfer speed: 30-35 Mo/s (8 Mo/s with AES) CPU load on the desktop (sending): 70-80 % of 4 cores CPU load on the laptop (receiving): 95-100 % of 1 core

I thought at first that the encryption algorithm was THE reason, but then I tested with another file.

File 2, ~ 4.7 Go. Transfer speed: 16Mo/s, quickly decreases to an average of 7-8 Mo/s (speed with AES unknown) CPU load on the desktop (sending): 95-100 % of 4 cores (?) CPU load on the laptop (receiving): 20-30 % of 1 core (?!?)

Damn, this is weird: the transfer speed depends on the file size, and the CPU load on the sending peer even more!

To be sure, I transferred a third file, ~ 1.5 Go. Result as expected, transfer speed of about 30-35 Mo/s. But then I noticed something on this transfer (CPU load on top, transfer speed at bottom):

At the level of the (beautiful) red arrow, the CPU starts increasing while the transfer speed decreases. Unfortunately, the transfer is quite fast, and I cannot see if the trend if confirmed on a long term.

I am puzzled now: obviously, the encryption algorithm play a role. Indeed, I reach a transfer speed of ~ 30-35 Mo/s with the RC4 method while it is limited to 8 Mo/s with AES 128. But there seems to be something weird with Syncthing itself: why would the transfer speed depend on the file size? And why is the CPU load much higher on the sending peer, while it is clearly more powerful than the receiving peer? Finally, why do I see a decreasing transfer speed while the CPU load increases on the sending node? @calmh, do you have an idea?