Disable data encryption

calmh · December 18, 2014, 2:39pm

I’ve added and tweaked a benchmark run to the integration tests, that transfers the exact same set of files each time and measures the time it takes. That could be used. But I’m not really disputing that running without crypto (or with weak crypto) is faster than doing real crypto…

AudriusButkevicius · December 18, 2014, 3:05pm

The claim of different file sizes having different speeds is the one thst intrigues me. Plus it’s interesting to see where the bottlenecks are in general in his case.

AudriusButkevicius · December 29, 2014, 7:51pm

Receiver: 100MB all files empty - no data 1GB:

(pprof) top
81.35s of 81.74s total (99.52%)
Dropped 17 nodes (cum <= 0.41s)
Showing top 10 nodes out of 24 (cum >= 1.43s)
      flat  flat%   sum%        cum   cum%
    50.56s 61.85% 61.85%     50.56s 61.85%  crypto/aes.encryptBlockGo
    26.09s 31.92% 93.77%     26.09s 31.92%  crypto/cipher.(*gcm).mul
     2.50s  3.06% 96.83%      2.50s  3.06%  crypto/cipher.getUint64
     1.25s  1.53% 98.36%      1.25s  1.53%  scanblock
     0.45s  0.55% 98.91%      0.45s  0.55%  crypto/cipher.fastXORWords
     0.41s   0.5% 99.41%        29s 35.48%  crypto/cipher.(*gcm).updateBlocks
     0.05s 0.061% 99.47%     50.64s 61.95%  crypto/aes.(*aesCipher).Encrypt
     0.03s 0.037% 99.51%     50.59s 61.89%  crypto/aes.encryptBlock
     0.01s 0.012% 99.52%     51.10s 62.52%  crypto/cipher.(*gcm).counterCrypt
         0     0% 99.52%      1.43s  1.75%  GC

5GB:

447.07s of 456.51s total (97.93%)
Dropped 392 nodes (cum <= 2.28s)
Showing top 10 nodes out of 32 (cum >= 274.18s)
      flat  flat%   sum%        cum   cum%
   270.42s 59.24% 59.24%    270.42s 59.24%  crypto/aes.encryptBlockGo
   142.48s 31.21% 90.45%    142.48s 31.21%  crypto/cipher.(*gcm).mul
    13.33s  2.92% 93.37%     13.33s  2.92%  crypto/cipher.getUint64
     7.99s  1.75% 95.12%      7.99s  1.75%  scanblock
     5.92s  1.30% 96.41%      5.92s  1.30%  crypto/sha256.block
     2.88s  0.63% 97.04%    158.60s 34.74%  crypto/cipher.(*gcm).updateBlocks
     2.67s  0.58% 97.63%      2.67s  0.58%  crypto/cipher.fastXORWords
     0.71s  0.16% 97.79%    271.13s 59.39%  crypto/aes.encryptBlock
     0.48s  0.11% 97.89%    271.61s 59.50%  crypto/aes.(*aesCipher).Encrypt
     0.19s 0.042% 97.93%    274.18s 60.06%  crypto/cipher.(*gcm).counterCrypt

Sender: 100MB:

47.57s of 51.07s total (93.15%)
Dropped 133 nodes (cum <= 0.26s)
Showing top 10 nodes out of 36 (cum >= 0.90s)
      flat  flat%   sum%        cum   cum%
    22.53s 44.12% 44.12%     22.53s 44.12%  crypto/aes.encryptBlockG
    17.59s 34.44% 78.56%     17.59s 34.44%  crypto/cipher.(*gcm).mul
     2.16s  4.23% 82.79%      2.16s  4.23%  crypto/cipher.getUint64
     1.52s  2.98% 85.76%     21.27s 41.65%  crypto/cipher.(*gcm).upd
     0.96s  1.88% 87.64%      0.96s  1.88%  scanblock
     0.74s  1.45% 89.09%      0.74s  1.45%  runtime.MSpan_Sweep
     0.73s  1.43% 90.52%     23.26s 45.55%  crypto/aes.encryptBlock
     0.67s  1.31% 91.83%      0.67s  1.31%  crypto/cipher.fastXORWor
     0.44s  0.86% 92.70%      0.44s  0.86%  crypto/cipher.gcmInc32
     0.23s  0.45% 93.15%      0.90s  1.76%  crypto/cipher.xorWords

1GB:

41.38s of 52.52s total (78.79%)
Dropped 133 nodes (cum <= 0.26s)
Showing top 10 nodes out of 46 (cum >= 1.45s)
      flat  flat%   sum%        cum   cum%
    15.88s 30.24% 30.24%     15.88s 30.24%  crypto/aes.encryptBlockGo
    10.25s 19.52% 49.75%     10.25s 19.52%  crypto/cipher.(*gcm).mul
     2.64s  5.03% 54.78%      6.24s 11.88%  github.com/calmh/xdr.(*Reader).ReadUint32
     2.19s  4.17% 58.95%      2.19s  4.17%  scanblock
     2.03s  3.87% 62.81%      3.45s  6.57%  io.ReadAtLeast
     1.97s  3.75% 66.57%      1.99s  3.79%  runtime.mallocgc
     1.82s  3.47% 70.03%      9.68s 18.43%  github.com/calmh/xdr.(*Reader).ReadBytesMaxInto
     1.62s  3.08% 73.12%      1.62s  3.08%  runtime.MSpan_Sweep
     1.54s  2.93% 76.05%      4.99s  9.50%  io.ReadFull
     1.44s  2.74% 78.79%      1.45s  2.76%  crypto/cipher.getUint64

5GB:

837.92s of 1130.55s total (74.12%)
Dropped 419 nodes (cum <= 5.65s)
Showing top 10 nodes out of 59 (cum >= 58.49s)
      flat  flat%   sum%        cum   cum%
   163.01s 14.42% 14.42%    163.01s 14.42%  scanblock
   116.94s 10.34% 24.76%    116.94s 10.34%  crypto/aes.encryptBlockGo
    90.28s  7.99% 32.75%    228.26s 20.19%  github.com/calmh/xdr.(*Reader).ReadUint32
    83.94s  7.42% 40.17%    142.93s 12.64%  io.ReadAtLeast
    78.10s  6.91% 47.08%     78.72s  6.96%  runtime.mallocgc
    71.81s  6.35% 53.43%     71.81s  6.35%  crypto/cipher.(*gcm).mul
    64.48s  5.70% 59.14%     64.48s  5.70%  runtime.MSpan_Sweep
    62.50s  5.53% 64.66%    338.13s 29.91%  github.com/calmh/xdr.(*Reader).ReadBytesMaxInto
    54.64s  4.83% 69.50%    197.58s 17.48%  io.ReadFull
    52.22s  4.62% 74.12%     58.49s  5.17%  bytes.(*Reader).Read

I am not sure if the profiles are of multiple runs or of a single run (since the profiler produces multiple files during the runtime), I assumed it’s a single run and checked only the last files.

AudriusButkevicius · December 29, 2014, 7:54pm

It seems that we are very inefficient with memory management on the sender with larger files causing load with memory allocations which is now: https://github.com/syncthing/syncthing/issues/1157

bilz · December 31, 2014, 2:39pm

Thanks for testing

cydron · February 27, 2015, 12:34am

I’m not worried about the crypto… we can switch to faster algorithms if we need to like EC-DH and ChaCha20 , which guarantee PFS and are at least 2x faster than AES etc. Or better crypto libraries. The crypto is only gonna be about 3% to 15% of the overhead, but probably closer to 5% with AES hardware acceleration.

The only latency is during keypair generation and session key agreement, not transfer. So I don’t think the crypto is slowing transfers too much.

The slow speeds are a few things… (1) No parallel transfer / saturation of bandwidth to optimum … (2) Lack of efficient multithreading , (3) Lack of a dedicated & distributed cache for writes() , (4) Lack of file locks and signalling, (5) Perhaps we need to check UDP as well, as it’s generalyl faster.

Oh also, the block size needs to be (A) configurable, (B) block id not random, should be ‘rarest first’, © other nodes should share blocks, and (D) we need filesystem locking during changes.

So there’s alot to do… most of it involves testing.