Faster hashing package

AES instructions don’t help with computing SHA256 unfortunately.

Old, slow, 64bit Atom N450: improved from 49 MB/s to 59 MB/s :slight_smile: arm 32bit stays exactly the same, can test on old AMD 386 in some weeks, currently no access to this device (I don’t use it for syncthing and it’s at my parents’ house)…

1 Like

185 vs 135 on my old i5

On my laptop, it’s 372 vs 412 (i7-6700HQ).

No difference on my old Intel P9600 laptop (~130 MB/s)

Synology NAS DS415+, Processor Intel Atom C2538 (4C/4T Silvermont x86 Cores @ 2.40 GHz)

sha256Perf increase from 68.07 to 91.02 :slight_smile:

Hi guys, I really appreciate you considered the simd package. For people who don’t dig through pull requests it has been integrated into PR 3581.

Thanks @AudriusButkevicius and @calmh!

2 Likes

Not sure if this is the right version to test this (I used the link above)

syncthing v0.14.6+2-g5541868 “Dysprosium Dragonfly” (go1.7 windows-386) jenkins@build.syncthing.net 2016-09-06 10:15:04 UTC

=> Atom N2800 @1.86GHz: Single thread hash performance is ~13 MB/s Which is scary less than the ‘official’ 0.14.6 version (~41 MB/s)

This makes me think about the distributed.net client (many years ago, not sure how their software works nowadays) that had several optimized versions inside the .exe and would select the most appropriate one based on a combination of CPU identification and some quick short test-runs. You could run a ‘performance test’’ that would compare all versions and then you could manually select a specific one if wanted; but generally speaking the automatically selected one usually was the best choice. Maybe Syncthing could have something similar? Someday =)

That benchmark is concerning. According to the documentation of the sha256-simd repo, it picks the fastest one your CPU can do, already. Maybe the test can be run again?

(To be clear, one bad test isn’t reason to revert since it gave solid improvements to everyone else. Might just be a bug or other issue)

Since the hashing speed test is pretty short, and will be affected by things like what else your CPU is doing, I reckon that’s within the margin of error. Nothing to be worried about… Running it a few more times will probably make the difference disappear on average.

Keep in mind the OS scheduler and CPU frequency scaling can affect performance measurement significant. @calmh , @AudriusButkevicius wouldn’t it be a good idea to have a benchmark where people can submit their measurement with HW/CPU info?

Maybe. But I don’t see that there should be a downside to this; the code for non-improved platforms is the same that is running today.

I don’t mind retrying, but I did start the .exe 3 times and it gave me 13, 13, 12 as results. The official version gave me 42, 41, 41.

Anyway, starting over a couple of times I get:

C:\temp\syncthing-windows-386-v0.14.6+2-g5541868\syncthing-windows-386-v0.14.6+2-g5541868>syncthing.exe -verbose [JSUAT] 10:53:56 INFO: Single thread hash performance is ~13 MB/s [JSUAT] 10:54:02 INFO: Single thread hash performance is ~11 MB/s [JSUAT] 10:54:08 INFO: Single thread hash performance is ~13 MB/s [JSUAT] 10:54:12 INFO: Single thread hash performance is ~13 MB/s [JSUAT] 10:54:17 INFO: Single thread hash performance is ~10 MB/s [JSUAT] 10:54:22 INFO: Single thread hash performance is ~11 MB/s [noticed MsMpEng.exe had kicked in (virusscanner), not sure why but waited for it to finish before continuing] [JSUAT] 10:58:19 INFO: Single thread hash performance is ~13 MB/s [JSUAT] 10:58:25 INFO: Single thread hash performance is ~12 MB/s

I simply started the .exe from the commandline (-verbose didn’t tell my anything relevant, let me know if I should add other options); had it show a couple of loglines – the test is pretty much on top so it takes only a couple of seconds – and then killed it by means of ctrl-c. I then waited a couple of seconds and repeated the operation.

I had a look at some logfiles and the ‘normal’ version consistently gives me +40MB/s.

Feel free to throw me some homework if you want. I’ll try to do the same test on my Atom N230 later today to see how that one fares; the ‘normal’ code does low tens there IIRC, I’m curious how/if that might be affected.

1 Like

Is there an Raspberry Pi 3 (64 Bit CPU) Owner here who could do some Benchmarks? Thx! :slight_smile:

Well, I guess the new code made it into the official release?

[JSUAT] 10:42:24 INFO: syncthing v0.14.7 “Dysprosium Dragonfly” (go1.7.1 windows-386) jenkins@build.syncthing.net 2016-09-18 19:02:42 UTC [JSUAT] 10:42:25 INFO: Single thread hash performance is ~13 MB/s

(Intel Atom N2800 @ 1.86GHz, Win10 32bit, 4 GB Ram)

It’s no disaster, but the ~41Mb/s wasn’t overly speedy to start with …

Anyway, since the code is now ‘out’ I had a look at my other little machine and there I have

[YBGGA] 10:58:46 INFO: syncthing v0.14.7 “Dysprosium Dragonfly” (go1.7.1 windows-386) jenkins@build.syncthing.net 2016-09-18 19:02:42 UTC [YBGGA] 10:58:46 INFO: Single thread hash performance is ~11 MB/s

(Intel Atom 230 @ 1.6GHz, Win Server 2008, 32bit, 2GB Ram)

Curious there doesn’t seem to be any effect on that one.

Cu Roby

In next release (you can try the dev build now, linked at the top of the page) we autodetect which method is the fastest and use that.

(sorry for the slow replies… so much to do, so little time…)

Atom N2800 before the update:

[JSUAT] 11:33:17 INFO: Single thread hash performance is ~11 MB/s

Atom N2800 after the update:

[JSUAT] 11:36:09 INFO: Single thread hash performance is 41 MB/s using crypto/sha256 (13 MB/s using minio/sha256-simd).

Thank you very much! Roby

PS: (FYI) Atom N230 before the update:

[YBGGA] 11:40:00 INFO: Single thread hash performance is ~11 MB/s

Atom N230 after the update:

[YBGGA] 11:45:28 INFO: Single thread hash performance is 27 MB/s using crypto/sha256 (10 MB/s using minio/sha256-simd).

Computers are weird… I’ve never seen that one going over 11Mb/s before!? So I tried again a bit later

[YBGGA] 13:17:04 INFO: Single thread hash performance is 35 MB/s using crypto/sha256 (11 MB/s using minio/sha256-simd).

3 Likes

Keep in mind modern CPU’s frequency scale dynamically so every measurement can be different when it is enabled. Under linux this can be controlled: https://wiki.archlinux.org/index.php/CPU_frequency_scaling

1 Like

It seems on most machines it improved the speed a little. I was looking up in BEPv1 spec but could not find any reference to SHA256. Is there any room for adding a other hashing algo like the much much faster Blake2s (https://godoc.org/golang.org/x/crypto/blake2s), and testing out the difference?

There is, and we did at some point try it out. It does improve scanning speed a little (or lower CPU usage, depending), but it had no real effect on the sync speed - which is the thing that matters most of the time.