Looking for benchmarking volunteers

I am looking into writing some benchmarks to compare different hashing algorithms which could be used for Pulse/Syncthing.

I only own a single type of devices in my household, so it be hard to compare the overall effect if I were to propose a change.

Ideally I’d be looking for people with ARM boards of different types, phones, Pi’s, and even smaller devices which might not even have SIMD support, NAS devices, old CPUs of various configurations (no recent SSE, 32bit single core etc).

If you are willing to donate some of your time and CPU cycles for the sake of science, drop a reply listing the devices you have available.

I can offer: Windows 7 64 bit with a Intel i5-3570K @ 3.40 GHz, 4 Cores - 8 GB Ram Android 4.3 with ARM Cortex-A9 @ 1,4 GHz Quad-Core 2 GB Ram, rooted Synology NAS (kind of Linux) with Marvell Kirkwood mv6281 ARM @ 800 MHz, 1 Core, 128 MB Ram Synology NAS (kind of Linux) with Marvell Armada XP armv7l @ 1.33 GHz, 2 Cores, 1 GB Ram + Hardware Encryption

I cannot code good enough, so I would like to contribute where I can.

The benchmarking tool would be provided by me.

I can also offer multiple devices: -Raspberry Pi -Netbook with Atom N450 -Laptop with some old Dual Core -Desktop with Xeon E3 1230 V2

Everything except the Pi runs Linux and Windows, also got 2 Android devices, if it’s not too hard to run the benachmark on them i could also offer them for testing

Sony Vaio VGN-S260 Intel® Pentium® M Processor 735 (1.70GHz, 2MB L2 cache)

So I’ve implemented a basic benchmarker, though I haven’t yet built the binaries. Those who know what they are doing can already run it, it’s available at:

Short instructions on how to do it are available on the readme.

Post the CPU type along with the results.

I compiled it because i don’t have go on all my pcs especially on windows, here are the first results:

Xeon E3 1230 V2 (windows 8.1 64bit)

Blake2s 256 Modified        2000       1382460 ns/op      94.81 MB/s
Skein 256        2000       1394935 ns/op      93.96 MB/s
Skein 1024        1000       2807835 ns/op      46.68 MB/s
Blake2s 256        1000       1673115 ns/op      78.34 MB/s
Blake2b 512 Modified        2000       1170794 ns/op     111.95 MB/s
Blake2b 256        2000       1204255 ns/op     108.84 MB/s
Blake2b 512        2000       1205334 ns/op     108.74 MB/s
Blake2b 256 Modified        2000       1200307 ns/op     109.20 MB/s
Skein 512        1000       1509083 ns/op      86.86 MB/s
SHA256        5000        638031 ns/op     205.43 MB/s
SHA512        5000        394460 ns/op     332.28 MB/s

Atom N450 (Ubuntu 14.04 64bit)

SHA512        1000       2201983 ns/op      59.52 MB/s
Blake2b 256         200       9284057 ns/op      14.12 MB/s
Blake2b 512 Modified         200       9046488 ns/op      14.49 MB/s
Blake2s 256 Modified         500       6348686 ns/op      20.65 MB/s
Skein 512         200       9230181 ns/op      14.20 MB/s
Skein 1024         100      21644610 ns/op       6.06 MB/s
SHA256         500       3954249 ns/op      33.15 MB/s
Blake2b 512         200       9954141 ns/op      13.17 MB/s
Blake2s 256         500       7016383 ns/op      18.68 MB/s
Blake2b 256 Modified         200       9156908 ns/op      14.31 MB/s
Skein 256         200       7358770 ns/op      17.81 MB/s

Raspberry Pi (Raspbian)

Skein 256          10     137660410 ns/op       0.95 MB/s
Skein 1024           5     484293717 ns/op       0.27 MB/s
SHA256         100      22344678 ns/op       5.87 MB/s
Blake2b 512          10     151706113 ns/op       0.86 MB/s
Blake2s 256          20      74565693 ns/op       1.76 MB/s
Blake2b 256 Modified          10     216545381 ns/op       0.61 MB/s
Skein 512          10     211516223 ns/op       0.62 MB/s
SHA512          50      38009926 ns/op       3.45 MB/s
Blake2b 256          10     151516819 ns/op       0.87 MB/s
Blake2b 512 Modified          10     218767018 ns/op       0.60 MB/s
Blake2s 256 Modified          50      54956267 ns/op       2.39 MB/s

Core i7, 2.3 GHz (Mac)

testing: warning: no tests to run
PASS
Blake2b 512 Modified	    2000	   1398582 ns/op	  93.72 MB/s
Skein 512	    1000	   1757098 ns/op	  74.60 MB/s
Skein 1024	     500	   3216343 ns/op	  40.75 MB/s
Blake2b 256	    2000	   1390337 ns/op	  94.27 MB/s
Blake2b 512	    2000	   1484393 ns/op	  88.30 MB/s
Blake2b 256 Modified	    1000	   1423988 ns/op	  92.05 MB/s
Blake2s 256 Modified	    1000	   1615558 ns/op	  81.13 MB/s
Skein 256	    1000	   1646161 ns/op	  79.62 MB/s
SHA256	    2000	    740704 ns/op	 176.96 MB/s
SHA512	    5000	    470788 ns/op	 278.41 MB/s
Blake2s 256	    1000	   1981872 ns/op	  66.14 MB/s

Old Xeon E5540, 2.53 GHz (Solaris)

testing: warning: no tests to run
PASS
SHA512      3000            579992 ns/op         225.99 MB/s
Blake2b 256         1000           1962913 ns/op          66.77 MB/s
Blake2b 512 Modified        1000           1927548 ns/op          68.00 MB/s
Blake2s 256 Modified         500           2471297 ns/op          53.04 MB/s
Skein 512            500           2631373 ns/op          49.81 MB/s
Skein 1024           500           3185699 ns/op          41.14 MB/s
SHA256      2000            904403 ns/op         144.93 MB/s
Blake2b 512         1000           1963127 ns/op          66.77 MB/s
Blake2s 256          500           2991994 ns/op          43.81 MB/s
Blake2b 256 Modified        1000           1928574 ns/op          67.96 MB/s
Skein 256           1000           2087819 ns/op          62.78 MB/s

SHA512 is looking nice, except on raspberry where it’s anyway not relevantly worse…

I have now compiled a binary for all platforms:

Blake2p should be the best on all platforms, but Go 1.3 compiler completely messed up the optimizations making it 300% slower.

Core i5-2520M @ 2 * 2.5 GHz:

testing: warning: no tests to run
PASS
Blake2s 256 Modified        1000           1776909 ns/op          73.76 MB/s
Skein 512           1000           1912932 ns/op          68.52 MB/s
SHA256      2000            966303 ns/op         135.64 MB/s
SHA512      2000            650358 ns/op         201.54 MB/s
Blake2b 256         1000           1835415 ns/op          71.41 MB/s
Blake2b 512         1000           1847027 ns/op          70.96 MB/s
Blake2s 256         1000           2342659 ns/op          55.95 MB/s
Blake2b 256 Modified        1000           1772842 ns/op          73.93 MB/s
Blake2b 512 Modified        1000           1586663 ns/op          82.61 MB/s
Skein 256           1000           2226575 ns/op          58.87 MB/s
Skein 1024           500           3832765 ns/op          34.20 MB/s

Nexus 5, Snapdragon 800 @ 4 * 2.26 GHz (this is ARMv7, I suppose it might be limited by your ARMv5 binary, though I don’t know enough about this).

testing: warning: no tests to run
PASS
SHA256       500           3869984 ns/op          33.87 MB/s
Blake2b 512          100          13774776 ns/op           9.52 MB/s
Blake2s 256          100          10839948 ns/op          12.09 MB/s
Blake2b 256 Modified         100          13472478 ns/op           9.73 MB/s
Skein 256            100          20471165 ns/op           6.40 MB/s
Skein 1024           100          28024591 ns/op           4.68 MB/s
SHA512       200           8558778 ns/op          15.31 MB/s
Blake2b 256          100          13106796 ns/op          10.00 MB/s
Blake2b 512 Modified         100          13689727 ns/op           9.57 MB/s
Blake2s 256 Modified         200           8562868 ns/op          15.31 MB/s
Skein 512            100          19392473 ns/op           6.76 MB/s

For anyone else trying to run on Android, you have to copy it somewhere on system memory (sdcard does not work), then run chmod 755 *file* and execute.

I spoke with the person who worked on the Blake2 port for Go, and it seems that the current SHA256 implementation is done in ASM in x86, and in Go in ARM (though he suggested it could be done in ASM too as most of the required instructions are there)

If Blake2 was implemented in ASM, it would most likely work for both ARM and x86 and would outperform SHA256.

Also, a very interesting benchmarks for most platforms (Warning, large page): http://bench.cr.yp.to/results-sha3.html

AMD Athlon™ XP 1700+ 1,46GHz running Ubuntu 9.19 (Linux 2.6.31):

SHA256	    1000	   2484703 ns/op	  52.75 MB/s
Blake2b 512	     100	  11329656 ns/op	  11.57 MB/s
Blake2s 256	     100	  11861114 ns/op	  11.05 MB/s
Blake2b 256 Modified	     100	  11360263 ns/op	  11.54 MB/s
Skein 256	     100	  12099396 ns/op	  10.83 MB/s
Skein 1024	      50	  46107154 ns/op	   2.84 MB/s
SHA512	     100	  14916757 ns/op	   8.79 MB/s
Blake2b 256	     100	  11350365 ns/op	  11.55 MB/s
Blake2b 512 Modified	     100	  11338172 ns/op	  11.56 MB/s
Blake2s 256 Modified	     100	  10276982 ns/op	  12.75 MB/s
Skein 512	     100	  14857514 ns/op	   8.82 MB/s

AMD Athlon™ XP 1700+ 1,46GHz running Windows XP:

Blake2b 256          200          13046875 ns/op          10.05 MB/s
Blake2b 512 Modified         200          12421875 ns/op          10.55 MB/s
Blake2s 256 Modified         100          11718750 ns/op          11.18 MB/s
Skein 512            100          16718750 ns/op           7.84 MB/s
SHA512       200          11562500 ns/op          11.34 MB/s
Blake2b 512          100          12031250 ns/op          10.89 MB/s
Blake2s 256          100          13125000 ns/op           9.99 MB/s
Blake2b 256 Modified         100          11875000 ns/op          11.04 MB/s
Skein 256            100          12031250 ns/op          10.89 MB/s
Skein 1024            50          48750000 ns/op           2.69 MB/s
SHA256      1000           2656250 ns/op          49.34 MB/s

Raspberry Pi B (ARMv6, 256MB RAM, overclocked to 800MHz) with Raspbian 7:

Blake2b 512 Modified	      10	 153959203 ns/op	   0.85 MB/s
Blake2s 256 Modified	      50	  58865828 ns/op	   2.23 MB/s
Skein 512	      10	 208538257 ns/op	   0.63 MB/s
SHA512	      50	  45860429 ns/op	   2.86 MB/s
Blake2b 256	      10	 157584014 ns/op	   0.83 MB/s
Blake2s 256	      20	  77800581 ns/op	   1.68 MB/s
Blake2b 256 Modified	      10	 153338718 ns/op	   0.85 MB/s
Skein 256	      10	 142405188 ns/op	   0.92 MB/s
Skein 1024	       5	 370104674 ns/op	   0.35 MB/s
SHA256	     100	  25412553 ns/op	   5.16 MB/s
Blake2b 512	      10	 152929328 ns/op	   0.86 MB/s

Intel Atom D2500 2x 1.86GHz running IPFire (my Router/Firewall):

[root@ipfire ~]# uname -a
Linux ipfire 3.10.44-ipfire-pae #1 SMP Mon Jun 23 23:23:33 GMT 2014 i686 pentium2 i386 GNU/Linux
[root@ipfire ~]# ./gohashcompare-v0.1-linux-386
Skein 256	     100	  15242779 ns/op	   8.60 MB/s
Skein 1024	      50	  39473998 ns/op	   3.32 MB/s
SHA256	     500	   4064968 ns/op	  32.24 MB/s
Blake2b 512	     100	  23312438 ns/op	   5.62 MB/s
Blake2s 256	     100	  11275684 ns/op	  11.62 MB/s
Blake2b 256 Modified	     100	  23190138 ns/op	   5.65 MB/s
Skein 512	      50	  33980170 ns/op	   3.86 MB/s
SHA512	     100	  13696257 ns/op	   9.57 MB/s
Blake2b 256	     100	  23317991 ns/op	   5.62 MB/s
Blake2b 512 Modified	     100	  22972929 ns/op	   5.71 MB/s
Blake2s 256 Modified	     100	  10507790 ns/op	  12.47 MB/s

Intel Core2Duo 2x 2.0GHz running OS X 10.6.8:

SHA512	    2000	    790549 ns/op	 165.80 MB/s
Blake2b 256	     500	   3532697 ns/op	  37.10 MB/s
Blake2b 512 Modified	     500	   3524700 ns/op	  37.19 MB/s
Blake2s 256 Modified	     500	   3278720 ns/op	  39.98 MB/s
Skein 512	     500	   3495382 ns/op	  37.50 MB/s
Skein 1024	     200	   8579453 ns/op	  15.28 MB/s
SHA256	    2000	   1226707 ns/op	 106.85 MB/s
Blake2b 512	     500	   3532336 ns/op	  37.11 MB/s
Blake2s 256	     500	   4174911 ns/op	  31.40 MB/s
Blake2b 256 Modified	     500	   3524490 ns/op	  37.19 MB/s
Skein 256	     500	   3194665 ns/op	  41.03 MB/s

Intel Core i7 870 8x 2,9GHz running OS X 10.6.8:

Skein 256	    1000	   2092689 ns/op	  62.63 MB/s
Skein 1024	     500	   3382583 ns/op	  38.75 MB/s
SHA256	    2000	    822559 ns/op	 159.35 MB/s
Blake2b 512	    1000	   1726556 ns/op	  75.92 MB/s
Blake2s 256	    1000	   2749869 ns/op	  47.66 MB/s
Blake2b 256 Modified	    1000	   1723037 ns/op	  76.07 MB/s
Skein 512	    1000	   2580658 ns/op	  50.79 MB/s
SHA512	    5000	    535503 ns/op	 244.76 MB/s
Blake2b 256	    1000	   1680542 ns/op	  77.99 MB/s
Blake2b 512 Modified	    1000	   1696015 ns/op	  77.28 MB/s
Blake2s 256 Modified	    1000	   2161369 ns/op	  60.64 MB/s

Intel Xeon X3360 4x 2,8GHz running elementary OS (GNU/Linux 3.13 64bit):

Skein 256	    1000	   2256484 ns/op	  58.09 MB/s
Skein 1024	     500	   6416628 ns/op	  20.43 MB/s
SHA256	    2000	    865275 ns/op	 151.48 MB/s
Blake2b 512	    1000	   2219384 ns/op	  59.06 MB/s
Blake2s 256	    1000	   2938243 ns/op	  44.61 MB/s
Blake2b 256 Modified	    1000	   2456305 ns/op	  53.36 MB/s
Skein 512	    1000	   2414231 ns/op	  54.29 MB/s
SHA512	    5000	    557967 ns/op	 234.91 MB/s
Blake2b 256	    1000	   2206165 ns/op	  59.41 MB/s
Blake2b 512 Modified	    1000	   2456441 ns/op	  53.36 MB/s
Blake2s 256 Modified	    1000	   2315812 ns/op	  56.60 MB/s

cat /proc/cpuinfo | grep "model name" model name : Intel® Core™2 Duo CPU E8400 @ 3.00GHz

PASS
SHA512	    5000	    525045 ns/op	 249.64 MB/s
Blake2b 256	    1000	   2249694 ns/op	  58.26 MB/s
Blake2b 512 Modified	    1000	   2253477 ns/op	  58.16 MB/s
Blake2s 256 Modified	    1000	   2174681 ns/op	  60.27 MB/s
Skein 512	    1000	   2274926 ns/op	  57.62 MB/s
SHA256	    2000	    814702 ns/op	 160.88 MB/s
Blake2b 512	    1000	   2249298 ns/op	  58.27 MB/s
Blake2s 256	    1000	   2761611 ns/op	  47.46 MB/s
Blake2b 256 Modified	    1000	   2253390 ns/op	  58.17 MB/s
Skein 256	    1000	   2125172 ns/op	  61.68 MB/s
Skein 1024	     500	   6045747 ns/op	  21.68 MB/s

Allwinner A20 (ARM dual-core Cortex-A7) @ 1GHz

testing: warning: no tests to run
PASS
SHA512 	      	      	      100	  25968812 ns/op	   5.05 MB/s
Blake2b 256 	      	      50	  38302490 ns/op	   3.42 MB/s
Blake2b 512 Modified	      50	  38495106 ns/op	   3.40 MB/s
Blake2s 256 Modified	     100	  17404004 ns/op	   7.53 MB/s
Skein 512 	     	      50	  46729231 ns/op	   2.80 MB/s
Skein 1024 	      	      20	  85537841 ns/op	   1.53 MB/s
SHA256	     	      	      100	  14260295 ns/op	   9.19 MB/s
Blake2b 512	     	      50	  37509162 ns/op	   3.49 MB/s
Blake2s 256	     	      100	  19811277 ns/op	   6.62 MB/s
Blake2b 256 Modified	      50	  37638347 ns/op	   3.48 MB/s
Skein 256 	      	      50	  38604284 ns/op	   3.40 MB/s

Intel® Core™2 Duo CPU L9300 @ 1.60GHz Archlinux 64bit:

Blake2b 512 Modified	     500	   5353005 ns/op	  24.49 MB/s
Blake2s 256 Modified	     500	   4873107 ns/op	  26.90 MB/s
Skein 512		     500	   5122144 ns/op	  25.59 MB/s
SHA512	    		     2000	   1209216 ns/op	 108.39 MB/s
Blake2b 256	     	     500	   5229393 ns/op	  25.06 MB/s
Blake2s 256	     	     500	   5978017 ns/op	  21.93 MB/s
Blake2b 256 Modified	     500	   5246482 ns/op	  24.98 MB/s
Skein 256	     	     500	   4656133 ns/op	  28.15 MB/s
Skein 1024	     	     100	  11738369 ns/op	  11.17 MB/s
SHA256	    		     1000	   1833992 ns/op	  71.47 MB/s
Blake2b 512	     	     500	   5260140 ns/op	  24.92 MB/s

Intel Core i5-3570K 4x @3.40GHz

PASS
Skein1024             1000        2770157ns/op         47.32MB/s
SHA256                5000         622835ns/op        210.44MB/s
Blake2b256            2000        1201070ns/op        109.13MB/s
Blake2s256            1000        1685096ns/op         77.78MB/s
Blake2b256Modified    2000        1165067ns/op        112.50MB/s
Blake2b512Modified    2000        1159065ns/op        113.08MB/s
SHA512                5000         383422ns/op        341.85MB/s
Blake2b512            2000        1198568ns/op        109.36MB/s
Blake2s256Modified    2000        1339076ns/op         97.88MB/s
Skein256              2000        1335076ns/op         98.18MB/s
Skein512              2000        1449082ns/op         90.45MB/s

Intel Core i5-2520M CPU 4x @2.5 GHz on Windows 7 64bit

PASS
SHA256                  2000           1028374 ns/op         127.46 MB/s
Blake2b 256             1000           1787293 ns/op          73.34 MB/s
Blake2b 512             1000           1811514 ns/op          72.35 MB/s
Blake2s 256             1000           2502816 ns/op          52.37 MB/s
Blake2b 256 Modified    1000           1957848 ns/op          66.95 MB/s
Blake2s 256 Modified    1000           2099136 ns/op          62.44 MB/s
Skein 512               1000           2243451 ns/op          58.42 MB/s
Skein 1024               500           4513142 ns/op          29.04 MB/s
SHA512                  5000            670714 ns/op         195.42 MB/s
Blake2b 512 Modified    1000           1830688 ns/op          71.60 MB/s
Skein 256               1000           1987114 ns/op          65.96 MB/s

Synology NAS DS214+:

cat /proc/cpuinfo | grep "model name" 

says: Intel Atom CPU CE5335 4x @ 1.60GHz and the manual says: Marvell Armada XP armv7l @ 1.33 GHz, 2 Cores, 1 GB Ram + Hardware Encryption … so maybe thats the same CPU

PASS
SHA512                   100          14876557 ns/op           8.81 MB/s
Blake2b 256              100          25612066 ns/op           5.12 MB/s
Blake2b 512 Modified    100          25612530 ns/op           5.12 MB/s
Blake2s 256 Modified    100          10612570 ns/op          12.35 MB/s
Skein 512                 50           38806945 ns/op           3.38 MB/s
SHA256                   500           3589723 ns/op          36.51 MB/s
Blake2b 512              100          25630447 ns/op           5.11 MB/s
Blake2s 256              100          11481037 ns/op          11.42 MB/s
Blake2b 256 Modified    100          25603787 ns/op           5.12 MB/s
Skein 256                100          16683268 ns/op           7.86 MB/s
Skein 1024                50           44244566 ns/op           2.96 MB/s

Synology NAS DS210j: Marvell Kirkwood mv6281 ARM @ 800 MHz, 1 Core, 128 MB Ram:

Nas> /volume1/Data/gohashcompare-v0.1-linux-arm
runtime: this CPU has no floating point hardware, so it cannot run this GOARM=6 binary. Recompile using GOARM=5.

I cannot compile on that NAS, I will run the test again, if needed. Could you supply me the arm5 binary?

I think we now have enough data to conclude that @calmh has made the right choice from the very beginning. Unless someone wants to implement Balke2b in ASM for x86 and ARM, I don’t think there is anything better we could use :smile:

Is there a way to close the topic?

This topic is now closed. Thanks everyone! :heart:

(I wasn’t being smart, it’s just a reasonable algorithm which as an effect has ended up being optimized in the standard library.)