Looking for benchmarking volunteers

I spoke with the person who worked on the Blake2 port for Go, and it seems that the current SHA256 implementation is done in ASM in x86, and in Go in ARM (though he suggested it could be done in ASM too as most of the required instructions are there)

If Blake2 was implemented in ASM, it would most likely work for both ARM and x86 and would outperform SHA256.

Also, a very interesting benchmarks for most platforms (Warning, large page): http://bench.cr.yp.to/results-sha3.html

AMD Athlon™ XP 1700+ 1,46GHz running Ubuntu 9.19 (Linux 2.6.31):

SHA256	    1000	   2484703 ns/op	  52.75 MB/s
Blake2b 512	     100	  11329656 ns/op	  11.57 MB/s
Blake2s 256	     100	  11861114 ns/op	  11.05 MB/s
Blake2b 256 Modified	     100	  11360263 ns/op	  11.54 MB/s
Skein 256	     100	  12099396 ns/op	  10.83 MB/s
Skein 1024	      50	  46107154 ns/op	   2.84 MB/s
SHA512	     100	  14916757 ns/op	   8.79 MB/s
Blake2b 256	     100	  11350365 ns/op	  11.55 MB/s
Blake2b 512 Modified	     100	  11338172 ns/op	  11.56 MB/s
Blake2s 256 Modified	     100	  10276982 ns/op	  12.75 MB/s
Skein 512	     100	  14857514 ns/op	   8.82 MB/s

AMD Athlon™ XP 1700+ 1,46GHz running Windows XP:

Blake2b 256          200          13046875 ns/op          10.05 MB/s
Blake2b 512 Modified         200          12421875 ns/op          10.55 MB/s
Blake2s 256 Modified         100          11718750 ns/op          11.18 MB/s
Skein 512            100          16718750 ns/op           7.84 MB/s
SHA512       200          11562500 ns/op          11.34 MB/s
Blake2b 512          100          12031250 ns/op          10.89 MB/s
Blake2s 256          100          13125000 ns/op           9.99 MB/s
Blake2b 256 Modified         100          11875000 ns/op          11.04 MB/s
Skein 256            100          12031250 ns/op          10.89 MB/s
Skein 1024            50          48750000 ns/op           2.69 MB/s
SHA256      1000           2656250 ns/op          49.34 MB/s

Raspberry Pi B (ARMv6, 256MB RAM, overclocked to 800MHz) with Raspbian 7:

Blake2b 512 Modified	      10	 153959203 ns/op	   0.85 MB/s
Blake2s 256 Modified	      50	  58865828 ns/op	   2.23 MB/s
Skein 512	      10	 208538257 ns/op	   0.63 MB/s
SHA512	      50	  45860429 ns/op	   2.86 MB/s
Blake2b 256	      10	 157584014 ns/op	   0.83 MB/s
Blake2s 256	      20	  77800581 ns/op	   1.68 MB/s
Blake2b 256 Modified	      10	 153338718 ns/op	   0.85 MB/s
Skein 256	      10	 142405188 ns/op	   0.92 MB/s
Skein 1024	       5	 370104674 ns/op	   0.35 MB/s
SHA256	     100	  25412553 ns/op	   5.16 MB/s
Blake2b 512	      10	 152929328 ns/op	   0.86 MB/s

Intel Atom D2500 2x 1.86GHz running IPFire (my Router/Firewall):

[root@ipfire ~]# uname -a
Linux ipfire 3.10.44-ipfire-pae #1 SMP Mon Jun 23 23:23:33 GMT 2014 i686 pentium2 i386 GNU/Linux
[root@ipfire ~]# ./gohashcompare-v0.1-linux-386
Skein 256	     100	  15242779 ns/op	   8.60 MB/s
Skein 1024	      50	  39473998 ns/op	   3.32 MB/s
SHA256	     500	   4064968 ns/op	  32.24 MB/s
Blake2b 512	     100	  23312438 ns/op	   5.62 MB/s
Blake2s 256	     100	  11275684 ns/op	  11.62 MB/s
Blake2b 256 Modified	     100	  23190138 ns/op	   5.65 MB/s
Skein 512	      50	  33980170 ns/op	   3.86 MB/s
SHA512	     100	  13696257 ns/op	   9.57 MB/s
Blake2b 256	     100	  23317991 ns/op	   5.62 MB/s
Blake2b 512 Modified	     100	  22972929 ns/op	   5.71 MB/s
Blake2s 256 Modified	     100	  10507790 ns/op	  12.47 MB/s

Intel Core2Duo 2x 2.0GHz running OS X 10.6.8:

SHA512	    2000	    790549 ns/op	 165.80 MB/s
Blake2b 256	     500	   3532697 ns/op	  37.10 MB/s
Blake2b 512 Modified	     500	   3524700 ns/op	  37.19 MB/s
Blake2s 256 Modified	     500	   3278720 ns/op	  39.98 MB/s
Skein 512	     500	   3495382 ns/op	  37.50 MB/s
Skein 1024	     200	   8579453 ns/op	  15.28 MB/s
SHA256	    2000	   1226707 ns/op	 106.85 MB/s
Blake2b 512	     500	   3532336 ns/op	  37.11 MB/s
Blake2s 256	     500	   4174911 ns/op	  31.40 MB/s
Blake2b 256 Modified	     500	   3524490 ns/op	  37.19 MB/s
Skein 256	     500	   3194665 ns/op	  41.03 MB/s

Intel Core i7 870 8x 2,9GHz running OS X 10.6.8:

Skein 256	    1000	   2092689 ns/op	  62.63 MB/s
Skein 1024	     500	   3382583 ns/op	  38.75 MB/s
SHA256	    2000	    822559 ns/op	 159.35 MB/s
Blake2b 512	    1000	   1726556 ns/op	  75.92 MB/s
Blake2s 256	    1000	   2749869 ns/op	  47.66 MB/s
Blake2b 256 Modified	    1000	   1723037 ns/op	  76.07 MB/s
Skein 512	    1000	   2580658 ns/op	  50.79 MB/s
SHA512	    5000	    535503 ns/op	 244.76 MB/s
Blake2b 256	    1000	   1680542 ns/op	  77.99 MB/s
Blake2b 512 Modified	    1000	   1696015 ns/op	  77.28 MB/s
Blake2s 256 Modified	    1000	   2161369 ns/op	  60.64 MB/s

Intel Xeon X3360 4x 2,8GHz running elementary OS (GNU/Linux 3.13 64bit):

Skein 256	    1000	   2256484 ns/op	  58.09 MB/s
Skein 1024	     500	   6416628 ns/op	  20.43 MB/s
SHA256	    2000	    865275 ns/op	 151.48 MB/s
Blake2b 512	    1000	   2219384 ns/op	  59.06 MB/s
Blake2s 256	    1000	   2938243 ns/op	  44.61 MB/s
Blake2b 256 Modified	    1000	   2456305 ns/op	  53.36 MB/s
Skein 512	    1000	   2414231 ns/op	  54.29 MB/s
SHA512	    5000	    557967 ns/op	 234.91 MB/s
Blake2b 256	    1000	   2206165 ns/op	  59.41 MB/s
Blake2b 512 Modified	    1000	   2456441 ns/op	  53.36 MB/s
Blake2s 256 Modified	    1000	   2315812 ns/op	  56.60 MB/s

cat /proc/cpuinfo | grep "model name" model name : Intel® Core™2 Duo CPU E8400 @ 3.00GHz

PASS
SHA512	    5000	    525045 ns/op	 249.64 MB/s
Blake2b 256	    1000	   2249694 ns/op	  58.26 MB/s
Blake2b 512 Modified	    1000	   2253477 ns/op	  58.16 MB/s
Blake2s 256 Modified	    1000	   2174681 ns/op	  60.27 MB/s
Skein 512	    1000	   2274926 ns/op	  57.62 MB/s
SHA256	    2000	    814702 ns/op	 160.88 MB/s
Blake2b 512	    1000	   2249298 ns/op	  58.27 MB/s
Blake2s 256	    1000	   2761611 ns/op	  47.46 MB/s
Blake2b 256 Modified	    1000	   2253390 ns/op	  58.17 MB/s
Skein 256	    1000	   2125172 ns/op	  61.68 MB/s
Skein 1024	     500	   6045747 ns/op	  21.68 MB/s

Allwinner A20 (ARM dual-core Cortex-A7) @ 1GHz

testing: warning: no tests to run
PASS
SHA512 	      	      	      100	  25968812 ns/op	   5.05 MB/s
Blake2b 256 	      	      50	  38302490 ns/op	   3.42 MB/s
Blake2b 512 Modified	      50	  38495106 ns/op	   3.40 MB/s
Blake2s 256 Modified	     100	  17404004 ns/op	   7.53 MB/s
Skein 512 	     	      50	  46729231 ns/op	   2.80 MB/s
Skein 1024 	      	      20	  85537841 ns/op	   1.53 MB/s
SHA256	     	      	      100	  14260295 ns/op	   9.19 MB/s
Blake2b 512	     	      50	  37509162 ns/op	   3.49 MB/s
Blake2s 256	     	      100	  19811277 ns/op	   6.62 MB/s
Blake2b 256 Modified	      50	  37638347 ns/op	   3.48 MB/s
Skein 256 	      	      50	  38604284 ns/op	   3.40 MB/s

Intel® Core™2 Duo CPU L9300 @ 1.60GHz Archlinux 64bit:

Blake2b 512 Modified	     500	   5353005 ns/op	  24.49 MB/s
Blake2s 256 Modified	     500	   4873107 ns/op	  26.90 MB/s
Skein 512		     500	   5122144 ns/op	  25.59 MB/s
SHA512	    		     2000	   1209216 ns/op	 108.39 MB/s
Blake2b 256	     	     500	   5229393 ns/op	  25.06 MB/s
Blake2s 256	     	     500	   5978017 ns/op	  21.93 MB/s
Blake2b 256 Modified	     500	   5246482 ns/op	  24.98 MB/s
Skein 256	     	     500	   4656133 ns/op	  28.15 MB/s
Skein 1024	     	     100	  11738369 ns/op	  11.17 MB/s
SHA256	    		     1000	   1833992 ns/op	  71.47 MB/s
Blake2b 512	     	     500	   5260140 ns/op	  24.92 MB/s

Intel Core i5-3570K 4x @3.40GHz

PASS
Skein1024             1000        2770157ns/op         47.32MB/s
SHA256                5000         622835ns/op        210.44MB/s
Blake2b256            2000        1201070ns/op        109.13MB/s
Blake2s256            1000        1685096ns/op         77.78MB/s
Blake2b256Modified    2000        1165067ns/op        112.50MB/s
Blake2b512Modified    2000        1159065ns/op        113.08MB/s
SHA512                5000         383422ns/op        341.85MB/s
Blake2b512            2000        1198568ns/op        109.36MB/s
Blake2s256Modified    2000        1339076ns/op         97.88MB/s
Skein256              2000        1335076ns/op         98.18MB/s
Skein512              2000        1449082ns/op         90.45MB/s

Intel Core i5-2520M CPU 4x @2.5 GHz on Windows 7 64bit

PASS
SHA256                  2000           1028374 ns/op         127.46 MB/s
Blake2b 256             1000           1787293 ns/op          73.34 MB/s
Blake2b 512             1000           1811514 ns/op          72.35 MB/s
Blake2s 256             1000           2502816 ns/op          52.37 MB/s
Blake2b 256 Modified    1000           1957848 ns/op          66.95 MB/s
Blake2s 256 Modified    1000           2099136 ns/op          62.44 MB/s
Skein 512               1000           2243451 ns/op          58.42 MB/s
Skein 1024               500           4513142 ns/op          29.04 MB/s
SHA512                  5000            670714 ns/op         195.42 MB/s
Blake2b 512 Modified    1000           1830688 ns/op          71.60 MB/s
Skein 256               1000           1987114 ns/op          65.96 MB/s

Synology NAS DS214+:

cat /proc/cpuinfo | grep "model name" 

says: Intel Atom CPU CE5335 4x @ 1.60GHz and the manual says: Marvell Armada XP armv7l @ 1.33 GHz, 2 Cores, 1 GB Ram + Hardware Encryption … so maybe thats the same CPU

PASS
SHA512                   100          14876557 ns/op           8.81 MB/s
Blake2b 256              100          25612066 ns/op           5.12 MB/s
Blake2b 512 Modified    100          25612530 ns/op           5.12 MB/s
Blake2s 256 Modified    100          10612570 ns/op          12.35 MB/s
Skein 512                 50           38806945 ns/op           3.38 MB/s
SHA256                   500           3589723 ns/op          36.51 MB/s
Blake2b 512              100          25630447 ns/op           5.11 MB/s
Blake2s 256              100          11481037 ns/op          11.42 MB/s
Blake2b 256 Modified    100          25603787 ns/op           5.12 MB/s
Skein 256                100          16683268 ns/op           7.86 MB/s
Skein 1024                50           44244566 ns/op           2.96 MB/s

Synology NAS DS210j: Marvell Kirkwood mv6281 ARM @ 800 MHz, 1 Core, 128 MB Ram:

Nas> /volume1/Data/gohashcompare-v0.1-linux-arm
runtime: this CPU has no floating point hardware, so it cannot run this GOARM=6 binary. Recompile using GOARM=5.

I cannot compile on that NAS, I will run the test again, if needed. Could you supply me the arm5 binary?

I think we now have enough data to conclude that @calmh has made the right choice from the very beginning. Unless someone wants to implement Balke2b in ASM for x86 and ARM, I don’t think there is anything better we could use :smile:

Is there a way to close the topic?

This topic is now closed. Thanks everyone! :heart:

(I wasn’t being smart, it’s just a reasonable algorithm which as an effect has ended up being optimized in the standard library.)

Edit: Post removed since the C library relies on SSE2. Will reopen and post once I got a cross-arch package available.

I want to perform some further tests with blake2b implementation in C. This means that you need to compile the package for your platform. I am more or less interested in the benchmarks on weaker devices (especially ARM’s and RPI’s). This requires go and gcc

The instructions are as follows:

mkdir -p /tmp/src/github.com/AudriusButkevicius/blake2b-opt
cd /tmp/src/github.com/AudriusButkevicius/blake2b-opt
curl -L -O https://github.com/AudriusButkevicius/blake2b-opt/archive/master.tar.gz
tar -zxvf master.tar.gz --strip-components=1
./configure
cat /tmp/src/github.com/AudriusButkevicius/blake2b-opt/framework/include/asmopt.h > /tmp/benchmark.log
make install-lib util
go build blake2b.go
./bin/blake2b-util bench >> /tmp/benchmark.log 

mkdir -p /tmp/src/github.com/AudriusButkevicius/gohashcompare
cd /tmp/src/github.com/AudriusButkevicius/gohashcompare
curl -L -O https://github.com/AudriusButkevicius/gohashcompare/archive/master.tar.gz
tar -zxvf master.tar.gz --strip-components=1
export GOPATH=/tmp

go build --ldflags '-extldflags "-static"' ./main.go
./main >> /tmp/benchmark.log 

Afterwards, provide content of /tmp/benchmark.log here or ideally upload it to some pastebin. Furthermore, you can help us by providing the following file: /tmp/src/github.com/AudriusButkevicius/blake2b-opt/bin/blake2b.lib by uploading it somewhere, but please also provide the name of your device.

Here the benchmark.log and blake2b.lib from my raspberry pi (model B 512MB) and my netbook (Intel Atom N450, 64bit but pretty slow) http://alex-graf.de/public/st/benchmark.zip

Saving people the need to download it themselves:

RPI B 512MB:

Build: go1.4 linux-arm

Blake2b CGO 512     200	   8344897 ns/op  15.71 MB/s     320 B/op       2 allocs/op
SHA512		     30	  38897250 ns/op   3.37 MB/s      64 B/op       1 allocs/op
SHA256		    100	  22712797 ns/op   5.77 MB/s      32 B/op       1 allocs/op

Intel Atom N450:

Build: go1.4 linux-amd64

SHA256		     500   3335126 ns/op  39.30 MB/s      32 B/op       1 allocs/op
SHA512		    1000   2258643 ns/op  58.03 MB/s      64 B/op       1 allocs/op
Blake2b CGO 512-2   2000   1033368 ns/op 126.84 MB/s     320 B/op       2 allocs/op

Thanks for the benchmarks.

http://davidak.de/tmp/Benchmark.zip

Ubuntu 12.04 VM (64-Bit), 4x Intel Xeon E5520 @ 2.27GHz, 990 MB RAM

Build: go1.1.1 linux-amd64

SHA256-4	     500	   2438435 ns/op	  53.75 MB/s	      32 B/op	       1 allocs/op
SHA512-4	    1000	   1440279 ns/op	  91.00 MB/s	      64 B/op	       1 allocs/op
Blake2b CGO 512-4	    5000	    358430 ns/op	 365.68 MB/s	     386 B/op	       3 allocs/op

elementary OS (Ubuntu 14.04 based) VM (64-Bit) with 8x 2,6GHz (Intel Core i7), 4 GB RAM

Build: go1.2.1 linux-amd64

SHA256-8	    2000	   1206111 ns/op	 108.67 MB/s	      32 B/op	       1 allocs/op
SHA512-8	    2000	    817385 ns/op	 160.36 MB/s	      64 B/op	       1 allocs/op
Blake2b CGO 512-8	   10000	    228387 ns/op	 573.90 MB/s	     384 B/op	       3 allocs/op

elementary OS (Ubuntu 12.04 based) (32-Bit) with 2x 3,16GHz (Intel Core 2 Duo E8500), 4 GB RAM

Build: go1.1.1 linux-386

SHA256-2	    1000	   2646573 ns/op	  49.53 MB/s	      32 B/op	       1 allocs/op
SHA512-2	     500	   4801462 ns/op	  27.30 MB/s	      64 B/op	       1 allocs/op
Blake2b CGO 512-2	   10000	    307482 ns/op	 426.27 MB/s	     387 B/op	       3 allocs/op

i had to install build-essential and golang from PPA: https://stackoverflow.com/questions/17480044/how-to-install-the-current-version-of-go-in-ubuntu

Raspberry Pi B with 700MHz and 256MB RAM

Build: go1.4 linux-arm

SHA256	      50	  27734445 ns/op	   4.73 MB/s	      32 B/op	       1 allocs/op
SHA512	      30	  52249703 ns/op	   2.51 MB/s	      64 B/op	       1 allocs/op
Blake2b CGO 512	     100	  10274646 ns/op	  12.76 MB/s	     320 B/op	       2 allocs/op

i had to use an unnoficial build of go: http://dave.cheney.net/unofficial-arm-tarballs http://tip.golang.org/doc/install

@AudriusButkevicius What value is important? The MB/s?

Compare Hashing Performance.ods (19.4 KB)

We see that Blake2b is much faster than SHA256 on every platform.

I guess you can just do go run main.go to be honest.

Yes MB/s defines how fast we can crunch through the files.

9 posts were split to a new topic: Faster hashing package