Weak hash?
Correct, itâs the weakhash. Comment that out and Iâm back in business. I havenât figured out why yet. Itâs supposed to be faster, I think?
No, itâs slow. Very slow. Ouch.
func BenchmarkWeakHash(b *testing.B) {
const size = 16384
data := make([]byte, size)
hf := NewHash(protocol.BlockSize)
for i := 0; i < b.N; i++ {
hf.Write(data)
}
_ = hf.Sum32()
b.SetBytes(size)
}
BenchmarkWeakHash-8 5000 251595 ns/op 65.12 MB/s
I can make it about 20% faster by tweaking variables to help the compiler allocate registers, but thatâs still an order of magnitude too slow for it to be funâŚ
@wucke13 you could compare this build if you like; https://build.syncthing.net/job/syncthing-branch/357/artifact/
The hashing should obviously be faster, but I hope the rest of the behavior is identical (if you deploy it on the receiving side as well).
The displayed hashing perormance did not change a lot, little increase.
[ETMJ5] 16:36:30 INFO: Single thread hash performance is 159 MB/s using minio/sha256-simd (117 MB/s using crypto/sha256).
From a subjective view the Scanning goes faster. The guessed time to scan now dropped to ~ 14m 20s, which seems fine. Like half the time. Syncthing CPU Usage dropped to ~4%, which is half the usage from before. So, your change doubled the speed, and halfed the CPU usage on this scan. Looks like a pretty good improvement to me! When will it make itâs way in the official release?
EDIT: The jumping scan percentage seems to be reproducible by changing the scan interval during a running scan, but most of the time the scan started after changing interval takes over after a few seconds. I still donât know how to reproduce it so that both scans keep on going all the time?
Thanks, thatâs good data. Changing the scan interval indeed restarts the folder, which could lead to parallel scans (although itâs a bug).
Another thing about the sync speed: what speed should I get syncing a lot (80GB) of little files (2-10MB) via Gbit LAN? I never get over 30MiB/s and averaging around 20MiB/s doing that. It increases by around 5 MiB/s when setting both to 128 Pullers, but that is still less than half the speed which this connection is capable of. Normal behavior?
EDIT: that (128 pullers) causes uncomfortable spikes in CPU usage, even on the 12 core machine.
Try disabling weak hash from advanced options and increase pullers. Syncing many small files most of the overhead is managing the file, so 20-30MiB probably makes sense.
Yes. Those 300 Mbps or so are about what you can expect, given crypto and hashing and overhead and stuff.
diff --git a/lib/weakhash/benchmark_test.go b/lib/weakhash/benchmark_test.go
index 79f5ce9..2f0c2ba 100644
--- a/lib/weakhash/benchmark_test.go
+++ b/lib/weakhash/benchmark_test.go
@@ -9,9 +9,14 @@ package weakhash
import (
"os"
"testing"
+
+ "github.com/chmduquesne/rollinghash/adler32"
+ "github.com/chmduquesne/rollinghash/buzhash"
+ "github.com/chmduquesne/rollinghash/rabinkarp32"
)
const testFile = "../model/testdata/~syncthing~file.tmp"
+const size = 128 << 10
func BenchmarkFind1MFile(b *testing.B) {
b.ReportAllocs()
@@ -28,3 +33,51 @@ func BenchmarkFind1MFile(b *testing.B) {
fd.Close()
}
}
+
+func BenchmarkWeakHashOurs(b *testing.B) {
+ data := make([]byte, size)
+ hf := NewHash(size)
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashRabinKarp(b *testing.B) {
+ data := make([]byte, size)
+ hf := rabinkarp32.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashAdler32(b *testing.B) {
+ data := make([]byte, size)
+ hf := adler32.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashBuzHash(b *testing.B) {
+ data := make([]byte, size)
+ hf := buzhash.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
C:/go/bin/go.exe test -test.bench=.* [C:/gohome/src/github.com/syncthing/syncthing/lib/weakhash]
BenchmarkFind1MFile-4 20 64061805 ns/op 16.37 MB/s 6952658 B/op 917549 allocs/op
BenchmarkWeakHashOur-4 1000 2220122 ns/op 59.04 MB/s
BenchmarkWeakHashRabinKarp-4 3000 513630 ns/op 255.19 MB/s
BenchmarkWeakHashAdler32-4 10000 108073 ns/op 1212.81 MB/s
BenchmarkWeakHashBuzHash-4 3000 396835 ns/op 330.29 MB/s
PASS
ok github.com/syncthing/syncthing/lib/weakhash 7.768s
Should we switch to adler32?
Indeed we should. I think we should be able to do so directly and seamlessly as a weakhash mismatch doesnât really matter.
I did a
diff --git a/lib/scanner/blocks.go b/lib/scanner/blocks.go
index 89a5da6..073e0c5 100644
--- a/lib/scanner/blocks.go
+++ b/lib/scanner/blocks.go
@@ -27,6 +27,7 @@ func Blocks(r io.Reader, blocksize int, sizehint int64, counter Counter) ([]prot
hf := sha256.New()
hashLength := hf.Size()
whf := weakhash.NewHash(blocksize)
+ combinedHF := io.MultiWriter(hf, whf)
var blocks []protocol.BlockInfo
var hashes, thisHash []byte
@@ -46,7 +47,7 @@ func Blocks(r io.Reader, blocksize int, sizehint int64, counter Counter) ([]prot
var offset int64
for {
lr := io.LimitReader(r, int64(blocksize))
- n, err := io.CopyBuffer(hf, io.TeeReader(lr, whf), buf)
+ n, err := io.CopyBuffer(combinedHF, lr, buf)
if err != nil {
return nil, err
}
to get the right buffering towards the weakhash, it had a minor positive performance impact as well.
We might cause weak hash collisions by switching over, but that should be fine.
Yeah, exactly. Dgryski has another ten hash functions or so over here, but I havenât checked if any are rolling and fast: https://github.com/dgryski?utf8=â&tab=repositories&q=hash&type=&language=
diff --git a/lib/weakhash/benchmark_test.go b/lib/weakhash/benchmark_test.go
index 79f5ce9..1ab3cb6 100644
--- a/lib/weakhash/benchmark_test.go
+++ b/lib/weakhash/benchmark_test.go
@@ -9,9 +9,14 @@ package weakhash
import (
"os"
"testing"
+
+ "github.com/chmduquesne/rollinghash/adler32"
+ "github.com/chmduquesne/rollinghash/buzhash"
+ "github.com/chmduquesne/rollinghash/rabinkarp32"
)
const testFile = "../model/testdata/~syncthing~file.tmp"
+const size = 128 << 10
func BenchmarkFind1MFile(b *testing.B) {
b.ReportAllocs()
@@ -28,3 +33,163 @@ func BenchmarkFind1MFile(b *testing.B) {
fd.Close()
}
}
+
+func BenchmarkFind1MFileRabinKarp(b *testing.B) {
+ b.ReportAllocs()
+ b.SetBytes(1 << 20)
+ for i := 0; i < b.N; i++ {
+ fd, err := os.Open(testFile)
+ if err != nil {
+ b.Fatal(err)
+ }
+ _, err = FindCustom(fd, []uint32{0, 1, 2}, 128<<10, rabinkarp32.New)
+ if err != nil {
+ b.Fatal(err)
+ }
+ fd.Close()
+ }
+}
+
+func BenchmarkFind1MFileAdler32(b *testing.B) {
+ b.ReportAllocs()
+ b.SetBytes(1 << 20)
+ for i := 0; i < b.N; i++ {
+ fd, err := os.Open(testFile)
+ if err != nil {
+ b.Fatal(err)
+ }
+ _, err = FindCustom(fd, []uint32{0, 1, 2}, 128<<10, adler32.New)
+ if err != nil {
+ b.Fatal(err)
+ }
+ fd.Close()
+ }
+}
+
+func BenchmarkFind1MFileBuzHash(b *testing.B) {
+ b.ReportAllocs()
+ b.SetBytes(1 << 20)
+ for i := 0; i < b.N; i++ {
+ fd, err := os.Open(testFile)
+ if err != nil {
+ b.Fatal(err)
+ }
+ _, err = FindCustom(fd, []uint32{0, 1, 2}, 128<<10, buzhash.New)
+ if err != nil {
+ b.Fatal(err)
+ }
+ fd.Close()
+ }
+}
+
+func BenchmarkWeakHashOurs(b *testing.B) {
+ data := make([]byte, size)
+ hf := NewHash(size)
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashRabinKarp(b *testing.B) {
+ data := make([]byte, size)
+ hf := rabinkarp32.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashAdler32(b *testing.B) {
+ data := make([]byte, size)
+ hf := adler32.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashBuzHash(b *testing.B) {
+ data := make([]byte, size)
+ hf := buzhash.New()
+
+ for i := 0; i < b.N; i++ {
+ hf.Write(data)
+ }
+
+ _ = hf.Sum32()
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashOursRoll(b *testing.B) {
+ data := make([]byte, size)
+ hf := NewHash(size)
+ hf.Write(data)
+
+ b.ResetTimer()
+
+ for i := 0; i < b.N; i++ {
+ for i := 0; i <= size; i++ {
+ hf.Write([]byte{'a'})
+ }
+ }
+
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashRabinKarpRoll(b *testing.B) {
+ data := make([]byte, size)
+ hf := rabinkarp32.New()
+ hf.Write(data)
+
+ b.ResetTimer()
+
+ for i := 0; i < b.N; i++ {
+ for i := 0; i <= size; i++ {
+ hf.Roll('a')
+ }
+ }
+
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashAdler32Roll(b *testing.B) {
+ data := make([]byte, size)
+ hf := adler32.New()
+ hf.Write(data)
+
+ b.ResetTimer()
+
+ for i := 0; i < b.N; i++ {
+ for i := 0; i <= size; i++ {
+ hf.Roll('a')
+ }
+ }
+
+ b.SetBytes(size)
+}
+
+func BenchmarkWeakHashBuzHashRoll(b *testing.B) {
+ data := make([]byte, size)
+ hf := buzhash.New()
+ hf.Write(data)
+
+ b.ResetTimer()
+
+ for i := 0; i < b.N; i++ {
+ for i := 0; i <= size; i++ {
+ hf.Roll('a')
+ }
+ }
+
+ b.SetBytes(size)
+}
diff --git a/lib/weakhash/weakhash.go b/lib/weakhash/weakhash.go
index 5c210a5..9411b54 100644
--- a/lib/weakhash/weakhash.go
+++ b/lib/weakhash/weakhash.go
@@ -11,6 +11,8 @@ import (
"hash"
"io"
"os"
+
+ "github.com/chmduquesne/rollinghash"
)
const (
@@ -71,6 +73,53 @@ func Find(ir io.Reader, hashesToFind []uint32, size int) (map[uint32][]int64, er
return offsets, nil
}
+// Find finds all the blocks of the given size within io.Reader that matches
+// the hashes provided, and returns a hash -> slice of offsets within reader
+// map, that produces the same weak hash.
+func FindCustom(ir io.Reader, hashesToFind []uint32, size int, hashMaker func() rollinghash.Hash32) (map[uint32][]int64, error) {
+ if ir == nil {
+ return nil, nil
+ }
+
+ r := bufio.NewReader(ir)
+ hf := hashMaker()
+
+ n, err := io.CopyN(hf, r, int64(size))
+ if err == io.EOF {
+ return nil, nil
+ }
+ if err != nil {
+ return nil, err
+ }
+ if n != int64(size) {
+ return nil, io.ErrShortBuffer
+ }
+
+ offsets := make(map[uint32][]int64)
+ for _, hashToFind := range hashesToFind {
+ offsets[hashToFind] = nil
+ }
+
+ var i int64
+ var hash uint32
+ for {
+ hash = hf.Sum32()
+ if existing, ok := offsets[hash]; ok {
+ offsets[hash] = append(existing, i)
+ }
+ i++
+
+ bt, err := r.ReadByte()
+ if err == io.EOF {
+ break
+ } else if err != nil {
+ return offsets, err
+ }
+ hf.Roll(bt)
+ }
+ return offsets, nil
+}
+
BenchmarkFind1MFile-4 20 59880225 ns/op 17.51 MB/s 6952674 B/op 917549 allocs/op
BenchmarkFind1MFileRabinKarp-4 50 28291430 ns/op 37.06 MB/s 168810 B/op 18 allocs/op
BenchmarkFind1MFileAdler32-4 50 33253970 ns/op 31.53 MB/s 168816 B/op 21 allocs/op
BenchmarkFind1MFileBuzHash-4 30 43864000 ns/op 23.91 MB/s 14800502 B/op 52 allocs/op
BenchmarkWeakHashOurs-4 1000 2108324 ns/op 62.17 MB/s
BenchmarkWeakHashRabinKarp-4 3000 468214 ns/op 279.94 MB/s
BenchmarkWeakHashAdler32-4 20000 89862 ns/op 1458.58 MB/s
BenchmarkWeakHashBuzHash-4 5000 368838 ns/op 355.36 MB/s
BenchmarkWeakHashOursRoll-4 300 4396461 ns/op 29.81 MB/s
BenchmarkWeakHashRabinKarpRoll-4 2000 1029349 ns/op 127.33 MB/s
BenchmarkWeakHashAdler32Roll-4 1000 1489260 ns/op 88.01 MB/s
BenchmarkWeakHashBuzHashRoll-4 1000 1154916 ns/op 113.49 MB/s
PASS
ok github.com/syncthing/syncthing/lib/weakhash 21.007s
But the actual roll is quite shit overall
Yeah. Anything we need to do byte by byte for an entire file is going to suck. But that part of it can at least be switched off by configuration, as opposed to what we do per block in the scanner.
If we wanted to go the complicated route we could add some of the heuristics we talked about when introducing the weakhash - only do the rolling if we see a lot of blocks that differ after a certain point in the file. That would avoid us weakhashing a 75 gig file at 88 MB/s just to see if we can find the one 128 KiB block we are looking for instead of transferring it over the network in like a millisecondâŚ
I think we could optimize it a fair bit by hooking into the digest directly as well, for the rolling case. Weâre doing the same amount of work really, itâs just that we have a bunch of function calls for every byte hashed. If we instead passed the entire block to Write
(or a function like it) and let it return 128k weak hashes in one pass it would probably be about the same speed.
I actually also encountered doubled scanning progress a few days ago when I changed .stignore mid-scan. I assumed (apparently incorrectly) that this was expected behaviour, how ever weird (which is par for the course with .stignore).
Hi! Hopefully, this helps; where I work, a few years ago, we needed to hash a LOT of data, and we spent some time into investigating which algo would work best for us (I wasnât involved in this myself, so I donât know exactly if our needs are similair to Syncthingâs needs.)
BUT: We ended up using Murmurhash; I just checked, there is a Go implementation of it. I googled around a bit to find some benchmarks which compare adler32 to murmurhash, and found one (which was comparing Java implementations, but still):
http://jpountz.github.io/lz4-java/1.3.0/xxhash-benchmark/
Which indicated that murmur(2) is faster (at least in java) than adler32.
BUT: it also pointed to xxHash being even faster.
It also has a go implementation, and itâs webpage is here:
http://cyan4973.github.io/xxHash/
Hope this helps!
With 0.14.19 we are back in business
25,8 GiB vdi file, Core i5-4590S (âsha256Perfâ: 361.6), VM and ST DB on System SSD (960 EVO NVMe).
Single thread hash performance is 385 MB/s using crypto/sha256 (379 MB/s using minio/sha256-simd).
Scan time: ~1:40 Scan Speed: 271 MB/s
Still ~100 MB/s less then the test on startup, but a huge improvement over the 47 MB/s
There is some overhead that brings down the scan speed a bit from the theoretical.
One we can calculate, although itâs not displayed in the startup or GUI: the weak hash checksum. Each block is hashed by SHA256 (385 MB/s above) and then the weak hash (~1200 MB/s on my computer). The resulting theoretical hash speed is 1/(1/385 + 1/1200) = 291 MB/s. We could potentially do these two in parallel, in the futureâŚ
Then there is some memory allocation, shuffling, waiting for disk/SSD etc that I think accounts for the rest.
I got arround 15:40 for 75 GB, which is arround 80 MB/s. More or less doubled the speed for me. Note the printed hashing speed is still what it was in 0.14.18-1:
INFO: Single thread hash performance is 156 MB/s using minio/sha256-simd (117 MB/s using crypto/sha256).
As already mentioned by @wweich, thatâs a pretty good improvement. Still, 80MB/s is about 10% less then half of the speed my disk is able to deliver. Is there a way to let at least 2 threads work on a single file, maybe by letting them run from the top to bottom and vice versa, meeting in the middle?