Node initialization bandwidth optimization

zukoo · February 26, 2015, 9:20am

Hi everyone,

I have installed Syncthing 2 weeks ago and I’m very happy with it.

Now that I have a few data and nodes connected (280GB / 7 nodes) I’ve noticed that turning on a node cause it to download and send about 15Mo of data in the first minute (with only 1 node online).

This might note be a lot compared to 280GB but my files are mostly years of photos which almost never change (and small changes when it does) which means that during the year i exchange less than 60MB in average per day. However I turn my laptop on & off more than once a day. That bring me to transfer more indexes than my actual data.

From what i read about the protocol, the implementation must exchange the full index of directories shared with the device?

If I’m correct here maybe it would be better --if possible-- to either:

send a hash of the index which might or not trigger full download
a diff of the index since the device was last sync maybe still with a hash of the full index to avoid corruption

What do you think? Anyway, thanks for the good work.

calmh · February 26, 2015, 9:31am

I think this is

I’ve almost implemented it twice, but there are some things to sort out that I get stuck on before I get distracted…

It’s coming.

cydron · February 26, 2015, 4:07pm

We should brainstorm and work on some cool new features based on what people generally want… then it will become better.

Anyway, I may start doing bandwidth optimization, but it seems to make most sense to me there to do the reference implementation in Java or C#/C++ first. Changing the data structures of of the initialization handshake would help… But this is hard stuff. This is why I think it’s so cool.

I think we need to put in our own sort of “DHT” of the files on the system that can be easily serialized to disk and to the network. We also need to have the state of the files on the local machine shared with the other machines, which triggers the update as mentioned.

Also perhaps we store the same descriptors of one’s peers as well. So you have LocalState_Machine1, LocalState_Machine2, as objects. GlobalState is the union of these objects.

My personal inclination is this… Store the descriptor of the filesystem you are mirroring locally in RAM, a RAMdisk, a networked encrypted cache, a local disk cache, distributed hash table, whatever – store it OUTSIDE the FS we are monitoring – and also store it internally as an object while the program is running. Save this serialized object on program close to disk regularly in serialized form.

This way, on startup, you’ve got ‘where we left off’ in terms of who has what (files) before we even begin transmitting packets!

FASTEST PROTOCOLS TO TRANSMIT DATA? UDP UDT TSUNAMI UDP UFTP FDT

So we need to ditch TCP if we can avoid it. It works fine with one stream, but if we want multiple parallel streams it’s way too much overhead in terms of latency etc. UDP with a checksum and an ACK is fine for our purposes.

If the packet fails, we can simply move that chunk to the top of the stack ad request a retransmit. There are even libraries tha will do all this for you – monitor throttling and so forth for UDP. Oh and UDP can take multiple paths thru different routers so it’s inherently parallell.

Another HUGE issue is we need to change the way data is exchanged My personal opinion is that we need to switch over to UDP, but that is a big undertaking. In fact, I’d switch to UDP (like bittorrent is using) , ditch the TLS entirely, use an open source flow-control library, and then roll my own encryption using EC-DSA and not RSA.

But I know there is this animosity towards bittorent here, but my take wouuld be ‘yeah fine, let’s write a better protocol’ , but I think it’d be really neat if I could sync all my files to my machines and encrypted to Google Drive… Then with one click, I can select an album I recorded and publish it to bittorrent by the DHT system.

I really think we ought to take advantage of existing technologies as long as they are decent… A distributed hash table is good idea, especially if people want to share data outside their own clusters … ie giving someone on facebook a link to your data store, etc.

But that’s up for discussion and would be long term. Short term, for speed increase (ad the ‘bittorrent-like’ ablity for the network to speed up as you add more machines)…

We need three big things for that to happen…

UDP Protocol for sending data chunks. Multiple parallel streams. Up to 10-30 pckets ‘in transit’ at once, easy.

think we can learn alot here SRTP protocool…

REDUCE UDP packet sizes. The max sze of UDP packets is 64k anyway. Otherwise they will just get fragmented on the Tier1 backbone … Also select UDP pack size based on real world testing.
Use a UDP library designed for dealing with congestion , so it throttles up and down. We can still use TCP as the negotiation channel for TLS if we want, but I think the packets should be UDP if possible.

4… Multithreading… Not only the GU needs t’s own thread… Or the socket code… The app will need tons of threads. All network I/O should be non-blocking, and instead should fire off when data is received or at predetermined intervals. In terms of writing data to files on the disk, we definitely need to consider always calling flush() and fsync() in a nonblocking manner as well.

Utilize in-memory disk model as a hash table… Check disk for changes and if so, update the hash table.
Current ‘version’ of files should NOT just be Folder and Name. At the very least it should also include the length and some sort of hash. It can be something like File_version = SHA2_256(File_Name + File_Data + Folder_Name + Lamport_Clock).

7… Anytime a host confirms reception of a block, immediately share it with authorized peers if they don’t already hve that block. Share the block in parallel (threaded), and do it PRIOR to flushing the block out to the disk.

Trap filesystem events at the kernel level if possible.

Minimaly, we need to lower the ‘chunk’ size dramatically, but have it scale with the file. Chunk size is not as much of a factor over TCP, but it’s critical for UDP. Both are important because of network MTU , which varies, causing IP fragmentation in transit over WAN.

Files up of size up to about 128k or so should be transferred all at once… Especially if they re part of the protocol control metadata.

But larger files MUST have the chunk size lowered, when we get into GB. And we need to draw inspiration from the bittorent block mechanism, where each machine know what files it must find missing blocks for.

For starters, we can just have the UDP threads grab missing blocks… But ideally, we want the hosts to exchange information on the rarity of a particular file’s individual block… Then they should be fetching the rarest blocks first.

Also, each block needs to have it’s own (AT LEAST) message digest like SHA128 or whatever , so we know if it arrived intact and untampered.

Preferably, each block has an HMAC (extremely fast and secue) which we derived during decryption o f the data and stored in a local hash table or RAMdisk.

Having the hashes of all the file chunks – expected vs reality – will safe immense amounts of time, and will prevent corrupted files. If a block gets corrupted, we can autodetect and have only that block retransmitted. We rescan the file, and then we can say we’re in sync (rather than doing full gui rescan all the time).

So less filesystem monitoring and more asynchronous parallel multithreading.

Anyway what I’ve written here is alot of work I know, and I don’t propose to say someone else do it, because it’s not easy. But I’m confident this will speed up the transfer speeds dramatically.

By the way, are any of the active developers here located in San Francisco? I’d also like to chat over email, Skype, or the phone with any of the people who’ve contributed to this project.

Again, great job, and amazing work.

-Alex

calmh · February 26, 2015, 4:24pm

You have good ideas here, but a lot of what you talk about is either already how it works, almost implemented, planned, or not hugely relevant. I’d suggest digging into the code and getting familiar with it.

As far as I know, the main contributors are usually all in the old world, unfortunately.

A C# implementation would be awesome, as a road to getting a better Windows experience. Otherwise there are ongoing works on both Java and Swift implementations, and there exists at least one closed source implementation for iOS (that I assume is Swift) that may get opened in the future…

AudriusButkevicius · February 26, 2015, 4:42pm

It’s actually dynamic library written in Rust with a Swift GUI on top using it.

calmh · February 26, 2015, 4:45pm

Oh! Nice, cool, even better. .)

cydron · February 26, 2015, 4:54pm

I think you are the main developer … Hi! Great to meet you, and really impressive work on the code. The reason I’m here brainstorming ideas is because I’m excited about what’s been done so far and the potential.

So mostly I’m thinking out loud here – I agree that a reasonable fraction of what I’ve written is completed or near complete. I’ve been slowly going through the code, and I think porting a Java client will answer alot of questions.

But there’s a few things that I reallly really want to fix (if just for selfish reasons) so my own setup works… I need P2P file sync to interface with Google Drive… And I need it to interface to my remote servers. So I get to pick the filesystem, transport protocol, run benchmarks, etc.

But two big issues that stood out when I was testing the other day… Bit-Sync was much faster under identical conditions. That was weird because they also use similar grade encryption. So for Syncthing (1) I noticed that transfer speeds were too slow, then (2) My Syncthing peers weren’t sharing very many packets, if any, with one another… and (3) I got repeated crashes upon interaction with filesystems with any sort of latency or that didn’t have local hw cache (like Amazon S3, Amazon Glacier, Dropbox, etc).

Oh nd (4) During these error states, I had major issues re-sync’ing stuff whch required deleting and starting over. So I want to build my own Java reference GUI to try out some patches , fixes, new features, etc.

So the above isssues are what sparked my interest in the project. But I’m sure I’ll get familiar with the code if I port it to Java. I just wnt to get a feel for the goals and direction etc.

For example, I’m trying to understand the goals and see if they align. One of my goals is to have an open-source client like BitSync or BitCasa that I can mount as a network drive and instantantanously stream an AVI off cheap clcud sources.

I want the same protocol to be able to do my weekly backups to dropbox, or to take an image of a partition and upload i to Amazon glacier. If this sort of stuff sounds cool to anyway, please send me message so we can chat.

Back to the commens…

I did notice there appears to be some sort of DHT mechanism , but there is no way for me to view it through the GUI.

Also, so you’re saying that we’re already using UDP? Because I think we can substantially improve the speed… so much so, that I think we can even mount the device as a real-time file system.

Also – in terms of UI, I think something like a shared secret authentication / key agreement protocol makes more sense so you do’t have to keep cutting and pasting between the web browsers. I know many people find this difficult to set up.

I think easiest usability would be SRP or EJAKE w/ Elliptic Curves. Then you just have the user login to each server, hit start, and the password isused to establish a random shared session key.

Symmetric encryption could occur on a per-key level using AES-CBC+IV… I think a prefable solution wouuld actually be brand new strong stream cipher in CTR mode , which would actually be faster than AES and have the benefts of an HMAC. On top of that, add the fact that you derive your “user password” key from a combination o user input , IV, through scrypt password expander, then you’ve go ti made.

Low overhead from the TLS on your TCP control channel, Low overhead from your strem cipher on your UDP data channels.

Can you send me your email / phone to my personal email? I’d like to chat to get a feel for the direction.

Sincerely, Alex

calmh · February 26, 2015, 4:59pm

Currently no, but there’s a branch for it. However that’s not a speed thing, that’s a NAT thing. I’d be amazed if the speed difference was as large as you say.

My email is on each of my commits and on my github profile. But I much prefer keeping things out here – I’m not great reading or responding to email.

But before any of that, you mentioned multiple security issues you found. I’d like to get those filed as soon as possible, or if you think they deserve to be confidential, reported as per the instructions on http://syncthing.net/security.html, please…

AudriusButkevicius · February 26, 2015, 5:57pm

@cydron given you are here, and you seem to understand about crypto, I have a question:

I am looking for a encryption schema which is something like AES ECB so that it always produces same size of output for a given set of input rounded to the block boundary, but which would also support separate keys for encryption and decryption.

The reason it has to be a block cipher is because I need to be able to read an arbitrary block from within a file still be able to encrypt it and decrypt it without having to re-encrypt the whole file.

Any suggestions?

cydron · February 26, 2015, 6:28pm

OKAY important part first… You are fine. But Bit-Sync is full of holes. That’s the short answer.

For Syncthing, i don’t remember what the security issue was. I have to go back and check. I don’t think it was anything major. One thing I remember was that you say you are using the hash of your certificate ,but you actually should be using the cert ID. Also, , you need to re-key the session key after a certain amount of data, or after we start retransmitting the same data over gain (a new IV helps as a workaround) if you are in CBC mode.

But your issues were just minor compared to Bit-Sync. First of all, they don’t have PFS (perfect forward secrecy). Neither do you unless you are using latest version of TLS nd everyone has latest edition of their browsers… But you are ready to implement PFS tonight.

So no PFS for BittSync. The big problem , if I recall , was Bit-Sync got caught read-handed sending user’s personal data to their own private web services that were on their hosted ‘tracker’. It wasn’t just advertising files…

BySync was(apparntly)…

-It was uploding hashes of your folders regardless (even if they are private?) -It was leaking IPs to the relay and tracker during JSON GETs. -Not safe against DOS attacks thru the DHT/relays via spamming of fake peers -Leaked ALL your internal interface IPs to host relay by accident, even if they weren’t configured to share.

They are using the latest SHA-3 winner for their Android client

Bit Sync Android app does tons of data collection via web service endpoints unrelated to the service getting calls. Many are unencryppted http calls.
The GetSync server doesn’t use crypto for sharing its directory… All the hashes are in the clear as GET params.
For awhile, their IP leakage occurred with DNS lookup, so it actually dumped private folder hashes during DNS lookup of a host that was on another interfae.
At one point you could get the secret keys through just two quick URL web service calls to their php and json code.
Several buffer overflows discovered ; More data leakage thru bootstrap traffic
Client listens on all TCP and most UDP nterfaces rather than sections
Crypto specs sucked.
Bt-sync secret keys were a joke… Same key for alll keys just prefixed with A, B, C, or D, etc… keys were re-used for same traffic. No dynamic or manual re-keying of internal keys.
Data was visible through a bt-sync web site without any api key. You could get the master secret key from a folder key.
BT Sync infrastructure is on both AWS EC2 and Utorrent domains.
BT Sync database stored as local SqlLite file unencrypte.

Okay, well it’s a combination of problems I’m sure. Because I’ve noticed some academic paperrs say bittorrent block size is up to 512kb or more, but the protocol spec says 16kb to 23kb tops. So that needs to be checked with Wireshark . http://2014.hackitoergosum.org/bittorrentsync-security-privacy-analysis-hackito-session-results/

I will review your security code again ASAP. I think you are fine, if I remember the bug I found was minor.

What I meant regarding traffic is this – that because we don’t need alot of features of TCP , we will gain speed. We don’t need all the extra packets, and it doesn’t matter if data arrives out of order. Since we are dealing with a p2p ‘block’ file system, p2p will immediately shed like 10% to 30% overhead just from getting rid of TCP stateful connection and the traffic.

Not only that, but TCP takes up more process / cpu tme, file descriptors, etc. It’s more overhead – but we aren’t viewing a web page. All that needs to happen is that we send the data in chunks . If it arrives out of order, thatt was going to happen anyway. If it’s corrupted, well we need to detect that anyway because we need a crc, hash, or mac of each chunk to error checking. And if the chunk never arrives, it will just be queued for re-transmit.

On top of that, the design of your protocol is very similar to RDP or early NFS. It looks like an RFC. It just seems to be UDP protocol… PLUS all the fastest transfer protocols in the research are UDP… all the p2p file sharing protocols are UDP.

It just eeems to make sense. If it’s that big a deal, the next best thing is to run some experiments with # of tcp sockets, packet size, vs throughput.

I may use TCP to understand since it’s easier to debug in wireshark… But UDP is the way to go for speed and parallelization I think.

NOW: The NAT thing you mentioned is a really big problem. The biggest problem, in fact. Luckily, most home networks or consumer equipment can be fixed with NAT hole punching. Kind of like how how Bit-Sync uses the DHT to learn all the external IPs of the machines…

You can use UPnP along with NAT hole punching to set up direct UDP to UDP interconnects. The UPnP will give you the external IP and posibly available ports if it’s running on the default gateway. NAT hole punching simply requires an outbound UDP connection using the SIP protocol to friendly server.

Let’s say we make the public Syncthing servers also act to help connect networks. So I connect out to the server from behind the NAT, ad the firewell lets me. Somehow I tel the server tat I need to connect to Target B via UDB either through a UDP message or a TCP side channel)… SIP protocol is compatible with TCP, UDP, whatever.

This helps the initiator and the target agree on their respective IP addresses … and mroe importantly, source and destination port addresses. Once connected by SIP, the UDP channel now goes strraight thru the NAT on both sides.

TONS of stuff uses this . Skype uses it. P2P networks use it. GSM cell networks. Mobile VoiP. Streaming Video. Signalling Transport. Even some cloud drive and remote desktop vpn apps use UDP hole punching for NAT…

I think this is not the most important feature, but I think it makes sense to test iit.

More importantly, I’d like to get the chunk sharing working over a TCP cluster with all the existing setup.

calmh · February 26, 2015, 7:02pm

Ok, cool. The device ID is a teensy bit more involved, but not much. We check the hash of the certificate, and the certificate common name. I’m not sure what the cert ID you refer to is in this case, but I think it’s usually the hash of it? Rekeying could theoretically be a good thing. The Go TLS implementation doesn’t do it, and we use AES in GCM mode so I think we’re cool anyway though. At least for all practical purposes…

Oh no, we’ve done perfect forward secrecy since the initial release day. The GUI may not, depending on the browser as you say, but all sync connections are locked down to TLS 1.2 and strong cipher suites.

I had no idea, but it smelled bad from the beginning, hence this project.

It does though. Well, currently it does since we talk TLS on top of it, and TLS requires that. But even if we invent something of our own on top of single datagrams we still need to take care of retransmissions, timeouts, windowing, congestion control etc and in practice it doesn’t end up being that much more simple. And it ends up happening in user space instead of in the kernel which is a performance penalty.

I implemented GitHub - calmh/dst: The Datagram Stream Transfer protocol for this purpose, in the same vein as UDT, uTP etc. In the end it’s just a slower, crappier TCP, and the syscall latency per packet ends up being a severe performance limiter, but it does enable better NAT busting. We use UPnP successfully today to get TCP through, but that breaks down when UPnP isn’t possible.

But TCP as such is very rarely the bottleneck for syncthing. We do so much expensive crap on top of it that it just disappears into the background, unless you are on a lossy high latency link of some kind, which is somewhat of a corner case.

uok · February 26, 2015, 7:39pm

… that’s a long list of “bugs” and Wikipedia says

that’s a troubling combination

generalmanager · February 27, 2015, 4:18pm

@cydron Rolling your own crypto is absolutely not a good idea and as @calmh already mentioned, Syncthing had PFS from the beginning. Why not choose something that has been used in the wild and reviewed for use with UDP like DTLS ( https://en.wikipedia.org/wiki/Datagram_Transport_Layer_Security ), which already has a Go implementation?

Sure, there were (and are) some problems with this (see http://blog.cryptographyengineering.com/2012/01/attack-of-week-datagram-tls.html for a nice writeup), but inventing your own thing is going to be a lot worse, if you don’t happen to be a very knowledgable cryptographer with a ton of implentation experience. And even then, there should be a team of similarly skilled cryptographers who pick it apart. And even then things can and have gone wrong.

QUIC (https://en.wikipedia.org/wiki/QUIC) would probably be a lot better, but AFAIK there is no golang implementation yet.

Abusing ZRTP to do the job is probably not the best idea, as explained in the comments of Matthew Greens blog post I linked to above.

Thus we could either use DTLS and possibly have to do it all again after a QUIC implementation in golang comes around or wait until it does and spare us the effort.

One other alternative for the time beeing could be spipe and spiped, which are used in the probably most secure backup system out there: tarsnap. Oh and the guy who builds it also invented scrypt, just to use it for tarsnap. https://www.tarsnap.com/spiped.html It seems to be very secure, well reviewed and fast, but it’s written in C…

Edit: A sidenote: Why would you use ECDSA instead of Ed25519 in a new implementation? I am not that concerned about manipulated curves, but the other problems of ECDSA. For more info see http://ed25519.cr.yp.to and http://safecurves.cr.yp.to

Harlok · February 27, 2015, 6:25pm

I strongly agree with that ! (A single LIKE wouldn’t be enough.)

AudriusButkevicius · February 27, 2015, 9:25pm

Can you link to the implementation of DTLS?

generalmanager · February 27, 2015, 9:54pm

@AudriusButkevicius sure, this is what I found:

With the documentation on godoc:

There also seems to be a very early implementation in the boringssl repo: https://boringssl.googlesource.com/boringssl/+/master/ssl/test/runner/dtls.go

AudriusButkevicius · February 27, 2015, 10:02pm

It’s seems to be unstable: https://github.com/cfromknecht/dtls/issues/1 Neither is the other thing you’ve linked.

generalmanager · February 27, 2015, 10:34pm

Oh yeah, sorry about that. I did notice that the google implenentation still needed some significant work (which I mentioned), but I assumed that the first one was finished, because there was no mention of the opposite in the description.

But I am sure at least the boringssl one will get some attention in the next months.

And to be honest: UDP and especially QUIC would certainly bring some significant improvements, but getting the delta-updates for the index running should be the top priority in this regard.

Other features like the rewrite of the ignore logic to allow mobile clients which only load files on request are also probably more important for usability (and therefore adoption).

In short: I would be stoked to see a very efficient TLS-free protocol, but I don’t think we should invest too much time and effort now, if big improvements like QUIC are probably just a few months away.

cydron · February 27, 2015, 11:17pm

**Sorry keyboard got coffee in it… Will fix shortly.

** I also will post TONS of academic reearch that explains the options we can take for making an arbitrary number of nodes sync their filesystems. There’s a wholes spectrum of solutons, from modifying the existing protocol to include better object serialization and CRC for every file with file locking via software or FUSE, or we can get as ridiculous as replacing iit with something like Hadoop. (the latter is definitely overkill).

We need to decide how many simulatneous nodes we want to support based on needs ndrequestsof people on here. I’m thnking about 8 to 16 nodes overal tops in paralellel, btut wecold addt a DHT for morede advanced features,

Noo, I was mistaken on that first issue. You are correct. You should be using the IDofthat hash of your X509 cert as a node identifier. Just a sde note… don’t migrate this systemonto the normal commercial CA PKI (ike Verisign).

It’s actuallly more secure either (1) don’t use aCA, or (2) make the CA the syncthing discovery server with a self-signed cert. Then you stay out of the PKI with all the rootc cert baackdoors.

Pick a Cipher, HMAC, and Key Exchange Protocol that will yield PPFC. Key agreement is fine win most caes. You’ve got session key that’s good foratof gigs, perhaps up to a terabyte or more .

Second part… Here, I figured it out… If We are probably okay for TCP for now. It’s good enought to test to see if t’s al working. I found out the ‘big’ pBitTorrent ackets being sent around are ALL in TCP. The little packets are all modiified UDP where they send back ACK packets. Using the UDP swarming for data transfer results in overall speed improvement of improve speed about 40% to 60$.

. WE CANT just useUDP pckes out of the box because t willbein the cler. Well here’s a great solution!!! Generate a a random key for CHACHA20 on eaach ofthe servers. Keyup a a ChaCha stream cipher using OpenSSL or whatever… Excange the keys over the TCP TLS channel!! Then it’s still PFS, , assumingwe never re-use the keys for the UDP side channel. So Then start up the IV with the stream cipher, and do your large file transmit, using ChaCha to enrpt each byte in the packet. The Key is formd by K=(Kh + IVh + CTR). Thus they are stream ciphers running in CTR mode with an IV for good measure. This tacttic was publishedin practical crypto. R good 64 byte IV on each of the two node s about to do UDP burst. S

But you’re right…even if we leave the existing TCP/TLS alone, and just fix the file store, add file locking , import astandard versionof the BT DHT cache etc… Fine… But twe might end up having to use our own custom FUSE modules over afriendlyfilesystem like XFS… that ay we can gettheindex of blocks quicklyanddiscardthe data… Alternately, we canstore thfile data in thedistributed hash tableperhaps. Next steps for me aretest TP vs UDP for file transfers. ie. NFS TCP vs NFS UDP. SFTP TCP vs SFTP UDP etc. Then I’ll try the same test with the "super UDP software.

Also working in the meantme on my java code. Started stubbing out the GUI. I’m sure Ill have lots of questions.

So generally agree, let’s not ‘roll our own’ protocol for TCP… Lets try to us the off the shelf produt for UDP unless its too much of a speed hiit… if so, then we use fast implemetation of CHACHA steam cipher with IV … but tht will depend on whether UDP is really 50%faster. If it is, we can try TLS/UDP … if that’s crap, wecan try CHACHA20 , because it’s faster than AES but stronger. Anyway, don’t worry about the speed over TCP. We keep the TLS connection going as a low bandwidth control protocol, snce it hs al the ccrypto stuf fbuilt in, and that’s a pain to debug and test.

If we aadd UDP (and I think we should eventually…but I am doing benchmark this weekend to answer our scientific question for good… ) to see iAnother note: UDP rapid swarm ismore efficient as you scale out from like 4 TCP peers to 1like 0 to 40 peers if there is high output UDP . I suspect there will be,especially using UTP