Syncthing performance issue with remote directory

Hello there, I am running Syncthing on a node “Server” which use a remote directory mounted over the network with the sshfs protocol. Here is the network:

#########                      ##################
#   PC  #  -----Syncthing----- # Server         #
#########                      #----------------#                ###############
                               # repository dir ------SSHFS------# Data Server #
                               ##################                ###############

Synchting works fine between the “PC” and the “Server” with classics repositories (directories are located in the server).

Syncing goes wrong with directories and files located on a remote server. It takes a while to scan and to do the work.

I share two repository with the same data, the first use the remote directory and the second does not.

It takes around 3 seconds to scan the first repo (20k files) while it takes around 4 minutes to scan the remote dir. If there’s data to propagate, it takes hours.

(Gwan is “server”, TOMMAINPC is “PC”, “/home/tlatch” is the remote directory)

Here is a benchmark from the “Server” to the “Data Server” through the sshfs. It basically overload my ADSL connection.

Syncthing seems to sleep with the repo because of the sshfs remote directory; the sshfs connection is almost unused (less than 5ko/s in both download/upload).

I have done a STTRACE=beacon,discover,files,net,model,scanner,upnp log but i didn’t notice anything wrong. sctrace.log (569.3 KB)

What i’ve done during logging in this order : -Changing a file from PC on repo Bench_test and on repo Bench_tes2 to compare. -Removing some stuff from Mics from PC (is also a remote directory with SSHFS ( smaller)).

Is there something wrong with Syncthing on that network architecture?

I don’t understand, really…

The graphs show a lot of traffic, and you speak of overload, but there’s no sshfs traffic?

The sshfs mount will be slow (especially over ADSL!), so scanning and syncing that repo will be no faster…?

I guess he is speaking of overload via pure sshfs.

Plus we are opening and closing files on every read when providing data to someone, which means 3 packets each way per 128kb block.

Graph shows only sshfs traffic yes. The purpose was to show that the sshfs mount is fast enough for syncing files. But it is not with syncthing, which does not use the (sshfs) bandwitch at all.

What I don’t get is : why would reading (opening, closing file so) not be as fast as it can be ?

Because on Windows the file would be locked until closed.

Plus, this also helps us escape situations where we are reading from a file which is now deleted (due to how deletion on Linux works)

Use NFS. NFS has kernel level caching with read aheads etc which might help in this case.

This is between 2 Linux, does it keep the same behavior? I will try to set a NFS to compare. Even if really harder to route all those port. I’ll append the result there if useful to know.

Well the devices do not matter, the implementation does, and I’ve explained why I believe it’s implemented like that.

I have encountered a similar bug – not only with Syncthing but with the latest version of Bit-Sync as well!. I really think Syncthing could give Bit-Sync a run for it’s money but that’s a different story…

First, I think it Syncthing is an AMAZING idea. Respect to everyone who created and worked on it, and who re involved. . . I admire the hard work and initiative.

Syncthing is buggy but reasonably usable at this point. Most of the bugs are not critical, which is fortunate. Performance (and UIX design), however, remains a major issue. . . . But other than a few things, it’s an impressive project.

The only thing that bothers me is that Syncthing is just simply as not as fast as Bit-Sync right now, and unfortunately I think that is a protocol issue. Eventually we’ll neeed BEPv2… But I think we can make it be as faster than any file sharing protocol to date. And better overall.

One quick suggestion I have is (1) introduce an adaptable data block size to Syncthing – which has a fixed 128 KiB block size… Bittorrent is way faster, and part of that is due to it’s dynamic block size, and it’s ratio to the file.

. Keep in mind that Bit-torrent adapts the block size (chunk) size to the size of the file being transferred… the block size should be about 1000 to 2000 times smaller than the file being transferred according to the academic literature and Bittorrent experience…

Anyway, I’ll save the protocol for another thread.

Here is one such issue I’ve identified that fits the subject and may be related… This is the setup… Local computer and two Amazon EC2 instances running Ubuntu 12.04 LTS.

**Local Machine: Windows 7 , Master, Regular Directory Remote Machine: Amazon EC2 Instance #1 running WebDav & Syncthing (Syncthing download directory set to WebDev networked mount)… The networked mount is connected to EC2 instance #2 running Bitnami ownCloud … This runs a pass-through WebDav enabled which sync up with a paid 1TB Dropbox.com account.

My setup worked mostly fine between my desktop and EC2 with local storage, with a few minor or moderate bugs mostly. But once I started added network mounts to either side, it all went haywire.

When I upload large file from my local machine to the Amazon ‘cluster’, I’m getting either failures or incredibly slow bandwidth . That’s another bug that happens, but this time it’s consistet.

(1) I see in my error logs that Syncthing is trying do with file operations on the newly-uploaded data to the networked WebDav directory on EC2… . I think the networked directory does not perform fast enough (in other words, it can’t handle network-mount lag).

(2) The ‘network-style’ file systems like sshfs, nfs, etc are creating lag. This appears as ‘blocking’ IO to an application like Syncthing.

Anyway, the net result is that there are errors… files don’t get deleted off the network drive in time and syncthing doesn’t know what to do, syncthing tries to renname and copy files, etc. Here’s an example of the type of error message it would throw… not exact, this is just the type.

[B2DEO] 03:48:11 INFO: Puller (folder “dbox”, file “Enc/vDJuHEgyLLFCbiiD0DoASDL8”): dst create: open /home/ubuntu/mnt/Enc/.syncthing.vDJuHEgyLLFCbiiD0DoASDL8: file exists

This is actually fairly interesting. Obviously, if the underlying FS is very slow, syncing (and maybe scanning) will be very slow, but shouldn’t result in any more errors than usual unless that FS is broken somehow. Sounds like somethat that would be worth reproducing in a test.