Problems with Syncthing

gJzcBT · February 10, 2017, 2:34am

This isnt my first time using Syncthing. Ive used it from phone to PC and vice versa and it works great with no sync issues. The issues im experiencing with Syncthing is between 2 Ubuntu 16.04 servers with Syncthing ppa.

Using one folder between Server A and Server B and Server A being in Send Only mode
Both servers have an already exact copy of that folder of many TB’s of data.

1.First and foremost, it takes about 6 - 7 hours to build the index and all that - during which trying to see whats happening on Server A web Gui or refreshing it - the gui becomes mostly unresponsive or ‘connection error’ i have to refresh a couple of times or wait a few minutes and then it will refresh. Ive reported an error regarding this - where the first time i sync’d i could not get the gui back up so i restarted the server, no one has responded to that issue: 9203

2.Second problem is, that after it supposedly finished with the index and hashing of files. I was left with the dreadful message on Server A:- “Out of Sync Items 8615 items, ~970 GiB”

That above message is an example. the number varies, as this is the second time i am resyncing everything(ie. i deleted the index folder on both servers and started again).

3.Override button does nothing on master(Server A). Those “Out of Sync Items” will just sit there on Server A and wont change. Yet Server B eventually becomes ‘Up to date’

4.Server B’s Remote Device section of Server A sits at a certain percentage and never moves, yet the folder says “Up to Date” - That should not happen because Server A’s folder to my understanding is in Send Only mode.

5.The Address on Server A or Server B sometimes changes to a different port, event though i have specified what port and IP it must connect on. I dont understand why that is happening, and mostly dont understand how it can communicate on a different port when only port 22000 is open on both servers.

Server A (when in Send Only mode) should not show Out of Sync items if Server B claims everything is “Up to Date”. And id say that both folders on BOTH Servers are up to date, its just the message ‘Out of Sync’ that gets stuck on a certain number of items. Override button should work as expected but it doesnt.

Ive tried removing index folders on both servers, only to wait another 7 hours to do this and land up in the same situation. it claimed 7 hours at the start but its been more than that.

Can someone suggest an alternative way in terms of forcing items to not be 'out of sync’on Server A rather than removing the Index folders again? If this is still a bug can this be fixed ASAP? because im relying on it heavily for keeping servers in sync with Server A being in Send Only mode.

AudriusButkevicius · February 10, 2017, 7:45am

Hashing is a one off operation, so once it’s done it will only hash new changes.
If the node is master, it will not download files by definition and will simply stay out of sync. Send only does not mean “enforce my state everywhere else”, in means if someone changes something, don’t download it (as the docs explain).
It seems its quite a few files so it might take a while, also try refreshing the page.
As I said, send only is not enforced, so if B modifies files, A will naturally become out of synx as it will refuse to download Bs changes.
If you really care about this read up about how ports work in general. Connections have a listening side and dialing side. The dialing side always picks a random port to dial out off and dials towards the listening connection, so on one side the port will always be 22000, yet its random on which.

The up to date folder means “I have all files I need” (local device status), yet out if sync device means “he does not have everything I have” (remote devicd state).

Removing index is a nuclear option which would probably only be used when corruption happens.

Honestly, I don’t see any bugs, I think you just haven’t read the relevant sections of the docs and you have a lot of misexoectations.

gJzcBT · February 10, 2017, 8:20am

Audrius thanks for response.

1.I was trying to ask why does the web gui become unresponsive during hashing? 2. & 4. ok so that means the “Out of Sync Items 8615 items, ~970 GiB” that is showing is normal on the master? Because like i said 970GB is a part of the data, not the total data. Why would only 970GB be out of sync on the master? Ive checked both servers with: du -s /media/user1/hdd1 -> 3670016000 KB and du -s /media/user2/hdd1 -> 3670015990 KB

In other words, both have the same amount of data. and then only after the data is hashed on both sides. Server A still has “Out of Sync Items 8615 items, ~970 GiB”, sometimes it gets stuck on a different number if i remove the index db folder and start from scratch. which doesnt make sense, since you said that only if B has changes made it will show that to server A. but there were no changes on B. its just showing out of sync items incorrectly on A.

imsodin · February 10, 2017, 8:44am

Did you check that these out of sync items are incorrect? If you click on “Out of Sync Items” you get a list. Pick some of these files and test whether they are actually on both hosts and do not differ.

AudriusButkevicius · February 10, 2017, 8:56am

They are both out of sync with each others perspective. On the folder level on the master and remote device level on non-master.

Out of sync does not always mean the content is different, it could just be metadata such as permissions or modification timesramps. The reason master is out of sync is because B claims it has newer files (in terms of metadata) than A, which A refuses to accept as A is send only.

Reason web ui becomes inresponsive is probably because your machine is maxed out at the point it’s hashing, simply not getting any airtime to reply to ui requests, especially when running on weak devices (which I bet is the case given it takes so long to hash things)

gJzcBT · February 10, 2017, 9:16am

@AudriusButkevicius Server B finished hashing from scratch and says Up to Date. and as i reported previously i couldnt access web gui of server A at all to see whats happening. I restarted Server A and when opened the web gui, it needed to do all the hashing from scratch, ie. it didnt do it the same time server B did. I dont know why that happened? Currently Server A reports: “Out of Sync Items 15761 items, ~3500 GiB” which is the total amount that is showing in Global State. The Local State is counting till it reaches 3500GB. Im gonna wait to see when its finished, if “Out of Sync Items” goes away or not this time. @imsodin Then il check whats the difference between the items if any are “Out of Sync”

gJzcBT · February 10, 2017, 9:19am

By the way, the devices are i7 cpu and more than 16GB ram each.

gJzcBT · February 10, 2017, 9:37am

Now at 26% i just saw this error in the LOG: table@build error I·39553 "leveldb/table: corruption on data-block (pos=1964228): checksum mismatch

Why this all of a sudden? and now what do i do… delete index folder again?

calmh · February 10, 2017, 10:10am

The index database has become corrupted. We’ve seen this happen on machines with bad RAM or bad disks. It’s likely your system is not healthy.

gJzcBT · February 10, 2017, 7:01pm

Sorry guys but another corruption on data-block. What a waste of time. I can assure you now, its definitely not the drives or RAM. they are all fairly new and tested them. My CPU usage was not even that high, it was hashing on both servers on about 20%

I really hope Syncthing can handle HUGE amounts of data in future versions, atleast in Send Only mode when theres already the same size on both PC’s before first time hashing starts. For small amounts of data such as 2GB or 5GB here and there Syncthing works ok.

If anyone would like to reproduce this issue, then do the following and prove me wrong:

Server A with folder must be master (ie. send only).
Server A must already have more than 3TB of data
Server B must have the exact same amount of data as Server A already before adding the folder and server on syncthing on both servers.
After checking that both Server A and Server B have the exact same amount of data. Then, share the folders between Syncthing and wait for it to finish hashing on both ends. After which you could tell me the result.

If anyone has already tried this procedure above before, with more than 3TB of data, id like to hear of the result.

AudriusButkevicius · February 10, 2017, 8:34pm

See data.syncthing.net, there are people with 25tb of data and a few hundered devices.

If you are getting corruption errors repeatedly, I am sure you either have hardware errors or software errors (caching software/buggy exotic filesystems etc).

Allen_G · February 21, 2017, 11:39pm

Indeed, I see the same behavior. I have looked at the list that is shown and checksummed the files to verify they are identical.

I started from an rsync, but it’s possible I’m using ignore owner and group

Is it possible that mismatched metadata triggers this?

gJzcBT · February 22, 2017, 5:06am

Well, im not sure if i used ignore owner and group, i left everything else as default in post 10 description. After having this constant issue everytime i tried it from scratch, i used LuckyBackup with rsync (as you did) --checksum option and the results were perfect. Both Server A and Server B using checksums were identical. Best part about rsync with --checksum option is that it didnt take more than a day for 3TB of data.

One of the developers would have to reproduce the issue exactly as i stated in post 10 and they will see for themselves if they are willing to go through the process to find the problem. But for now - Syncthing does not work with terrabytes of data from Ubuntu to Ubuntu.

Allen_G · February 25, 2017, 1:41am

I recreated with owner and group synced, and same behavior. Its almost as if new nodes announce they have file X without checking the global model and recognizing that it’s unneeded, causing all the other nodes to write out temp tables and do hash comparisons.

This is only speculation though, so I believe I’ll look through the docs and code to see whats up.

The project is wonderfully documented compared to many

system · March 27, 2017, 1:52am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.