real world scenario - syncthing needs more thorough testing and robustness

Hi there,

brief thread about real world very simple scenrio and the non-robustness of syncthing, a.k.a. plenty of bugs and weird messages.

Situation: all fresh, never before syncthing usage on both machines. Machine 1: linux, x64, initially syncthing 0.14.49-rc.2 Machine 2: windows, x64, initially syncthing 0.14.48

fresh setup/empty syncthing installation/binary and initial startup of both syncthings. no filters, no ignores, no nothing.

I needed to transport around 180gigabyte from the linux machine to the windows machine.

one share (one directory structure), added on the linux machine, where the 180gig resides. Its all normal readable files, as even these files originally came from a ntfs drive, moved over to the linux’ drive (ext4 or something). so all files, filenames, path structures were originally on windows before, and happily landed on the linux filesystem.

mainly family pictures, videos, some doc files, and such stuff, family archive.

linux machine first scanned and finished scanning of the 180gig structure. only then did i add/handshake the two machines with each other. and only after this step, did i set the share from the linux machine to be shared with the windows machine.

eventually the windows machine displayed the notification inside syncthing gui, that there was a folder to be received.

i set a simple directory as destination on the windows side, namely F:\data\

and it then started to do its initial metadata exchange and these first steps.

i did not mess with special settings as to large blocks or anything complicated, only very basic settings, point to a folder, scan it, share it, ah and yes, the linux machine is set to send-only and the normal random order setting of files. so far so good.

but pretty much immediately, on the windows machine side, i have seen many tens or hundreds of new error lines in the console window of the syncthing.exe, speaking about mkdir errors on \?\F:\data\syncthing\structure\certain\folder\for\Pictures…

and similar to this, random folders that it apparently couldnt always fully create the destination path.

there is absolultely zero reason that the windows machine would be hindered to create these folder structures

the path names are not too long or anything, the F: drive is ntfs, the F: drive is some full rights external usb drive, so nothing special.

i have noticed that syncthing started randomly to write stuff to F:… and those data structures, and only select certain maybe pictures or certain folders of past years or past family events were affected, and select others on the same directory depths werent at all and so forth. Chaotic order of folders were either affected or not.

but not thousands of objects, only a few tens to hundred.

It looked to me as if syncthing was maybe racing against itself or some algorithm of the metadata exchange or the order of the events when files need to be created and first their cotaining folders to be created or something every now and then failed…

this is very unfortunate I think and this doesnt shed very nice light on the robustness and dependability of syncthing. These are elemental and essential things that fail, and I cant come up with a reason for them to fail.

While syncthing worked, i browsed into those failing folders (or their parent folders), and other folders in there (except the failing one) were already created and select files, for example family pictures, camera pix etc were already put in there and number of data and objects was growing.

Eventually after some minutes I restarted the windows syncthing, and later I restarted the linux syncthing and eventually I also applied the windows syncthing to the same 0.14.49-rc.2 yesterday still, and they were both on the same version.

some hours later they updated themselves to -rc.3 even.

Summary: a clean situation with two syncthings and a lot of objects/folders, trouble the end user with funny or invalid error messages or failing situations

Eventually after some restarts of the syncthing, they apparently managed to create those folder names and structures, they first complained about, because those folders are now there and the data is still incoming and being transfered.

please guys, do look into real world scenario with lot of data, lot of folders and clean state starting fresh into syncthing experience. it is pittysome that a fresh syncthing scenario is causing these kind of severe error messages for the end user.

and yes, the syncthing gui did show lot of red and failed messages too on the status of the share, it wasnt only the console window that displayed the error situation.

thanks.

There is over 50k people using syncthing with terabytes of data and millions of files, if things were as bad as you describe them, this forum would be full of people with pitchforks. Instead of providing detailed error messages or something that can help is help you, you just come and rant. What response do you expect to get?

As my topic reads, this was foremost a request for better testing methodology and real world stuff, not just unit testing or whatever the state of art these days is and for go language there is, I am not very much of an expert in this.

We used to have race conditions and weird order of things and algorithm if I recall. I described a very simple and most basic situation.

One large share, syncing to a single other machine. And the results it immediately gave.

I will try to see what logs or messages I can provide.

One more thing that happened, pretty much immediately after a few minutes was, that the linux side now shows (it is send only share), that there would be four objects to be synced back.

Two of them being pictures/movie file, and one or two weird entries looking like simple directory objects.

Which is also weird as the target windows machine would not change a single bit on the directory it puts the stuff into, and these objects/directories originate on the linux (sending) machine, so why claim that there were some changes on the windows side of things?

There were also messages in the gui about “objects went away that certain machine had” or similar. All being not what the real world would resemble, but what bugs or racing or bad algorithms would cause.

I am speaking here about what happens in reality on the machines, and about what syncthing makes of situations.

I wonder why or how these kind of things can go undetected, as you say, be it on you guys developers/testing machines or with other peoples setups. So it must maybe some rare scenarios or these kind of fuzzy logic or race like conditions and so on I can speculate about.

We have a few thousand people running release candidates that we release weekly. If there are issues, we get reports, what else do you want us to do?

No distributed system is edge case free and you might have hit one (or issue with your setup which is the more likely issue as mkdir errors imply that the filesystem is preventing us to do something) yet with a statistical sample of one make blanket statements about reliability

My problem with your report is, that it’s mainly pointing fingers. I strongly disagree that Syncthing’s testing is poor, but even if it was, what’s the point in telling the people doing the work that they suck without providing actionable feedback/advice?

If you look around in this forum, you see that problems are taken seriously and investigated. I am not even doubting that you encountered a problem. You already used quite a bit of your time for lengthy posts - expending the same time on the problems themselves might get them solved, even pretty soon. You and every other user will benefit from that.

1 Like

Which “debugging facilities” should I select to help best here in this situation? I guess it is filesystem related or around there?

the mkdir failing parts on the windows target machine most specially, but at the end, now having synced pretty much near the 180gigabyte, there are select objects pending back on the linux originating machine as well it lists 5 directory objects, all being size zero but wanting to create/recreate them on the linux side, even the directories are fine there already as everything started there.

thanks for hints.

You should provide the log with errors first

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.