File transfer stopped early on stable local network

I use Syncthing to sync setup.exe files from my build server to my production servers. They are local to each other inside a VPC.

Last night the overnight update failed, and when I went to check it was because one of the setup.exe files on the production server failed to run - it said it was corrupted.

I checked that I could execute it on the build server, and it was fine. I made a copy of the file on the build server (in the same Syncthing folder) and that synced down to my development machine and is fine, and so I have both copies here.

Both files are the same length, but there is a big chunk of zeros at the end of the corrupted version. It looks like Syncthing just stopped early:

This has happened once before, a few weeks ago, but I deleted the corrupted version of the file in my efforts to get sites back online so I wasn’t sure what had gone wrong.

I’m not currently on v2 - I was waiting for things to calm down before updating. Everything is on 1.30.0. But this has only just started happening after years of trouble-free service.

Is there any useful debugging information that I can share? Does this sound like a problem already fixed in v2? Could it be because I haven’t upgraded but relay servers have?

This does not fit any known problem in any version.

Edit: which is to say, root cause needs to be determined. While Syncthing transferred the file, it’s quite particular about hashing every block of data from reading it on the sending side to receiving it on the other side just before writing it. How big was the block of zeroes, precisely, and at what offset?

The file is 33,332,598 bytes long (that is 0x1FC9D76).

The zeros are not actually at the very end - I misread it.

The zero section starts at byte 01E00000 and ends at byte 01F3FFFF, so it’s definitely a neat chunk, but also there’s a chunk of valid data right at the end of the file.

I ran a build and then set the site to update overnight and went home for the day, so it had hours to finish syncing. The version of the file that appeared on all servers was corrupted, not just one, so it was the “sending” server not the receiving one.

The log file complains about some files “changing during hashing” at the time of the build, but that’s normal, and it wasn’t the problem file anyway.

is there any logging I should change to generate more useful data in case this happens again?

Cool, good info. My guess is the file was written in stages and Syncthing read and synced it in an intermediate state, and then the size and modtime never changed so it didn’t have a reason to think it changed from that state.

The alternative is some sort of bug that would repeatedly result in those specific blocks being read as zeroes instead of their actual content – once when scanning, and then once more each time a client asked to sync them, since Syncthing reads and hashes the blocks each time. I have a hard time envisioning how that would happen.

Ok that makes sense. I used to build my setup files in one step, and then copy them all into the SyncThing folder once the whole thing had finished in another step.

But that meant it didn’t start syncing until the last file had finished building, so to speed things up I started copying output files into the SyncThing folder during the build process as I went along.

But I was being lazy, so instead of copying files one at a time, I just dumped

Copy-Item “$buildFolder\Deployment\Builds$version” “$targetFolder” -force -recurse | Out-Host

all the way through the build script. So actually I was overwriting the files several more times than I needed to. If the timing was really bad, I could have copied a file in, SyncThing started scanning it, and then I copied it in again replacing the original whilst SyncThing was in the middle of uploading a chunk, and it would have got a load of zeros.

I will fix my build scripts to ensure that I only copy each setup file into the target SyncThing folder once.

Thanks!

1 Like