Syncthing helped diagnose a faulty CPU

This is just a quick story from last week. We built a new machine for video editing using AMD Ryzen 4750G, 32 GB of RAM, NVMe SSD, etc. The computer ran fine during the initial setup, and both the OS and software, including Syncthing, all seemed to install and work just fine during the first few hours.

However, at night the user suddenly reported that his file synchronisation stopped working. I checked the situation, and Syncthing was in an error state indeed. What was unusual was that it simply refused to start outright. A panic log kept being created on each start, but the text file was completely empty. I then tried to start Syncthing manually from the command line, but right from the get-go, there was a weird Go thread exception error. Unfortunately, I do not have the screenshot anymore, but searching for it in Google did not bring any useful information anyway.

Then, after another hour or so, the computer started acting up. Windows began to throw random BSODs for no particular reason, and eventually ended up in always bluescreening when trying to boot up. Safe mode was bluescreening also, and the Windows installer that we tried to run to re-install the OS was bluescreening too.

At the end, we tested the CPU with a different motherboard, different RAM, and a different PSU, where it did not work correctly either, and came up to the conclusion that the unit was simply faulty. We sent it back to AMD for RMA, and got a replacement immediately. The computer is now running well with no issues, using the exact same OS installation.

The conclusion is that sometimes errors in Syncthing may not necessary indicate issues in the program itself, but can be hardware related too. Also, Syncthing was the first software to start showing those symptoms, which gave us some time to prepare for the worst.

3 Likes

Not exactly related, but I wanted to add that just a few days ago Syncthing also helped diagnose a dying HDD.

Multiple errors like the one below suddenly popped up,

Fatal error: gdwva-7bfwn WithNeedTruncated(XXXXX): leveldb/table: corruption on data-block (pos=2014954): checksum mismatch, want=0xffffffff got=0x6170dc89 [file=001359.ldb]

and sure enough, we checked the HDD and the thing was toasted.

Yet again, Syncthing was the first to report the problems, as the OS and the rest of the software seemed to still work fine. The disk is now going through its last zero-format before being thrown away (which is taking forever due to all the write errors).

re: zero-write – a drill and hammer might be faster and just as effective…

2 Likes

Just preferred to do it the clean way :sweat_smile:. It is a 500 GB drive, so normally formatting should not take a long time. It actually managed to finish overnight anyway. Only the beginning had been very slow with the HDD making some suspicious noises, but then it fortunately sped up.