A problem with storage issues is that it’s often subtle and gradual. A while back I had a drive that seemed to be working just fine, but it wasn’t until I noticed that one file failed checksum verification that prompted me to check every file – 30,000+ files, ~30% were corrupted all without filesystem errors. The corruption was caused by media errors that didn’t trigger any alerts until I ran a bad block scan that put high I/O load on the drive.
In Syncthing 1.x, LevelDB doesn’t have the same level of integrity checking as SQLite does, so corruption can go by unnoticed.
Quality and types of metrics from S.M.A.R.T. vary a lot between different models so it could give a “Pass” to a drive that’s still failing due to media errors. It’d be helpful to see the results.
Clarity on the pool and RAID configuration would also be very helpful, e.g. Is it pure Btrfs mirroring between the pair of NVMe drives or is it an Unraid array with Btrfs on top?
What does the following command say?
btrfs filesystem show