Receive Only folder marks everything as Locally Changed on restart.

I’m working with a, probably, unique setup and wouldn’t be surprised if this was an appropriately unique issue.

I’m looking to transport several TB of mixed data across a satellite link (~700-900ms, with up to 3Mbps throughput). The source is NTFS and the target is Azure Blob through a Blobfuse2 mount on a Rocky Linux server in the same Azure region.

To test this setup, without the expensive satellite, I mapped a NAS across the Atlantic to a server in the east coast then used Syncthing to transmit the data to Azure West Europe. This worked well, appropriately slowly, but the deduplication made up for that. At a point in time, I’d sent 19.7GiB of data on the WAN and 65.9GiB was stored in Azure and this ratio naturally just gets better as more data is loaded.

Pleased with this, I went ahead and tested with an existing data set that had previously been loaded to Azure. I configured the folder on the Azure Node as Receive Only with no partners yet. Once the Azure Syncthing Node had scanned the 1.97TiB I restarted the OS. Upon restarting both folders (the US test and existing data) showed with all items in Azure as locally changed.

Blobfuse as a file system has several limitations, no x-attrs, etc. But these seem to be already addressed by Syncthing, so I’m unsure why this happened.

Any help or advice would be gratefully received.

Depending on the end goal, there might be more appropriate options than Syncthing.

Is the Azure Blob backup storage?

So the topology is the following?..

NTFS -> sat link -> (Rocky Linux + Azure Blob)

What’s hosting Syncthing on the NTFS side?

Not wholly, the data will be tiered within the Containers with current data being Hot and falling down through cool to archive over a year or more. The Hot data will be accessed directly or through a SAMBA host for Windows clients. Some of the content will need to be shared back from Azure to the original host, but it’s a small fraction.

Pretty much, there’s a few more hops but it’s nothing fancier than SD-WAN/VPN. Both Nodes are in the same WAN, just with a really long physical distance to cover.

In the initial test it was a Windows 2008 R2 server (figured it couldn’t get much worse than that), for the seeded test it’s a Windows 10 client. On a side note, the client won’t connect over QUIC which with Satellite I thought would be beneficial (I suspect it’s a network rule in the SD-WAN portion). I had looked at creating Docker Containers for the Windows servers, but the Syncthing server worked well. Later, I’m looking at different endpoints, potentially NAS or back to Docker.

I wasn’t able to find something better. Although, I was thinking there was a Global deduplication but it seems to be limited to per folder. Is that right?

Thanks again.

Something about those files look different than they did previously. Most likely timestamps changes or permissions changed, for whatever reason. You probably want to check “ignore permissions” and perhaps set a modification time window of 1s or more. You can also use syncthing cli debug file to get all the meadata for one file and we can see what the difference is.

2 Likes

I didn’t need to run the debug, just look with my eyeballs. Each item in Blob shows modified time initially as the creation time and then it’s updated to the time of changes, there’s apparently no way to maintain the timestamp as it’s a read-only value.

Although, I did see somewhere that Syncthing dealt with this by recording an alias when the timestamp attempted to be written was read differently. Or was that just someone’s wishlist perhaps.

I think I’ve seen that there’s now allowance for ignoring timestamps, could this be done by setting the allowed delta to a billion seconds?

Actually, I see that the timestamp delta is “modTimeWindowS” and is actually for detecting change locally. Well, actually since this is a receiver folder perhaps I could use that. I realise that’s a bit dirty but is it wrong?

Syncthing handles the case where setting a modtime fails or is ignored, but not if it appears to work and is then reverted at some later time (which sounds like it’s the case here). Setting the mod time window to a billion seconds or so should work as an ignore-modtime setting, I think. Clearly it’s not beautiful, but honestly neither is the fuse mount of a blob store. :slight_smile: Have you seen rclone?

I did, but it was the absence of deduplication in transit that marked it down. There’s a lot of duplicate chunks in the whole data set but there would be no duplicate files. So I was thinking with a large enough repository I’d be able to minimise the data across the satellite.

1 Like