Optimize the transfer of double content?

Matteo_Raggi · January 10, 2017, 12:10am

Sample: Officially I will soon manage the content of 9 devices trough syncthing for a first test:

1 or 2 devices with win 10 with casual movements
2 devices with the same content (we call it P1&P2)
other 2 devices with different content, but in real 99% of the times the content is the same on both the folders (we call it G1 and G2)
other 5 devices with the different content, but in real 99% of the times the content inside these folders is exactly the same too (we call it V1, V2,V3,V4,V5) All of these contennts are shared with a server (we call it VPS) So the VPS server will send 7 different files (V1, V2,V3,V4,V5+G1+G2), plus one doubled content (P1&P2) for 2 devices. So totallt we’ll have 8 transfers when in real we could do only 3. Is there an idea or plugin to optimyze these transfers with a lot of uniffocial , but real double contents?

rumpelsepp · January 10, 2017, 6:49am

I am not sure if I got your question right, but Syncthing operates on block level and does some sort of deduplication per share (is this still valid?). Every file is chunked into equally sized blocks which are hashed with SHA256. So, if you define your shares appropriately you should end up with efficient bandwidth utilization out of the box, since duplicated blocks can easily be reused.

calmh · January 10, 2017, 7:19am

I’ll echo @rumpelsepp - the question is complicated so I’m not sure I get it either. There is a rough description of the sync method Syncthing uses here, perhaps that answers your question.

More specifically, when a file needs to be synced Syncthing knows the block hashes that make up the desired end result. For each such block it’ll try to find it in a file it already has (including but not limited to the old version of the file being synced) - this is a database lookup. If it is nowhere to be found it will try to request the block from one of the other devices that are connected and have advertised that they have it. The other devices will advertise the blocks they have for fully synced files, and files in progress on some interval (it’s not instantaneous).

So yes, there is some optimization going on. But if a file changes on device A and devices B, C, D, E, and F all want it, there is going to to be some duplication in the data they request from A before something C wants happens to be available on E as well.

(There’s two opposing directions of optimization here, we currently optimize for sharing data in the cluster at the price of doing less efficient transfers when there are just two devices: #3692)

Matteo_Raggi · January 10, 2017, 10:33am

A more simple example: If one device is sharing the same file to 7 other devices trough different folders, there will be some sync data transfer optimization?

AudriusButkevicius · January 10, 2017, 11:51am

It’s still not clear, do all 7 devices have all 7 folders?

Matteo_Raggi · January 10, 2017, 12:04pm

device 0 produce and mp4 file and it copy the same file into 7 different folder, eachone of these 7 different folders are shared with one different device. I think that officially there will not be optimization of the data transfers, but practically it would be nice to have transfer optimization for the doubled contents…

AudriusButkevicius · January 10, 2017, 12:11pm

There won’t be any optimization as:

Potentially devices are not even aware of each other.
Even if devices are aware of each other, they do not have a folder in common that would have the file in question.

rumpelsepp · February 9, 2017, 12:26pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.