Here’s true story. Let’s say we have devices A and B. I copy 500Mb folder Dupa to ~/Sync on A. Since my uplink is 50KB/s it will take some time before Dupa gets uploaded to B. In the meantime someone modifies a file phones.txt on B (whose unmodified version is also on A).
Now my problem is that when I go to the syncthing GUI on A, the status is “Up to Date”, but in reality it’s not! phones.txt will get updated only after folder Dupa gets completely uploaded to B.
The best would be if phones.txt started uploading to A before Dupa is completely uploaded to B, but if that’s really not possible (which is annoying because phones.txt is a super small file for which I have to wait hours) perhaps A shouldn’t have “Up to Date” as status.
The thing is that B is not looking for local changes while it’s pulling down changes from A, so it doesn’t know phones.txt has changed yet.
Thanks Jakob for the info. Is it part of design, i.e. would it be difficult to change it at this stage of syncthing’s development?
I believe its there for syncthing not to step on its own toes, classifying changes in makes on behalf of a remote device as its own.
It’s possible to change that but it’s fair amount of work to make sure we don’t hit race conditions, and doesn’t necesserily bring a lot of benefit.
One objective benefit is that dropbox gets it right :-), the file phones.txt gets synchronized on A almost immediately. And I think your potential users are very heavily biased towards comparing to dropbox before switching definitively .
At the very least I really think A should not report “Up to Date”.
Well dropbox has a central authority which decies whats newer and whats not and so on, which solves a lot of problems. We don’t have this luxury as all nodes are of equal authority, hence some sacrifices need to be made.
Then again, this is annoying precisely because this is such a clear cut situation where conflict can’t possibly arise, i.e. the folder Dupa and the file phones.txt are each modified only on one of the nodes and only once. (Of course if phones.txt was in the folder Dupa, I’d be much more sympathetic)
Not sure how familliar you are with coding, but imagine the following threads and the race conditon.
T1 is downloading files
T2 is scanning.
T1 downloads the file, puts it in its final location
T2 scans the file, checks the db, realizes the file has changed because stuff in db does not match whats on disk, assumes it was modified locally, tells others.
T1 updates the database for the file it just updated.
This potentially generates a conflict.
T1 downlads the file, updates the databae
T2 scans the file, checks that it does not match whats in the database, assumes a local change, tells others.
T1 moves the file to its final location.
This causes other nodes to become out of sync and try to pull a file which doesn’t exist causing puller erroslrs, until a next rescan happens (if it ever does).
This would need per file locking, etc.
Audrius, I’m familiar with coding, but I’m not familiar with syncthing design, which makes your message somewhat useless to me - I’m sure given sufficient time both of us could come up with p2p designs which work fine in the situation I described in the first post.
However if you’re telling me that this sort of situation was not thought of when syncthing was designed, and that right now it’s too late without either introducing ugly hacks or extensive redesigning, then of course that’s perfectly ok - I just need to know such things so that I can make an informed decision whether moving away from dropbox is feasible for me.
Nothing is ever too late, and it’s not impossible, and does not require hacks.
It needs a lot of thinking through to work out the possible race conditions in the existing code paths, and man hours to do that, not something that was available when it was done initially.
If you are interested in fixing this, I will gladly point you to the places where this all happens, and give you any guidance you need.
I’m not familiar with golang and I’m not a professional dev, but if you describe what would need to be changed, then I’d try to put some serious effort into it (although only after I’ll have used syncthing for one week in my day to day routine, and no serious things come up - so you might prefer to wait with the explanations )
If we were to allow scanning and pulling in parallell, I think the easiest way forward (at least from where we are currently) would be to still not really do it in parallell but instead have mutexes held around smaller sections. I.e. we could have a folder mutex that needs to be held to scan or perform changes on a folder. The puller would hold it it for short times (i.e. batches of x files, handled and flushed to the database before the mutex is released), and the scanner would have an opportunity to run thereafter / in between. This would give the impression of parallellity while maintaining our well defined state.
The time where we need exclusion could be made smaller as well. It’s only really necessary while there is a new/updated file on disk, and it’s not yet in the database. We have a delay here on purpose as we like to batch changes into the database for performance, but it’s fixed to a few seconds. The rest of the time there are just temp files on disk and those are ignored by the scanner anyway.
(This could, in fact, simplify some sections as we now have a fair bit of song and dance to make sure pulling and scanning happens only in serial and from the same place, when they should not really be so closely coupled.)
Then there are of course always going to be tricky corner cases. What if the results from a scan should affect an ongoing pull? And so on. But solving these things is why we get paid the big bucks.
If what you’re looking for is a Dropbox-like solution, then what you need to do is create a dropbox-like environment. I also wanted a similar solution, but without having my files stored out of my own domain. What I did was to run Syncthing on a machine I use as a family server. All my machines sync to the server, but not to each other. That means all my machines will get all the same content, even if it takes 2 hops. Since the server runs 24/7, I know the latest content is always there and available, even if the location I’m currently at is a site I haven’t visited for a long time.