Device completion and ignores

imsodin · October 14, 2017, 1:41pm

Different ignores between devices causes the completion percentage to never reach 100% and thus stay seemingly syncing forever. This is nothing new, in fact it’s almost ancient in syncthing-time.

An approach mentioned by @calmh and as far as I remember already once existing in a prototype was to base the status on syncing activity instead of what they have according to index. That came up a long(ish) time ago as well. I can’t find the detailed discussion, but issue #623 mentions it.

Another course of action I would like to hear opinions about:
Register every ignored file in the index for the ignoring device when it is encountered on scanning or pulling. Today that is already done if a file is already in the index but becomes ignored: Then it is marked invalid. The same could be done when in index is received/while pulling: Every file that is in the received index but ignored locally could be added to the index for the local device as well (as invalid or a new property specific for ignoring). Then the ambiguity in the completion based on the index is removed: If you don’t count entries that are invalid/ignored on the device in question you will get the real completion of that device.

I did a quick prototype, which to my surprise already seems to do what it is supposed to. It is just a proof of concept, needing several database lookups when calculation completion or pulling is hardly the way to go.

I was worrying this would blow the database set up. I did the tests on my production system (yes I know, but it so nicely had the problem to be solved already…) and the database size even decrease (841->839MB) which is probably due to non-related changes, but it shows that it doesn’t increase the size significantly.

Diff of prototype: https://github.com/syncthing/syncthing/compare/master...imsodin:invalid

What do you think?

calmh · October 14, 2017, 2:20pm

I misunderstood what you were saying when reading the description above (thinking you meant to scan ignored files), but when looking at the code it’s actually quite elegant. I think it deserves cleaning and merging.

imsodin · October 14, 2017, 2:43pm

Elegant? Cool! xD

What I don’t like, or more accurately am unsure about whether it needs significantly more resources, are the frequent database calls: Before only db.WithNeed was needed, which is light as it only queries the global VersionList and then files that are actually needed. Now an additional file needs to be loaded via explicitly calling FileSet.Get. As pulling happens every 10s (see another PR ) that happens very often so unless this call is very cheap it should probably be optimized.

I was thinking of extending FileVersion by an invalid field to be able to exclude invalid files from db.WithNeed and thus not need to the additional FileSet.Get calls.

canton7 · October 14, 2017, 2:48pm

I’m afraid I’m too lazy to read your code, but anything which bases completion % off synchronization progress rather than number of files in common gets a big from me in principle.

AudriusButkevicius · October 14, 2017, 3:35pm

I think this does a wasteful lookup, as the FileInfoTruncated should just expose the invalidity, as it’s just a bit on the mode (or it used to be).

Also, I am not sure I understand how this works. I can see it marks stuff as invalid on up, but if I unignore the file, I don’t see the inverse happening?

Also, why not invalidate when the index is received?

imsodin · October 14, 2017, 3:44pm

The FileInfoTruncated from WithNeed is from the file is needed, not from the file the device has and is thus not invalid.

When unignore the new code in the case is not invoked and the file goes through the normal pulling process. I didn’t check any details (it’s just a prototype), I just did what already happens when scanning: https://github.com/syncthing/syncthing/blob/master/lib/model/model.go#L1972

I guess it could work there as well. I just did it when pulling because this also updates the local fileset, while on receiving the index the remote fileset is updated.

AudriusButkevicius · October 14, 2017, 5:36pm

I’ll leave it for you and @calmh to iron out the details. To me, something smells, yet I can’t put my finger on it. Just feels that we are doing it at the wrong place.

calmh · October 14, 2017, 6:03pm

I think it makes sense from an architecture point of view that the puller makes the note that “yep, I’ve seen this file, and I’m not going to pull it ever”. I’m not keen on the extra database lookup, but I think that can be avoided indeed.

imsodin · October 16, 2017, 10:31pm

After some debugging (my database was luckily quite deranged, so I found corner cases) I have a new version that does not need additional database lookups, on the other side I had to change more code. I will post a PR once https://github.com/syncthing/syncthing/pull/4426 is reviewed (not connected changes that came up while hacking).