Ignore everything but certain paths after 0.14.59

Hello,

I just realized that some of my ignore patterns don’t work the way they used to and, after a bit of research, I found that this MR might be the culprit.

My use case is exactly the one described here and I solved it exactly this way. Typically I want to backup some apps settings and other files stored in my home folder and in ~/Library/Application Support, so I created a Syncthing folder with the path set to my home folder and a .stignore file that looked like this :

!/Library/Application Support/Sublime Text 3/Packages/User
/Library/Application Support/Sublime Text 3/Packages/
!/Library/Application Support/Sublime Text 3/Packages
/Library/Application Support/Sublime Text 3/
!/Library/Application Support/Sublime Text 3
/Library/Application Support/
!/Library/Application Support
/Library/
!/Library
!/.ssh
!/.zshrc
!/.gitignore
!/.vimrc
!/.gitconfig
**

And it worked fine.

Now I understand that this MR intended to make writing such ignore patterns easier, and from what I can see it works (yeah !), since I’m now able to shrink my patterns to just :

!/Library/Application Support/Sublime Text 3/Packages/User
!/.ssh
!/.zshrc
!/.gitignore
!/.vimrc
!/.gitconfig
**

However, I feel that what @imsodin foresaw (i.e. : “Potentially a lot more data to traverse/stat”) is a thing, especially in my case since an initial scan of my folder (I reset it with the API between my pattern tweaks for accurate results) takes forever now (compared to instants before the update), with an endless load of these lines in the scanner logs :

2019-01-28 23:05:22 ignored (patterns): <some ignored path in my ~ directory>

I believe the scanner is now traversing my whole home folder, despite the ** in my patterns and I can’t figure out how to prevent this behavior.

Is there something I’m missing ? How can we efficiently treat this kind of use case after this update ?

Thanks

Edit : I actually noticed the problem while trying to fix another issue by upgrading to 1.0.1-rc.2, just so you know what version I’m using right now.

Your concern is not clear. Why is it a problem that it’s traversing the whole tree? If you are concerned by scanner logs, I assume you can disable them.

I only used the logs to diagnose the problem, they don’t bother me per say.

My concern is that the initial scan now takes forever since it’s traversing the whole tree. I’m not sure wether it impacts subsequent scans or not, but in anyway I’m not really a fan of so much work being done for nothing. Remember that the folder’s path in my case is the home folder : that’s a huge tree being traversed to only keep a few thousand files for a total of less than 100MB. This takes time, computing power, battery life (if “on the go”), and HDD/SSD life for nothing (okay I’m exaggerating a bit, but I’m sure you get the idea ;)). And even if the folder wasn’t that big, the point is in my opinion it should only scan what the user wants : that’s what I expect as a user from ignore filters, conceptually speaking.

So the way I see it, either I did a mistake writing my patterns, or there is some optimization missing from the ignore mechanism since the update. If it’s the former (which I doubt), could you help me fix my patterns ? If it’s the later and you don’t agree with what I’ve said, could you explain your point of view ? And if you agree, do you think this is feasible in a near future ?

Thanks

The scan is one off, after that, it works based on inotify which scans the directory where the action happens, so all these scary things you listed are not that scary. Scanning mostly just does a bunch of stat() calls whch are almost free.

If someone has a patterns like:

!*.jpg
**

How do you not traverse everything? You are forced to traverse everything to find all the jpegs.

Sure, your case is different, as supposedly all your excludes are anchored and without wildcards in the middle, which would allow to optimise for this one case, by essentially checking that all includes have no wildcards, but you can’t optimise this for general purpose.

1 Like

That’s nice. Like I said, I wasn’t sure wether subsequent scans were impacted. But just to be sure : the thing about inotify is basically the “Watch for Changes” feature, right ? In the case of periodic “Full Rescans”, it stills traverses the whole tree, doesn’t it ? If that’s the case, even if those are almost free like you said, I guess I can just increase the rescan interval to make them even less noticeable (relying essentially on inotify). Just making sure I understood correctly here :wink:

That makes sense. I didn’t think of these use cases and I should be fine with this implementation.

Thank you for taking the time to explain.

1 Like

Yes

Yes

Not sure, I thought the interval was fixed when watch for changes is enabled, but perhaps I am wrong.

It isn’t fixed, there’s just separate default values with or without watching for changes.

Indeed. I increased the rescan interval so that it’s only done once a day and everything seems to work fine.

The only downside while I was setting this up was obviously the initial/manual scans that took several minutes. I ended up copying parts of the tree and create test folders on two devices, just so that I could quickly tweak my patterns without having to wait for the scan to finish every time I made a change.

So in the end, I still believe that it could be nice to have that optimization and, while it doesn’t seem easy to do, I’m not sure it’s impossible as long as all patterns are anchored : if there are no wildcards, well bingo, and if there are, there should still be a way to narrow down the tree (i.e. : determine which paths have to be traversed “to be sure”, and which can be safely avoided).

Anyway, I don’t know the code, so it’s just a guess and a suggestion. Plus I’m still pleased with the current implementation, so this thread can be closed as far as I’m concerned.

Thanks again

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.