I was wondering whether ignore patterns require a full scan, when edited, or does Syncthing rely on the index when a folder is on a stable status (i.e., not sycning, or out-of-sync)? Complete rescans should be avoided at all cost! The index needs to be maintained, not recreated, right? What is the philosophy here?
It’s a full rescan, which happens often anyway - it’s a light operation (under normal circumstances). It doesn’t mean the index is recreated or files necessarily re-hashed, just that the metadata of files on disk is compared with what’s in the db. And files ignored or un-ignored according to your changes to the ignore patterns.
On a 5400 RPM HDD, a full rescan is not that light. This is common storage media for NAS units, even today, given the price of large SSD storage. My personal setup has ~600k files being served from an ARM64 quad-core NAS, with 1 GB RAM (not great, I know). Resources are limited in this environment, but I feel like a more optimized approach could be taken. Consistency is key, but so is usability. Isn’t there more that could be done? A full rescan occurs every hour by default, right? I would say that the mindset should be for it to occur once every day, at most, or perhaps even week, and it should not be be required to run when ignore patterns are changed. It is viable to use the index for this, assuming it’s up-to-date, and we should always make sure it is, atomically during sync operations, after a first scan.
Just a few (hopefully constructive) ideas. What do you think?
Syncthing is AAA software, by the way. Congrats, and thanks!
It isn’t though, because there might be files on disk that were ignored previously that now aren’t, and those aren’t in the index. Maybe you mean that we could do some sort of analysis on the ignore pattern change and try to figure out if it’s more or less restrictive than before and only do a rescan if it’s less restrictive, but that seems like a lot of effort for marginal return.
Otherwise, you can absolutely run with rescans every week, trusting FS notifications to do their thing.
I guess the question is why can we not trust FS notifications? Assuming they work, we can update the index accordingly, even for previously ignored files, avoiding the hassle the do a full rescan when we edit the ignore patterns. We can thus rely on the index and then simply set a personal best interval for full rescans to run, ensuring overall consistency.
Let’s just think about it. I think it’s worth considering.
FS notifications mostly work, but they can overflow and events can be missed so periodic scans are recommended. You can decide that for yourself though, running with one scan per week will almost certainly be perfectly fine.
We shouldn’t index ignored files – ignoring means the user told us they don’t want them indexed. For example, we might not have permission to access them, or they may be temporary files we shouldn’t bother with, or they may be secrets which should not be recorded in the database.
In any case, you just think it would be a performance gain in your setup, but we could imagine a scenario where almost all of your 600k files are ignored and only a handful are not. You would not be happy with us if we indexed all 600k files for some potential future performance boost in case you elected to change ignore patterns. Especially not since you might need to increase kernel variables to allow watching of them when all we really need is to watch the handful of non-ignored files.
I am not saying we should index ignored files, but rather remove them from the index when they are added to the ignore patterns. This would require an index scan and subsequent update, rather than a full filesystem rescan. We could then add the files back whenever there is an FS notification, or a remote sync attempt, and the files are not in the ignore patterns anymore.
There is, however, a use case that is not covered, which is when an ignored file is added locally, the FS notification is triggered, but it is not added to the index, since it is ignored. In this case, removing this file from ignored patterns would not be sufficient to index it and add it back. In this case, it could simply be picked up in the next rescan. This could be optional, where the trade-off is the level of consistency required by the user versus a performance gain in most cases.
This would affect all systems, not just my particular case.
I do understand, however, if you want to keep it simple, which also ensures code robustness. Really just thinking “out loud” here.
Thanks for the discussion,
Yeah… We generally optimize for correctness rather than performance, so we do the scan to pick up files that are no longer ignored.