When is it beneficial to use cacheIgnoredFiles?

tomasz86 · February 9, 2021, 9:34am

I am curious about the cacheIgnoredFiles option. When exactly would it be beneficial to use it?

Obviously, I understand that it should not be enabled when we are running low on memory, but what about if RAM is plenty?

To be more specific, I can think of several configurations as follows.

fast CPU, fast storage
fast CPU, slow storage
slow CPU, fast storage
slow CPU, slow storage

Would enabling cacheIgnoredFiles help in any of these setups? Also, does the complexity of ignore patterns matter? I do have some quite complex ones.

Andy · February 9, 2021, 9:51am

My interpretation about that is similar to a cache of a computer, which store data to use them repeatedly. Here the data are evaluated ignore values. Documentation:

cacheIgnoredFiles

Whether to cache the results of ignore pattern evaluation. Performance at the price of memory. Defaults to false as the cost for evaluating ignores is usually not significant.

So it means, if you use Ignore Patterns, you scan and sync process in each setup is faster.

AudriusButkevicius · February 9, 2021, 12:37pm

This was originally created when we used regular expressions for patterns.

RPI’s and all other fancy calculators would simply choke on running the regular expression comparisons as it was walking through the filesystem on every scan, so we started caching the previous match result and trust it as long as ignores did not change. This removed a lot of the CPU load at the cost of ram.

These days we don’t use regular expressions, so the cost of re-checking the same paths against the patterns is much cheaper, and probably does not yield a big benefit.

In general, every file in the filesystem (N) has to potentially be checked against every pattern (M), so the worst case computational complexity is N*M.

The cache uses a map, which turns N*M into N. So for this to be meaningful your M has to be reasonably big, where it makes sense to absorb the cost at the expense of extra ram.

Your cache size is however linear to N, not linear to M. So if your N is very big, it blow up on ram usage etc.

So effectively there is a very limited scenario where it makes sense to use it, which is with a large M, not a very large N and a very weak CPU potentially.

tomasz86 · February 9, 2021, 1:11pm

Thank you very much for such a detailed explanation, especially about the whole background and history. It sounds like the option does not have much use nowadays then…

I do have 1 device that could potentially match the criteria (very weak CPU, more than necessary RAM, not too many folders), but it does not use complex ignore patterns anyway, so I would not be surprised if there was no noticeable difference between having the option enabled or disabled.