High CPU usage because of .stignore

  1. The question is why patterns that include asterisk (*) and ? match any character except of path separator (/)?

  2. BTW, you might consider scanning optimization of what they call in BTSync a directory hash. In other words, if directory update time stamp isn’t changed (compare to what it is in the database/index) then that directory is not scanned at all. That should dramatically improve the scanning performance if I understand what they mean by “directory hash”.

If you mean what I think, the reason is that those wildcards should not match a path separator. That is, the pattern a*a should match a file called alpha but not a path aleph/omega.

Yes, that is exactly what I mean. My question is what is the reason for special handling of path separator characters? What does it “buy” you?

It buys the behaviour I described above, which is necessary. :wink:

(Because the ignore pattern is applied to a path as a whole, rather than individually to each component)

I guess I am missing something here. I thought if these characters are handled as exception to the “match any char” rule, that would imply that you can NOT match the entire path, but only its subcomponents.

Btw, what is that ^Folder/.+ mean? I do not recall handling of ^ or . (dot) chars in your specification as shown here:

https://forum.syncthing.net/t/excluding-files-from-synchronization-ignoring/80?source_topic_id=1113

The specification above is pure regexp expression where dot char means match any character and + means one or more time. But in syncthing specification is says that star (*) itself matches any number of chars and ? means any single char except of path separator char. What am I missing here?

You are mixing usage (as a user, what you see and enter) which is described by what you link to above and implementation details which is what Audrius and I are describing as you’re asking about it.

The external interface (for users) is glob-like, with * and ? as wildcards similar to a shell. The internal implementation is regexp based.

It’s worth noting that rsync treats slashes specially though:

   o      a ’*’ matches any path component, but it stops at slashes.
   o      use ’**’ to match anything, including slashes.
   o      a ’?’ matches any character except a slash (/).

i.e. A double asterisk matches both.

I’d like to clarify the matching rules to make sure we fully understand its workings.

Basically, it reduces to a question: If some match pattern matches just a part of the full path, is it considered to be a match?

For example, assume the full path is: .SyncArchive/Level_2/L3/file_01.txt

Is it true to say that if I specify

1) /.SyncArchive // matches any path that starts with .SyncArchive, but ONLY if it is in the top dir of the share. So, that would mean that it would match as /.SyncArchive** pattern in effect.

2) /.SyncArchive/ // matches any path that starts with .SyncArchive as a top level dir (but not a file named .SyncArchive in the top level dir, and all its subfolders and files.

3) .SyncArchive // matches any path that contains .SyncArchive, regardless of its depth, which would be equivalent to specification /.SyncArchive/

4) /.SyncArchive/* // matches any file in the top level dir and all its subfolders, regarldess of their file name and the depth of that subfolder on the path. If effect, it is equivalent to specifications 1 and 2.

5) Specifications 1 and 2 are in fact equivalent for the most part except 1 would also match /.SyncArchive[string], where [string] is any string, and it would also match any file in the top level dir that starts .SyncArchive.in its file name.

Is this correct?

If not, would you state why exactly?

Thanx in advance. I’d like to reduce the CPU load caused by .stignore patterns as soon as I can manage.

master branch now has the .stignore optimization if you are really in a hurry.

Oh, great. Thanks. I am not THAT in a hurry. I could wait for a couple of days. I hope there will be a new release in a few days. I hope it is not going to be weeks though :smile:

I can confirm that with v0.10.2 the CPU usage as reported as 5 mins. average by my server (Linux Ubuntu) dropped drastically. Now it basically does not show any significant impact of syncthing on the overall CPU consumption of the whole system. So, now it shows an average of 7-8% CPU load, which is quite normal considering the load by the web server, search engine and other not so light apps. But before 0.10.2 it would show at least double of that CPU load, which looked like the system is overloaded somewhere.

So, thanx a bunch. That helps.

My question would be: can I add more patterns than just 2-3 and does it represents a significant increase of CPU load if I do not specify the patterns as relative to the root folder, (like /dir/file)?

Another issue is: in my particular case I would like to avoid using the absolute (relative to root folder) path because the .stignore files might get synced by BTSync. The installation is somewhat complex and things might happen that will prevent the normal functioning of .stignore if it is synced to Windows path specification.

Or, to put it in a different way: can I assume that if I specify a leading path starting with path separator in Unix style (/), will it work on both Win and Linux with the same .stignore?

Thanx in advance.

Adding more patterns should increase CPU load only for 1 iteration, and then should require a bit extra of ram. It doesn’t care how the patterns look like

Regarding the paths, the format I think is the same for both Windows and Linux, so I don’t think it should break.