The "Slow Re-scan Issue" Puzzle.

Context

I’m aware that the “slow re-scan issue” has been discussed many times, but something still don’t make sense with my understanding of system resource utilisation.

System Specifications

  • Environment: Supercomputer running Windows 2016
  • File System: NTFS
  • Hardware:
    • RAM: 1.5TB
    • CPU: Xeon Gold processors
    • Storage: M.2 array (20GB+/s read/write, millions of IOPS)
  • Sync Folder: 30TB, containing 20 million files

Observation

The initial scan took ~5 hours, which is understandable. However, the re-scan still requires about an hour to complete, and this is where my confusion lies.

The Puzzle

During the re-scan process, the system appears to be largely idle:

  • All CPU cores usage is minimal (no core exceeds 10%)
  • Disk activity is negligible (almost zero read/write operations)
  • RAM and GPU resources are freely available

In essence, the entire computer seems unused, yet the re-scan takes a considerable amount of time like just idling there, and suddenly finishes.

I am aware that rescan is a single thread process, but that thread is not using much of anything, in any CPU core.

Key Questions

  1. Which component of the system is working so hard secretly during the re-scan? What is the actual bottleneck?
  2. Can we better utilize the available computer resources (CPU, disk, etc.) or purchase any new hardware to make it faster?

I’m seeking insights into the underlying mechanics of this re-scan process and potential strategies for optimization.

In the Web GUI, you could try going to Actions → Logs → Debugging Facilities, enabling “scanner” and then see in real time what it is actually doing.

Probably listing directories take (comparatively) long time and we do it a gazillion times due to the fairly inefficient caching in case-insensitive-filesystem protections…

1 Like

Thank you! Learn something new every day. This helps me to confirm the time-consuming stage.

Listing all the files and directories is fast. Because I notice other programs like

  • Everything.exe can re-index the same volume in 30 seconds
  • Chkdsk can check all the directories and file metadata in ~2 minutes

So I guess the listing doesn’t take too much time, if it exists. As the matter of fact, in the syncthing log I failed to see a listing process. It jumps straight to the file-by-file comparasion.

The main time-consuming stage appears to be a hour long loop of:

  • Checking xxxx file
  • Unchanged xxxx file (then looping to the next file)

So either the “database query” or the “file metadata fetching” is not utilizing the full potential of the hardware.

Expectation: At least 1 CPU core or the disk reading should be pushed to ~100% utilization.

Since this doesn’t happen, there may be something in the code artificially limiting the speed. The question is: where?

1 Like

What calmh meant is not in the scanner debug log, and I don’t get the relevance of other tools doing something fast - syncthing almost certainly does something else. There is a mechanism to deal with windows case-insensitiveness which requires listing directory contents. We cache results of these directory listings for 1s or up to 1000 items in the cache. If you get unlucky with timing and folder structure, you might get very little cache hits. You can collect some profiles to see what it’s doing: Profiling — Syncthing documentation

The below is a related technicalish story. Anyone knowing syncthing filesystem code/design is very welcome to read it and point out flaws in reasoning, everyone else please skip unless you feel inclined to read what probably amounts to gibberish (I didn’t made any attempt at contextualisation). I intend to reread myself soon and if it still makes sense act on it :slight_smile:

I just had a series of revelations/discovering, each inverting the outcome. In between I thought we aren’t actually hitting the case checking when scanning, but in the end I think we do, but we shouldn’t (and there’s a bug resp. code that’s unintentionally unreachable):

Initial thought: Scanning means walking the filesystem, which lists directory contents, so the casing of all paths/names falling out of it is correct, even if we don’t do any case checks - yeiii, big, simple optimisation.

Lets check that the optimisation is indeed simple and have a look at caseFilesystem.Walk:

func (f *caseFilesystem) Walk(root string, walkFn WalkFunc) error {
	// Walking the filesystem is likely (in Syncthing's case certainly) done
	// to pick up external changes, for which caching is undesirable.
	f.dropCache()
	if err := f.checkCase(root); err != nil {
		return err
	}
	return f.Filesystem.Walk(root, walkFn)
}

It’s already using the underlying filesystem, not the case filesystem, so we are already good - are we though? What is the underlying filesystem?

In fs.NewFilesystem we first apply options, including the case one, and then the last step is wrapping in a walk filesystem. So that means caseFs.Walk is actually never called. Looking at e.g. BasicFilesystem seems to confirm that: Its Walk method returns an error not implemented.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.