Reduce verboseness of puller

imsodin · July 28, 2017, 2:17pm

To get it out of the way first: I know I am a bad boy syncing a database, it just nicely illustrates the problem:

Jul 28 14:47:19 homeserver syncthing[9084]: [BPDFD] INFO: Folder "Programme" isn't making progress. Pausing puller for 1m0s.
Jul 28 14:47:19 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:19 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:18 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:18 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:17 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:17 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:17 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:17 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:16 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:16 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:15 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:15 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:14 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:14 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:14 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:14 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:13 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:13 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:12 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:12 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:09 homeserver syncthing[9084]: [BPDFD] INFO: Puller: final: peers who had this file went away, or the file has changed while syncing. will retry later
Jul 28 14:47:09 homeserver syncthing[9084]: [BPDFD] INFO: Puller (folder "Programme", file "AppDaten/digikam5/.mysql.digikam/db_data/ibdata1"): pull: peers who had this file went away, or the file has changed while syncing. will retry later

21 lines per minute for a single file is a bit much. Two separate issues: Firstly logging of the same thing happens twice (once in the pullerRoutine, once in the finisherRoutine) and secondly it happens every pullerRoutine, which is repeated 10 times on errors.

What I would like to do is either not log at all or to debug within a pullerIteration and return errors/store state such that logging only happens once when all the successive tries finished. I didn’t look at implementation details, I would like to know whether I am missing something that requires this level of verbouseness and if the general idea is sound.

And out of curiosity: What kind of problem is to be catched by repeating a failing pullerIteration 10 times back-to-back?

AudriusButkevicius · July 28, 2017, 5:27pm

Set the retry interval to something more, it’s somewhere in the config.

AudriusButkevicius · July 28, 2017, 5:34pm

Also, you are probably right, we should log the last attempt only and only in the finisher, yet I think it’s more than just changing one number. You’ll probably have to pass down the iteration/attempt number somewhere.

calmh · July 28, 2017, 9:47pm

Generally speaking, at the point when we log something we don’t know if it’s going to happen again in the future. When we retry something it’s because we’re hoping an error was transient. Sometimes these things interact negatively. We could possibly suppress logging things we recently logged, much like syslog does.

imsodin · July 29, 2017, 1:04pm

The iterations aren’t independent though: Anything that failed in the previous attempt will be processed again in the following one. So any error (log) that doesn’t make it to the last iteration has been resolved. In a debug scenario, the intermediate errors that can show a negative interaction that you mentioned can be useful. However in “normal” operation I don’t really care about any problems that resolved themselves immediately anyway.

calmh · July 29, 2017, 4:13pm

Fair enough. So I think nowadays ten iterations is probably way overkill - I can’t off hand think of anything that isn’t resolved in one - and we wouldn’t need to log anything until the final one.

dinosore · July 29, 2017, 5:52pm

This has come up before. See Make puller more patient with locked files

imsodin · September 1, 2017, 7:05pm

Another question: Why is there periodic pulling?

The way I understand it this is pointless, as any change in index or ignores schedules a manual pull and scanning and pulling cannot occur at the same time. So instead of periodic pulling scanning could schedule a pull as well. There is the little difference that on changed ignores the pull should happen regardless of sequence number, but that doesn’t require periodic pulling. Am I missing something?

calmh · September 1, 2017, 8:38pm

There is no periodic pulling like you are probably imagining. Pulling is triggered by index updates. But if it fails, and the global state indicates we should update some files to a newer version, we will continue trying to do so indefinitely in the hope that whatever is blocking us might have been resolved in the meantime.

(There is a periodic check to see if the index sequence number has increased, indicating there may be stuff to sync, in addition to the actual trigger on index updates. But this is just comparing an integer in memory to another integer in memory so no real “pull” to it.)

imsodin · September 1, 2017, 8:44pm

What I am referring to is the periodic pulling that happens e.g. when pulling worked out fine:

github.com

syncthing/syncthing/blob/master/lib/model/rwfolder.go#L238


		// channel. The channel is 1-buffered, so one notification will be
		// queued to ensure we recheck after the pull, but beyond that we must
		// make sure to not block index receiving.
	}
}


func (f *sendReceiveFolder) String() string {
	return fmt.Sprintf("sendReceiveFolder/%s@%p", f.folderID, f)
}


func (f *sendReceiveFolder) pull(prevIgnoreHash string) (curIgnoreHash string, success bool) {
	select {
	case <-f.initialScanFinished:
	default:
		// Once the initial scan finished, a pull will be scheduled
		return prevIgnoreHash, true
	}


	if err := f.CheckHealth(); err != nil {
		l.Debugln("Skipping pull of", f.Description(), "due to folder error:", err)
		return prevIgnoreHash, true

The default value is 10s, so that happens pretty often. This does not have much overhead as the next pull like this is aborted early:

github.com

syncthing/syncthing/blob/master/lib/model/rwfolder.go#L195


				pullFailTimer.Reset(f.pause)
				// Back off from retrying to pull with an upper limit.
				if f.pause < 60*f.basePause() {
					f.pause *= 2
				}
			}


		case <-initialCompleted:
			// Initial scan has completed, we should do a pull
			initialCompleted = nil // never hit this case again
			if prevIgnoreHash, success = f.pull(prevIgnoreHash); !success {
				// Pulling failed, try again later.
				pullFailTimer.Reset(f.pause)
			}


		// The reason for running the scanner from within the puller is that
		// this is the easiest way to make sure we are not doing both at the
		// same time.
		case <-f.scan.timer.C:
			l.Debugln(f, "Scanning subdirectories")
			f.scanTimerFired()

This again schedules a new pull. Sure this is micro-optimization but I don’t see any reason to do this in the first place. As the only way the return value from model.RemoteSequence can change is by either a scan or index exchange happening (or is there something else?).

calmh · September 1, 2017, 8:51pm

That’s the integer I’m talking about, yes. It’s more to make sure we haven’t missed something. Plus the check above for changed ignores. Possibly this timer could be significantly increased; it used to be the only driving factor before the hook for incoming index updates was added.

imsodin · September 1, 2017, 8:56pm

That’s my question: What is there that could be missed if a pull was scheduled every time an index update comes in or a scan happens (where the changed ignore is picked up)? Or put differently: Why do index and ignore changes do a triggered pull (IndexUpdated) and scanning is picked up by periodic pulls?
I am aware that IndexUpdated resets prevSec, but that does not prevent triggered pulls for scans.

Edit:
There is a scenario where picking up scan by timer leads to unnecessary delay: Puller iteration fails and thus delays the timer. In the meantime a scan happens that potentially solves the puller problem -> it isn’t picked up until the delayed timer fires.

calmh · September 1, 2017, 9:14pm

So one thing is that the index updated notification is best effort. If it comes in while we’re already busy pulling or scanning it won’t be processed, so we need at minimum one extra check after such things. The ignore updates is another, where we could add a hook. (I’m not sure updating ignores from the API triggers a scan.) There are corner cases like removing a device which changes file availability, but should only matter if we’ve failed to pull the newest version from somewhere else, and then we should already be in the retry loop.

More than that I can’t think of anything off hand. Refactor it and see what happens.

imsodin · September 1, 2017, 10:58pm

Done: lib/model: Trigger pulls instead of pulling periodically by imsodin · Pull Request #4340 · syncthing/syncthing · GitHub