Too Many Open Files


(Jakob Borg) #21

Do you have a lot of sync failures? (Visible in the GUI as failed items.)

I haven’t looked, but maybe there’s snuck in a bug somewhere so that files don’t get closed when a pull fails.


#22

Hi Audrius:

Can you post the output from /fd directory (fdinfo bears no purpose).

I previously quoted a snippet of this:

root@NAS02:~# ls -lah /proc/13857/fd*
/proc/13857/fd:
total 0
dr-x------  2 syncthing users  0 Nov 29 08:39 .
dr-xr-xr-x  8 syncthing users  0 Nov 29 06:02 ..
lr-x------  1 syncthing users 64 Nov 29 08:39 0 -> /dev/null
l-wx------  1 syncthing users 64 Nov 29 08:39 1 -> pipe:[60052884]
lrwx------+ 1 syncthing users 64 Nov 29 08:39 10 -> /file/path1/.syncthing.file1.tmp
lrwx------+ 1 syncthing users 64 Nov 29 08:39 100 -> /file/path2/.syncthing.file2.tmp
lrwx------+ 1 syncthing users 64 Nov 29 21:39 1000 -> /file/path3/.syncthing.file3.tmp
...

The file and folder names are confidential, so I can’t post them unredacted here. Also the output is currently 3278 lines long - so I’m not able to put it all up. But suffice to say:

  • The vast majority of entries are references to .syncthing.xyz.tmp files;
  • There are a few database or socket entries scattered here and there - mostly amongst the low numbers - e.g.
l-wx------  1 syncthing users 64 Nov 30 18:08 8 -> /volume1/@appstore/syncthing/var/index-v0.14.0.db/MANIFEST-5196825

I’m happy to send you the full, unredacted list privately if that would be helpful.

Thanks,

Pants.


#23

Hi Jakob:

Do you have a lot of sync failures? (Visible in the GUI as failed items.)

I haven’t seen any evidence of sync failures at all - either amongst the masses of logs, or in the GUI.

And, FWIW, I’ve just searched for a file which reports sync success in the /fd list - and it wasn’t found, thankfully…

Thanks,

Pants.


(Audrius Butkevicius) #24

So feels like this deserves a bug report, yet I am not sure how to repro this, and without a repro this is useless.


(Jakob Borg) #25

Mmm. This might be the time where you run with -verbose and maybe STTRACE=model and see if you find anything interesting related to those files. There is no reason it should leave temp files around unless a sync fails, and no reason at all to keep them open.


#26

Many thanks for your continued input.

My apologies for being a bit of an amateur here - but I’m having trouble passing STTRACE=model to the syncthing executable.

The Synology distribution starts syncthing using the following command:

start-stop-daemon -b -o -c ${USER} -S -u ${USER} -k 002 -x env HOME=${CONFIG_DIR} ${SYNCTHING} -- ${SYNCTHING_OPTIONS}

Does anyone know how I can pass STTRACE=model as an environment variable to start-stop-daemon, in addition to the existing HOME variable? I’ve tried adding STTRACE=model to the very beginning of the command, and after {CONFIG_DIR} - but I don’t believe I’m seeing the expected result (e.g. seeing microsecond timestamps in the log file).

Thanks,

Pants.


(Audrius Butkevicius) #27

It should work. Alternatively, you can just run syncthing yourself by just running the binary.


#28

Hi Audrius:

Thanks - I’m just working that out, and will update you shortly!

Pants.


#29

Hello again:

Still working on this - got the binary running, the debug log happening and reproduced the problem. Now just got to get the massive log off the NAS and into a readable form… :face_with_raised_eyebrow:


#30

Hello:

So I’m wading through the massive log file, looking for anything that might be useful. Here are a couple of snippets from (as far as I can see) the first times that ‘too many open files’ are reported:

[OVPYX] 2017/12/07 03:48:38.808621 rwfolder.go:1569: DEBUG: fsync "file/path/1" failed: open /volume1/NAS/file/path/1: too many open files
[OVPYX] 2017/12/07 03:48:38.808685 rwfolder.go:1569: DEBUG: fsync "file/path/2" failed: open /volume1/NAS/file/path/2: too many open files

…and…

[OVPYX] 2017/12/07 03:48:38.828511 rwfolder.go:1089: DEBUG: sendReceiveFolder/bvkmk-*****@0xc42009e480 need file file/path/2; copy 1, reused 0
...
[OVPYX] 2017/12/07 03:48:38.828836 verboseservice.go:48: VERBOSE: Started syncing "bvkmk-*****" / "file/path/2" (update file)
[OVPYX] 2017/12/07 03:48:38.828872 rwfolder.go:1476: DEBUG: sendReceiveFolder/bvkmk-*****@0xc42009e480 closing file/path/2
[OVPYX] 2017/12/07 03:48:38.828932 rwfolder.go:1701: INFO: Puller (folder "\"NAS\" (bvkmk-*****)", file "file/path/2"): finisher: dst create: open /volume1/NAS/file/path/.syncthing.2.tmp: too many open files
[OVPYX] 2017/12/07 03:48:38.828980 progressemitter.go:225: DEBUG: progress emitter: deregistering bvkmk-***** file/path/2
[OVPYX] 2017/12/07 03:48:38.829018 verboseservice.go:48: VERBOSE: Finished syncing "bvkmk-*****" / "file/path/2" (update file): dst create: open /volume1/NAS/file/path/.syncthing.2.tmp: too many open files

Is that useful at all? If not, are you able to give me any pointers as to what I should be looking for?

Many thanks,

Pants.


(Jakob Borg) #31

Do you have any complaints in the log along the lines of dst truncate: <whatever error>, the keyword being “truncate”?


#32

Hi Jakob:

Thanks for the thought. I’m running a search through the log file for dst truncate: - it’s about 25% of the way through so far, with no hits.

I’ll let it complete, but thought you might want an early report.

Thanks,

Pants.


(Audrius Butkevicius) #33

Are you sure all the open fds are pointing to different files?

You could try running with STTRACE=fs, and see if every open is followed by a close.


#34

Hi Audrius:

Are you sure all the open fds are pointing to different files?

I can’t be 100% sure - but they look different from an overview. I’ve specifically searched for the first 10 in the list - and they were unique.

You could try running with STTRACE=fs, and see if every open is followed by a close.

Will do that and report back. Thanks!


(Jakob Borg) #35

Give this build a spin as well maybe. Even if you don’t see the log entries I expect, it does fix a file descriptor leak that I suspect could apply to you.


#36

Thanks Jakob - that’s amazing!

I’ll give the system chance to go bad with the current build, running under STTRACE=fs first - this should happen in about 24 hours, if previous experience is anything to go by. Once that’s done, I’ll try the new build.

Thanks,

Pants.


#37

Hello:

Thanks for the thought. I’m running a search through the log file for dst truncate: - it’s about 25% of the way through so far, with no hits.

I’ll let it complete, but thought you might want an early report.

For completeness, I’ll let this go through and confirm. So far it’s 40% of the way through - still with no hits…

Pants.


#38

Hello:

I’ve been running with STTRACE=fs - but I’m not seeing any filesystem-related log entries… however, I have reached a useful correlation between the list of open files and what’s in the (reasonably-sized) log file this time:

I’ve only gone through a few of the listed open files - but I’m seeing a repeated pattern - for example: here’s an entry from the open file list:

lrwx------+ 1 syncthing users 64 Dec 13 09:09 10 -> /volume1/NAS/Computer 1.sparsebundle/bands/.syncthing.11122.tmp

If I search through the log file for Computer 1.sparsebundle/bands/.syncthing.11122.tmp I get no results - but searching for Computer 1.sparsebundle/bands/11122 produces the following:

[OVPYX] 2017/12/12 23:05:34.123373 verboseservice.go:48: VERBOSE: Started syncing "bvkmk-*****" / "Computer 1.sparsebundle/bands/11122" (update file)

Normally I would expect to see a second log entry, ending in (update file): Success - but I’m not seeing this with the files in the open file list. The timestamps for the (update file) entry are many hours ago - so I wouldn’t expect these files to still be being updated.

I hope that’s of some help - I’ll give the new build a test now, and report back later.

Many thanks for your assistance in this - it’s very much appreciated!

Pants.


(Simon) #39

That should be because the keyword is filesystem not fs.


#40

That should be because the keyword is filesystem not fs.

Oops! :open_mouth: Thanks Simon!