Slow scanning and hash rate

erroneous · January 5, 2021, 1:22am

Hello,

I have been encountering this issue for quite some time and I really don’t know what’s going on. Here are the basics.

I have a 3 node set up. A server and two client machines. Here is a screenshot of the server’s webui.

As you can see we deal with a lot of small files. These are primarily 1kb to 250kb in size and there are a lot of them, however they’re generally in a structure so there may be a max of 50 in a given folder. The joys of the old implementation of the potree point cloud format.

The server is running unRaid and SyncThing is running in the linuxserver/syncthing docker container. Files are stored on the xfs unraid array with 2 hdds.

My first thought was a bottleneck of some description, however CPU usage is always quite low and there is plenty of ram and the system isn’t swapping.

root@files ~ # free -m
              total        used        free      shared  buff/cache   available
Mem:          32145        5767       16294        1022       10084       24998
Swap:             0           0           0

top - 12:02:28 up 33 min,  2 users,  load average: 1.96, 2.47, 2.61
Tasks: 373 total,   1 running, 372 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.8 us, 10.0 sy,  0.9 ni, 80.7 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
MiB Mem :  32145.5 total,  16250.8 free,   5771.4 used,  10123.3 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  24994.5 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 8097 root      20   0 1040988  96956   1032 S  47.9   0.3  18:04.56 shfs
 9327 root      20   0 4689708   4.1g  22112 S  44.6  13.1  33:23.88 qemu-system-x86
13995 nobody    31  11 1072924 336532  17976 S  27.1   1.0   9:39.04 syncthing
 8358 root      20   0 1066080  75988  42468 S   4.3   0.2   1:48.57 dockerd
 8375 root      20   0 1044508  33504  18276 S   3.6   0.1   1:06.56 containerd
28818 root      20   0  151228  37860  27040 S   2.6   0.1   0:00.08 docker
 7830 root      20   0  283592   4116   3460 S   1.0   0.0   0:12.62 emhttpd
28519 root      20   0    7032   3360   2524 R   0.7   0.0   0:00.08 top
   53 root      20   0       0      0      0 I   0.3   0.0   0:01.62 kworker/4:1-events
 7649 root      20   0  109104  10264   3892 S   0.3   0.0   0:04.69 containerd-shim
 9227 root      20   0  109104  10324   3956 S   0.3   0.0   0:07.70 containerd-shim
 9521 root      20   0    3856   2464   2200 S   0.3   0.0   0:00.71 docker_load
12072 root      20   0  109104  10324   3956 S   0.3   0.0   0:07.23 containerd-shim
13615 root      20   0  109104  10324   3956 S   0.3   0.0   0:11.85 containerd-shim
17587 root      20   0  109104  11096   4544 S   0.3   0.0   0:06.82 containerd-shim
18352 nobody    20   0 2031168 134088  22628 S   0.3   0.4   0:01.71 mysqld

The above statistics were taken while hashing was running and nothing looks like it’s being pushed hard. Next thought was disk IO bottleneck of some description.

root@files ~ # iostat -m
Linux 4.19.98-Unraid (files)    01/05/2021      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.55    1.12   12.52    2.67    0.00   70.14

Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s    MB_read    MB_wrtn    MB_dscd
loop0             1.36         0.00         0.00         0.00          3          0          0
loop1             0.02         0.00         0.00         0.00          0          0          0
loop2             8.39         0.23         0.15         0.00        502        332          0
loop3             0.04         0.00         0.00         0.00          1          0          0
md1               9.82         0.13         0.00         0.00        276          2          0
sda               0.62         0.06         0.02         0.00        134         49          0
sdb               0.74         0.01         0.01         0.00         30         25          0
sdc               0.67         0.01         0.01         0.00         25         25          0
sdd               0.06         0.00         0.00         0.00          3          2          0
sde               9.52         0.13         0.00         0.00        278          2          0
sdf             113.02         1.65         4.95         0.00       3632      10923          0
sdg             100.69         0.47         4.95         0.00       1041      10923          0

From what I could find, the tps value points towards a fairly heavy IOPS for the spinning rust disks. However, it’s possible I’m interpreting those results incorrectly.

I also enabled STTRACE=scanner and here is an excerpt of the log.

root@files ~ # cat syncthing_redacted.log
[IDID_0] 2021/01/05 12:09:34.642662 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1318514/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:36.643536 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1318514/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:38.641049 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1318514/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:40.642674 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1318514/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:40.923316 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 1 25M.potree/data/r/60255/r60255.bin
[IDID_0] 2021/01/05 12:09:40.923824 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 1 25M.potree/data/r/60255/r60255.hrc
[IDID_0] 2021/01/05 12:09:40.926027 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 1 25M.potree/data/r/60256/r60256.bin
[IDID_0] 2021/01/05 12:09:42.643804 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1374860/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:44.642547 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1374860/3476255146 at 0.0 MiB/s (0%)
[IDID_0] 2021/01/05 12:09:46.641152 walk.go:171: DEBUG: Walk FOLDER_ID_1 [] current progress 1374860/3476255146 at 0.0 MiB/s (0%)

Lots of the current progress lines randomly interspersed with a real to hash line. There’s usually around 2.5 minutes between each real to hash line, which doesn’t seem correct to hash a small file.

Any assistance on this issue would be greatly appreciated. Unfortunately these files have to be synced, so ignoring them isn’t really an option.

Thank you.

erroneous · January 5, 2021, 3:18am

Some further information, the two drives that the data is stored on are sdd and sde. The database is stored on the unraid cache drive (btrfs) which are SSDs sdf and sdg. So there’s definitely some activity on the SSDs, but nothing that would suggest such a slow rate at hashing files. So I definitely made a mistake in my original post about reading the IOPS incorrectly.

It also seems to be just the 20 Current Tekla Models folder that is slow. This would suggest that it has something to do with accessing / reading the database file. Possibly. If it was taking a long time to perform a lookup on the database before moving to the next file.

calmh · January 5, 2021, 7:31am

Typically this is caused by slow access to the files in question. Is it fast to list (ls -lR) and cat (find /wherever -type f -print0 | xargs -0 cat > /dev/null or something) the files in question?

erroneous · January 5, 2021, 10:39am

File access speeds seem correct. I’ve compared a node and the server as well as in the docker container syncthing runs inside of to see if there is something really wrong. This is a bit of an apple and oranges comparison however, because the node is running on an NVMe SSD and the server is using HDDs. So I expected the server to be slower, however it wasn’t extremely slow.

Server (Bare-metal)

# time ls -lR > /dev/null

real    12m12.311s
user    0m9.941s
sys     1m59.614s

Server (Docker)

# time ls -lR > /dev/null

real    17m29.263s
user    0m13.391s
sys     3m0.405s

Node

$ time ls -lR > /dev/null

real    9m19.668s
user    0m6.106s
sys     1m30.736s

Interestingly there is definitely a performance hit running in Docker. I wouldn’t have thought it’d be that significant however. Which could point to a larger underlying issue somewhere.

cat-ing one of the folders that has a heap of point clouds in it to compare.

Server (Bare-metal)

# time find SD340\ GSSC\ Handrails\ \&\ Balustrades/ -type f -print0 | xargs -0 cat > /dev/null

real    1m32.667s
user    0m2.052s
sys     0m25.048s

Server (Docker)

# time find SD340\ GSSC\ Handrails\ \&\ Balustrades/ -type f -print0 | xargs -0 cat > /dev/null

real    1m44.297s
user    0m2.786s
sys     0m45.102s

Node

time find SD340\ GSSC\ Handrails\ \&\ Balustrade
s/ -type f -print0 | xargs -0 cat > /dev/null

real    0m56.155s
user    0m0.563s
sys     0m11.053s

Again, the results point to an overhead with Docker. However, none of them are extremely slow and general usability with the files is fine. I’m never waiting for a file to load or anything along those lines.

imsodin · January 5, 2021, 11:12am

The iostat output with the huge tps values for the two drives where the db resides, together with low load on other drives and cpu, indeed very much suggest the db io being the bottleneck.

Is the Syncthing setup new? What’s the content of your db dir (output of ls -lh on the index-v0.14.0.db directory in your Syncthing config dir)?
Syncthing uses two different leveldb configurations, depending on the db size. Maybe you recently crossed the threshold to use “large” settings and leveldb doesn’t handle that well (does a ton of compaction). The “LOG” file in the db directory should show a log of activity then too.

And completely in the realm of unfounded speculation: Maybe it’s something in the unraid cache/btrfs setup that doesn’t play well with db io.

erroneous · January 5, 2021, 9:59pm

This set up has been running for a couple of years. I have encountered this problem in the past where everything was working fine and then it would just grind to a halt. I would go through the folder and try to remove anything that wasn’t absolutely necessary and then delete the folder from syncthing and re-add it. Unfortunately that’s not really an option at this point.

Overnight and it’s up to 4% on the scan, it seems it’s slowed down on the disk IO.

# iostat -m
Linux 4.19.98-Unraid (files)    01/06/2021      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.42    1.11   13.11    0.50    0.00   76.87

Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s    MB_read    MB_wrtn    MB_dscd
loop0             0.04         0.00         0.00         0.00          3          0          0
loop1             0.00         0.00         0.00         0.00          0          0          0
loop2             2.58         0.01         0.08         0.00        572       6326          0
loop3             0.00         0.00         0.00         0.00          1          0          0
md1               1.73         0.07         0.00         0.00       5098        161          0
sda               0.02         0.00         0.00         0.00        134         49          0
sdb               0.53         0.00         0.04         0.00        271       3349          0
sdc               1.60         0.02         0.04         0.00       1463       3349          0
sdd               0.04         0.00         0.00         0.00          3        163          0
sde               1.58         0.07         0.00         0.00       5102        163          0
sdf              29.72         0.08         1.59         0.00       6413     120045          0
sdg              28.79         0.02         1.59         0.00       1164     120045          0

Here is a listing of the index directory.

# ls -lh
total 233M
-rw-r--r-- 1 nobody users 2.1M Dec 30 04:04 070375.ldb
-rw-r--r-- 1 nobody users 2.1M Dec 30 04:04 070376.ldb
-rw-r--r-- 1 nobody users 112K Jan  5 17:44 071944.ldb
-rw-r--r-- 1 nobody users  11M Jan  6 08:30 071959.log
-rw-r--r-- 1 nobody users 129M Jan  6 06:49 071961.ldb
-rw-r--r-- 1 nobody users  89M Jan  6 06:49 071962.ldb
-rw-r--r-- 1 nobody users   16 Jan  5 17:44 CURRENT
-rw-r--r-- 1 nobody users   16 Jan  5 17:44 CURRENT.bak
-rw-r--r-- 1 nobody users    0 Jan 24  2019 LOCK
-rw-r--r-- 1 nobody users 311K Jan  6 06:51 LOG
-rw-r--r-- 1 nobody users 1.1M Dec 30 04:04 LOG.old
-rw-r--r-- 1 nobody users 6.5K Jan  6 06:49 MANIFEST-071946

During yesterday, I ended up needing to create an additional folder. However this is a very small folder with 935 files totalling 112MB. Here is an updated screenshot of the webUI for completeness.

I think you’re right with the database going over a limit. I checked the LOG file and here’s an excerpt and the complete file as well. LOG (310.3 KB)

Hopefully it doesn’t need to be sanitised.

=============== Jan 6, 2021 (AEDT) ===============
05:16:12.149627 memdb@flush N·11321 S·2MiB
05:16:12.206090 memdb@flush created L0@71950 N·11321 S·593KiB "\x00\x00\x00..bin,v368359217":"\x0e\x00\x00..bin,v368363464"
05:16:12.206459 version@stat F·[1 0 0 4] S·213MiB[593KiB 0B 0B 212MiB] Sc·[0.12 0.00 0.00 0.02]
05:16:12.211929 memdb@flush committed F·1 T·61.580573ms
05:16:12.214599 journal@remove removed @71945
05:16:12.215048 table@compaction range L-1 "":""
05:16:12.215902 table@compaction L0·1 -> L1·0 S·593KiB Q·368360301
05:16:12.268314 table@build created L1@71951 N·10615 S·583KiB "\x00\x00\x00..bin,v368359217":"\x0e\x00\x00..bin,v368363464"
05:16:12.268679 version@stat F·[0 1 0 4] S·213MiB[0B 583KiB 0B 212MiB] Sc·[0.00 0.01 0.00 0.02]
05:16:12.278075 table@compaction committed F~ S-10KiB Ke·0 D·706 T·61.855404ms
05:16:12.278853 table@compaction L1·1 -> L2·0 S·583KiB Q·368360301
05:16:12.281830 table@remove removed @71950
05:16:12.341365 table@build created L2@71952 N·10615 S·583KiB "\x00\x00\x00..bin,v368359217":"\x0e\x00\x00..bin,v368363464"
05:16:12.341764 version@stat F·[0 0 1 4] S·213MiB[0B 0B 583KiB 212MiB] Sc·[0.00 0.00 0.00 0.02]
05:16:12.345738 table@compaction committed F~ S~ Ke·0 D·0 T·66.565145ms
05:16:12.346094 table@compaction L2·1 -> L3·2 S·209MiB Q·368360301
05:16:12.348146 table@remove removed @71951
05:16:22.344360 table@build created L3@71953 N·1671935 S·128MiB "\x00\x00\x00..dnf,v355289080":"\x01\x00\x00..txt,v368087933"
05:16:30.104627 table@build created L3@71954 N·1522624 S·80MiB "\x01\x00\x00..bmp,v368076762":"\x0e\x00\x00...wd,v368339040"
05:16:30.104979 version@stat F·[0 0 0 4] S·213MiB[0B 0B 0B 213MiB] Sc·[0.00 0.00 0.00 0.02]
05:16:30.109291 table@compaction committed F-1 S-18KiB Ke·0 D·466 T·17.762851748s
05:16:30.110647 table@remove removed @71952
05:16:36.015153 table@remove removed @71947
05:16:36.048286 table@remove removed @71948
06:48:50.610085 memdb@flush N·465839 S·63MiB
06:48:51.506600 memdb@flush created L0@71956 N·465839 S·6MiB "\x00\x00\x00..eet,d368368847":"\f\x00\x00..bak,d368595962"
06:48:51.507187 version@stat F·[1 0 0 4] S·220MiB[6MiB 0B 0B 213MiB] Sc·[0.12 0.00 0.00 0.02]
06:48:51.515899 memdb@flush committed F·1 T·905.456666ms
06:48:51.548712 journal@remove removed @71949
06:48:51.596348 table@move L0@71956 -> L1
06:48:51.596825 version@stat F·[0 1 0 4] S·220MiB[0B 6MiB 0B 213MiB] Sc·[0.00 0.07 0.00 0.02]
06:48:51.650410 table@move L1@71956 -> L2
06:48:51.650841 version@stat F·[0 0 1 4] S·220MiB[0B 0B 6MiB 213MiB] Sc·[0.00 0.00 0.01 0.02]
06:48:51.702018 table@compaction L2·1 -> L3·2 S·216MiB Q·368360301
06:49:02.185164 table@build created L3@71957 N·1881838 S·128MiB "\x00\x00\x00..dnf,v355289080":"\x01\x00\x00..bin,v367756224"
06:49:11.504831 table@build created L3@71958 N·1778560 S·84MiB "\x01\x00\x00..bin,v368637944":"\x0e\x00\x00...wd,v368339040"
06:49:11.505212 version@stat F·[0 0 0 4] S·217MiB[0B 0B 0B 217MiB] Sc·[0.00 0.00 0.00 0.02]
06:49:11.531918 table@compaction committed F-1 S-3MiB Ke·0 D·0 T·19.829439391s
06:49:17.780949 memdb@flush N·446584 S·63MiB
06:49:18.711048 memdb@flush created L0@71960 N·446584 S·6MiB "\x00\x00\x00..tes,d369110603":"\f\x00\x00..bak,d369103456"
06:49:18.711391 version@stat F·[1 0 0 4] S·223MiB[6MiB 0B 0B 217MiB] Sc·[0.12 0.00 0.00 0.02]
06:49:18.721895 memdb@flush committed F·1 T·940.460226ms
06:49:18.756130 journal@remove removed @71955
06:49:18.773672 table@move L0@71960 -> L1
06:49:18.774063 version@stat F·[0 1 0 4] S·223MiB[0B 6MiB 0B 217MiB] Sc·[0.00 0.06 0.00 0.02]
06:49:18.801492 table@move L1@71960 -> L2
06:49:18.801846 version@stat F·[0 0 1 4] S·223MiB[0B 0B 6MiB 217MiB] Sc·[0.00 0.00 0.01 0.02]
06:49:18.831069 table@compaction L2·1 -> L3·2 S·219MiB Q·368360301
06:49:35.670059 table@build created L3@71961 N·2057687 S·128MiB "\x00\x00\x00..dnf,v355289080":"\x01\x00\x00..bin,v367596280"
06:49:43.826375 table@remove removed @71956
06:49:47.694204 table@build created L3@71962 N·2049295 S·88MiB "\x01\x00\x00..bin,v368506929":"\x0e\x00\x00...wd,v368339040"
06:49:47.694656 version@stat F·[0 0 0 4] S·220MiB[0B 0B 0B 220MiB] Sc·[0.00 0.00 0.00 0.02]
06:49:47.785984 table@compaction committed F-1 S-2MiB Ke·0 D·0 T·28.954560982s
06:49:47.797274 table@remove removed @71960
06:49:47.872629 table@remove removed @71957
06:49:47.939681 table@remove removed @71958
06:51:20.689904 table@remove removed @71953
06:51:20.724584 table@remove removed @71954

It’s quite possible there’s something with docker / btrfs and the db io. All indications point towards it’s a bottleneck in the db io. Especially with how the other folders scan and perform fine. It seems to be this one folder. Is there anything that can be done to increase the limit where the db changes to the large settings or load it into RAM.

calmh · January 5, 2021, 10:24pm

All folders share the db, synchronise acces to it, and essentially a file update is equally expensive regardless of the file size. So to me this doesn’t feel like database io issue, rather more like an actual folder io issue.

erroneous · January 5, 2021, 11:30pm

An incorrect assumption on my part. I saw the separate .ldb files and thought each folder had its own separate database.

All folders are on the same drives, and the other two folders (screenshots below) scan and sync correctly.

Just thinking about when these problems started. I imagine there’s a bit of an overhead when you add new files. For example, a normal scan takes 2 minutes. You add some new files, the first scan after the new files are added takes 5 minutes. Then it’s back to 2 minutes after that, or maybe 2 minutes 5 seconds because it’s checking the new files.

So if I were to add 4.4GB in 141,571 files, the initial “oh hey, new files. Lets scan and hash them” stage could take a considerably longer time. However, according to the STTRACE=scanner logs, it’s hashing a <1mb file in 2.5 minutes which doesn’t seem right. Is the hashing process multi-threaded or paralleled so it’ll be hashing multiple files at a given time? During the folder structure walk, when new files are found are they added into the same database or into a separate queue somewhere?

erroneous · January 5, 2021, 11:53pm

Okay. I had an idea that I had to test to see if there is something going on specifically with the 20 Current Tekla Models folder in Syncthing. As @calmh stated, the database is shared between all folders configured within Syncthing and all folders are on the same storage, so having different performance between them doesn’t make much sense.

I paused 20 Current Tekla Models and added one of the sub-folders that was causing problems, the SD340... that is within 20 Current Tekla Models. With STTRACE=scanner enabled it very quickly walked the folder structure and then started hashing like mad and hashed the entire folder in 3 minutes. Log has been sanitised.

[IDID_0] 2021/01/06 10:39:27.435713 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34024/r34024.bin
[IDID_0] 2021/01/06 10:39:27.438286 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34024/r34024.hrc
[IDID_0] 2021/01/06 10:39:27.442039 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34025/r34025.bin
[IDID_0] 2021/01/06 10:39:27.442949 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34025/r34025.hrc
[IDID_0] 2021/01/06 10:39:27.444767 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34027/r34027.bin
[IDID_0] 2021/01/06 10:39:27.458822 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34027/r34027.hrc
[IDID_0] 2021/01/06 10:39:27.472215 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34030/r34030.bin
[IDID_0] 2021/01/06 10:39:27.472296 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34030/r34030.hrc
[IDID_0] 2021/01/06 10:39:27.479594 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34031/r34031.bin
[IDID_0] 2021/01/06 10:39:27.492130 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34031/r34031.hrc
[IDID_0] 2021/01/06 10:39:27.492206 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34031/r340314.bin
[IDID_0] 2021/01/06 10:39:27.506248 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34031/r340315.bin
[IDID_0] 2021/01/06 10:39:27.506307 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34034/r34034.bin
[IDID_0] 2021/01/06 10:39:27.512718 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34034/r34034.hrc
[IDID_0] 2021/01/06 10:39:27.513977 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34035/r34035.bin
[IDID_0] 2021/01/06 10:39:27.517102 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34035/r34035.hrc
[IDID_0] 2021/01/06 10:39:27.517945 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34100/r34100.bin
[IDID_0] 2021/01/06 10:39:27.532710 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34100/r34100.hrc
[IDID_0] 2021/01/06 10:39:27.532963 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34101/r34101.bin
[IDID_0] 2021/01/06 10:39:27.546733 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34101/r34101.hrc
[IDID_0] 2021/01/06 10:39:27.560781 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34102/r34102.bin
[IDID_0] 2021/01/06 10:39:27.568536 walk.go:187: DEBUG: real to hash: PointClouds/210104 ENT North 20M.potree/data/r/34102/r34102.hrc

Does each folder have its own table within the database or are they all just in one large table? I’m not sure what technology Syncthing uses. The best way I can explain the symptoms is that it’s like once a table is too big, the lookups in it are slowed down (possibly non-linearly).

imsodin · January 6, 2021, 12:31am

It’s a leveldb (lsm-tree) database, and all data is stored in the same db.

The db respectively table sizes do look peculiar: You have both 2MB and 10MB tables, which correspond to small and large settings. And the full size is not much above the 200MB threshold - so that squares with having exceeded the threshold recently. However the 129M and 89M tables are weird, as their size suggests they are level 4 tables (>10*2^3), but there’s no l1/l2/l3 tables at all.
One way I can imagine that happening is if you had a much larger db in the past, then it got a lot smaller (removed folders?) to below 200MB, and now it grew past that threshold again. And the “20 …” folder is an ancient one, thus it’s information is stored in the large tables and takes a long time to be looked up.
Still lookups in a big table shouldn’t be bad: It’s sorted, thus log(n).

erroneous · January 6, 2021, 1:22am

It’s quite likely this is what has happened. However, as you said it shouldn’t impact performance that much.

Is there any way we could confirm that it’s the database lookups that are taking the time? Could the database become unsorted? As this install is a couple of years old, could the updates to Syncthing cause issues with the database structure?

imsodin · January 6, 2021, 8:46am

I am talking about the db because I asked for info on it and I don’t have any better ideas right now. Do not mistake that with me saying “db lookups are the cause” - I am not

E.g. additional debug statements in the scanner code around db lookups to see if that’s really the cause here. Can you apply a patch and build yourself or are willing to run a custom binary?

No.

There was a significant reduction in db size in an update last year, which can maybe explain the db size dropping below the threshold. However that shouldn’t hurt performance.

erroneous · January 6, 2021, 9:15am

As I’m running Syncthing in a Docker container, I have a feeling this will be somewhat more complicated. However, I’m certainly willing to give it a whirl. I’m happy to try either option.

imsodin · January 6, 2021, 9:53am

Actually, before going there running with walkfs,db,scanner debug facilities should already gain further insight into where time is spent - lets try that first (beware of huge log volume).

erroneous · January 6, 2021, 10:44am

Once the initial startup logs and first directory walk had settled down, here’s the logs from once the hashing had started. Log has been sanitised.

[IDID_0] 2021/01/06 21:32:39.992358 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree {0xc001039790} <nil>
[IDID_0] 2021/01/06 21:32:39.993444 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data {0xc001039a00} <nil>
[IDID_0] 2021/01/06 21:32:39.994791 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data/r {0xc001039c70} <nil>
[IDID_0] 2021/01/06 21:32:39.996183 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data/r/41075 {0xc001039ee0} <nil>
[IDID_0] 2021/01/06 21:32:39.997741 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data/r/41077/r41077.hrc {0xc00103e1a0} <nil>
[IDID_0] 2021/01/06 21:32:39.998397 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx {0xc00103e410} <nil>
[IDID_0] 2021/01/06 21:32:39.999176 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds {0xc00103e680} <nil>
[IDID_0] 2021/01/06 21:32:40.000139 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree {0xc00103e8f0} <nil>
[IDID_0] 2021/01/06 21:32:40.001126 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data {0xc00103eb60} <nil>
[IDID_0] 2021/01/06 21:32:40.002266 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data/r {0xc00103edd0} <nil>
[IDID_0] 2021/01/06 21:32:40.004179 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/PointClouds/201214 Hawthorn Bridge 30M.potree/data/r/41077 {0xc00103f040} <nil>
[IDID_0] 2021/01/06 21:32:40.004320 set.go:376: DEBUG: FOLDER_ID_1 WithBlocksHash("c7267123c3a183c22a8b1c5e8f0ebd5730146dc1f1179f1735bd438fd1f1b207")
[IDID_0] 2021/01/06 21:32:40.004568 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.bin
[IDID_0] 2021/01/06 21:32:40.005156 set.go:376: DEBUG: FOLDER_ID_1 WithBlocksHash("edd41546f80c33eb286bf6a624f043227f499bf1fae5dd83430a4cc5528b3ff6")
[IDID_0] 2021/01/06 21:32:40.005255 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.hrc
[IDID_0] 2021/01/06 21:32:40.005856 logfs.go:79: DEBUG: mtimefs.go:137 basic /sync/Strata/FOLDER_LABEL_1 Open SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07611/r07611.hrc {0xc0143cc000 SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07611/r07611.hrc} <nil>
[IDID_0] 2021/01/06 21:32:40.006634 set.go:376: DEBUG: FOLDER_ID_1 WithBlocksHash("afa052583465eddcbe65a75afce288204ee11e1a07ba41626457e878e6ba787b")
[IDID_0] 2021/01/06 21:32:40.006868 walk.go:187: DEBUG: real to hash: SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07614/r07614.bin
[IDID_0] 2021/01/06 21:32:40.006956 logfs.go:79: DEBUG: mtimefs.go:137 basic /sync/Strata/FOLDER_LABEL_1 Open SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.bin {0xc00a360038 SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.bin} <nil>
[IDID_0] 2021/01/06 21:32:40.008259 logfs.go:79: DEBUG: mtimefs.go:137 basic /sync/Strata/FOLDER_LABEL_1 Open SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.hrc {0xc0143cc048 SD340 GSSC Handrails & Balustrades/PointClouds/201229 A3 Wing 3 25M.potree/data/r/07612/r07612.hrc} <nil>
[IDID_0] 2021/01/06 21:32:40.008827 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r/00050/r00050.hrc {0xc00103f520} <nil>
[IDID_0] 2021/01/06 21:32:40.009499 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy {0xc00103f790} <nil>
[IDID_0] 2021/01/06 21:32:40.010306 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds {0xc00103fa00} <nil>
[IDID_0] 2021/01/06 21:32:40.011229 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree {0xc00103fc70} <nil>
[IDID_0] 2021/01/06 21:32:40.015318 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data {0xc00103fee0} <nil>
[IDID_0] 2021/01/06 21:32:40.016438 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r {0xc0010461a0} <nil>
[IDID_0] 2021/01/06 21:32:40.017687 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r/00050 {0xc001046410} <nil>
[IDID_0] 2021/01/06 21:32:40.019464 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r/00051/r00051.hrc {0xc001046680} <nil>
[IDID_0] 2021/01/06 21:32:40.020088 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy {0xc0010468f0} <nil>
[IDID_0] 2021/01/06 21:32:40.020711 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds {0xc001046b60} <nil>
[IDID_0] 2021/01/06 21:32:40.021482 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree {0xc001046dd0} <nil>
[IDID_0] 2021/01/06 21:32:40.022343 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data {0xc001047040} <nil>
[IDID_0] 2021/01/06 21:32:40.023335 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r {0xc0010472b0} <nil>
[IDID_0] 2021/01/06 21:32:40.024643 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r/00051 {0xc001047520} <nil>
[IDID_0] 2021/01/06 21:32:40.026069 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree/data/r/00052/r00052.hrc {0xc001047790} <nil>
[IDID_0] 2021/01/06 21:32:40.026782 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy {0xc001047a00} <nil>
[IDID_0] 2021/01/06 21:32:40.027454 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds {0xc001047c70} <nil>
[IDID_0] 2021/01/06 21:32:40.028246 logfs.go:61: DEBUG: mtimefs.go:90 basic /sync/Strata/FOLDER_LABEL_1 Lstat yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy/PointClouds/191023 Combined.potree {0xc001047ee0} <nil>

Watching the log scroll past, there are a ludicrous number of mtimefs.go lines and it seems to continually loop through the entire folder.

imsodin · January 6, 2021, 12:33pm

The ton of mtimefs calls come from model.folder.findRename. There’s WithBlocksHash calls and the log shows LStat calls to all path components, which points at osutil.IsDeleted respectively osutil.TraversesSymlink in turn.
We do look up all files with the same hash and check if they are deleted. It looks like due to the nature of your data, there’s quite a few distinct files with the same contents. That also fits with your test that with an initial scan of a sub-path is fast, because there is nothing in db to check for renames yet (and/or the identical files are in different subfolders).

To really see where the time is spent, a log portion containing everything between two occurrences of “real to hash” that are not back-to-back (the three in your latest post are within less than 0.01s) is needed. Please provide that. If it turns out that the costly operation is findRename, I have an idea for optimization. But I want to be sure about what’s going on first.

erroneous · January 7, 2021, 4:50am

I’ve sent you a private message with the logs.

erroneous · January 11, 2021, 12:17am

Just following up on this issue to see if you have had any luck.

Thanks again.

imsodin · January 11, 2021, 10:33am

I had an offline weekend, I have come back to a big pile of notifications, emails and topics. I received your log, it’s huge. Not complaining, I asked for that. However this is just not an easy nor fast to analyse problem. And nothing personal: It’s also only performance and only very niche, so not particularly pressing.

I do intend to come back to this topic

erroneous · January 11, 2021, 10:10pm

No worries mate. I completely understand and I hope you had a restful weekend. I’ll wait to hear from you. Thanks again for your help.