Reducing v2 DB context and footprint - Yes or No

Dedicated thread for this that bothers me still;

People keep coming with “why so slow now?” every week or so. Proposals are clear - pass one, reduce index pressure by hash index instead of full index, pass two - reduce data pressure by parent_id instead of full path.

Pass one is extremely straightforward and I am really missing why not go there.

pass one,

PROS: Now all data in DB is essentially doubled in indexes, we have almost exact 50/50 ratio; will change to something like 80/20 at least, 95/5 at best.

CONS: GlobalDirectoryTree will be inefficient. But I don’t quite get who use it? We deleted (efficient version of it) it in our internal build and it was fine.

Am I missing something?

And I am trying to figure out will it be considered or not if I post PR on that.

2 Likes

Edit by @imsodin: Moved the relevant posts from Reducing v2 DB context and footprint - yes or now - information is partially redundant with the OP here, but this way all the context is preserved as is.

Offtopic here, but my 2 cents,

I am very excited about any scalability fix but it escapes me why this sharding will help.

My cents on this topic are the following:

  1. I never see anything CPU-bound, so, increasing parallel of things is enough already, I am seeking opposite;

  2. with sharding like that you will only duplicate common index tree nodes. Thus most probably increasing IO/RAM pressure even more. The only way it helps will be if shards share some locality, say, one subfolder shards to here, another to there, but with proposed desing, this is unlikely to happen. This point is under question. May help, may help not, may make things worse. But I do not see any easy to follow proof to any of the outcomes.

Maybe I am missing something here.

Another question by the way: If we are to prepare PR with fixes, starting with easy ones, to decrease index IO pressure, will you ever consider them? It will make DB design a bit messy anyway. Or do you think that clarity is paramount here?

– sorry posting to outdated topic and not on the subject, I just got it as “new messages” and had not noticed it is 27 days ago the thing I am replying to

IMHO we got rid of all actionable issues. Either they were fixed or in the case of the Android app, they turned out to be bugs that were not tied to the switch of the database layer.

People keep coming with “why so slow now?” every week or so. Proposals are clear - pass one, reduce index pressure by hash index instead of full index, pass two - reduce data pressure by parent_id instead of full path.

Pass one is extremely straightforward and I am really missing why not go there.

And I am trying to figure out will it be considered or not if I post PR on that.

If it’s clean enough and benchmarks show it’s beneficial enough, it’ll be merged.

Benchmarks on low data will show slowdown only. Improvement expected when IO gets throttled to disk (millions of objects) - this is tricky to benchmark. To treat it properly, you need consider global OS cache pressure.

Do you still agree if given this? Or any ideas?

I made some fairly decent benchmarks I think for the sharding PR, showing the effect on when it was supposed to show effect. From what I’ve experienced the performance tradeoffs are not always intuitive, so certainly a complicating PR without benchmarks showing a benefit in at least the targeted scenario and no terrible regression in the common scenarios is dead in the water. It can’t be both extremely straightforward to implement and impossible to show a benefit with benchmarks.

1 Like

Valid; can you please tell very briefly what is benchmark setup and process you usually follow?

Since I was trying to improve insert performance for many files/blocks, there is a benchmark that inserts many files and prints some stats: syncthing/internal/db/sqlite/blocksdb_bench_test.go at e17ef24c22f612ff09641fb076e5d8973917a7e1 · calmh/syncthing · GitHub

That can be compiled (locally) and run on a couple of different machines, like my laptop, my Linux server and my Raspberry Pi. The expectations on each are of course different, I don’t expect to be able to run to the full (simulated) 24 TB in a reasonable time on the Raspberry, for example, but I expect to be faster on a reasonably specced server than the unmodified case, or there is no gain.

There are also existing benchmarks in the sqlite package, which gives some indication on the effect on other things. If something sticks out there, it could be further investigated. (cd src/syncthing/syncthing/internal/db/sqlite; go test -bench .)

1 Like

Hi:

For what little value I can add here: speaking as someone who’s been running v1 and v2 in parallel on two identical Synology NAS units with (near) identical datasets (probably about 20 million files, nearly 200TB), I saw a big drop in performance (initial folder scanning taking literally months, response to UI operations, updates to UI data) with v2 - and any improvement that can be made would be extremely welcome.

Of course, I realise I’m something of an edge case and am on underpowered hardware (I’m currently splitting the dataset in half across another pair of NAS units), so I didn’t like to shout too loudly.

Sorry for not appearing excited - quite the opposite of the truth, but I didn’t want to add to the volume of complaints you’ve been receiving, and didn’t want to come across as ungrateful for the improvements that v2 is bringing!

2 Likes

On GlobalDirectoryTree, what is it for? Who really use it?

There’s an API method, and it looks like it’s exposed to the iOS app via the internals interface. A git grep reveals this.

Except that most of these boil down to a few types of setups that we can’t fix on our end:

The “whales” with datasets in the muti-terabyte range are either happy with the performance or don’t participate that vocally on the forum :man_shrugging:

1 Like

Yes and mostly all this merges into “now dont fit into the cache”, and it can be addressed. Thats it all about.

My NAS is still running 1.3 after comments from maintainers about the slowdown of systems with large databases. I’ve upgraded the smaller systems that each share less, but my large data store with 500k files and 8TB with a database on SSD (and file storage on spinning disks) is still running 1.3.

I haven’t tried 2.x on that machine.

I have fewer files, but more blocks, and this runs perfectly smoothly on my Raspberry Pi. I don’t think that should qualify as a lot lot of data, really.

3 Likes

I’m using it if you’re talking about GET /rest/db/browse. Maybe other users of Syncthing Tray are also using it as Syncthing Tray provides a UI for it that also allows managing ignore patterns based on it. So please don’t delete this API without replacement. A slight performance regression would be acceptable of course. (I think this has gotten faster with v2; it would be ok if it was again as slow as it was with v1.)

1 Like

well if use cases flow started,

For 16Gb RAM and NVME system, I had to remove 2M files folder from the sync since v2, due to high resources usage from that, ok and working, but not good for portable laptop. Size seems ok, nobody ever scans hash stores as it seems. Items count seems not ok. DB size was 5 GB with, 2 GB without.

For spinning rust stores I just dont care on battery and io drain, does not matter there for me. Can handle my 30tb 5m fine. However graphs of “now res usage up a lot” still here as everyone posted.

You might wanna check if you’re affected by the ZFS bug

I’m using mdadm for raid on the boot disk which also has the database. And I’m using ceph for the main file storage. So I should be clear of any ZFS specific issues.

I’ll go back and reread the comments that led me to wait and maybe I’ll go for the upgrade.