syncthing uses too much memory on windows

imsodin · November 19, 2021, 11:31am

Basically what Audrius said. And when it comes to memory management: Yes it’s crude and has room for improvement. In most use-cases it works as is nd otherwise you can remove the scan progress updates (as mentioned a few times), and the problem disappears.

That looks like less abstraction not more. If you implement a nice and stable Go library doing something of the sort, we can create a shim for our filesystem interface and use it for change monitoring
I don’t see how a history makes scans unnecessary - you still need to discover the present to add it to your history. I guess what you really mean here is 1. again: Use fancy, efficient methods to detect changes.

3./4./5. You are aware of the B in BEP - Block Exchange Protocol? Syncthing doesn’t just handle entire files, but data in the form of blocks. You address such a block by iths SHA256.

qyloxe · November 19, 2021, 11:51am

I assume from your emotional post, that you identify this project with yourself, and every question about technicalities is equal in your mind to question about your personal life decisions or views of the world, but believe me, this is not the case - Syncthing is AWESOME piece of software, and if we agree that it is just that - electrical 0 and 1 in a machine and we are privileged to understand that, we could even be happy. There is a chance, that I was in your place, so I know what I’m talking about (emotionally) haha

OK - firstly I really don’t care what you think that I think about you. I don’t know you, so the only common place for us is technical ground. Do you want to Syncthing be better or just like to bully people?

Technically, the facts are this:

DEFAULT configuration leaded IN MY CASE to system hung which was reported as a bug in github which probably triggered you - sorry. I help as much as I can, and you add to this unnecessary emotional load. It is only you who does that.
the context is that MORE modern devices allows for journaling filesystem than NOT so, the default configuration SHOULD consider this firstly. Ten years ago it was obviously different. I don’t know your modern userbase, but do you? Syncthing looks as a solution for home users because BUT it has a huge potential in server environment also. Do you consider such direction viable? Are you single responsible for such decisions? Modern user devices support journaling, modern server devices also.
you’ve got info, that some parts of code could be responsible for unnecessary hashing behaviour during scans and got suggestions. Those were kind suggestions not demands. This is open source project and remember that I’m just grateful that it exists and I do not demand anything or care. Those are your projections. If you do not want to use those suggestions then it is totally ok to just say so, but calling anybody as “idiots” or “ignorants” is an unnecessary passive aggressive behaviour. You do not know me, and what I created or what I know. Ach, do what pleases you, but be respectful.

calmh · November 19, 2021, 12:21pm

We always welcome PRs to optimize and improve. As you said, there’s years of development behind the current code. That has two effects; on the one hand, there’s certainly cruft and unnecessary stuff in there as a result of evolution. On the other hand, there’s a lot of stuff that happens due to corner cases in operating systems or use cases. What might look unnecessary sometimes isn’t.

Suggestions that change the fundamentals of what Syncthing is (like “I think Syncthing shouldn’t use block hashes by default”) are ignored by necessity.

AudriusButkevicius · November 19, 2021, 12:23pm

What I am saying is that it’s easy to be a critic, and throw around “free tips” how to make everything better. We get people like that by the dozen.

Most commonly they lack context as to why things are done the way they are, hence it becomes easy to criticize.

Sure, some of your suggestions make sense, some don’t, etc.

I think we all agree that using filesystem journaling would be a sensible idea, it’s not that we haven’t thought about it, but it’s not universally portable and needs to be though through how to integrate sensibly, and most importantly implemented.

So if you actually want to help, I suggest you start opening pull requests, instead of handing out free tips, because as I said, we get those by the dozen.

qyloxe · November 19, 2021, 2:37pm

@imsodin

(as a new user I reached daily post limit in this forum and had to wait 2 hours, sorry)

In most use-cases it works as is nd otherwise you can remove the scan progress updates (as mentioned a few times), and the problem disappears.

sure, I’ll do that later, check memory consumption and give feedback.

That looks like less abstraction not more. If you implement a nice and stable Go library doing something of the sort, we can create a shim for our filesystem interface and use it for change monitoring

in Go rather not - it’s not my area. Well, using osquery is viable rather in server environments where there are people capable of installing this beast. IF Syncthing evolve into a layered solution with API in each layer, then it would be possible to write integration code in any technology. What I mean? If you would use database for example SQlite or Postgres as a repository for input files, then any process could populate those with data (for example based on OSQuery or their own files). Imagine, that I have a server with 2M files created daily and you do not need to scan them and hash them I just need for you to distribute them to other 12 servers in 6 places in the world. So, it would by a responsibility of my process to update records in DB and Syncthing would be reponsible for efficient distribution of those files. As for now it is hard, because scanning and hashing and distributiona in Syncthing are tightly coupled.

I don’t see how a history makes scans unnecessary - you still need to discover the present to add it to your history. I guess what you really mean here is 1. again: Use fancy, efficient methods to detect changes.

Yes, basically it is, because when I have an event stream of changes from modern file system and have a history of past state, then I do not need to repeatedly compute actual state of file at every point of time. Could elaborate, but I think you know what I mean and you know what is possible with journaling technology. The OSQuery is nice, because it wraps this event stream on linux, mac and windows. Persistent state should be stored in standard DB (SQLite, Postgres, etc.).

3./4./5. You are aware of the B in BEP - Block Exchange Protocol? Syncthing doesn’t just handle entire files, but data in the form of blocks. You address such a block by iths SHA256.

As I mentioned, I didn’t analyzed the distribution layer, because as for now I’m more interested in scanning/hashing layer. Anyway, distribution should also be a layer, because fixed block hashing would not always be optimal. Synchronization could be file format specific, for example TIFF files has sections, VM images has own blocks, some file formats have their own embedded hashes, checksums etc. If this part of syncthing (block detection by file type) could be its own layer, maybe somebody would write a synchronization plugin for tiff files, or vmware images (this would be a HUGE deal in backup industry) or else? I assume, that such block synchronization should work in some kind of transactional mode, for obvious reasons (this is non trivial - hard even).

(AudriusButkevicius)

What I am saying is that it’s easy to be a critic, (…) Most commonly they lack context as to why things are done the way they are, hence it becomes easy to criticize.

Wow, what’s wrong with you? Nobody is criticizing anything and please do not bully me by calling me names (“critic”). Why are you everything taking personally and continue to attack? Listen, you got TECHNICAL text from enthusiast, based on technical problem - are you capable of staying in technical borders of discussion about technical project and not talk about yourself or explaining how awful people are (which is not interesting to me at all)? I want to help, but your form of communication fills me with disgust. Maybe it is my/yours cultural thing, maybe not, but it is as it is, and it comes from you specifically.

PRs are nice, yet I do not write in Go, so I do not plan to change YOUR codebase (honestly: and deal with you personally based on your current attitude lol).

(calmh)

like “I think Syncthing shouldn’t use block hashes by default”) are ignored by necessity

I didn’t say that you should not use block hashes for network transmission protocol - if anything that you shouldn’t block synchronization while hashing

I want to NOT generate hashes during SCANNING, where said hashes are used as a means to determine if the file has CHANGED (remember 50GB file). What I would like, is to use - in my situation - only timestamps, filesizes etc. because IN MY situation it is sufficient. I didn’t found an command line option for that, so my proposal is to only add such an option to create simple and fast hashes/id’s based on natural attributes of files (timestamp, size, etc) which will drastically improve scanning in my situation and properly determine changes of files (because MY filesystem will take care of that).

(devs)

What I can do - please consider this:

if you open the scanning/hashing layer and persist it in the PostgreSQL/SQLite. I could help you with data structures, triggers, SQL, queries, indexes, analytics, optimization etc.
I could write an external wrapper dealing with OSQuery in other language as an example of integration, and populate your persisted state in SQL table with lots of file records to sync. It should allow almost INSTANT synchronization between for example web servers in distant locations.
I could write an example of how to integrate another apps (distributed) where they need to exchange input files and distribute output files in specific moments of their life cycle, write some analytics based on status of data synchronization, trigger some events when something happens ie. all output files are transferred.
I would experiment with federated sharding of Postgresql database triggered by your synchronization layer - however this would need strict transactional support in your network layer, because it is essentially a holy grail of master-master synchronization.

This would open a way to write plugins in MANY tools for example continuous integration, docker, monitoring, backups, vscode, ssh, admin panels - every single one of them at some point needs to send/receive data, or logs, or backups or installations or updates or environment or else.

calmh · November 19, 2021, 2:42pm

That seems like it goes both ways. I’m sure you have some good ideas, and if you’d like to experiment with a file sync product using plugins and federated sharding of PostgreSQL that sounds cool, I wish you all the best. But it’s not Syncthing then. Perhaps what you want is actually something that’s not Syncthing. I don’t see a point in arguing about whether Syncthing needs block hashes or not, for example. Currently it’s designed around having them.