I’m running v1.3.3 on a Raspberry Pi. Working good for a year or more and today I removed a 50-60 GB sync folder using the normal GUI. The GUI interface locked up and is now unresponsive. Reboot the computer and syncthing runs but htop says the syncthing is taking 30-100% of the CPU. GUI appears locked up still.
Log file says "Cleaning data for dropped folder “xxxx-xxxx” (which I assume is because I deleted a big sync folder) so I figured I’d wait a while. But then I noticed the log shows syncthing is restarting over and over with a “fatal error: runtime: out of memory”. I don’t think any progress is being made.
prints node ID
fatal error: runtime: out of memory
prints node ID
fatal error: runtime: out of memory
(repeat over and over)
In ~/.config/syncthing, I see dozens of panic log files, but don’t know how to make sense of them or what to look for or what to change.
I looked in the config.xml file and the folder I deleted is not listed, leaving only the two other ones I had defined.
How do I recover this situation? Is there some manual way to clean out whatever it’s trying to clean out so syncthing won’t run out memory?
Well, yes, “out of memory” sort of points to adding more memory but that might be hitting with a hammer what can be done with more finesse. Being on a Pi3, that means taking away from the virtual drives on SD, or making a hardware change of adding more SD memory and rebuilding the system, which I’ve grown quite accustom to as is.
Besides, this Pi3 and Syncthing have been happily running for more than a year and it seems awkward that removing a shared folder locks up the program - not the computer mind you. All else remained working, only syncthing locked up, so the issue seems to be something internal to syncthing.
It seems deleting a shared folder should not ask more of the hardware than creating and syncing the folder. So, I thought it might be a good to consider why such a huge memory load is required, and any operational work arounds that do not need a hardware change.
I’ll tune my question: What in the architecture of Syncthing creates a huge memory requirement when removing a shared folder? Can this be changed to have a gentler requirement on the host system, or is this trapped in the Go framework and beyond the design options of the Syncthing team?
Well thank you for an idea I could implement now. However, I don’t have another SD card handy right now, so after your first reply, I moved to “Now Option B”…
I stopped the Syncthing process using htop and removed it from the cron boot-time startup. I rebooted, then renamed the ~/.config/syncthing/index-v0.14.0.db directory to something syncthing would not find. Then I ran syncthing. It seems to be happily chunking through the two remaining share folders doing all indexing ab initio. So far so good.
After everything else you’re working on, it would be great to ensure deleting something never costs more resources than creating it.
Deleting a folder requires loading and deleting or rewriting index entries for all files in the folder. Creating those entries typically happened over time in lots of small steps. Cleaning out happens in larger transactions. Ensuring the resource usage is identical would mean deleting a million files in a million transactions which isn’t a great way to do it. There is a tradeoff here, as there often is.
Your explanation makes sense, however, who is really comfortable with insidious software?
Perhaps there is some way to gradually delete indexes whenever those indexes would otherwise be visited. We’re all used to new folders “coming on line” slowly as scans are done. Why not go away slowly, too? This may sound stupid or foolish initially, however consider this analog: Syncthing has optional bandwidth limits because some people need things to happen at limited rates. Memory could be seen the same. Files could be deleted at limited rates. That sounds hokey, but when the alternative is catastrophic failure of the entire syncthing environment, maybe the idea is not so bad. If designers would say, “go buy more memory”, why do designers not say, “go buy more bandwidth” ?
I look back at some of the the Syncthing design goals which all point toward designing “safe folder deletions” for people with memory the same as when they built the folder:
Safe From Data Loss, Easy to Use (approachable, understandable and inclusive), For Individuals, Automatic (User interaction should be required only when absolutely necessary).
I had a look at the code dropping the folder: We don’t do any checkpointing there. In principle that’s correct, as we don’t need to ensure grouping of writes. However we’ve also done it for resource usage reasons in database transition recently. Looking at the interface comment for Checkpoint, I am not entirely sure what our decide semantics are:
Should Checkpoint always be called routinely or only when we have a reason to commit at specific points?
In the former case I’d check all our readWriteTransactions for compliance (e.g. the drop ones don’t) and make the wording of the comment a bit stronger.
In the latter case I think the current defaults for flushing of 1 MiB for checkpointed and 128 MiB otherwise are too far apart, resulting in vastly differing memory requirements. I am not sure how it should relate to leveldbs defaultWriteBuffer, my gut feeling is transaction <= buffer, but maybe a bigger one skips the in-memory leveldb part, which might be desirable.
And on a related, but not really (the bikesheddings) note: Maybe some heuristics based on system memory in the db setting would also be a good idea, given Leveldb uses too much memory · Issue #6374 · syncthing/syncthing · GitHub. On the other hand that can quickly get out of hand in terms of complexity (e.g. folder count would also matter, which means maybe updating at runtime, which I don’t want ).
I think checkpointing should be for correctness only. The default transaction size might be too large though. At one point we had it time based as well (at least for batches in the puller), so that a transaction wouldn’t grow to more than a couple of seconds of data. The trick is that in code that uses checkpointing we explicitly don’t want a surprise commit at other times. So maybe there should be two kinds of write transactions;
checkpointed ones, which are essentially the ones we have now. The expectation is that it’s committed periodically at a good time, and otherwise it can grow to a large size (which we expect will never happen)
other ones, where it’s just about batching for performance and we otherwise don’t care. These can be smaller and/or have time limit for example.
This it touching a (not good) nerve. I deleted a 40GB folder for unrelated reasons. AGAIN, entire Syncthing program crash and lock-up. If someone clicks to remove a big folder, PLEASE at least put a warning message up to remind people what could happen. It is taking DAYS to rescan all my folders after I did the only thing to recover - deleted my entire database again and let it start rescanning.
No “trustworthy” software should exhibit this type of behavior. As is, it just seems a no-go for a production environment. Please consider mitigating or fixing this problem.
Syncthing is used in plenty of production environments, though indeed I doubt many of those involve a Raspberry.
I believe it is used in production environment, but I also suspect those admins don’t know deleting a big folder can bring down the entire syncthing installation, requiring rebuild of the database.
You suggest that a Raspberry is the problem. If this is a memory requirement issue, then please publish minimum memory requirements for safe use. Otherwise, the issue of surprise loss that takes days to rebuild, remains. And ~that’s~ what a business admin would not like.
I see your github comment about adjusting parameters. I don’t have the skill set to rebuild a custom source code version, even if I did understand the implications. Flush what? What transactions less than 1MB? 128MB can’t be the problem with GB of memory on the system? What would be enough?
After the crash a few days ago, 4 folders of about 100GB are ab initio scanning. The first is 2% done predicting 13 days to finish scanning. What?! Syncthing features are wonderful. The performance is very sad. I’m stopped any other user programs and paused all other folders. Maybe there’s something wrong to take that long to sync, but I don’t have a clue what to change. I hate to go back to cron job rsyncs, but I’m being boxed into the corner…
More acutely, maybe there is a simpler way to help. Tell me how to make re-sync of the database happen fast. So far, my process is to rsync the directories so they’re identical and then put syncthing to work.
I would think all it needs to do is calc a hash on each file, compare, and be done. What’s happening to make it take days for each folder?
It doesn’t. It may require a few hundred megs or RAM for a folder of the mentioned 40+ GiB in size. It may be possible to tweak this to require a bit less memory, but frankly it’s your turn to put some effort into this. If throwing either more hardware or more effort onto it isn’t a possibility, then I think Syncthing is the wrong setup for you on your current hardware with your current requirements and limitations. Rsync may be far superior in many respects if it checks the feature boxes you need.
Yes, I’d love to put in more effort that could take a load off the developers, but I don’t have the skill set to do that. The Rsync is a different angle I didn’t consider because I wanted Syncthing to work.
What’s frustrating is I’m sitting here watching Syncthing do essentially nothing rebuilding a database for the second time (single digit Bytes/ second with a folder global size of 5G) and stuck at 2% for hours.
For local folders (within my LAN), that should be screaming fast and I can’t explain why it’s not. It’s not an issue of effort. I’ve spent HOURS on this problem. It’s an issue of I don’t know what else to try. For a folder over the internet, I bring up rsync and it saturates my DSL link. Syncthing dribbles away at a few bytes per second for no obvious reason.
I wish there was a way to understand and fix this.
Well I suggested to add some swap a month ago. Adding a swap file is literally three commands in less than half a minute. Yet you’re accustomed to your system and don’t want to do that so we’re still here talking about it a month later.
There isn’t such a limit, it depends on what you do with it. Very roughly speaking, the more data and more peers the more resources will be used.
You don’t need to partition anything to add swap, you can create a swap file on any mounted file system. But if you’re not running out of memory I don’t know what the problem is, and certainly can’t help. Sorry.