Deleting remnants of ignored files


#1

Hello,

after syncing a few gigabytes worth, I found myself in the need to filter out portions of the data I don’t really need. However, this leaves me with remnant files that I am not sure how to get rid of, because I am not certain which files are and which aren’t ignored — I have at least 20 folders and many ignores.

My first thought was to delete everything in the index and resync the data but unfortunately this is not an option for me because remote machines won’t be available for some time. My other thought is to scan the audit log and determine which files are ignored, then pipe that to rm or other.

Does anyone have other suggestions on how to do this?

Thank you!


(Jakob Borg) #2

If you mean …syncthing.tmp files that were created while syncing, chill for 24 hours and Syncthing will remove them.


#3

I meant normal files and directories which I put in .stignore after they synced.

Unless I am mistaken and syncthing is supposed to delete these files? I have tried to experiment (?d) but they wouldn’t get removed…


(Jakob Borg) #4

Syncthing will not remove those. Ignoring a file does not mean it gets removed on other devices. (And obviously Syncthing doesn’t delete files on your local computer just because you mention them in the ignore file.)


#5

Apologies if you could not understand what I meant, perhaps I could try explaining in simpler terms:

  • (1) [Global State] on (Device A) consists of ~51000 files and ~5000 directories totalling ~5GB

  • (2) sync of [Global State] from (Device A) to (Device B)

  • (3) on (Device B) a large portion of the data, with relatively complex .stignore rules, is ignored

  • (4) [Local State] on (Device B) now consists of ~4500 files and ~700 directories totalling ~600MB

  • (5) (Device B) has remnant ignored files on the physical disk, totalling > ~2GB

  • (6) (Device A) is not available anymore

There are ~20 such folders for which the above is valid.

I want to remove this extra data (5) on (Device B) which occupies my physical disk. Removing files manually on (Device B) is dangerous because I could make a mistake i.e. change the [Global State].

How to remove (5) on (Device B) without making a mistake?

Is it possible to utilize syncthing, which knows about these ignored files because otherwise it could not construct the [Local State], or to just be incredibly patient?


(Jakob Borg) #6

I can’t think of an easy way to have Syncthing delete the files. I can think of a couple of annoying or inefficient ways - set up another instance and sync a copy of the files (will exclude the ignored ones); set some trace options for the scanner and parse the logs for ignored files, then feed that to a bulk delete tool / script…


#7

Thanks, I think my best option is to write a script that parses the logs and moves the files pending deletion.


#8

Hi analagflow,

I’ve got the same problem that you describe here. Did you manage to create a script which is doing the delete correctly? Would be very cool if you can provide it somewhere that I also can use it.

regards, creature


#9

Hey @creature , sorry for the late response.

I actually gave up on the idea of parsing log files with scripts to determine this.

However, I did implement my own syncthing REST API to fetch that information. It is appended at the bottom of this post. My approach is quite possibly dangerously idiotic since I am not familiar with syncthing’s codebase nor the Go language. It was an interesting excercise, and can definitely be improved. Maybe @calmh can take a quick look?

Nonetheless, it seems to be accurate (apparently there are ~35000 ignored files in one of my folders) and I will most certainly attempt to use it after I do some sanity checking (probably with a test case or something).

You use the API by issuing something like this:

curl -X GET -H "X-API-Key: <YOUR-API-KEY>" 'http://localhost:8384/rest/db/ignored?folder=<FOLDER-ID>'

The results (truncated file info) can be piped to jq or something, which is quite nice.

Cheers!


gui, model: add api for listing ignored files
From e4fb1eea15c67eaea416be4c4ea20578b21e5537 Mon Sep 17 00:00:00 2001
From: analogflow <analogflow@protonmail.com>
Date: Wed, 25 Apr 2018 00:05:37 +0200
Subject: [PATCH] gui, model: add api for listing ignored files

---
 cmd/syncthing/gui.go | 11 +++++++++++
 lib/model/model.go   | 27 +++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/cmd/syncthing/gui.go b/cmd/syncthing/gui.go
index 29fc35b9..bec4e442 100644
--- a/cmd/syncthing/gui.go
+++ b/cmd/syncthing/gui.go
@@ -85,6 +85,7 @@ type modelIntf interface {
 	GlobalDirectoryTree(folder, prefix string, levels int, dirsonly bool) map[string]interface{}
 	Completion(device protocol.DeviceID, folder string) model.FolderCompletion
 	Override(folder string)
+	IgnoredFolderFiles(folder string) ([]db.FileInfoTruncated)
 	NeedFolderFiles(folder string, page, perpage int) ([]db.FileInfoTruncated, []db.FileInfoTruncated, []db.FileInfoTruncated)
 	RemoteNeedFolderFiles(device protocol.DeviceID, folder string, page, perpage int) ([]db.FileInfoTruncated, error)
 	NeedSize(folder string) db.Counts
@@ -259,6 +260,7 @@ func (s *apiService) Serve() {
 	getRestMux := http.NewServeMux()
 	getRestMux.HandleFunc("/rest/db/completion", s.getDBCompletion)              // device folder
 	getRestMux.HandleFunc("/rest/db/file", s.getDBFile)                          // folder file
+	getRestMux.HandleFunc("/rest/db/ignored", s.getDBIgnored)                    // folder
 	getRestMux.HandleFunc("/rest/db/ignores", s.getDBIgnores)                    // folder
 	getRestMux.HandleFunc("/rest/db/need", s.getDBNeed)                          // folder [perpage] [page]
 	getRestMux.HandleFunc("/rest/db/remoteneed", s.getDBRemoteNeed)              // device folder [perpage] [page]
@@ -760,6 +762,15 @@ func getPagingParams(qs url.Values) (int, int) {
 	return page, perpage
 }
 
+func (s *apiService) getDBIgnored(w http.ResponseWriter, r *http.Request) {
+	qs := r.URL.Query()
+
+	folder := qs.Get("folder")
+	files := s.model.IgnoredFolderFiles(folder)
+
+	sendJSON(w, files)
+}
+
 func (s *apiService) getDBNeed(w http.ResponseWriter, r *http.Request) {
 	qs := r.URL.Query()
 
diff --git a/lib/model/model.go b/lib/model/model.go
index c80149af..29486986 100644
--- a/lib/model/model.go
+++ b/lib/model/model.go
@@ -719,6 +719,33 @@ func (m *Model) NeedSize(folder string) db.Counts {
 	return result
 }
 
+// IgnoredFolderFiles returns a list of currently ignored files
+func (m *Model) IgnoredFolderFiles(folder string) ([]db.FileInfoTruncated) {
+	m.fmut.RLock()
+	defer m.fmut.RUnlock()
+
+	ignores := m.folderIgnores[folder]
+	rf, ok := m.folderFiles[folder]
+	if !ok {
+		return nil
+	}
+
+	// files := make([]db.FileInfoTruncated, 0, maxBatchSizeFiles)
+	var files []db.FileInfoTruncated
+	rf.WithHaveTruncated(protocol.LocalDeviceID, func(fi db.FileIntf) bool {
+		f := fi.(db.FileInfoTruncated)
+		l.Debugf("%v IgnoredFolderFiles(%q): %v", m, folder, f)
+
+		if ignores.Match(f.Name).IsIgnored() {
+			files = append(files, f)
+		}
+
+		return true
+	})
+
+	return files
+}
+
 // NeedFolderFiles returns paginated list of currently needed files in
 // progress, queued, and to be queued on next puller iteration, as well as the
 // total number of files currently needed.
-- 
2.17.0


(Simon) #10

That looks good to find all ignored files in the database. There may be ignored files on disk, that or not in the database. As you are interested in ignored files on disk, you’d probably have to walk the filesystem instead of the db to find them.


#11

Yeah, that’s completely right. So invoking this API would have to trigger a database update / rescan of the folder or we could get away with just walking the filesystem directly?


(Simon) #12

You’d have to walk the filesystem directly, as a scan would again just skip ignored files: https://github.com/syncthing/syncthing/blob/master/lib/fs/filesystem.go#L37


#13

Ahhh, now I understand: there are two “types” of ignored files. The first type are files that are part of a [Global State] and they have to exist in the database to prevent Synching from receiving them from remote devices. I am probably wrong about this…

The other ignored files are simply those which aren’t part of [Global State] but they are present on the filesystem. Thus, we can discern three distinct states of an ignored file:

  1. Ignored, is in [Global State], is present on the filesystem
  2. Ignored, is in [Global State], is not present on the filesystem (deleted?)
  3. Ignored, is present on the filesystem

So, what I have been doing is merely walking through the database and selecting files from (1) and (2) where I should have been selecting files (1) and (3) by using Walk()?


(Simon) #14

Yes (btw (1) is a subset of (3) ).

The part about why ignored files exist in the database is a bit complicated. They don’t need to be part of the global state. Essentially every file that was ever not ignored in the database (locally or remotely) and is now ignored, exists as an ignored file in the database.

For your purposes (removing ignored files on disk), it would be enough to select (3) by walking the filesystem ( (1) is a subset of (3) ). I think this is anyway the main use case. There may also be a use-case to know ignored files in the database (what your diff above does) or all ignored files, both on disk and in db - however I don’t see what these use cases would be.


#15

Thanks for the explanation @imsodin !

I’ve decided to remove ignored files from within Syncthing because it was quite easy to add a button with the following action:

model.go diff, click me!
diff --git a/lib/model/model.go b/lib/model/model.go
index 1b47d3c4..4e3a8472 100644
--- a/lib/model/model.go
+++ b/lib/model/model.go
@@ -2230,6 +2230,35 @@ func (m *Model) WatchError(folder string) error {
 	return m.folderRunners[folder].WatchError()
 }
 
+func (m *Model) RemoveIgnored(folder string) {
+	m.fmut.RLock()
+	fcfg, ok := m.cfg.Folder(folder)
+	runner  := m.folderRunners[folder]
+	ignores := m.folderIgnores[folder]
+	m.fmut.RUnlock()
+	if !ok {
+		return
+	}
+
+	filesystem := fcfg.Filesystem()
+
+	runner.setState(FolderScanning)
+	filesystem.Walk(".", func(path string, info fs.FileInfo, err error) error {
+		if err != nil {
+			return err
+		}
+
+		if ignores.Match(path).IsIgnored() {
+			l.Debugf("%v RemoveIgnored(%q): %v %v", m, folder, path, info)
+			filesystem.RemoveAll(path)
+		}
+
+		return nil
+	})
+	runner.setState(FolderIdle)
+}
+
 func (m *Model) Override(folder string) {
 	m.fmut.RLock()
 	fs, ok := m.folderFiles[folder]

I am not exactly sure what other locks or checks I should do on the files, but it does seem work and isn’t terribly slow. The runner.setState() is also unnecessary I think. At least it provides visual feedback.

Anyway, I would consider this matter resolved. Whether or not this should be a feature is another matter entirely.

Cheers!


(system) #16

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.