Run script:
redactLogs.sh syncthing.log cleanup_table.csv
After execution, all strings in the log file, matching the strings on the left in cleanup_table.csv table, will be replaced with the strings on the right in cleanup_table.csv
Cleaned-up file will be created: syncthing_redacted.log
–
Logs don’t contain any “super-personal” info, but I felt uncomfortable posting IPs, IDs, folder names, etc. when it’s not really necessary. So I wrote this for self-use.
Great. I always wanted something like that, which could save a lot of time, masking/changing/replacing information, I’d not like to tell the world, but help with my logfiles.
I’ll give it a shot, when I need to send my logs.
Fast forward several months and it turns out that maintaining all the stuff to be redacted using csv sheets is a huge PITA. For every new folder pair or device, to remember add it’s IDs, labels, etc…
Eventually it just didn’t seem really useful anymore.
So I decided to rewrite the cleanup script to require almost no manual involvement.
After all, most of stuff I want to remove from the logs is actually can be found in the config, and IP addresses have a unique patterns.
Operation principle:
You give it a path to the syncthing’s config file and the log you want to redact. Config is then scanned for: device IDs, device names, folder IDs, folder labels, folder paths. Then it walks through the provided log file, replacing all of those with consistently-enumerated generic placeholders. In addition it looks for IP addresses and port numbers patterns and replaces those as well.
The only shortcoming (that I know of so far, there may be more stuff I’m not aware of) is that if syncthing complains about specific filenames or sub-directories, I haven’t figured out how to detect or replace those in a pretty way automatically.
I noticed though, that most of such data comes form Puller module, so I check for it and prompt the user to remove it’s messages entirely. It may not be always a good idea, since it may be essential to debugging. That’s why it is optional.
So, turns out Mac OS’s cut implementation doesn’t recognize certain flags, I don’t have a mac, but i’l check the documentation and try to make it more compatible.
On sudo - the points above are correct. You as a user want to run programs with as little permissions as possible to minimize the damage they can make in case of program misbehaving for one reason or another.
The script doesn’t require root, it requires read-only access to the config and the log files, and write access to the log’s directory in order to create a redacted version there.
Also, cleaning 1-2Mb log should take seconds, so no need to wait longer than a minute.
/Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
-bash: /Users/user/Downloads/redact_st_logs.sh: /bin/bash^M: bad interpreter: No such file or directory
but after changeing the EOLs in an editor, it seems to work…
user$ /Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
Loading configuration data from /Users/user/Library/Application Support/Syncthing/config.xml... Done
Redacting device info...
I’ll wait for the progress and we’ll see what happens
Also, if I want to create a report with a log and have to wait more than 10 minutes for the output, I surely have forgotten what I wanted to report…
Should I delete “useless” log entries before I call the script to speed it up? Then I maybe delete important lines too… hmmm…
user$ time /Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
Loading configuration data from /Users/user/Library/Application Support/Syncthing/config.xml... Done
Redacting device info...
^C
real 156m0.488s
user 155m11.674s
sys 0m18.510s
Yeah @sisa, you’re right, 10 mins is really unreasonable time to wait.
That being said, I don’t see a simple way to optimize it in a significant way without rewriting everything in a compiled lang., or rethinking the approach in some other way.
I miscalculated before, assuming 2-3 MB of log would take few seconds - it is in fact going to take quite a while.
So the practical thing to do would be to just use only a relevant portion of a log instead: a section of no more than few thousand lines around the time when the issue occurred for you (that’s what I usually do) - copy it to a separate file, and then clean that file.
It just would have been “good to know” in advance, that this script will take an hour or so for a small logfile.
And/Or somekind of “progress bar” for the user to see the script is actually working and not stuck.
(I once wrote a script with a “spinning wheel”, shamelessly stole it from the guys at stackoverflow.com)
In the meantime, I tested with a much smaller logfile and it worked great!
I came across:
! Warning ! : "Puller" messages detected: the log may contain unredacted filenames/paths
Notice: Deleting those messages will remove potentially useful debugging information
Remove "Puller" log messages? [Y/n] y
"Puller" log messages removed
Would you mind to add an option „save both“, so that the user can compare both versions and may decide which version to upload to the forum, or do some further changes.
Anyways, I’m really thankful that now I can mask logfiles, if I want to.
Hi, thanks for the feedback! I’ll check the probable cause (probably not before tomorrow though).
And regarding the filetypes: yes, the file extensions are hardcoded, exactly as shown in “usage” hint, to avoid mistakes like mixing config and the log. Yea, it’s a bit dumb and ugly… I’ll probably add txt as allowed ext. for log files for now
Now, regarding the un-sanitized device_ID and device_name you mentioned:
Is this device’s info is present in the config.xml that you’ve passed to the script?
Or more precisely, is it possible that the log and the config are from different devices? Or maybe they both are from the same device, but from different time periods, e.g. the “un-sanitized” device in is no longer linked to the reporting device, and hence not in its config anymore?