Shell script to clean (redact) log files before posting


#1

I wrote a small script a while ago to anonymize log-files before posting them in public.

For it to work you need to provide a comma-separated table with strings to replace.

Example:

cleanup_table.csv:

My-Fancy-Shared-Folder-Name1, FOLDER_01
My-Fancy-Shared-Folder-Name2, FOLDER_02
My-Fancy-Shared-Folder-Name3, FOLDER_03
QW123E-CV123E-TY123E-WW123U, ID-01
QAZS3E-CSD23E-ASD23E-QWEE3U, ID-02

Run script: redactLogs.sh syncthing.log cleanup_table.csv

After execution, all strings in the log file, matching the strings on the left in cleanup_table.csv table, will be replaced with the strings on the right in cleanup_table.csv

Cleaned-up file will be created: syncthing_redacted.log

–

Logs don’t contain any “super-personal” info, but I felt uncomfortable posting IPs, IDs, folder names, etc. when it’s not really necessary. So I wrote this for self-use.

Figured out, maybe others will find it helpful :wink:

redactLogs.sh (611 Bytes)


(totoba) #2

They are personal enough to use to recognize you. I hink the host name , domain name etc might be in the logs too.

I will git it a try.


#3

Great. I always wanted something like that, which could save a lot of time, masking/changing/replacing information, I’d not like to tell the world, but help with my logfiles.

I’ll give it a shot, when I need to send my logs.

Thanks! Good thinking!


#4

Hi!

Fast forward several months and it turns out that maintaining all the stuff to be redacted using csv sheets is a huge PITA. For every new folder pair or device, to remember add it’s IDs, labels, etc… Eventually it just didn’t seem really useful anymore.

So I decided to rewrite the cleanup script to require almost no manual involvement. After all, most of stuff I want to remove from the logs is actually can be found in the config, and IP addresses have a unique patterns.

Operation principle:

You give it a path to the syncthing’s config file and the log you want to redact. Config is then scanned for: device IDs, device names, folder IDs, folder labels, folder paths. Then it walks through the provided log file, replacing all of those with consistently-enumerated generic placeholders. In addition it looks for IP addresses and port numbers patterns and replaces those as well.

The only shortcoming (that I know of so far, there may be more stuff I’m not aware of) is that if syncthing complains about specific filenames or sub-directories, I haven’t figured out how to detect or replace those in a pretty way automatically.

I noticed though, that most of such data comes form Puller module, so I check for it and prompt the user to remove it’s messages entirely. It may not be always a good idea, since it may be essential to debugging. That’s why it is optional.

So here we go - version 2.0!
redact_st_logs.sh (4.2 KB)

Usage:

redact_st_logs.sh <syncthing_config_file.xml> <syncthing_log_file.log>"

Here’s a comparison of a sample log before/after redaction: https://www.diffchecker.com/N0Iyj69U


#5

Hello, I just tried this on a Mac with MacOS Sierra 10.12.6. and got this:

user$ sudo /Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
   Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
cut: illegal option -- -
usage: cut -b list [-n] [file ...]
       cut -c list [file ...]
       cut -f list [-s] [-d delim] [file ...]
   Configuration loaded from /Users/user/Library/Application Support/Syncthing/config.xml...
   Device info redacted...

Same output with or without sudo. Then, nothing happens anymore.

Let me know, if I can be of further help.


#6

Hi, thanks for the feedback :slight_smile:

Do you mind trying this version:
redact_st_logs_v2.2.sh (4.5 KB)


#7

same output:

user$ time sudo /Users/user/Downloads/redact_st_logs_v2.2.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
   Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
   Loading configuration data from /Users/user/Library/Application Support/Syncthing/config.xml...cut: illegal option -- -
usage: cut -b list [-n] [file ...]
       cut -c list [file ...]
       cut -f list [-s] [-d delim] [file ...]
   Done
   Redacting device info...^C

real	52m6.236s
user	51m57.143s
sys	0m6.255s

I canceled after an hour. The log file is about 1.2 MB in size.

cut executable lies in /usr/bin/cut. There seems no version info in the file.


(Antony Male) #8

Using sudo as a matter of habit whenever something doesn’t appear to work is probably a bad idea… Worth breaking that habit!


#9

You’re maybe have a good point, but you don’t explain it, so I can’t see any reason why it should be worth to break it, or why it is a bad idea.

Would you mind to explicate? So I can add another point of view to mine?


(Antony Male) #10

sudo runs things as root.

  1. If something is broken and doesn’t work properly, running it as root allows it to do even more damage
  2. If something is malicious, pretending to break is a good way to get users like yourself to give it root privileges

A script to clean log files should never need to alter your system. It should never need root. There should be no reason to ever give it root.


#11

So, turns out Mac OS’s cut implementation doesn’t recognize certain flags, I don’t have a mac, but i’l check the documentation and try to make it more compatible.

On sudo - the points above are correct. You as a user want to run programs with as little permissions as possible to minimize the damage they can make in case of program misbehaving for one reason or another.

The script doesn’t require root, it requires read-only access to the config and the log files, and write access to the log’s directory in order to create a redacted version there.

Also, cleaning 1-2Mb log should take seconds, so no need to wait longer than a minute.


#12

Hey @sisa, it should work on macOS now as well:

v2.3 - redact_st_logs.sh (4.6 KB)


#13

it throws:

/Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
-bash: /Users/user/Downloads/redact_st_logs.sh: /bin/bash^M: bad interpreter: No such file or directory

but after changeing the EOLs in an editor, it seems to work…

user$ /Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
   Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
   Loading configuration data from /Users/user/Library/Application Support/Syncthing/config.xml...   Done
   Redacting device info...

I’ll wait for the progress and we’ll see what happens :slight_smile:

Thank you very much!


#14

well, I cancelled.

How long can it take? (Or is it even workin? How will I know?)

The Logfile was about 2.4MB and now it is 3.4MB with 17186 lines. So I guess, they were about 11000 lines when the progress started.

Unfortunately, I can not run the MacBookPro with fully loaded CPU all the time. The Vans are too noisy :-}

Also, if I want to create a report with a log and have to wait more than 10 minutes for the output, I surely have forgotten what I wanted to report… :slight_smile:

Should I delete “useless” log entries before I call the script to speed it up? Then I maybe delete important lines too… hmmm…

user$ time /Users/user/Downloads/redact_st_logs.sh ~/Library/Application\ Support/Syncthing/config.xml ~/Library/Logs/Syncthing.log
   Starting cleanup of /Users/user/Library/Logs/Syncthing.log...
   Loading configuration data from /Users/user/Library/Application Support/Syncthing/config.xml...   Done
   Redacting device info...
^C

real    156m0.488s
user    155m11.674s
sys    0m18.510s

#15

Yeah @sisa, you’re right, 10 mins is really unreasonable time to wait. That being said, I don’t see a simple way to optimize it in a significant way without rewriting everything in a compiled lang., or rethinking the approach in some other way.

I miscalculated before, assuming 2-3 MB of log would take few seconds - it is in fact going to take quite a while.

So the practical thing to do would be to just use only a relevant portion of a log instead: a section of no more than few thousand lines around the time when the issue occurred for you (that’s what I usually do) - copy it to a separate file, and then clean that file.

Sorry for not coming up with something simpler…


#16

No problem at all :slight_smile:

It just would have been “good to know” in advance, that this script will take an hour or so for a small logfile.

And/Or somekind of “progress bar” for the user to see the script is actually working and not stuck. (I once wrote a script with a “spinning wheel”, shamelessly stole it from the guys at stackoverflow.com)

In the meantime, I tested with a much smaller logfile and it worked great!

I came across:

  ! Warning ! : "Puller" messages detected: the log may contain unredacted filenames/paths
                 Notice: Deleting those messages will remove potentially useful debugging information
                 Remove "Puller" log messages? [Y/n] y
                 "Puller" log messages removed

Would you mind to add an option „save both“, so that the user can compare both versions and may decide which version to upload to the forum, or do some further changes.

Anyways, I’m really thankful that now I can mask logfiles, if I want to. :slight_smile:

Thanks for developing

btw: maybe this (the LC parts) helps to speed up the process a little bit: https://stackoverflow.com/questions/13913014/grepping-a-huge-file-80gb-any-way-to-speed-it-up


#17

Hi, a smallish issue:

after saving 40 lines of my log to a textfile “log.txt”, the script could not open it.

/Users/user/Downloads/redact_st_logs.sh /Users/user/Library/Application\ Support/Syncthing/config.xml /Users/user/Downloads/log.txt
! Error ! : Wrong input parameters!
Usage:   /Users/user/Downloads/redact_st_logs.sh  <syncthing_config_file.xml> <syncthing_log_file.log>

After renaming the file from log.txt to log.log, it worked, but it took me a while to understand and correct the problem.

Wouldn’t it be easier to allow any extension, as the user input?


#18

Hi again,

after checking the redacted log,

I see entries which not have been sanitized.

TTJ76JR-TIEMLJB-4JJFTY2-TNEYLJW-JDRATFQ-TADJ7FE-GHJW2TK-T7J2RAJ

and

###TESTSERVER###

https://pastebin.com/LxT8gDF8


#19

Hi, thanks for the feedback! I’ll check the probable cause (probably not before tomorrow though).

And regarding the filetypes: yes, the file extensions are hardcoded, exactly as shown in “usage” hint, to avoid mistakes like mixing config and the log. Yea, it’s a bit dumb and ugly… I’ll probably add txt as allowed ext. for log files for now


#20

Hey @sisa, first, here’s a version that allows both .txt and .log extensions:

v2.4 - redact_st_logs.sh (4.6 KB)

Now, regarding the un-sanitized device_ID and device_name you mentioned:

Is this device’s info is present in the config.xml that you’ve passed to the script?

Or more precisely, is it possible that the log and the config are from different devices? Or maybe they both are from the same device, but from different time periods, e.g. the “un-sanitized” device in is no longer linked to the reporting device, and hence not in its config anymore?