audit log parser

grin · February 5, 2017, 4:05pm

If anyone’s interested, here’s a simple parser for the audit log, which parses and/or follows the file, only showing relevant info about file add/modify/delete. You can pour stuff into its STDIN so concat your logs as you please and pipe onto it (set $forever_mode=0 if you do, or use ctrl-c).

Yes it would be extremely useful if this was in the GUI, but what’s there is not really, um, useful.

Maybe I’ll try to rewrite it in go oneday, to learn that language.

nrm21 · February 9, 2017, 10:15pm

Yes it would be extremely useful if this was in the GUI, but what’s there is not really, um, useful.

Care to elaborate on why what’s in the GUI isn’t useful? I haven’t run the script but I’m not really sure what’s missing from the GUI global change log.

Is it that it doesn’t keep it forever?

grin · February 13, 2017, 11:58am

Gladly.

It doesn’t keep it forever. Well, to elaborate a bit more: it doesn’t even keep it throughout restarts, or possibly not even through web GUI restarts. Which mean that it gets lost every few hours/days here, due to the problem that the web gui sometimes get stuck and have to be reloaded.

The other thing is that the info is not detailed enough and often completely useless. Let’s see my current example: Source: "unknown" File name, size, etc: empty Date: 2017-02-13 11:22:34 and I have two dozen such lines.

While my code spits out (for the same transactions) the (microsec precise) stamp, and what happened (add/modify/del), which share, which node originated (both descriptive and hashed name, including my own node), and which dir or file it was. From all the audit trail length. And since it follows the file every change immediately visible.

Obviously I support this to be in the GUI and if you’d like to have it specified, I’d gladly would, in fact I just did above.

canton7 · February 13, 2017, 12:15pm

For the empty entries, there was a bug a while back which caused them. That’s been fixed for a couple of releases. Make sure you’re running an up to date version.

It’s true that refreshing the browser will clear the list, up until the next event where it will be populated again. That’s an easy fix: I was planning to do it, but I might actually get around to it…

Edit:

grin · February 13, 2017, 12:23pm

That was 0.14.21; I’ll see what an upgrade will change.

making it persistent is the first step towards World Peace™ indeed. Next one could be to dedicate a separate page for it which automagically refreshes as the list changes, and a simple string filter would be really nice as well.

(As a sidenote: A related problem is that I believe I have heard that maybe the node in the audit trail is not always the node which have originated the change, but possibly the node which relayed it, which is an extremely unfortunate case if it’s true, and should be indeed fixed; technically both originator and relay would be interesting info to know, especially when tracking ghost deletes or updates, or generally doing the happy task of blaming someone for an unwanted change [in case of a share between a large group of people].)

canton7 · February 13, 2017, 12:30pm

The list should update itself while you’re looking at it already.

nrm21 · February 13, 2017, 12:34pm

It already does this.

grin · February 13, 2017, 12:35pm

Refresh: Yes, I see now for the new version now. Thanks

Is there any hidden reason why is it in an overlay window? I would possibly follow it independently since it’s the only real indicator of change history. And would be neat to include the share name and the hashed name of the node.

nrm21 · February 13, 2017, 12:44pm

That is my understanding as well (though @AudriusButkevicius can probably confirm). It’s actually why the GUI log feature was added.

The code seen is only a small part of the feature added (and also was kinda seen as a first step). Much of it went into code on the backend finding out who made the change and having them announce themselves in an API entry. Then the GUI just fetches that from the API and displays it. Before the feature it was very difficult telling who was the original modifier of a file since it was just done by someone, not formally recorded, and passed from computer to computer. So I believe the audit file doesn’t guarantee that who you received a mod from was actually the creator (though if all your nodes are always online I guess it is highly likely, just not a guarantee).

calmh · February 13, 2017, 12:56pm

It is the “originator”, but what that means is a bit fluid. Device A creates a file, device B modifies it, device C comes online with an old version that conflicts with the changes and resolves the conflict; the resolution is synced back to A, B and you (D). Who’s the “originator”? C, in this case, because they changed it last while resolving the conflict. You may have wanted that to be “A” (who created it) or “B” (who altered the contents). Syncthing has one opinion, doesn’t know yours.

grin · February 13, 2017, 5:27pm

I started to reply, then went and read the protocol, then removed my reply and started rewriting.

So, if I understand correctly the Index contains every files, and every file contains a Version index which contains in turn all the versions which all of the nodes see of a given file, but these versions are independently sequenced, so there is no way to tell which was before any other one.

(If there’s a new connection does it result a new Index to be received? Does it simply override the previous one in case of different values? I mean cases like having two clusters with no connection to one another, and I connect them together.)

I see no sign in the protocol to try to syncronise time and no requirement of ntp sync, so I guess nodes may use any wild time settings they like. (It’s weird that not even ClusterConfig tries to get info about that.) The only time I see is modified_s (and there is a possibly typo in the document about it, it’s mistyped as “modified_ns”, I have sent a PR) but that seems to be whatever local time the node believes in.

Is there any assurement that all nodes see the same Version list? How do they decide which is last? When I Request block1 of the file which do I expect to get from the Versions list? (Or maybe the Version Value is a globally enforced sequence? If not, how to tell which happened last? If yes, then we have a global enforced order?)

So, basically, audit log contains the last action on the file as one of any of those modifications which were not yet received by the given node for the given file (so it’s quite possible that there were plenty of actions on the file which are lost because they were not received either but only [random] one of those can be received), but on the record it is the real originator of the last change (where “last” means the one we received labeled as “last”, independent of the real sequence of time of the actions), and not any of the relays.

And if I understand correctly there is no sign of conflicts and conflict resolution throughout the protocol, apart from the filenames created by failed resolutions, so there’s no way to el whether a file was really updated or just a result of a conflict resolution. It is important when a file gets screwed up and you look for the one to blame - it is quite possible then that the one on the audit trail was simply doing automatic conflict resolution.

calmh · February 13, 2017, 5:33pm

The version is a version vector that describes the steps taken to get the file into the current state. It gives us the ability to determine if one version of a file is a direct descendent of another, or if they have been changed in conflict. When there is a conflict, the resolution of it does indeed create a new version - this corresponds to a “merge” in a regular software versioning tree for example.

There is no requirement or attempt to synchronize times, and attempting to do so would be both futile and not really gain us anything.

I’m not really sure what you mean about seeing the same version list and things being lost and so on - perhaps you can exemplify. Every device strives to get the latest version of every given file, and when they have that they are “up to date”.

The audit log describes what the device writing the log does. It does not attempt to describe the history of events that happened on other devices - we don’t know about those.

grin · February 13, 2017, 5:50pm

Thanks for the version vector link. I get it now.

So if I understand right my node connects to others and get the result of their “last synchronised state”, and if I connect multiple nodes with different states I decide whether there is a last one, or they’re conflicting and I have to do a conflict resolution; my audit log only contains the result of my peers’ already resolved last state, or the conflicts I’m seeing, but I have no information of any state changes which happens between my last connected/synced state and the “synchronised” state at my reconnect.

In this way the audit log is fine as it is now, but it would be useful to generate some human-readable FAQ from this.

calmh · February 13, 2017, 5:54pm

Correct. And please do.

nrm21 · February 15, 2017, 5:12am

Also, there was talk a while back about expanding the global change log (GUI feature) to sync between all clients somehow with persistence. Perhaps create a log file that could be synced to all clients in a share. Or a world writable file (at least writable by all nodes) in a share somewhere where those disk events could be coalesced, deduped, and appended to the end, to create a continuous log of all changes.

Only tricky issue would be the act of writing in the file would trigger the file itself to change, thus generating another entry (thus starting an endless recursion event). But this can probably be gotten around easily if one keeps it in mind.

Maybe make sure the modifier of file is the only one to make a change to the file and ensure that that particular file modification is ignored by the disk event buffer (so it doesn’t record itself recording).