Post-processing of completed files


#1

Hi all,

This topic has come up a number of times. One common use case of Syncthing is to act as a component in a larger automation system that performs a one-way (or mainly one-way) sync of files from a remote host to a local machine.

As a part of this automation it might be desirable to trigger further processing of these files upon completion.

As this topic has come up in the past, one possible solution is simply to poll the file system for completed files. In a low-power system, however, we may want to avoid unnecessarily spinning up the disk simply to check whether there are any available incoming files. The response at this point is typically to take advantage of the long-polling events API.

As I set out to do exactly this over the past weekend, I started to realize that this is a bit easier said than done - primarily with regard to the fact that you may have an unreliable consumer of the API. To build a truly reliable system, your API client must contend with a number of issues. To name a few:

For one, you need to figure out how to checkpoint - as replaying the stream of events from the beginning of the trim horizon (ie, the oldest event in history) may not be very practical. Secondly, we must also contend with the fact that the majority of the events are stateful in the sense that they are dependent upon a particular state of the filesystem at particular moment in time. For example, the FolderComplete event may only be valid at and shortly after the moment in which the event is emitted. Additionally, there are locking/synchronization issues where an ItemFinished event may occur, but be invalidated by additional changes from the remote peer before the post-processing script has completed processing the file.

As it stands, it appears to me that the events API is most suited to be consumed in real time alongside a mechanism for dealing with a full state refresh and a healthy amount of error checking. While all of this have well known, stable solutions, we are inching into a fair amount of complexity to accomplish even fairly simple automations.

I am writing to inquire the appetite to introduce direct support for post processing synchronized files within Syncthing. Many issues above can be addressed in a far simpler manner from within the Syncthing process as it likely already has an accurate representation of file system state. If such an appetite does exist for introducing first-party support of post-processing scripts, I would be more than happy to introduce a PR for review. Otherwise, I am welcome to discussing alternative methods to supporting these use-cases - perhaps more APIs or a reference client that implements post-processing behavior?


(Audrius Butkevicius) #2

I am pretty sure there is no appetite for post processing support.

Events API is the way to go and you should explain the issues you are having with it, as it’s not obvious. I don’t think you really need checkpointing. You do a full sweep on startup and then just follow the events…


(Jakob Borg) #3

You might be able to jury rig the external versioner for this - it’s a script that gets called on each file sync operation, after all. The downside is that it’s called at the wrong time - before the file is replaced, rather than after. However, if your notification system is slow enough or you can rig it to understand a notification that essentially says “this file will change in about a millisecond”, then there you are.

For what it’s worth, since this mechanism is already there, I could imagine there being a second part to the external versioner. That is, I don’t see it making things much uglier if you could configure both a pre-replace and a post-replace script, instead of just the current pre-replace one.


#4

Thanks for the replies.

I’ll take a look at the versioning script and see if I can adapt it to my needs.

I don’t really have any issue with the events API. It is a well defined pattern (very similar to consuming a Kafka/Kinesis queue). All challenges have stable and mature solutions. The only inconvenience is that it is a fair amount of complexity to do something as simple as move a couple of files around, and/or create a couple of hard links. I believe there are many use-cases for Syncthing that can be achieved by introducing a simple, in-process, synchronous mechanism that post-processes files upon completion. Such a mechanism may also make Syncthing more accessible to those who are not able to write a full-blown app/service to consume the events API (sysadmins, for example, who may be more comfortable with scripting in bash, as opposed to developing an app).

For example, the issue in this topic can be very simply addressed by a post-processing script that moves a file out of the sync folder after it completes (therefore deleting it from the server).

Nonetheless, I’ll resume upon my original intent to build something out of the events API. Thanks again!