[Poll] The sending of crash data

calmh · May 23, 2019, 1:34pm

We will be including crash reporting soon. The way it will work is that when a crash (“panic” usually, in Go parlance) is detected the Syncthing version and backtrace is reported to some server, where it’s stored and aggregated. The data that will be sent looks like this:

07:48:24 INFO: syncthing v1.1.4 "Erbium Earthworm" (go1.12.5 darwin-amd64) teamcity@build.sycnthing 2019-05-21 20:36:38 UTC
Panic at 2019-05-22T07:48:25+02:00
panic: interface conversion: *pfilter.FilteredConn is not net.Conn: missing method Read

goroutine 106 [running]:
github.com/syncthing/syncthing/lib/connections.(*quicListener).Serve(0xc000158000)
        github.com/syncthing/syncthing/lib/connections/quic_listen.go:74 +0x41b
github.com/thejerf/suture.(*Supervisor).runService.func1(0xc0001c6690, 0xc000000000, 0x54b4728, 0xc000158000)
        github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:600 +0x47
created by github.com/thejerf/suture.(*Supervisor).runService
        github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:588 +0x5b
... more of same gibberish

That is, it does not include any log data, user or other metadata, nor the ID of the sending machine. Excluding that data makes it somewhat less valuable for us, but I think we’ll be OK.

All in all this means sending us less data than global discovery does, which is a feature that is enabled by default. Given that, how do you reason around having this enabled by default or not? Not having it enabled by default means we will lose lots of panic reports, so will be less likely to fix those bugs.

Enabled by default is fine; the very privacy conscious can disable it like they must global discovery
No, this must be explicitly opt-in, because …

0 voters

Feel free to explain your thinking below.

uok · May 24, 2019, 9:01am

I think it is best to ask user (like usage reporting) during next update

calmh · May 24, 2019, 10:51am

Why?

JKing · May 25, 2019, 11:20am

I have no objection to its being enabled by default. It’s the sort of thing which can lead to misunderstand with people who are, shall we say, abundantly cautious, so it and its rationale should probably be clearly documented, though.

AudriusButkevicius · May 25, 2019, 11:44am

I think we will have a one off notification like we had with filesystem watching that has to be dismissed.

canton7 · May 25, 2019, 7:34pm

I would ask the user after the first crash, just before sending the first report. At that point they might be a bit annoyed, and offering to help diagnose the issue will probably go down well.

It also means you can preview exactly what will be sent.

Even entities like Mozilla and Microsoft will ask before sending crash data, I believe.

calmh · May 26, 2019, 7:03am

Firefox sends everything by default and opens a tab on first start (in the background, so you have to notice it yourself) explaining their privacy focus;

Chrome on the other hand actually asks on first start, but abstracts it all away to a single checkbox, which covers the full spectrum from crash reporting to keylogging.

Edge Canary didn’t say anything and just sends it by default. But it’s the Canary build, real thing might act different - we also enable this stuff in our candidate builds.

But! That doesn’t mean we can’t do better of course. I think we already do better in that the stuff we do send (after asking) is much less invasive than what browsers send, both by the nature of the program and by decision. The crash reports are specifically designed to include zero personal data.

I would like to not have to ask about it, because it makes it more likely we’ll actually get the report, plus situations where it’s not possible to ask the user (headless, or crash on startup). But if we do ask, then we could add in a lot more data on the other hand… At minimum I covet those log entries we have in the on disk crash report…

bege · May 26, 2019, 4:06pm

Privacy is a right of every person. That others don’t respect it does not mean that it is okay. Please, show a pop-up and let users decide. Thank you very much.

calmh · May 26, 2019, 5:20pm

There is literally no privacy concern here that is not completely overshadowed by normal usage of the app.

wscott · May 28, 2019, 11:08am

This actually includes less privacy concerning data than the discovery connections that are needed for correct operation. And requiring the user to opt-in to discovery would make this is much less useful product. Making the user explicitly OK error reporting would be deceptive.

I would vote enabling by default, but have a clear statement in the documentation. You already have something like that: https://docs.syncthing.net/users/security.html#information-leakage

It might be good to have a section describing how to configure syncthing to share data because machines on the same local network (or with hardcoded addresses) where you wouldn’t expect anything to leak to the outside world. I assume that is possible, but I didn’t see a definitive list.

calmh · May 28, 2019, 11:19am

Thanks for the thoughts on the documentation; we should do that. I think we will default to on for the reasons described, and in the migration (existing users) do something smarter. We should show the popup Audrius mentions, and default to following for example global discovery and anonymous usage reporting. If either of those are enabled, crash reporting will probably also be fine. If global discovery is disabled we can probably assume the user won’t like to be opted in to crash reports for now.

ellnic · May 28, 2019, 11:38am

Enabled by default with the user informed upon first run. I don’t see any issue with non-sensitive data and if the user is informed at the start (or upgrade) there’s no ‘hidden agenda’.

4world · May 30, 2019, 7:18am

Google, Facebook, Amazon and numerous other companies track user data and present ads as an example. I am not saying you should follow them but there is no point going overboard on “privacy” either.

The developers are doing a great public service by providing this software for free so in return, it ought to be completely acceptable to get some data to make the product better (I don’t even think any permission is necessary for the kind of data that was shown by Jakob above; it should be silent collected. Just put a statement to this effect in the “Agreement” during install like other companies do.).

These companies make profit on public data, and that is totally unforgivable.

“Privacy” like any other socially relevant word, needs a lot of analytical discussion to understand how much of it is justifiable and where to draw the line. For example, one can argue that collection of anonymous data for public benefit is the collective right of society (and there are numerous laws that use this in several countries.)

bege · May 30, 2019, 6:07pm

Yes, and privacy is looked upon very differently in different countries all around the world. That’s why it should be left to every user to decide. I understand your intentions and I am sure that you will get enough information if you respect the decision of every user.

AudriusButkevicius · May 30, 2019, 7:06pm

It’s already somewhat agreed that if global discovery is disabled, we’d disable crash reports, past that point (of leaking IPs which we will not collect anyways), I don’t see anything privacy undermining.

Let’s cut all the “privacy is important in the modern society” stuff, and focus on the proposed implementation at hand.

If you have objections to that, please state how you believe your privacy is undermined (pointing at parts of the log that was provided or scenarios or whatever), so we can understand something we don’t currently understand.

If you just have a general feeling towards privacy, this is not the thread to argue about it.

AudriusButkevicius · May 30, 2019, 7:14pm

On another hand global discovery is enabled out of the box, so perhaps this should be enabled too, as there is no personally identifiable information in the payload apart from the IP address.

Aranjedeath · May 30, 2019, 10:18pm

I would default it on, without warning, because of how little information you’re actually collecting. If the amount of information reported changes, please reauthorize the sending of the data (via the must-be-dismissed-to-work dialog). I would put a checkbox in the webui with a nice, short explanation of what’s being reported and an example which can be shown for interested users.

I also agree that it’s a sane thing to disable reporting for clients who already have global discovery off. If nothing else, so syncthing does not violate the principle of least surprise with those users.

4world · May 30, 2019, 10:40pm

1. PURPOSE OF PRIVACY: Public outcry over privacy has generally been due to:

Govt tracking – to discriminate based on religion, race, etc., to take political advantage from citizen data, etc.

Private company tracking – to bombard people with ads to increase revenue and sales, to gain advantage over competitors, increase prices, etc.

In all cases, there was some damaging effect felt by the citizens so privacy was justified.

Here there is absolutely NO such undesirable effect. Just wanting privacy without a purpose has no meaning.

2. INDIVIDUAL vs COLLECTIVE: On one side is a collective benefit of making the software more robust for all. And to support the developers who are creating a viable competition against profit-hungry corporations that want money for such software.

On the other side is individual “privacy” (I don’t even know what is private in such data) demanded rhetorically by some. Which side should we take?

And IMO, it is not the users who should have a greater say in decision-making but the developers, particularly those who are putting in substantial time for all of us (although it’s awfully nice of them to ask us).

Thank you for this great software. (Sorry for the long note; I won’t say anything further on this.)

grin · June 18, 2019, 7:57am

Very simply phrased: I want to know about any data going out anywhere for any purpose, and I want to allow it, and unless I do I would like no data getting out anywhere for any purpose.

That’s the theory of control over my activities.

And, indeed, I would enable this reporting, after reading why it is sent, where it is sent and what does it contain. (For example including specific path elements could cause a privacy problem, as well as any data about files, including hashes, and lots of other possible problems I’ve seen with similar features.)

It’s not that I don’t trust every developer on Planet Earth, but… I don’t. This is rather a policy than a personal mistrust.

I see no problem to connect it to global discovery as long as it is obvious that it means “global discovery and crash reports with the following content”.

calmh · June 18, 2019, 4:45pm

I know I’ll get bashed and/or quoted out of context for this, again, but: this is not how Syncthing operates, and if this is a requirement Syncthing is probably not for you.

At the simplest level, it would be impossible to have two devices connect to each other with the above policy without requiring the user to enter the device ID, IP number, and port of the other device (and update it whenever that changes). This is not something we want to subject our users to.

You can run Syncthing in “stealth mode” but it requires some tinkering to actually work that way, for the above reason among others. Hence it will not be the default.