Mutual introducers race condition?

nekr0z · October 1, 2019, 8:10pm

I think I’ve run into some kind of unexpected behaviour today. Or maybe it’s some kind of unsupported configuration. Anyway, I thought I’d better bring this to the forum just in case.

Devices A and B are introducers for each other (I know this sounds weird, but it makes good sence in my case), and B is also introducer for device C. A folder that’s shared between these devices also needed to be shared with device D (not initially known to either of the three). The discovering and sharing was done by device B, so device D got introduced to A and C - as planned.

Later the need emerged for D to be removed from the swarm. Since it was introduced by B, the GUI on A and C wouldn’t allow it to be removed (because it would then be reintroduced). Makes sence.

While A was powered down for maintenance, I removed D from device B. Device C promptly removed D, too, because no other folders were shared between C and D. So far, so good, and everything works like clockwork. But then device A came online…

Both A and B went offline for the rest of the swarm. For the next hour, one of them would occasionally pop online only to disappear in the next 10 seconds. Then I got curious and started investigation.

I looked like as soon as A would connect to B, it would try to drop D that was no longer introduced, but at the same moment B would pick up D as introduction from A. Then B would seem to “remember” (this is the moment I didn’t quite understand, mostly because I didn’t have online access to B’s logs) that D was just dropped and re-drop D, but by that time the fact that list of devices changed on both A and B they would both drop all connections and start re-scanning the folder. By the time they’re done they’re back to square one: A knows D is introduced by B for this folder, and B knows D is dropped. Then they connect to each other and the dance repeats, ad infinitum.

I ended up walking to B and unchecking A (so that it would no longer be an introducer), waiting a minute for everything to settle down and checking A back into introducer again. Issue was solved.

But it got me thinking: why would B re-drop D that got “re-introduced” to it by A? Why this endless loop? And what would happen in case of a longer introducer ring chain (i.e. J is introducer for K, K for L, L for M, M for J)… Can’t wait till weekend to have time to fire up some VMs and test!

AudriusButkevicius · October 1, 2019, 8:48pm

Sadly this is sort of working as intended. Each device checks introductions every connection cycle, so it’s perfectly possible that in a single cycle:

Device B adds device C as A advertises it
Device A drops device C as B no longer advertises it (as advertising would only start happening on the next connection attempt)

Next connection cycle you just swap the letters A and B.

There is no easy way to solve it other than sending the whole introduction chain around, to see that C is being introduced by B, and B got it from A, so given I am A and I no longer have it, I should not add it back in just because B says he has it.

vincentardern · October 2, 2019, 2:34am

Is it considered good practice to only have one introducer in most networks or am I making too much of an assumption about other people’s networks here?

nekr0z · October 2, 2019, 4:03am

I see, yes, this could very well be the flip-flop situation, and I might have mistaken two cycles for one.

nekr0z · October 2, 2019, 4:27am

I think you are.

I’m not explicitly saying the “A to B, B to A” configuration I was dealing with is a sane thing for general public (it actually took me a while to understand what was going on because this particular swarm has been growing organically for quite some time, and no one had noticed this clash between A and B). So this exact situation may be looked at as misconfiguration after all.

Only one introducer for a given swarm is generally a good idea, and one definitely should try to stick to it. However, there is a known notion that for big swarms with a lot of machines it makes sence to go from “everyone connects to everyone” to a tree-like network structure (especially if some machines are weaker ones, as cpu and memory overhead becomes significant). As @AudriusButkevicius mentions above, it’s a good idea then to make sure introducer network is a tree-like one, too (hmmm, we should definitely document this somewhere, shouldn’t we?). But…

But as adoption grows (and as swarms grow) beyond control of one admin or group of admins, all kinds of introducer loops may be inadvertently created.

I have yet to find a way to solve the introducer loop dilemma or at least alert user of introducer loops (which would be useful but may not be possible), but I think it’s quite clear that at least some documentation on the topic should be nice. I’ll make that happen as soon as I have some spare time, and for now there’s this forum thread for those googling for the solution.

vincentardern · October 2, 2019, 11:16pm

I agree that a warning, or at least some documentation would be nice. I’m asking questions to try and help sort out what the best practice might be so someone has somewhere to start with documentation writing

system · November 1, 2019, 11:16pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.