Connection dropped due to "decompressing message: lz4: invalid source or destination buffer too short" and "unmarshalling message: string field contains invalid UTF-8"

Today, I’ve been getting these particular errors on a Windows and an Android device, which basically breaks synchronisation completely.

What happens is as soon as the two devices connect, the connection gets dropped immediately, and the log gets filled with the following on Android:

Connection to 3KHRD6O at 192.168.50.2:22000-192.168.50.253:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-61VQ1QRVBC86IF90LCDSIAQSCQ closed: decompressing message: lz4: invalid source or destination buffer too short

At the same time, the Windows side that is trying to send the file in question has these errors logged:

Connection to VZZQ7PU at 192.168.50.253:22000-192.168.50.2:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-61VQ0HKLSP60Q2JNAEGEKKQJE6 closed: unmarshalling message: string field contains invalid UTF-8

Syncthing is v2.0.0-rc14 (arm64 on Android, and amd64 on Windows).

Have you got any idea what may the culprit be here? I’ve never seen these kind of errors before :frowning:.

In fact I have seen these, though unfortunately I have as little clue to the cause as you do.

I got the “invalid utf-8” connection failures between my main Linux box and a closet Raspberry Pi and spent quite some time debugging it. Initially I expected some file naming issue but it turned out it was two bytes of the folder ID in the cluster config message that were overwritten with binary garbage – i.e., memory corruption of some kind. (That folder ID comes from the config, not the database, fwiw.) I didn’t manage to narrow down the cause, though it was reproducible for a while accross restarts. It seemed a little too reproducible to be plain hardware error.

Looking back in the logs on that rpi5 I do see some lz4 failures as well.

Linux-arm64 could be the common factor between our setups.

In fact this happened right now, again, on that same rpi5 after a restart with some new files to sync. So it definitely has something bad going on…

It’s something at the protocol/compression layer, not sure what. I tested builds with both the C and Go versions of SQLite with no difference, and with Go 1.24 and 1.23. Adding some debug messages, we have the sending side making a request (prior to compression):

[3C3QQ] 2025/05/16 07:48:05 INFO: Sending message: type:MESSAGE_TYPE_REQUEST id:1 folder:"ml3yu-v9kaa" name:"Mortal Kombat Annihilation (1997)/Mortal Kombat Annihilation (1997) Bluray-1080p.mkv" offset:1472200704 size:4194304 hash:"\x18\x9d݁\x1a\x17\\\xd0.ч\x049b\x98\xee>kP\xbf\xc9\xd0x[\xe2\x1a\xc3\x02\xc1\x85\xed\xfb" block_no:351
[3C3QQ] 2025/05/16 07:48:05 INFO: Marshalled:
00000000  08 01 12 0b 6d 6c 33 79  75 2d 76 39 6b 61 61 1a  |....ml3yu-v9kaa.|
00000010  54 4d 6f 72 74 61 6c 20  4b 6f 6d 62 61 74 20 41  |TMortal Kombat A|
00000020  6e 6e 69 68 69 6c 61 74  69 6f 6e 20 28 31 39 39  |nnihilation (199|
00000030  37 29 2f 4d 6f 72 74 61  6c 20 4b 6f 6d 62 61 74  |7)/Mortal Kombat|
00000040  20 41 6e 6e 69 68 69 6c  61 74 69 6f 6e 20 28 31  | Annihilation (1|
00000050  39 39 37 29 20 42 6c 75  72 61 79 2d 31 30 38 30  |997) Bluray-1080|
00000060  70 2e 6d 6b 76 20 80 80  80 be 05 28 80 80 80 02  |p.mkv .....(....|
00000070  32 20 18 9d dd 81 1a 17  5c d0 2e d1 87 04 39 62  |2 ......\.....9b|
00000080  98 ee 3e 6b 50 bf c9 d0  78 5b e2 1a c3 02 c1 85  |..>kP...x[......|
00000090  ed fb 48 df 02                                    |..H..|

and the receiving side getting it (post decompression):

[PSEUD] 2025/05/16 06:48:05 INFO: Error in received message type:MESSAGE_TYPE_REQUEST  compression:MESSAGE_COMPRESSION_LZ4:
00000000  08 01 12 0b 6d 6c ff 24  75 2d 76 39 6b 61 61 1a  |....ml.$u-v9kaa.|
00000010  54 4d 6f 72 74 61 6c 20  4b 6f 6d 62 61 74 20 41  |TMortal Kombat A|
00000020  6e 6e 69 68 69 6c 61 74  69 6f 6e 20 28 31 39 39  |nnihilation (199|
00000030  37 29 2f 4d 6f 72 74 61  6c 20 4b 6f 6d 62 61 74  |7)/Mortal Kombat|
00000040  20 41 6e 6e 69 68 69 6c  61 74 69 6f 6e 20 28 31  | Annihilation (1|
00000050  39 39 37 29 20 42 6c 75  72 61 79 2d 31 30 38 30  |997) Bluray-1080|
00000060  70 2e 6d 6b 76 20 80 80  80 be 05 28 80 80 80 02  |p.mkv .....(....|
00000070  32 20 18 9d dd 81 1a 17  5c d0 2e d1 87 04 39 62  |2 ......\.....9b|
00000080  98 ee 3e 6b 50 bf c9 d0  78 5b e2 1a c3 02 c1 85  |..>kP...x[......|
00000090  ed fb 48 df 02                                    |..H..|
[PSEUD] 2025/05/16 06:48:05 INFO: Lost primary connection to 3C3QQU4 at 172.16.32.4:22000-172.16.33.198:22000/tcp-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-61VTSDU8LUB90K3L4OIT4HE27G: unmarshalling message: string field contains invalid UTF-8 (0 remain)

Note how bytes 7 & 8 in the message, which are in the middle of the folder ID (ml3yu-v9kaa), change from 33 79 (3y) to ff 24

2 Likes

Disabling compression on both sides makes the problem go away…

1 Like

Thanks a lot for investigating this quickly and all the information!

Maybe yet another issue similar to https://github.com/syncthing/syncthing/issues/8484? Although there it was related to the 32-bit i386 architecture.

Just for the record, I do use “All Data” compression for the connection between these two devices specifically.

Possibly, though this one doesn’t reproduce trivially for me.

Maybe with a specific type of file? Unfortunately, I cannot share the one causing the issue for me, as it’s a large zip archive with a lot of private information.

Yeah, no, I mean I have the issue on my own data above, it’s just that when writing a test on that specifically in the lz4 package I don’t see an issue. So I’m still not sure what causes it, specifically.

1 Like

I had this happen on v14…

[RTF25] 2025/05/07 17:30:02 INFO: Lost primary connection to 3QBOJOM at 192.168.16.118:22000-188.xxx.xxx.104:22000/tcp-server/TLS1.3-TLS_AES_128_GCM_SHA256/WAN-P30-61T9EID581F98AB426R5JKSN3I: decompressing message: lz4: invalid source or destination buffer too short (0 remain)

this happened across most of the remote devices causing loads of disconnects. It got so bad that I have been back on v1 since the 7th. I figured as i’m an extreme user it might have been down to resources so been waiting to see if anyone else reported it

My set up is windows to windows and compression set to all data on both ends

edit: only the receiving end is on v2, the sending ends are all on v1

When you say Windows, you mean Windows on x86-64 and not a Windows-on-ARM thing, right? Would rule out an ARM-specific bug if this happens on x86.

1 Like

pure intel core i 64bit windows, all regular desktops

1 Like

You got the decompression error on the v2-beta sida, and downgrading that side resolved it?

1 Like

yes

Also, the issue only started on v14

2 Likes

That’s my impression as well, yet there are no real changes to the protocol code in v2 (rc.14 or any other). This very much feels like a problem caused by corruption on the message sending side, or something very low level in the TLS stack which is also worrying.

1 Like

FWIW it doesn’t easily reproduce with a few hundred gigs of synthetic data on my boxes :confused:

1 Like

I will migrate back up to v2-rc14 to do further testing. might be a while !!

1 Like

I’m testing under Windows now, with both “devices” running Syncthing v2.0.0-rc.14 (amd64), using the exact same file that caused problems on Android, and it also fails here. One difference between this and what happened on Android is that the Windows devices don’t disconnect, only the file transfer stops.

The file is stuck in transfer on the receiving side as below:

and the log complains about hash mismatch:

2025-05-16 18:42:30 request: jptt5-eglh2 uj~20250515-123351.zip 30670848 262144 hash mismatch
2025-05-16 18:42:30 request: jptt5-eglh2 uj~20250515-123351.zip 49283072 262144 hash mismatch
2025-05-16 18:42:31 receiveonly/jptt5-eglh2@0xc00062a008 closing uj~20250515-123351.zip
2025-05-16 18:42:31 progress emitter: bytes completed for jptt5-eglh2: 26993247
2025-05-16 18:42:31 sharedPullerState jptt5-eglh2 uj~20250515-123351.zip pullNeeded done -> 85
2025-05-16 18:42:31 sharedPullerState jptt5-eglh2 uj~20250515-123351.zip pullNeeded done -> 84
2025-05-16 18:42:31 receiveonly/jptt5-eglh2@0xc00062a008 new error for uj~20250515-123351.zip: finishing: pull: hash mismatch b8f7b9936bb378961d3f7c2e638d2c20331d38a81482e9a87792120150cc5c05 != 6b880559688b8c957791cdc739e9cf22c751ebab9195ed7d0aac0e1a2bd16a21

This happens with “Metadata Only” compression on both sides. I can provide debug logs, etc. if necessary.

Edit:

Something very strange seems to be going on. I removed the file, re-added it, and tried to sync again. That time, there were no issues. Then, I repeated the same process one more time, and now it fails with the connection between the two devices getting dropped immediately with a flood of these errors:

2025-05-16 19:01:16 Lost primary connection to DHK2GDE at [::]:50440-[fe80::de3d:ad5d:fa83:f4bc%Wi-Fi 2]:51355/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-620228PAV43T0MC93L2933SNII: unmarshalling header: proto: cannot parse invalid wire-format data 0803000000550813120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080803b28808010322008a05aa75761bc4503a1cd638aab851c9f9932c54e4811e723eb97ecd9a78e3548d80300020803000000550814120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080903b2880801032202db37b3b6f758fa17399e37b23c4809072a53a895976c889ff41949932bed86e48d90300020803000000550815120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080a03b288080103220269ce4d3f8e4d27293a1b76f42da9d3af4a11caa6e5190375e7cc716a23e61b548da0300020803000000550816120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080b03b28808010322014fc5e8f4011973212f207a5e689db841745e430368233f0e81d06bb2ae54b8148db0300020803000000550817120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080c03b28808010322061d04db1b8246a465361582db2d624e857b4070b5a15249c4537c7b1878a72fa48dc0300020803000000550818120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080d03b2880801032201951e12d7b261840e3ecf7d675ee14b0300a018216b2add3a3f7338b279a44d348dd0300020803000000550819120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080e03b28808010322020099494ecb9b7ffef6005b7320a76689c0fb60440c1a307c6da9729da67a94c48de030002080300000055081a120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080f03b2880801032201dc9995bc641a674dbf3b1fdec9811d3c91203c929526f4e5255b494bcb947e648df030002080300000055081b120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080903c288080103220295512a9d9abfb14c6a360b8321b8d4d492499d1885e5ae620a69cdf88cfb23848e1030002080300000055081c120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080a03c288080103220752788a3616dc38534f03769a19bfe6c58260ebf07108cbca856a33b5c60fb8e48e2030002080300000055081d120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080b03c2880801032208ae458284b644144811444af00e87af07ec21762b3c38927c3b8aecf60b0d64048e3030002080300000055081e120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080c03c288080103220dd8e1beea5b71511ed568108c27bff26b2649d0485792f6720189740c9be19d648e4030002080300000055081f120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080803c288080103220b51bab75c55722d832eb18d91d3337b1cbb8127e0ab56d44e1054df1c21e580248e00300020803000000550820120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080d03c288080103220ada6cb55747d9ee076d5c96f943e31b57214365b0bce3f1fa4af6b18af6dfc0948e50300020803000000550821120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080e03c288080103220d111da9fae813c8142984bc0d2609780190041f719116ad3daa00524917aee7848e60300020803000000550822120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080f03c288080103220cf75b758ae4aa315b14f8fadcc82bbe02c191a3f1f6fe74f7f0bc4f8bf47b95948e70300020803000000550823120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080803d288080103220eb7b3651a02f9dbbb5ecf003e3cc101e377cbc1e444d29ca12c68dc6dfeb228e48e80300020803000000550824120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080903d288080103220105f972b44d48b95f846fa8127134e4183742097c551d65c01e4203161cab33648e90300020803000000550825120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080a03d28808010322034d7bb2c5f0072473632edd7860530cf877ffb1c449739c79c177985b2a0aef948ea0300020803000000550826120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080b03d2880801032206c362217485bd8f92db5fe732b23dc36aadefe7516a36cf46f40ae989363a15a48eb0300020803000000550827120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080c03d288080103220a882fce3b88050d5945f3cdcd4d974ac94e2d3298e017fb01832275c25cae38548ec0300020803000000550828120b6a707474352d65676c68321a16756a7e32303235303531352d3132333335312e7a6970208080d03d2880801032207a0fa0628f15f4c84a71d42b01c8a33894d4876e3506d48ca2ff501a71fbc80148ed030002080300000055 (0 remain)
1 Like

So essentially something is causing on-the-wire corruption of messages. If it’s file data, it gets caught by hashing, if it corrupts strings in protobuf messages it gets caught by that, etc.

Since you can provoke this reasonably reliably, can you check that this really doesn’t happen with current main (i.e., v1)? (I know proving a negative is impossible, but just some checks to see if it seems likely…)

1 Like

Yeah, with the exact same config, I replaced the binaries with v1.29.7-dev.6.g1dd26489 and tried to sync the file about 10 times (by repeating the same copy-sync-delete steps). No problems at all.

Next, I shut Syncthing v1 down, replaced the binaries with v2 again, and the file got stuck in transfer right on the very first Syncthing launch :confused:.

2 Likes