How to generate dummy device IDs?

doronbehar · October 11, 2023, 9:18pm

I’m trying to setup a VM testing how does Syncthing handles many devices added via the API. I tried initially to use a certain device hash multiple times, and replace the last character with 1, 2 etc, as much as needed. This did not work, and only a single device with the original ID was added successfully. My suspicion is that this is due to a certain safety/check bit / digit, but I’m not sure.

I also read this paragraph from here:

The hashing results in a 256 bit hash which we encode using base32. Base32 encodes five bits per character so we need 256 / 5 = 51.2 characters to encode the device ID. This becomes 52 characters in practice, but 52 characters of base32 would decode to 260 bits which is not a whole number of bytes. The base32 encoding adds padding to 280 bits (the next multiple of both 5 and 8 bits) so the resulting ID looks something like:
MFZWI3DBONSGYYLTMRWGC43ENRQXGZDMMFZWI3DBONSGYYLTMRWA====
The padding (====) is stripped away, the device ID split into four groups, and check digits are added for each group. For presentation purposes the device ID is grouped with dashes, resulting in the final value:
MFZWI3D-BONSGYC-YLTMRWG-C43ENR5-QXGZDMM-FZWI3DP-BONSGYY-LTMRWAD

I don’t understand the base32 math… I tried to use this online calculator with the output of the command echo bla | sha256sum, and I got a weird result:

734OD6S3BPOCKK8MMDAG7R5C3HGUGDPTG1E1ISH0A020K3LHJV

And I didn’t experience what the docs mentions about the padding, and the non-integer number of bits. My simple mathematical sense also tells me that the number of bits I got for base32 sha256 hash, also makes sense, because:

$ echo bla | sha256sum | cut -d' ' -f1 | wc -c
65

While the number of characters in the result I got from the website is 51. This satisfies the simple calculation of 65*4/5 = 51, where 4 is the number of characters required for base16, and 5 is the number of bits in a base32 encoding.

Can anyone find the mistake in my math?

gadget · October 11, 2023, 9:37pm

The key is actually in the paragraph immediately before the paragraph you quoted:

Understanding Device IDs: To form a device ID the SHA-256 hash of the certificate data in DER form is calculated. This means the hash covers all information under the Certificate: section above.

In other words, the device ID is essentially a fingerprint of the TLS certificate.

Mathematically, in practical terms, the probability of arbitrarily creating a device ID that matches a TLS cert that’s in hand is close to zero.

doronbehar · October 11, 2023, 9:53pm

Thanks for trying to help @gadget. I understand that the certificate is the data source for the device ID. My difficulty is with the base16 to base32 conversion. Whether the certificate is the input for the sha256 function, or if it’s echo bla or echo hello world, shouldn’t matter for the discussion.

I don’t want to create an arbitrary device ID that will match a TLS certificate at hand. I don’t care about the certificates, I just want to get valid, but not a real device ID.

Nummer378 · October 11, 2023, 9:57pm

You’re probably base32 encoding the hex string of the SHA256 output, but you should base32 encode the raw bytes instead. This gives you different lengths and data.

dave22 · October 12, 2023, 2:24pm

Or you could do it the brute-force way:

Install syncthing on a stand-alone test system.
Start syncthing.
Extract it’s newly generated device id.

[ grep “device id=” .config/syncthing/config.xml ]
Kill syncthing and delete it’s config.xml and pem certificate files.

[ pkill -9 syncthing ; rm .config/syncthing/*.{xml,pem} ]
Restart syncthing and loop thru the above extraction procedure for the number of device id’s you need/want.

This is on a linux system. Mod as needed for a M$, or other system type. And you could probably script this if you are looking for a large quantity…

calmh · October 12, 2023, 2:38pm

% python3 -c 'import base64;\
  import random;\
  print(base64.b32encode(bytearray([random.getrandbits(8) for _ in range(32)])).decode().replace("=",""))'
LMB6QNQ6PHBS74RGBKJUHUP4V5LXCXNKK2H3WW7JLIX5Z45P5PDQ

No check digits, because those are annoying and you don’t need them.

Or, if you want a “real” one:

% (syncthing generate --home /tmp/foo | grep ID: | sed 's/.*ID: *//') && rm -rf /tmp/foo
6YBSGRD-NPLTJXE-EMGVRUW-BBUO7M6-AC73E7M-27RIL7D-LR3TPQY-UI6HZQQ

doronbehar · October 12, 2023, 8:45pm

Thanks @calmh for this idea, I’m sure it’ll work! Still just for the fun of it:

In fact I noticed the check bits are missing and I don’t know the algorithm to implement them and I do want them. Do you know if the classic sha256sum command injects them? I ran this code to divide the letters in the python snippet:

#!/usr/bin/env python

from base64 import b32encode
from random import getrandbits
count = 0
for c in b32encode(bytearray([getrandbits(8) for _ in range(32)])).decode().replace("=", ""):
    count+=1
    print(c, end="-" if count % 7 == 0 else "")

And it prints an ID with 4 characters missing:

TS5LWYA-2C53AKU-OWMVLFE-JGRBMB6-2OJ424T-VPCGHVL-6CE6YKN-MAA

calmh · October 13, 2023, 7:10am

The check digits are a Syncthing addition to better handle manually typed device IDs. They are not created by sha256 etc.

doronbehar · October 14, 2023, 11:17pm

Hmm interesting! If that’s so, doesn’t it mean the documentation is a bit false? If all of the characters besides the last 4 are random (as random as sha256 algorithm), then what is the story behind the non-integer number of bits per character and so on?

I also took as an example the sha256 converted to base32 from my first comment on the post, and divided it to 7 characters sections in (1), vs a real ID in (2):

734OD6S-3BPOCKK-8MMDAG7-R5C3HGU-GDPTG1E-1ISH0A0-20K3LHJ-V
MFZWI3D-BONSGYC-YLTMRWG-C43ENR5-QXGZDMM-FZWI3DP-BONSGYY-LTMRWAD

This shows there’s supposed to be6 character saved for check bits? Sounds to much to me…

This topic is not so important to me anymore, as the syncthing generate command you gave works great. I just wish the documentation to be correct :). Thanks again.

Nummer378 · October 14, 2023, 11:58pm

If we take the bytes of a SHA256 output - and not any character representation or base conversions - all of the documentation checks out:

# echo "foo" | openssl dgst -binary -sha256 | base32
WW5Z3AAUUD43DVQ6EHTZNV4NZTPRGUXSHTJSQEXUQUFYPCXESRGA====

Remove the padding as instructed by the documentation and you get exactly 52 base32 characters as foretold by the docs.

Now, compute 4 check digits (each covering 1/4 of the 52 characters) and place them according to the following scheme (see this post [v0.9.0] New node ID format, also linked in the docs):

aaaaaaa-aaaaaaA-bbbbbbb-bbbbbbB-ccccccc-ccccccC-ddddddd-ddddddD

For our example, we get (check digits replaced by ?):

WW5Z3AA-UUD43D?-VQ6EHTZ-NV4NZT?-PRGUXSH-TJSQEX?-UQUFYPC-XESRGA?

doronbehar · October 15, 2023, 8:24am

Thanks for the info! I haven’t noticed this link when I originally posted the questions… I won’t go ahead and implement the algorithm with Python just for fun, but it’s nice to know. Thanks again!