Generating a Short Random ID in MySQL

From time to time, I have had to go back to a legacy database and for one reason or another add a non-integer surrogate key. Sometimes this has come up when working on an old API that exposes a primary key in the URL (GET /account/123). This generally can be bad for security, as a user could attempt to access other resources by incrementing the ID (GET /account/124).

Generating a random string as a surrogate key in code can be easy enough, by using a UUID or a class like RandomStringUtils (careful, look up the Scunthorpe problem). But if you have lots of rows in a database that need to be updated, you will likely end up backfilling values in the database itself.

With Postgres 13, this is fairly simple:

select gen_random_uuid()::varchar
from some_table

UUID

7fc2113c-d471-4138-8e8e-b8aba6bcf3fc

b14bdeec-81eb-466f-b5b0-a2440e05b667

645aa4d2-4c3e-48ee-adb5-2d28f2c33ec9

MySQL's UUID algorithm is unfortunately a bit more predictable. It generates values that are sequential or near-sequential when run on the same database server at approximately the same time. Only the last characters of the first grouping will change. So, if you do a substring and select 6 characters, or only select the last 12 characters, you will end up with duplicate values. The same is true for the uuid_short function - the values will be sequential.

Some possible solutions in MySQL is to either run the UUID through a hash like sha, or use random_bytes. Just be sure to chose enough bytes to provide room for sufficient randomness to reduce the chance of collisions, and so that the probability of sequential IDs are low (ie: AABBCC, AABBCD). With hex (0-9,A-F), 6 characters allows for 16 million unique combinations, 8 over 4 billion, and 10 characters over 1 trillion.

select UUID(),
       uuid_short(),
       substr(sha(UUID()) from 1 for 6),
       hex(random_bytes(3))
from some_table

UUID	UUID_SHORT	SHA	RANDOM_BYTES
d54aced3-cfd3-11ec-afbc-0242ac130002	27717705796681743	2f18bd	43E6A4
d54ad161-cfd3-11ec-afbc-0242ac130002	27717705796681744	02b27c	7BD5CD
d54ad645-cfd3-11ec-afbc-0242ac130002	27717705796681745	85620e	95835B
d54ad7d7-cfd3-11ec-afbc-0242ac130002	27717705796681746	0c2a7b	26D02E

Just keep in mind that relying on security by obscurity, by having a long identifier in a URL that would be hard to guess, is not the only solution you should rely on. Sensitive or protected resources should still be protected by good authentication and authorization.

Related Posts

Leave a ReplyCancel reply