I recently discovered another form of UUID that I’m pretty excited about, ULIDs. ULIDs are Universally Unique Lexicographically Sortable Identifier (so UULSIDs?), but what does this actually mean? Like UUIDs, they are 128 bit identifiers meant to be assumed to be unique, though like other UUIDs collisions are hypothetically but not practically possible.
What sets ULIDs apart though is that they can be generated sequentially (with millisecond accuracy) by clients without central coordination. This is achieved by dedicating the first 48 bits to a timestamp, and the remaining 80 bits to a random value.
Another interesting feature of ULIDs that sets them apart from other UUIDs is the way that they are generally presented. While a typical UUID is typically represented as 36 character encoded with hyphens between segments, ULIDs are 26 character encoded with no hyphens using Crockford’s Base32 encoding. This encoding makes the value more compact, while also reducing the risk of transposition errors by omitting special characters and certain common characters that look too similar.
Combined, these features create some desirable properties. Identifiers can be created and assigned in a distributed fashion without accessing a central authority. This means that my application can create graphs of records offline before actually persisting them, and in general reduces the number of round trips I need to make to a database. A classic example for SOA applications is being able to create an identifier for an entity on the client and then use that identifier to persist data to multiple services without needing to wait for a response from any of them.
Since the identifiers are sequential, inserting into databases tends to be much more efficient, both in terms of insertion speed and in terms of reducing fragmentation. This means that we can insert records more quickly, query the data more quickly, and actually reduce the amount of disk space our data uses, generally with no downside. Also, because the timestamp is encoded into the identifier we automatically have a “created on” field that we can use in querying the data. The one weakness of this design is that it does reveal this timestamp in the identifier, meaning if you send it to a client they can infer at least that much about the record. This is acceptable in most cases, but it is something to keep in mind.
Looking at the choice of encoding we also see some advantages over other UUID formats. First, by reducing the length, unnecessary hyphens, and characters that are likely to be mistaken the value is much easier to read and transpose. Also, the value is more compact and URL safe, which makes it ideal for web applications where it is not uncommon to put the identifier in the URL path.
A Word Of Warning
A word of warning though; beware of using ULIDs as unique identifiers in SQL Server. While a ULID is a 128 bit unique identifier and otherwise meets the spec for the type in SQL Server, you may not gain the advantages of sequential inserts. The reason for this is that SQL Server has a proprietary order in which it stores the data for a unique identifier which would put the timestamp portion of a ULID in the wrong position to be useful for sorting.