Let's talk about GUIDs

Name: *
My email: *
Recipient email: *
Message: *
Fields marked as bold are compulsory.
You haven't filled in compulsory values. The email is not correct

I've been using GUIDs for a long time, even though I had no actual clue of what they were. Well, I did know what they were; they looked like 32 hex digit strings separated by hyphens that looked like that A1F5B524-B197-4787-A6FF-38BC0C8D2B01. And they were unique, so I knew that if I had lots of them within my database tables and inserted yet another one, that would be different from all the rest. Then, one day it came to me that it might be interesting to find out what GUIDs actually are, so I did some search.
 
 
What is a GUID?
 
GUIDs are stored as 128-bit values but are usually displayed as 32 hexadecimal digits separated by hyphens, so they can be easier for a human to read, for example A1F5B524-B197-4787-A6FF-38BC0C8D2B01. These groups, containing 8,4, 4, 4 and 12 hex digits respectively.
 
GUIDs (that stand for Globally Unique Identifiers) are also called UUIDs (Universally Unique Identifiers). Actually UUID was the first term to show up. When Microsoft decided to use UUIDs they preferred to create the name GUID.
 
guids
 
Now, the interesting thing about GUIDs is that the value they represent is quite a large one and can take a lot of different values. Actually it may contain up to 2^128 different values and this is quite a number. It is close to 1.000.000.000... where we get 13 zero triplets. This number, being that large that makes it hard for a person even to conceive, makes it extremely hard for two random numbers to be equal. You may be creating GUIDs all the time and fail to get a single match. Practically the number of possible GUIDs is less than the one described above, since some of the 128 bits are reserved, but we'll get to that later on.
 
In the meanwhile, let's take a look at a few history points.
 
 
GUID versions
 
 
The reason for creating GUIDs is getting an identifier that will be completely different from all the rest. You can easily tell that if I got two database tables using increasing integer numbers as primary keys and wished to concatenate them, I wouldn't be able to retain both tables' IDs. However if I had used GUIDs as primary keys I would have had no trouble at all since GUIDs in both tables are practically different. GUIDs can be found in more places than databases, including COM objects and hard drive partitioning.
 
In a few words GUIDs are used so that distributed systems may possess unique identity information even if no central coordination system is present.
 
GUIDs, as mentioned, stand for unique identifiers. What's bad about that, is that even if it's extremely improbable that two GUIDs are similar, you can never be completely sure about that. To ensure that, version one GUIDs were created based on two different aspects. First is the time the GUID was created, second is the MAC address of the computer that created it.
 
Now, that seems to work fine. You see, different computers cannot produce similar GUIDs and a single computer cannot produce similar GUIDs, so all GUIDs are bound to be different. That system might have worked fine, if it weren't for some problems in both time and space parts.
 
Considering time, suppose I create a GUID and then turn the machine clock back. Or suppose I had a multi-processing computer which could create GUIDs using more than one processor at the same time.
 
MAC address problems also arose, such as what would happen if a person set manually the computer's MAC address or in case of a computer that had no MAC address at all. 
 
To avoid such problems. version one GUIDs contain extra bits (apart from the space and time created ones) that try to make things up. Still uniqueness cannot be guaranteed.
 
Furthermore, such GUIDs can be examined so that MAC address and time info can be extracted. And that makes people creating GUIDs no happier.
 
To avoid such problems more GUID versions were created. Microsoft switched to version 4 GUIDs by default on Windows 2000 operating system and has stuck with it since then.
 
Version 4 GUIDs, in contrast to other versions, are created pseudo-randomly. Well, apart from some reserved bits. Let's take a look at the way GUID versions look like. This is the default GUID format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 
On version 1 GUIDs the 13th position hex is always 1
xxxxxxxx-xxxx-1xxx-xxxx-xxxxxxxxxxxx
 
Similarly, version 4 GUIDs, contain 4 at the same position
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
In addition, the 17th hex (represented as y) is allowed to be either 8, 9, A, or B.
 
By the looks of it, the GUID we saw earlier, A1F5B524-B197-4787-A6FF-38BC0C8D2B01, is a version 4 GUID.
 
Earlier we mentioned that version 4 GUIDs are created pseudo-randomly. This method is used to ensure GUID uniqueness. However since the GUID sequence is pseudo-random, a person aware of generator internal state may be able to obtain previous and future GUID values.
 
 
 
GUIDs and .NET
 
To create a new GUID we may use the NewGuid static method.
Guid g = Guid.NewGuid();
 
NewGuid uses the Windows CoCreateGuid function. CoCreateGuid function in return calls the RPC function UuidCreate which returns a UUID (or GUID if you prefer that name). As told earlier this will be a version 4 GUID which is more or less random. 
 
In addition to UuidCreate there is an extra function called UuidCreateSequential. This function is used instead of UuidCreate, if you want to make sure that GUIDs created from a single machine are unique. To accomplish this, UuidCreateSequential creates version 1 GUIDs as described earlier. Since version 1 GUIDs contain the computer's MAC address info and a timestamp every GUID created is bound to be unique, compared to all other GUIDs created by that machine. And since the timestamp is set on the GUID and time always marches on every GUID created that way holds a value greater than all previous ones. This is what the term sequential on UuidCreateSequential stands for.
 
By default .NET uses UuidCreate. You can use UuidCreateSequential if you'd like, but it's not as easy as the one line call to UuidCreate. 
 
You can also use your SQL Server to create GUIDs. In that case GUIDs are called uniqueidentifiers.
NEWID() creates a version 4 GUID using UuuidCreate.
NEWSEQUENTIALID creates a version 1 GUID using UuidCreateSequential.
 
guid
 
There's one more thing to talk about. Should database tables use GUIDs as primary keys or stick to the traditional integer values? The answer is, it depends. GUIDs require four times the amount of space integers do. Indexing and fragmentation as well as joins and sorting operation are less efficient (using sequential GUIDs will make such things slightly better). They are also far from being user friendly and may make things harder for manual debugging. 
 
Still, a person may choose GUIDs since they are overall unique and can make table or database concatenation much easier. They can also be created apart from the database thus separating project parts from each other.
 
People find it hard to agree which method is better. Some say GUIDs are way cool, others think they should never be used as primary keys. Choosing your primary key style depends on what your specs are.
 
Summary
 
GUIDs are 128-bit values which are used to ensure unique values among distributed systems. Since there are far more than a lot possible GUIDs, each newly created GUID may be supposed to be unique. There are two GUID versions currently used by Microsoft, 1 and 4. Version 1 uses the computer MAC address and timestamp to create a GUID while use 4 uses pseudo-random algorithms.
 

 

Back to BlogPreviousNext

Comments



    Leave a comment
    Name: