Many times I've found myself working on a project while questions concerning code efficiency or why some things were made this way, pop up all around. Since project deadlines are usually more important than my questions, I always end up forgetting the questions I had. To end this, I decided to mark them down, so I can check them out later on. As a result I wrote this article about strings concerning six interesting topics which I think many others might find interesting as well.
String.Empty VS empty string
This issue is quite common to be found even in code created by the same group. For example, there probably is a place around the source code where str = "" shows up and another one (probably not written by the same person) where str = String.Empty shows up. My remark concerning the person who wrote those lines was meant, as each developer seems to like either the first or the second way best. If so, is there any difference between these two methods?
To begin with String.Empty returns an empty string. You get ""; the same value as that of the empty string. Nothing more, nothing less. String.Empty contains no secrets.
Empty is a static field located on the String class. All it does is return a zero-length (empty) string. On the other hand using empty string ("") will create a new string object. So this is pretty much all there is about that. They are both about the same speed; however empty string creates a new object. So, how bad could this be?
Actually not that bad at all since .NET supports string interning. In case you haven't heard of string interning before, the CLR creates a table, called the intern pool. Each unique string that is declared or created, places a reference to that table. So, when a new string variable, containing an already created value, is created it gets the reference from the intern pool. And since "" is quite a common string it's really no big deal.
So there is no actual resource issue whether you use String.Empty or empty string. It's up to you to pick the one you like. People have claimed that "" can get easily confused with " " or that String.Empty does not look like a string value at first sight, when you are searching all over your source code. Personally I'd choose the empty string, but that's because this is what I am used to and nothing more. So, it's all up to you.
String.IsNullOrEmpty
IsNUllOrEmpty is a widely used static method located in the System.String library. Everyone knows what this method does. It returns true if the given string is null or its value is Empty, otherwise it returns false. String.IsNullOrEmpty is a great choice to let you know if a string contains a value and can be of use to your algorithm. Over the years I've heard of people claiming that String.IsNullOrEmpty is optimized and produces much faster results than custom created code and others claiming that there is no difference between using that or (str == null || str == ""). So, let's see what's really going on.
String.IsNullOrEmpty(str) is actually the same as (str == null || str.Length == 0).
First thing to notice is that str.Length == 0 is faster than str == "". That is because Length == 0 simply gets the Length property of the string and compares it to 0, in contrast to == which uses the Equals method and therefore significant more instructions than those (actually more instructions than the whole of String.IsNullOrEmpty method). So String.IsNullOrEmpty(str) is faster than (str == null || str == "").
OK, comparison to empty string is out of the question. Now what about comparing to null or checking only the Length property? As mentioned, String.IsNullOrEmpty performs both null comparison and zero length comparison. As a result null comparison or zero length comparison alone, are indeed faster than String.IsNullOrEmpty.
So, what about next time a friend of yours tells you to use str == null for the reason of checking against null only, instead of String.IsNullOrEmpty(str) as this will be faster? Well, he is right. It takes less time. Actually slightly less time. The amount of time it takes for the or operator to take effect. In other words, it's as if comparing str == null to str == null || str.Length == 0.
The same thing applies if you want to test against empty string only. str.Length is faster, but it will be of no help against a string that is null and will throw NullReferenceException.
To sum up with, if you want to check for null, or check for empty string alone you may use your own conditions. The same applies if you want to use both for null and empty as the result will be the same, but in that case, why not using the default method? Anyway it is up to you if you want use str == null, str.Length == 0 or String.IsNullOrEmpty(str). There are many people who think it is better to use the same method all around, others who trust String.IsNullOrEmpty is an easier way for a human being to understnd what is going on just a glimpse and others who think String.IsNullOrEmpty will save you from random exceptions when some unexpected null string will appear. So, it's all up to you to decide.
string VS String
In order to create a string we can use string s = "" or String s = "". Even though both will end up in creating the same string, is there some actual reason to pick one of them?
String is a class located on the System library. system is an alias for that keyword, meaning that either way you choose to write your code the compiled code will return System.String in the end.
So, if string is the same as String, why do both of them exist? Well, except for string, C# includes a few more aliases such as int to Int32 and bool to Boolean. Such aliases have been created in order to give C# its own style but people have argued whether it is good thing. Some said that C# should only allow developers use aliases and not primitive types in order to avoid mixing them up, others that they simply get confused by them eg not sure if int refers to Int32 or Int64 types.
Anyway, aliases do exist and the typical way to use them is to use the alias when referring to an object of that type and the actual class when using an attribute of the class. For example we use string s = String.Format("string vs {0}", "String").
Strings are reference types
Strings are indeed reference types, yet there's something strange about that as they look alike a value type more than a reference type. (If you would like to know more concerning value and reference types before moving on you may refer to an
older article.)
For example
bool IntsAreEqual(int a, int b)
{
if(a == b)
return true;
else
return false;
}
This methods returns true if a = 5 and b = 5. However
bool ClassesAreEqual(RandomClass a, RandomClass b)
{
if(a == b)
return true;
else
return false;
}
This method returns true if a and b refer to the same place on the heap no matter what the values contained look like.
Since string comparison str1 == str2 looks like the first method, string seems to look like a value type.
Similar, when initializing a new string using an already existing one eg
string str1 = str2;
str1 does not get a reference to the str2 memory space on the heap. On the contrary it creates its own partition on the heap, much like a value type would do.
Actually string is a unique type that should have been a value type since we do wish to have value type features. However, on the contrary to other value types, string is not a standard size type. Int32 requires four bytes to be stored. The string "Answer to life the universe and everything" requires far greater memory size than "42" (even though they may refer to the same thing). A string size may be enormous, thus not only causing trouble to the stack proper functioning, it may even not be able to fit in at all.
So, strings had to be placed on the stack. However they also had to maintain some of their fellow value types' characteristics. Developers would not be happy if they had to use some value comparing method in order to get simple string comparison. So what .NET architects did was to overload basic operators, for example == and != so that they work alike value types in case of strings.
In the same way, special rules apply when a new string, based on a previous one, is created. This will not create a reference to the old string's heap position, but as described earlier create a brand new reference type.
At the end of the day, a developer may actually state that strings are reference types having value type features, because that's what they were created for.
Strings are immutable
Well, most people know that. Strings are immutable, meaning that when a string value is created on the heap, this value is never supposed to change.
So what actually happens when we write
string str = " valar morghulis -";
str += " All men must die.";
Creating the string str, we create the reference on the stack and store the value on the heap. Adding another string to the one we have already created does not change the stored value. Instead, it gets the stored value, adds the new value and stores it on a new block on the heap where the previous stack reference now points. The previous block on the heap remains intact, waiting for the garbage collector to remove it.
So, strings are reference types that are immutable. The question is, why is that? Why is being immutable a good thing, and, if so, why do other reference types are mutable?
One part of the answer lies on the essence of string type. Strings on .NET are supposed to hold their values. Developers who wish to have easily mutable strings may use the StringBuilder class instead. As an example, if I create the string
string myName = "Kostas";
then well, that's my name and there's no point in changing that. Of course you may object saying that you've written thousands of source code lines containing strings that change all the time so that can't be so. And you may be right; I'm not saying that every string created which is bound to value changes should be replaced by StringBuilders. Choosing between string and StringBuilder we will talk about later on. Just keep in mind that if strings were mutable then there would be no need for StringBuilders. Strings are meant to be mostly readable; not editable.
Considering that, suppose you have
string myName = MyName;
myName now holds the value of MyName but not it's reference. If strings were not immutable and while using myName an external action changed MyName's value, then that should change myName's value as well. Not a good thing as myName might have already gone half the method's way before changed.
A string could also be set as an argument to a method call. If it were a standard reference type, the method could be able to change the argument string's value. We would probably want to keep that string's value as it was before calling the method (since as we mentioned strings are mostly readable) so such an edit would be no good. Of course we could use the ref keyword if we did wish to change the string's value from within the method.
We have already mentioned string interning. Using techniques like that helps so string immutability is by far faster, when creating new strings, than expected at first.
There is another yet important reason for that. Strings are not standard size variables, When a string is created it reserves memory space on the heap equal to its initial size. Supposing that before modifying the string value, more variables are created and extra heap space has been reserved, what would happen if we wanted to increase the string size? Would we rearrange everything on the heap just to update the string value? Actually no, it would have been easier to create a new value on top.
Of course there are ways to change all those immutable strings' values. That would be using reflection or pointers. The thing is not that you cannot change an already created string value, rather than since this is not the proper way to deal with strings, this is also not the straight forward way to change one.
Since string's immutable aspect is much based on personal opinions, those who are interested in learning more stuff will find a great deal of info online.
String concatenation
Since strings are immutable, creating a new string by adding two existing ones would not seem to be the most effective way. Some people use String.Concat instead. Others use the StringBuilder class. Let's take a look at them all and find out which one actually works best.
Adding strings will end up, behind the scenes, on using String.Concat.
string str1 = "Never send a human ";
string str2 = " to do a machine's job";
string str = str1 + str2;
is the same as
str = String.Concat(str1, str2);
So, there is no point on which one you choose. However when you add non-variable strings eg
string str = "Never send a human " + " to do a machine's job";
compiler will automatically replace it with
string str = "Never send a human to do a machine's job";
which is faster than using String.Concat
So, using the add operator is faster when adding non variable strings, but it's quite the same in all other cases. As a result using the + operator VS String.Concat can be of your personal choice, as Concat method seems to have no actual good points apart from being supported by a great deal of programming languages.
Moving on to StringBuilder. StringBuilder is supposed to optimize string modification. However should we always use the StringBuilder instead of concatenating strings? To answer that we should first find out how StringBuilders work.
When a string is created, it takes up memory space on the heap equal to its size. On the other hand when a StringBuilder object is created, it reserves extra space so it can support possible string modifications. Now, suppose we add extra characters to the StringBuilder. Should the new StringBuilder not surpass the current memory size, the new value is simply stored where the previous one was. However, if it does, new heap memory is allocated and the StringBuilder's capacity is doubled. Capacity is the property which returns the number of characters current StringBuilder can take. Default value is 16; however if you know the size will get much larger, you can set your own value on initialization or later on.
So, StringBuilder is indeed built to avoid creating new strings all the time. In that case should we always use StringBuilder when we know our strings are bound to changes?
Well, the answer is no. You see even though StringBuilder seems much more flexible, creating a StringBuilder object takes more time and resources than creating a simple string. So, in order to accomplish a few string concatenations, String.Concat() or the + operator will do much better. As a result when you know your code contains fixed or small number of string modifications, stick to the string path, otherwise go for StringBuilder.
An extra reason to choose string instead of StringBuilder is the fact that String contains a few extra methods (eg IndexOf, StartsWith etc) which are absent from StringBuider. In that case you may chose to drop the performance boost you would otherwise get.