Miłosz Orzeł

.net, js, html, arduino, java... no rants or clickbaits.

Ref modifier for reference types and a bit of SOS

Take a look at the following code and think what value will be displayed on the console (note that string is a reference type)?

using System;
               
class Program
{
    static void Test(string y)
    {
        y = "bbb";
    }

    static void Main()
    {
        string x = "aaa";
        Test(x);
        Console.WriteLine(x);
    }
}

The correct answer (aaa) is not all that obvious. You will see the words aaa, because without a ref modifier, a program written in C# provides a copy of the parameter value (for value types) or a copy of a reference (for reference types).

When parameter y in method Test receives a new text value, CLR does not modify the array of chars. Instead, a new string is created and a reference to it is assigned to variable y (more info here). Variable y contained in method Test is, however, just a copy of a reference hold under x variable from method named Main.

To actually change the text hidden under x variable, use the ref modifier (you have to set it both in the method declaration and its invocation - C# enforces such behavior for clarity):

using System;
               
class Program
{
    static void Test(ref string y)
    {
        y = "bbb";
    }

    static void Main()
    {
        string x = "aaa";
        Test(ref x);
        Console.WriteLine(x);
    }
}

After this change, console will show bbb text.

 

SOS

Way in which parameters are passed to a method can be examined by using tool called SOS (Son of Strike). We will use CLRStack -a command, which displays information about parameters and local variables on managed code stack (if you don't know how to use SOS look here and here, if you wonder where the name "Son of Strike" came from, click here)...

Below are the results of CLRStack -a command executed at the time of entry to the Test method.

For code without ref modifier:

!CLRStack -a
OS Thread Id: 0x176c (5996)
Child SP IP       Call Site
0031f114 00390104 Program.Test(System.String)
    PARAMETERS:
        y (0x0031f114) = 0x025cb948

0031f158 003900af Program.Main()
    LOCALS:
        0x0031f158 = 0x025cb948

0031f3c0 656721bb [GCFrame: 0031f3c0]

For code with ref modifier:

!CLRStack -a
OS Thread Id: 0x934 (2356)
Child SP IP       Call Site
001dee34 002f00f4 Program.Test(System.String ByRef)
    PARAMETERS:
        y (0x001dee34) = 0x001dee78

001dee78 002f00aa Program.Main()
    LOCALS:
        0x001dee78 = 0x027fb948

001df0ec 656721bb [GCFrame: 001df0ec]

An important difference that is exhibited by these results is the value of y parameter. In the case of code without ref modifier, it is the address of aaa string (0x025cb948). For the code with ref modifier, the value of y parameter is the address of x variable (0x001dee78) from Main method (that variable points to aaa string).

Why strings are immutable and what are the implications of it?

String type (System.String) stores text values as a sequence of char (System.Char) elements that represent Unicode characters (encoded in UTF-16). Usually one char element stands for one symbol.

When working with text one has to remember that strings in .NET are immutable! This simply means that once created, strings cannot be modified (without reflection or unsafe code), and the methods that apparently modify a string, really return a new object with the desired value.

Immutability of strings has many advantages (more about it soon), but it can cause problems if programmer forgets that any "change" to the string actually causes creation of a new instance of String class. Although the CLR treats strings in a special way, they are still a reference type, for which the memory is allocated on the managed heap.

Operation of this loop will create 10 000 string variables, all of which except the last are trash that will need to be collected by Garbage Collector:

string s = string.Empty;

for (int i = 0; i < 10000; i++)
{
    s += "x";
}

The following picture shows part of the "Histogram by Size for Allocated Objects" window from CLR Profiler application. You can see how subsequent iterations caused heap allocations for ever larger strings.

To avoid creating many unwanted objects use StringBuilder class, which allows you to modify the text without making new String class instances.

StringBuilder sb = new StringBuilder();

for (int i = 0; i < 10000; i++)
{
    sb.Append("x");
}

string x = sb.ToString();

This simple change has a huge impact on the amount of allocated and freed memory. See the following comparison of fragments of Profiler’s “Summary” window:

Why do designers of .NET (just like the creators of Java) decided to implement immutable text strings?

For optimization reasons (mainly because of comparison speed), texts can be stored in a special table (intern pool) maintained by the CLR. The idea is to avoid creating a number of variables defining the same string. Below is a piece of code proving that variables that have the same string value can point to the same* object:

string a = "xx";
string b = "xx";
string c = "x";
string d = String.Intern(c + c);

Console.WriteLine((object)a == (object)b); // True
Console.WriteLine((object)a == (object)d); // True

If the strings were modifiable, change to the value of variable a, would also change the value of b and d.

Immutability of strings has positive significance in multithreaded applications – any text amendment causes creation of a new variable so there is no need to set up the lock to avoid conflicts while multiple threads simultaneously access text. This is really important because quite often authorization of some operations is based on particular string value (for example, you can block specific functionality of a service based on client’s address).

Another important reason for immutability is the widespread use of strings as keys in hash tables. Would calculation of position of the element in table make any sense if it would be possible to modify the value of a key? The following listing shows that the "change" to variable used as the key does not affect the operation of the hash table:

string key = "abc";
Hashtable ht = new Hashtable();
ht.Add(key, 123);

key = "xbc";

Console.WriteLine(key); // xbc
Console.WriteLine(ht["abc"]); // 123

What would happen in the case of modifiable strings can be seen through a block of code that uses unsafe mode to actually change the string used as key:

unsafe
{
    string key = "abc";
    Hashtable ht = new Hashtable();
    ht.Add(key, 123);

    fixed (char* p = key)
    {
        p[0] = 'x';
    }

    Console.WriteLine(key); // xbc
    Console.WriteLine(ht["abc"]); // Not found!
}

Immutability is also related to the fact that the string is stored internally as an array - the data structure representing a continuous address space (which for performance reasons does not support insert operation).

* Whether a text literal is automatically added to the pool may be dependent on the use of ngen.exe tool or CompilationRelaxations settings...

Sending message to WebSphere MQ in the UTF-8

If you want to put a message encoded in UTF-8 to WebSphere MQ queue, be sure to set CharacterSet property of MQMessage object to 1208. If you do not, the text will be encoded using UTF-16 (CCSID 1200).

MQQueueManager queueManager = new MQQueueManager(...);
MQQueue queue = queueManager.AccessQueue(...);
MQPutMessageOptions putMessageOptions = new MQPutMessageOptions(...);
MQMessage message = new MQMessage();

message.Format = MQC.MQFMT_STRING;
message.CharacterSet = 1208;
message.WriteString("abcąćę");
queue.Put(message, putMessageOptions);

Strings in .NET are encoded with UTF-16. Sometimes however, it is worthwhile to use UTF-8 for information exchange. Why? Version 8 is less verbose because characters from the US-ASCII table are encoded using 1 byte instead of using 2 as in the case of UTF-16. So if you send the text consisting solely of that character set, you will use two times less space! In the case of Polish letters, characters will be encoded with 2 bytes (as for UTF-16).

Here is a comparison of bytes for the text "abcąćę":

UTF-8    61 62 63 C4 85 C4 87 C4 99
UTF-16   61 00 62 00 63 00 05 01 07 01 19 01