.net, js, html, arduino, java... no rants or clickbaits.

IBM.WMQ.MQMessage.ReadString and EndOfStreamException

Recently during modification of a program to communicate with WebSphere MQ (v6.0.2.7) I noticed that logs contain some exceptions of type EndOfStreamException. Since the adapter code was rather complex it took a while before I found a trivial cause of the problems ;)

System.IO.EndOfStreamException: Nie można odczytać danych spoza końca
   w System.IO.__Error.EndOfFile()
   w System.IO.BinaryReader.ReadByte()
   w System.IO.BinaryReader.Read7BitEncodedInt()
   w System.IO.BinaryReader.ReadString()
   w IBM.WMQ.MQMessage.ReadString(Int32 length)

The error was reported, because sometimes in two different locations there was a call to ReadString method on the same MQMessage object:

string text = message.ReadString(message.MessageLength);

To get rid of the trouble simply add one line of code:

string text = message.ReadString(message.MessageLength);

What's the problem?

ReadString is a method that scans stream of bytes and converts it to a string*. After successful reading of entire message content, marker of current position was left on the end of a stream. So next call to ReadString had to end up with EndOfStreamException exception. Why it had to happen? Internally ReadString (IBM.WMQ.MQMessage) uses data stored in MemoryStream object. While getting text, depending on current value of message’s DataLength property, there may be a  call to .NET Framework’s System.IO.BinaryReader.ReadString method. To read the text, it must first get the encoded length of that text  - this is why method Read7BitEncodedInt is visible on the stack trace. This routine in turn uses ReadByte method which after hitting the end of a stream throws discussed exception.

* Conversion takes place using message's CharacterSet (CCSID) property.

Sending message to WebSphere MQ in the UTF-8

If you want to put a message encoded in UTF-8 to WebSphere MQ queue, be sure to set CharacterSet property of MQMessage object to 1208. If you do not, the text will be encoded using UTF-16 (CCSID 1200).

MQQueueManager queueManager = new MQQueueManager(...);
MQQueue queue = queueManager.AccessQueue(...);
MQPutMessageOptions putMessageOptions = new MQPutMessageOptions(...);
MQMessage message = new MQMessage();

message.Format = MQC.MQFMT_STRING;
message.CharacterSet = 1208;
queue.Put(message, putMessageOptions);

Strings in .NET are encoded with UTF-16. Sometimes however, it is worthwhile to use UTF-8 for information exchange. Why? Version 8 is less verbose because characters from the US-ASCII table are encoded using 1 byte instead of using 2 as in the case of UTF-16. So if you send the text consisting solely of that character set, you will use two times less space! In the case of Polish letters, characters will be encoded with 2 bytes (as for UTF-16).

Here is a comparison of bytes for the text "abcąćę":

UTF-8    61 62 63 C4 85 C4 87 C4 99
UTF-16   61 00 62 00 63 00 05 01 07 01 19 01