.NET string.Split("::") Bug With Duplicate Delimiters


By Robbe Morris
Printer Friendly Version
View My Articles

  

In all my years of programming in C#, I've never run across this nice little bug in string.Split() and duplicate character delimiters.



Until today, I never realized the .Split method of the string object didn't support splits on
duplicate delimiters like :: or ||.   You have to use RegEx and escape any RegEx specific
characters that might be apart of your pattern. 
string[] text = null;

string test = "Note:  This is a string of text :: delimited by a double semi-colon.";

text = test.Split("::".ToCharArray());

Debug.WriteLine(text[0] + "   " + text[1]);

text = System.Text.RegularExpressions.Regex.Split(test, 
       System.Text.RegularExressions.Regex.Escape("::"));

Debug.WriteLine(text[0] + "   " + text[1]);


This code yields the following:

Note This is a string of text
Note: This is a string of text delimited by a double semi-colon.


Biography
Robbe is a 2004-2008 Microsoft MVP for C# and the .NET Evangelist for Alinean Inc..  He is also the co-founder of EggHeadCafe. Robbe enjoys scuba diving with the folks at wet-n-fla.


button
 
Article Discussion: .NET string.Split("::") Bug With Duplicate Delimiters
Robbe Morris posted at 19-Aug-08 04:56
Original Article

 
In the interest of completeness...
Peter Bromberg replied to Robbe Morris at 19-Aug-08 07:52
namespace stringsplit
{
    class Program
    {
        static void Main(string[] args)
        {
            TestSplit split = new TestSplit();
            Console.ReadLine();
        }
    }

    class TestSplit
    {
        string _test = "Note:  This is a string of text :: delimited by a double semi-colon.";
        /// <summary>
        /// Demonstrate some methods of splitting strings on multiple lines.
        /// </summary>
        public TestSplit()
        {
            // 1.
            // Split the string _test on double colon using Regex. The return value from Split
            // will be a string[] array. 
            string[] lines = Regex.Split(_test, "::");

            // 2.
            // Use a new char[] array of two characters (: and :) to break
            // lines from _test into separate strings. Use "RemoveEmptyEntries"
            // to make sure no empty strings get put in the string[] array.
            string[] lines2 = _test.Split(new char[] { ':', ':' },
                StringSplitOptions.RemoveEmptyEntries);

            // 3.
            // Same as the previous example, but uses a new string of 2 characters.
            // Will not return any empty strings, so "None" is an OK value for
            // StringSplitOptions.
            string[] lines3 = _test.Split(new string[] { "::" },
                StringSplitOptions.None);
            Console.WriteLine("lines--->" +lines[0] +" "+ lines[1]);
            Console.WriteLine("lines2-->" + lines2[0] + " " + lines2[1] +" "+lines2[2]);
            Console.WriteLine("lines3-->" + lines3[0] + " " + lines3[1]);
        }
    }
}
OUtput:
lines--->Note:  This is a string of text   delimited by a double semi-colon.
lines2-->Note   This is a string of text   delimited by a double semi-colon.
lines3-->Note:  This is a string of text   delimited by a double semi-colon.
Observe that option "lines2" actually produces three elements in order to get the entire string.

 
Line 2 removes the initial semi-colon next to Note and it should not
Robbe Morris replied to Peter Bromberg at 19-Aug-08 08:37

eop


 
the real problem
noneof yourbusiness replied to Robbe Morris at 27-Aug-08 03:06
The real problem was that you programmed it to your assumption that it would return 2 items. If you programmed the writeline statement using a foreach, you would have found that the real problem wasn't that it truncated.

foreach (string s in text)
{
    Console.WriteLine(s);
}


 
This is no bug...
123456 Nonymous replied to Robbe Morris at 28-Aug-08 10:35

This is no bug... at least not in the .NET framework. It is a bug in your code however.

If you want to split your string by "::" you should use text.Split("::"). With the ToCharArray in place you are basically using text.Split(new char[] { ':', ':' }), which splits the string by any of the characters in the array (which in this case are the same). In your example this results in 4 values:
- "Note"
- "  This is a string of text "
- ""
- " delimited by a double semi-colon."

Check out the documentation and examples of the String.Split method in the MSDN Library...