Standard clustering is done using a http://en.wikipedia.org/wiki/Vector_space model. The easiest way to do this is to
create a file like a spread sheet, where each row is each
document/instance and each column is a variable/feature. With your
dataset, the standard method to start with would be to
have each feature be "Does this set of tags contain X?", with a 1 if it
does and a 0 if it doesn't. You can then apply k-means, such as through
http://www.cs.waikato.ac.nz/ml/weka/, on the resulting dataset.
What this does, in practise for your dataset, is to group together sets of tags that are very similar, such as those that share 75% of common tags (depending, of course, on the parameters). You will probably get a similar result to your example.
Another area you can look at is graph based clustering. This builds a graph and splits the graph into subgraphs based on some criteria, which would achieve a similar result, but with potentially better results.
Finally, once you have your initial results, you may want to play around with what the features are, or the method of calculating distance between them. This gets a bit more advanced though and you may need to re-implement k-means to
do this (someone comment if they know of a good k-means implementation
that takes an arbitrary distance metric please!). One such distance
metric you could try would be the ratio of the intersection of the tags to the union of the tags. Eg.
c#|conversion|datetime|j#
c#|datetime|database|j#
Have an intersection size of 3 (sharing C#, datetime and J#) and a union size of 5 (there are 5 different tags). The similarity would then be 3/5=0.6. This can be turned into a distance metric by subtracting it from 1 which is 1-0.6 = 0.4.