Daniel Cazzulino's Blog : Making RegEx more readable

Subscriptions

News

Source code published in this blog is public domain unless otherwise specified.

 

kzu in LinkedIn

  Microsoft MVP Profile

 Contact

Post Categories

Making RegEx more readable

Compare the following code statements defining the same regular expression in .NET: static readonly Regex ParameterReference = new Regex(@"(?\)|\[^\]+)\>|(?\]*(?!\>))", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
static readonly Regex ParameterReference = new Regex(@" # Matches invalid empty brackets # (?\)| # Matches a valid parameter reference # \[^\]+)\>| # Matches opened brackes that are not properly closed # (?\]*(?!\>))", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);

While the former is still understandable for a fairly regex-aware developer, the later is far more explicit about the purpose of each part of it. The ability to place comments inside the expression is enabled by the RegexOptions.IgnorePatternWhitespace, which is not used enough by developers. In the case of this pretty simple expression this may seem unnecessary, but imagine a regex-based parser that processes (CodeSmith-like) template files:

static Regex CodeExpression = new Regex(@" # First match the full directives # \w*)(?.*?)\#\/>(?:\W*\n)?| # Match open tag # (?\#\/>)| # This is a simple expression that is outputed as-is to output.Write(); # (?:=)(?.*?)(?;.*?)?(?=\#\/>)| # Anything previous or after a code tag # (?<code>.*?)(?=)| # Finally, match everything else that is written as-is # (?.*[\r\n]*)", RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled | RegexOptions.Singleline);

It's pretty obvious that not commenting such complex expressions makes them almost unreadable except for the guy who wrote them (and even to him after some time!). Bottom line: ALWAYS comment your expressions in-line!!!

posted on Tuesday, February 10, 2004 11:07 AM by kzu