MLTP Contest on writing the shortest correct regular expressions for dates... "Regular expressions for checking dates" By Eric Howland and David Niergarth In Markup Languages: Theory and Practice ISSN 1099-6621 http://mitpress.mit.edu/MLANG Volume 2, Issue 2 (Spring 2000 ), pages 126-132 * Date checking regular expression that catches all bad dates: 245 char long * Same regular expressison without \D characters: 277 characters * Two regular expressions, the first of which must match to ensure that the expression is well formed and the second of which must not match to catch all the bad numbers: 184 characters total Below we present several regular expressions for checking dates including leap years. These expressions were inspired by the article by C.M. Sperberg-McQueen in Markup Languages (**see [Sperberg-McQueen 1999]). Specifically they were inspired by the challenge at the end of the article (and the date on which that challenge expires) to shorten the long regular expression generated by lex. ** http://xml.coverpages.org/mltpTOC14.html#MLTP-14Sperberg The regular expression offered here is, unfortunately, not deterministic but it is more than an order of magnitude shorter than the regular expression generated by lex. The expression is the inverse of the lex expression and actually finds incorrect dates rather than correct dates.The proposed expression uses the \D convention of Perl and Python to detect characters that are not numbers and the .{1,8}notation to indicate a string of one to eight characters. Note that a somewhat longer (and arguably less readable) version of the regular expression is also included in case you find those conventions distasteful. Also note that about 40% of this expression is dedicated to finding poorly formed dates (dates not in the nnnn-nn-nn format where n is a number). This implies that a much shorter total expression is possible if one allows two passes (one pass to insure that the potential date is well-formed and the second pass to detect incorrect dates). The percentage saved by using two passes is even larger when the Python conventions are not allowed. Using two passes is, however, a less aesthetically pleasing response to the challenge. The expression is a series of tests for errors OR'ed together. Perhaps the easiest way to understand this expression is to see how it is built up from the various types of possible errors. This approach turns out to be effective, but it is hard to guarantee that all possible errors have been found. Because the challenge specifies a well defined (and enforcable) format for the input to be tested, it is possible to exaustively test for errors. A Python program has been created (the second listing below) that exhaustively tests all dates of the form nnnn-nn-nn (where n is a number) using both algorithmic and regular expression based tests. A comparison of the results from these two methods exposes any errors in the regular expression and guarantees that the regular expression is as accurate as the algorithm. [...] A Python program to check date-checking regular expressions: A program to check all possible numeric dates in the form nnnn-nn-nn.Compares two methods of validating such dates, one based on a regular expression and one based on an algorithm. Substituting a different regular expression for whole_re would allow it to be checked for accuracy. [...]