Reading Railroad Diagrams

The images generated by Regexper are commonly referred to as "Railroad Diagrams". These diagram are a straight-forward way to illustrate what can sometimes become very complicated processing in a regular expression, with nested looping and optional elements. The easiest way to read these diagrams to to start at the left and follow the lines to the right. If you encounter a branch, then there is the option of following one of multiple paths (and those paths can loop back to earlier parts of the diagram). In order for a string to successfully match the regular expression in a diagram, you must be able to fulfill each part of the diagram as you move from left to right and proceed through the entire diagram to the end.

As an example, this expression will match "Lions and tigers and bears. Oh my!" or the more grammatically correct "Lions, tigers, and bears. Oh my!" (with or without an Oxford comma). The diagram first matches the string "Lions"; you cannot proceed without that in your input. Then there is a choice between a comma or the string " and". No matter what choice you make, the input string must then contain " tigers" followed by an optional comma (your path can either go through the comma or around it). Finally the string must end with " and bears. Oh my!".

Basic parts of these diagrams

The simplest pieces of these diagrams to understand are the parts that match some specific bit of text without an options. They are: Literals, Escape sequences, and "Any charater".

Literals

Literals match an exact string of text. They're displayed in a light blue box, and the contents are quoted (to make it easier to see any leading or trailing whitespace).

Escape sequences

Escape sequences are displayed in a green box and contain a description of the type of character(s) they will match.

"Any character"

"Any character" is similar to an escape sequence. It matches any single character.

Character Sets

Character sets will match or not match a collection of individual characters. They are shown as a box containing literals and escape sequences. The label at the top indicates that the character set will match "One of" the contained items or "None of" the contained items.

Subexpressions

Subexpressions are indicated by a dotted outline around the items that are in the expression. Captured subexpressions are labeled with the group number they will be captured under. Positive and negative lookahead are labeled as such.

Alternation

Alternation provides choices for the regular experssion. It is indicated by the path for the expression fanning out into a number of choices.

Quantifiers

Quantifiers indicate if part of the expression should be repeated or optional. They are displayed similarly to Alternation, by the path through the diagram branching (and possibly looping back on itself). Unless indicated by an arrow on the path, the preferred path is to continue going straight.

Zero-or-more

Greedy quantifier
Non-greedy quantifier

The zero-or-more quantifier matches any number of repetitions of the pattern.

Required

Greedy quantifier
Non-greedy quantifier

The required quantifier matches one or more repetitions of the pattern. Note that it does not have the path that allows the pattern to be skipped like the zero-or-more quantifier.

Optional

Greedy quantifier
Non-greedy quantifier

The optional quantifier matches the pattern at most once. Note that it does not have the path that allows the pattern to loop back on itself like the zero-or-more or required quantifiers.

Range

Greedy quantifier
Non-greedy quantifier

The ranged quantifier specifies a number of times the pattern may be repeated. The two examples provided here both have a range of "{5,10}", the label for the looping branch indicates the number of times that branch may be followed. The values are one less than specified in the expression since the pattern would have to be matched once before repeating it is an option. So, for these examples, the pattern would be matched once and then the loop would be followed 4 to 9 times, for a total of 5 to 10 matches of the pattern.