RunToolz iconRunToolz
Welcome to RunToolz!
RegexProgrammingText Processing

Regex: Less Scary Than It Looks

Regular expressions have a reputation for being unreadable. They don't have to be.

RunToolz TeamJanuary 8, 20263 min read

You need to find all email addresses in a document. Or validate phone numbers. Or extract dates from messy text.

Someone says "use regex." You Google the syntax, find something like ^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$, and immediately close the tab.

Regular expressions look intimidating because people write them to be clever instead of clear. Let's fix that.

Start With Literal Matching

Regex finds patterns. The simplest pattern is literal text.

cat matches "cat" in "The cat sat." Nothing fancy.

http matches "http" in any URL. Still straightforward.

Ready to try it yourself?Test Regex

Add Wildcards Gradually

. matches any single character. c.t matches "cat", "cot", "cut".

* means "zero or more of the previous thing". ca*t matches "ct", "cat", "caat", "caaaaaat".

+ means "one or more". ca+t matches "cat" and "caat" but not "ct".

? means "zero or one". colou?r matches both "color" and "colour".

Character Classes Are Your Friend

[aeiou] matches any vowel. [0-9] matches any digit. [A-Za-z] matches any letter.

\d is shorthand for [0-9]. \w matches word characters (letters, digits, underscore). \s matches whitespace.

These building blocks handle most real-world patterns.

Anchors Control Position

^ means "start of line". $ means "end of line".

^Hello matches "Hello world" but not "Say Hello".

world$ matches "Hello world" but not "world peace".

Groups Capture Parts

Parentheses group things. They also capture for later use.

(\d{3})-(\d{4}) matches "555-1234" and captures "555" and "1234" separately.

This is how you extract specific data from a pattern match.

Real Examples, Explained

Email (simplified):

[\w.-]+@[\w.-]+\.\w+

Translation: word characters/dots/hyphens, then @, then more of the same, then a dot, then word characters.

Phone number:

\d{3}[-.\s]?\d{3}[-.\s]?\d{4}

Translation: 3 digits, optional separator, 3 digits, optional separator, 4 digits. Matches "555-123-4567", "555.123.4567", "5551234567".

Date (MM/DD/YYYY):

\d{2}/\d{2}/\d{4}

Translation: 2 digits, slash, 2 digits, slash, 4 digits.

Common Mistakes

Forgetting to escape special characters. . means "any character" in regex. To match a literal dot, use \..

Being too greedy. .* matches as much as possible. For HTML tags, <.*> on <b>bold</b> matches the whole thing, not just <b>. Use <.*?> for non-greedy matching.

Overcomplicating. If you need to validate emails for real, use a library. The "correct" email regex is hundreds of characters long.

When Not to Use Regex

Parsing HTML or JSON. Use a proper parser.

Complex validation logic. Code is often clearer than a single massive pattern.

When string methods work. "hello".startsWith("he") is clearer than /^he/.


Regex is a tool. Like any tool, it's good for specific jobs and awkward for others. Start simple, test as you build, and don't try to be clever. Readable regex is better than impressive regex.