Regular Expressions - No stress RegEX

Learning regular expressions can be a daunting task, especially if you've been taught them traditionally. Most developers loathe the thought of having to implement a regular expression into the functionality of their applications. The syntax is confusing at first and many examples online are for really extreme, complicated edge cases.

So today, for your learning pleasure, we're going to demystify regular expressions.

Let's start with a basic string.


firstname lastname

There are several ways to match this string and we'll do it in a way that teaches some regex fundamentals. Here's a simple regex for matching this string.


^[a-zA-Z]+\s[a-zA-Z]+$

Let's break down this example, piece at a time:

  • the ^ at the beginning of the regex tells us that the next character (or in this case character set) should match the start of the matched expression.
  • The [ and ] represent a character set, so the first character after the ^ can match any character inside the brackets.
  • The + symbol following the character set denotes that an infinite number of matching characters from the set can follow the first character. So
    johnnnnnnnn smith

    would be a match as well.
  • \s signifies a blank white space in regex
  • Following the space, we have another character set which is a duplicate of the first.
  • Ending the expression, we have a $ which tells the regex that the preceeding character or character set is the last expected character in a match.

Let's take a slightly different example to really nail these fundamentals in.


john-smith

Now we could use something very similar to the above regex to solve this.


^[a-z]+-[a-z]+$

All we've really changed here swapping the \s for a -. But there may be a better way, depending on your use case. Regex has a built in character set \w which captures all word based characters in every language. So if you don't have specific alphabet to match, this may be the way to go.


^[\w]+-[\w]+$

For our next example, let's match a common ipv4 address.

19.6.21.254

Let's write a matching regex, break it down, then improve upon it a bit.


^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$

Let's breakdown this regex.

  • First we start with the ^ signifying the start of the expression.
  • Next we define a character group of [0-9] and a range of {1,3} setting the min and max of this matching piece at 1 and 3 respectively.
  • Next, we set the decimal point. We need the \ in front of it because . by itself means any character in regex.
  • We repeat this three more times, ending with 3 digits and a $ to note the end of our regex.

Not too bad... but kind of hard to read. Let's try to clean it up a bit.


^([\d]{1,3})\.([\d]{1,3})\.([\d]{1,3})\.([\d]{1,3})$

Here I've added a couple of key expressions in regex that are super useful.

  • First I replaced the [0-9] character selector with the \d built in selector that captures all digits, similarly to the \w from above.
  • Secondly, I've added a match group around each ipv4 octet with the ( and ) symbols. Match groups not only help with readibility, they also actually capture the value for later processing. This would allow, for instance, capturing the network half of the ip and the first octet of the host half for filtering offending ip addresses or isolating traffic from a specific area/region of your network or the internet. Here, we're just using it for readability, but it's an important feature of regex to know about.

For our final example, we'll match a basic email address. Now email addresses are complicated to match because of the several different syntax variations allowed... so for this example we'll match a standard email, assuming that only special characters .+_ and all letters and number are allowed before the @ symbol and only .com,.net.info.biz and .org are allowed after the email provider name.


^[\w\d\.+]+@[\w]\.(com|net|info|biz|org)

Hopefully, you can read this example since all of it's components were used in earlier parts of this article. Let's quickly break it down.

  • We begin our expression with ^ to signify the starting point
  • We then set the character set [\w\d\.+] which allows all alpha and numeric characters, decimals and the plus sign.
  • We added a plus sign on the outside of the set to allow an infinite number of any set defined character followed by the @ to signify the end of the email user account.
  • After the @ symbol, we set a second character set allowing only alphabet characters for the email service provider name (google, yahoo, bing, etc.)
  • We followed the email service provider character set with a \. to put in the decimal before the ending of the email address and then followed with a character group (com|net|info|biz|org) which allows for one of those, respectively.

Conclusion. RegEx is a powerful programming tool that will popup again and again in a developer's career. Since, most developers are highly intimidated by regex's complicated syntax, it's important to keep the basics in mind. I hope that you've gained a better understanding and a new found confidence with regex from reading this article. Cheers!