Regular Expressions Made Easy

/ Updated: Aug 18, 2021 / PHP /
  • 3.5 out of 5 Stars

Regular expressions is a very powerful instrument to manipulate and extract strings. However, not all PHP developers know how to use regular expressions. This simple tutorial is intended to everyone who wants to get started with regular expressions in PHP.

PHP regex

PHP has several built-in functions for dealing with regular expressions. We'll examine the PCRE functions (preg_replace, preg_match_all), which use a Perl-compatible regular expression syntax. (Please note, that POSIX regex functions (ereg, eregi, ereg_replace, etc.) are deprecated in PHP 5.3 and completely removed in PHP 7.)

Ok, let's begin.

That is a regular expression? It is the string, defining the "mask" for data to search or replace. Such a string can contain the characters with a special meaning: meta characters, anchors, character classes, quantifiers and group modifiers. Let's examine the most used of them (please note that this list is not a comprehensive one, for the full information consult the documentation).

Meta characters

Meta characters is the characters with special meaning. If you need to specify any character, used for regex syntax description (meta character), as text, you will need to escape it (for escaping just put the backslash "\" character before it).

  • . - any character except new line (* use the "/s" pattern modifier if you need to match new line character too)
  • (some expr) - Group. You can use groups for further replacing or for getting structured information
  • (php|ruby) - Condition. This group will match string "php" or string "ruby".
  • [abc] - Range. Will match character if it is in specified range.
  • [^abc] - Not in range. Will match character if it is NOT in specified range.
  • [a-f] - If symbol "-" is specified, range will be "all letters between "a" and "f"

Anchors

Anchors is the special zero-length characters in the regular expressions used to mark special positions within the searching text:

  • ^ - Start of string or line
  • $ - End of string
  • \b - Word boundary. Matches position before or after any word character (\w, see below)

Predefined character classes

There are several predefined character classes, which can be used in regular expressions, the most used are:

  • \s - White space. Includes a space, a tab, a carriage return, a line feed.
  • \d - Digit. Includes numbers [0-9]
  • \w - Word. Includes ASCII characters [A-Za-z0-9_]. If Unicode modifier /u is specified, also includes unicode letters.

Every character class listed above have an "opposite" class, which will match all characters except characters from the base class. Just uppercase the letter:

  • \S - Not white space
  • \D - Not digit
  • \W - Not word

Number quantifiers

Number quantifiers used to specify number of occurences, how many times previous character should occurs. They are can be as follows:

  • * - 0 or more
  • + - 1 or more
  • ? - 0 or 1
  • {5} - Exactly 5 times
  • {5,} - 5 or more repetitions
  • {5,10} - from 5 to 10 occurences

"*" and "+" quantifiers are "greedy" quantifiers. That means that they will match as many characters as possible. To make them "not greedy" you can use the "?" modifier after the quantifier. Let's explain it below.

We have the sample text to parse:

<p>Sample paragraph 1</p>
<p>Sample paragraph 2</p>
<p>Sample paragraph 3</p>

Now let's look at the regex: <p>.*</p>

This regex will match the whole sample text! If you need to break your sample into paragraphs and process them separately, you can specify the "?" modifier to make this regex not greedy: <p>.*?</p>

Pattern Modifiers

Pattern modifiers are used to specify additional options for regular expressions. The following pattern modifiers are supported:

  • /i - Enables case insensitive comparing.
  • /s - Single line mode. If specified, this modifier tells the regex engine to treat newline character (\n) as whitespace.
  • /m - Multiline mode. If specified, changes the behaviour of ^ and $ metacharacters from "start of string" to "start of line", and from "end of string" to "end of line", accordingly. Has no effect if there are no newline characters (\n) in a subject string.
Intel Xeon E3 1230 V3 servers from $49/month
In older PHP versions, there also was the /e modifier, which was used for inline PHP code evaluation. It is deprecated since PHP 5.5 and removed in PHP 7.

PHP regex functions

Ok, now you know something about regular expressions. Now we'll need to sum it up, and see the real examples. PHP has several functions for dealing with Perl-compatible regular expressions:

All of these functions have pattern parameter, which consists of the following sections divided with delimiter (a forward slash (/) is the most common one):

/regular_expression/pattern_modifiers

Some example patterns:

  • /<title>([^>]*)<\/title>/si - will match the title tag of the webpage
  • /\d{1,2}\/d{1,2}\/d{4}/ - will match the date in format dd/mm/yyyy
  • /\w+@[a-z_]+\.[a-z]{2,}/si - will match email address

Working with group references

As you know, the regex matches can be captured into groups. You can use these groups in the further regex operations with group references (backreferences). Group reference is the number of the group preceding with character "$" or "\". Let's look at the example. This example will change all HTML links in the variable $s to the links that will open in the new window:

<?php

//initialize the variable with HTML having several sample links
$s = '<a href="http://www.php.net">PHP web site</a> ';
$s .= '<a href="http://www.wmtips.com">Webmaster Tips</a> ';
$s .= '<a href="http://www.google.com">Google</a>';

//add the target="_blank" to the each string
$s = preg_replace('/<a[^>]*?href=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','<a href="$1" target="_blank" rel="noopener">$2</a>',$s);

//output the result
echo $s;

?>

In this example we have assumed, that HTML tag values can be inserted with both ordinary and both quotes, so we've used the [\'"] construction.

Grabbing site contents with PHP

Let's write our example script that will grab contents from some webpage, parse it with regular expressions and display the parsed data. Let's take Youtube "Top Rated" section as example. First, we need to view the HTML page source and find the data blocks we are interested in. The example HTML block looks as follows:


<div class="vstill"><a href="/watch?v=lJlRrlt8hdM" onclick="_hbLink('Makesomemoney','VidVert');"><img src="http://sjl-static13.sjl.youtube.com/vi/lJlRrlt8hdM/2.jpg" class=" vimg " alt="video" /></a></div>
<div class="vtitle">
<a href="/watch?v=lJlRrlt8hdM" onclick="_hbLink('Makesomemoney','VidVert');">Make some money</a>

So we have all the necessary data in this block and ready to implement our grabber php script:

<?php

 //regex example for article "Regular expressions made easy"
 //copyright (c) www.wmtips.com, 2006
 echo "Regex example for article \"Regular expressions made easy\"<br />&copy; <a href=\"http://www.wmtips.com\">www.wmtips.com</a>, 2006<hr />\n";

 //grab contents of web page into $s variable
 $s = file_get_contents('http://www.youtube.com/browse?s=tr');

 //perform regex
 if (preg_match_all('/<div class="vstill"><a href="(\/watch\?.*?)".*?><img src="(.*?)".*?><\/a><\/div>'.
 '\s*<div class="vtitle">\s*<a[^>]*>([^<>]*)<\/a>/si',$s,$m,PREG_SET_ORDER))
 {
  //iterate through the results and output them
  //we have following groups now:
  //[0] - entire matched text
  //[1] - url of the video
  //[2] - image link
  //[3] - title of the video
  foreach ($m as $val)
  {
   $url='http://www.youtube.com'.$val[1];
   echo "<a href=\"$url\"><img src=\"{$val[2]}\" /> {$val[3]}</a><br /><br />\n";
  }
 }

?>


You can view this script in action .

Please note, as youtube page format can be changed in the future, this script can stop working.

I hope this simple tutorial was insteresting and useful for you. Keep learning and you will find the regular expressions as a very powerful mechanism for processing string data.

Improve Your Display Campaigns & Get More ROI

Rate This Article

How would you rate the quality of this content?
Currently rated: 3.5 out of 5 stars. 14 users have rated this article. Select your rating:
  • 3.5 out of 5 Stars
  • 1
  • 2
  • 3
  • 4
  • 5

About The Author

Webmaster tips and tools. Webmaster tips: HTML, CSS, SEO, AdSense. SEO Tools: Site information tool, Pagerank checker, Keyword Density Analyzer and more.
Get free access to the largest backlink database