Webmaster tips » PHP

Nov 19, 2006
wmtips.com

Regular expressions made easy

Average rating:
  • 4.5 out of 5 Stars
Rate this article

Regular expressions is a very powerful instrument to manipulate and extract strings. However not all PHP developers know how to use regular expressions, so this simple tutorial is intended to everyone who wants to learn them.

PHP has several built-in functions for dealing with regular expressions. We'll examine only the PCRE functions (preg_replace, preg_match_all), which use a Perl-compatible regular expression syntax, as they are often faster than POSIX alternatives (ereg, ereg_replace, etc.).

Ok, let's begin.

That is a regular expression? It is the string, defining the "mask" for data to search or replace. Such a string can have meta characters, anchors, character classes, quantifiers, group modifiers. Let's examine most used of them (please note this list is not a comprehensive one, only the most used characters are listed).

Meta characters

Meta characters is the characters with special meaning. If you need to specify any character, used for regex syntax description (meta character), as text, you need to escape it (put the "\" character before it).

  • . - any character except new line (* use "s" pattern modifier if you need to match new line character too)
  • (some expr) - Group. You can use groups for further replacing or for getting structured information
  • (php|ruby) - Condition. This group will match string "php" or string "ruby".
  • [abc] - Range. Will match character if it is in specified range.
  • [^abc] - Not in range. Will match character if it is NOT in specified range.
  • [a-f] - If symbol "-" is specified, range will be "all letters between "a" and "f""

Anchors

Anchors is the characters in the regular expressions used to mark special positions within the searching text:

  • ^ - Start of string
  • $ - End of string

Character classes

There are several predefined character classes, wich can be used in regular expressions:

  • \s - White space
  • \d - Digit
  • \w - Word

Every character class listed above have an "opposite" class, which will match all characters except characters from the base class. Just uppercase the letter:

  • \S - Not white space
  • \D - Not digit
  • \W - Not word

Number quantifiers

Number quantifiers used to specify number of occurences, how many times previous character should occurs. They are can be as follows:

  • * - 0 or more
  • + - 1 or more
  • ? - 0 or 1
  • {5} - Exactly 5 times
  • {5,} - 5 or more
  • {5,10} - from 5 to 10 time

"*" and "+" quntifiers are "greedy quantifiers". This means they will match as many characters as possible. To make them "not greedy" you can use the "?" modifier after the quantifier. Ok, let's explain.

We have sample text to parse:

<p>Sample paragraph 1</p><p>Sample paragraph 2</p><p>Sample paragraph 3</p>

Now let's see at regex: <p>.*</p>

This regex will match the whole sample text! If you need to break your sample into paragraphs and process them separately, you can specify the "?" modifier to make this regex not greedy: <p>.*?</p>

Pattern Modifiers

Pattern modified used to specify additional options for regular expression. They can be as follows:

  • /i - Perform case insensitive comparing
  • /s - Treat string as single line. If specified, this modifier "says" regex engine to treat newline character (\n) as a whitespace.
  • /e - modifier makes regex engine treat the replacement parameter as PHP code after the appropriate references substitution is done.

Using in PHP

Ok, now you know something about regular expressions. Now we'll need to sum it up, and see the real examples. PHP has several functions for dealing with Perl-compatible regular expressions:

You can examine the syntax for all of these functions in the PHP manual. All of these functions have pattern parameter, which consists of following items:

/your_regular_expression/pattern_modifiers

Some example patterns:

  • /<title>([^>]*)<\/title>/si - will match title tag of the webpage
  • /\d{1,2}\/d{1,2}\/d{4}/ - will match date in format dd/mm/yyyy
  • /\w+@[a-z_]+\.[a-z]{2,}/si - will match email address

Working with group references

You know regex output can be divided into groups. You can use the groups in the further regex operations with group references (backreferences). Group reference is the number of the group preceding with "$" or "\\". Let's see the real example. This example will change all HTML links in the variable $s to links that will open in the new window:

<?php

//initialize the variable with HTML having several sample links

$s = '<a href="http://www.php.net">PHP web site</a> ';

$s .= '<a href="http://www.wmtips.com">Webmaster Tips</a> ';

$s .= '<a href="http://www.google.com">Google</a>';

//add the target="_blank" to the each string

$s = preg_replace('/<a[^>]*?href=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','<a href="$1" target="_blank">$2</a>',$s);

//output the result

echo $s;

?>

In this example we have assumed, that HTML tag values can be inserted with both ordinary and both quotes, so we've used [\'"] construction.

Grabbing site contents with PHP

Let's write our example script that will grab contents from some webpage, parse it with regular expressions and display the parsed data. Let's take Youtube "Top Rated" section as example. First, we need to view the HTML page source and find the data blocks we are interested in. The example HTML block looks as follows:

<div class="vstill"><a href="/watch?v=lJlRrlt8hdM" onclick="_hbLink('Makesomemoney','VidVert');"><img src="http://sjl-static13.sjl.youtube.com/vi/lJlRrlt8hdM/2.jpg" class=" vimg " alt="video" /></a></div>

<div class="vtitle">

<a href="/watch?v=lJlRrlt8hdM" onclick="_hbLink('Makesomemoney','VidVert');">Make some money</a>

So we have all the necessary data in this block and ready to implement our grabber php script:

<?php

//regex example for article “Regular expressions made easy

//copyright (c) www.wmtips.com, 2006

echo Regex example for article \”Regular expressions made easy\“<br />&copy; <a href=\”http://www.wmtips.com\">www.wmtips.com</a>, 2006<hr />\n";

//grab contents of web page into $s variable

$s = file_get_contents('http://www.youtube.com/browse?s=tr');

//perform regex

if (preg_match_all('/<div class="vstill"><a href="(\/watch\?.*?)".*?><img src="(.*?)".*?><\/a><\/div>'.

'\s*<div class="vtitle">\s*<a[^>]*>([^<>]*)<\/a>/si',$s,$m,PREG_SET_ORDER))

{

  //iterate through the results and output them

  //we have following groups now:

  //[0] - entire matched text

  //[1] - url of the video

  //[2] - image link

  //[3] - title of the video

  foreach ($m as $val)

  {

   $url='http://www.youtube.com'.$val[1];

   echo "<a href=\"$url\"><img src=\"{$val[2]}\“ /> {$val[3]}</a><br /><br />\n;

  }

}

?>

You can view this script in action . Please note, as youtube page format can be changed in the future, this script can stop working.

I hope this simple tutorial was insteresting and useful for you. Keep learning and you will find the regular expressions as a very powerful mechanism for processing string data.

Print! Print this article   Bookmark:

About The Author
Webmaster tips and tools. Webmaster tips: HTML, CSS, SEO, AdSense. SEO Tools: Site information tool, Search Engine Update Monitor, Google PR checker, Keyword Density analyzer, AdSense Ads preview and more.
Rate This Article
How would you rate the quality of this content? Currently rated: 4.5 out of 5 stars. 12 people have rated this article.
Use your mouse pointer to select as many stars as you want, and press the left mouse button to vote.
  • 4.5 out of 5 Stars
  • 1
  • 2
  • 3
  • 4
  • 5
Other PHP Articles
Rating: 5 stars
5 Tools for Spying on Your Competition by Kim Roach (Nov 16, 2006)
Did you know that an ancient Chinese military document unlocks many of the secrets to your online success? This book is called "The Art of War" and was written during the 6th century by Sun Tzu. This famous document is one of the oldest and most famous studies of strategy and has had a huge influence on military planning as well as business tactics...
Rating: 4 stars
3 Simple Ways to restrict access to your webpages using PHP by wmtips.com (Nov 21, 2006)
Why do you need to restrict access to some of your scripts or webpages? There are can be several reasons to do this: You can use some open-source php script (for example, statistics frontend), and you can not be fully assured that your data completely safe...
Rating: 4.5 stars
Content Compression Using PHP by Paul Katsande (Mar 3, 2007)
HTTP 1.0 introduced the idea of content encodings. A browser/client can notify the server that it can accept compressed content by sending the Accept-Encoding header. The Accept-Encoding header can be set as follows Accept-Encoding: gzip,deflate or with just one of gzip or deflate...
Rating: 4.7 stars
43 Tips for Optimizing PHP code by Reinhold Weber (Oct 18, 2007)
Here is the list of 43 short tips you can use for writing an optimized and more efficient PHP code.. ..
Rating: 4.1 stars
Auto Optimize Your MySQL Tables Script by John Miller (May 25, 2008)
In my quest to make our clients MySQL driven ecommerce websites running fast, I've pieced together a script and cron job that will save you some support calls down the road....