Pages

Regular expression in PHP (part 1)

Tuesday, September 11, 2012
We deal with at least one form on almost every project. So we need to validate this form. You can do the validation with javascript on the client side. But we also need to validate on the server side because user can turn off the javascript.

To do the form validation on the server side, we need to understand about the regular expression.

Let's start with the simple example.
<?php
$string = "webinone";
echo preg_match("/in/", $string);
?>
The above code will echo 1 because the characters "in" are found in our string.
The below code will echo 0.
<?php
$string = "webinone";
echo preg_match("/ni/", $string); //output: 0

echo preg_match("/IN/", $string); //output: 0
?>

Metacharacters

caret ^

Start of the string
<?php
$string = 'webinone';
echo preg_match("/^we/", $string); //output: 1

echo preg_match("/^eb/", $string); //output: 0
?>
The caret has another usage. I will show you in a moment.

dollor $

End of the string
<?php
$string = 'webinone';
echo preg_match("/one$/", $string); //output: 1

echo preg_match("/noe$/", $string); //output: 0
?>

Character Class []

If you code like [aeiou], it will return 1 when our string has one of the vowel.
<?php
$string = 'big';
echo preg_match("/[aeiou]/", $string); //Output: 1

$string1 = 'baig';
echo preg_match("/[aoiu]/", $string1); //Output: 1

$string2 = 'bag';
echo preg_match("/b[aoiu]g/", $string2); //Output: 1

$string3 = 'beg';
echo preg_match("/b[aoiu]g/", $string3); //Output: 0

$string4 = 'baog';
echo preg_match("/b[aoiu]g/", $string4); //Output: 0
?>
We used the caret ^ for the meaning of the start of the string. But if you use the caret ^ within the [], it means NOT.
<?php
$string = 'webinone';
if(preg_match("/[^a]/",$string))
{
 echo 'String has no a.';
}
?>
By the way if you use the $ withing [], it is not the end of string. It just a simple dollar sign and contains no special meaning within it.

We can use the - for the range of character class. [a-f] is equal to [abcdef].
<?php
$string = 'webinone';
echo preg_match("/[a-z]/", $string); //Output: 1

$string1 = 'WebInOne';
echo preg_match("/[a-z]/", $string1); //Output: 1

$string2 = 'WEBINONE';
echo preg_match("/[a-z]/", $string2); //Output: 0

$string3 = '12345';
echo preg_match("/[a-zA-Z]/", $string3); //Output: 0

$string4 = '12345';
echo preg_match("/[^0-9]/", $string4); //Output: 0
?>

Dot .

Any single character except new line (n).
<?php
$string = 'one';
echo preg_match("/./", $string); //Output: 1

$string1 = 'one';
echo preg_match("/[.]/", $string1); //Output: 0

$string2 = 'one';
echo preg_match("/o.e/", $string2); //Output: 1

$string3 = 'ons';
echo preg_match("/o.e/", $string3); //Output: 0

$string4 = 'onne';
echo preg_match("/o.e/", $string4); //Output: 0

$string5 = "ore";
echo preg_match("/o.e/", $string5); //Output: 1

$string6 = "one";
echo preg_match("/o.e/", $string6); //Output: 0
?>
 

Asterix *

a* means 0 or more of a. We need to see the little complicate example to know its usage.
<?php
$string = "<html>";
echo preg_match("/<[A-Za-z][A-Za-z0-9]*>/", $string); //Output: 1

$string1 = "<b>";
echo preg_match("/<[A-Za-z][A-Za-z0-9]*>/", $string1); //Output: 1

$string2 = "<h3>";
echo preg_match("/<[A-Za-z][A-Za-z0-9]*>/", $string2); //Output: 1

$string3 = "<3>";
echo preg_match("/<[A-Za-z][A-Za-z0-9]*>/", $string3); //Output: 0
?>
In the above example we use <[A-Za-z][A-Za-z0-9]*>. In this regex, < and > are literal characters. The first character class matches a letter. The second character class matches a letter or digit. The star repeats the second character class. Because we used the star, it's OK if the second character class matches nothing. So our regex will match a tag like <B>. When matching <HTML>, the first character class will match H. The star will cause the second character class to be repeated three times, matching T, M and L with each step.

Plus +

a+ mean one or more of a.
<?php
$string = "php";
echo preg_match("/ph+p/", $string); //Output: 1

$string1 = "phhp";
echo preg_match("/ph+p/", $string1); //Output: 1

$string2 = "pp";
echo preg_match("/ph+p/", $string2); //Output: 0

$string3 = "12345";
echo preg_match("/[a-z]+/", $string3); //Output: 0
?>

Question mark ?

a? Zero or one of a.
<?php
$string = "123456";
echo preg_match("/123-?456/", $string); //Output: 1

$string1 = "123-456";
echo preg_match("/123-?456/", $string1); //Output: 1

$string2 = "123--456";
echo preg_match("/123-?456/", $string2); //Output: 0
?>

Curly braces {}

a{3} Exactly 3 of a
a{3,} 3 or more of a
a{,3} Up to 3 of a
a{3,6} 3 to 6 of a

<?php
$string = "google";
echo preg_match("/go{2}gle/", $string); //Output: 1

$string1 = "gooogle";
echo preg_match("/go{2}gle/", $string1); //Output: 0

$string2 = "gooogle";
echo preg_match("/go{2,}gle/", $string2); //Output: 1

$string3 = "google";
echo preg_match("/go{,2}gle/", $string3); //Output: 0

$string4 = "google";
echo preg_match("/go{2,3}gle/", $string4); //Output: 1
?>

Subpattern ()

<?php
$string = "This is PHP.";
echo preg_match("/^(This)/", $string); //Output: 1

$string1 = "That is PHP.";
echo preg_match("/^(This)/", $string1); //Output: 0

$string2 = "That is PHP.";
echo preg_match("/^([0-9])/", $string2); //Output: 0

$string3 = "7 is lucky number.";
echo preg_match("/^([0-9])/", $string3); //Output: 1
?>

Logical Or |


<?php
$string = "This is PHP.";
echo preg_match("/^(This|That)/", $string); //Output: 1

$string1 = "That is PHP.";
echo preg_match("/^(This|That)/", $string1); //Output: 1
?>

Backslash /

Where we use backslash?
If you want to use these eleven metacharacters ^+*.?$()|[ as literal characters in your regex, we need to escape them with a backslash.
<?php
$string = 'webinone.net';
if(preg_match("/./",$string))
{
 echo 'String has dot.';
}
$string1 = 'webinone+net';
if(preg_match("/+/",$string1))
{
 echo 'String has + sign.';
}
?>

Now let's try the real world example. Below example is to test for the email validation. It is not a prefect one, but it will validate most common email address formats correctly.

Crate a blank document in your favourite editor and paste following code in your document. And then save as mail_regex.php in your www (in wamp) folder.
<html>
<head>
 <title>Mail text</title>
</head>
<body>
<?php
 if(isset($_POST['submit']))
 {
  $email = $_POST['email'];
  if(preg_match("/^[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,4}$/", $email))
  echo "Valid mail";
  else
  echo "Not Valid";
 }
?>
<form method="post" action="mail_regex.php">
 <input type="text" name="email" id="email" size="30" />
 <input name="submit" type="submit" value="Submit"/>
</form>
</body>
</html>

Our regex pattern for the email address is like below:
^[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,4}$
As you know, email address are always in a particular format:

username @ domain . extension

For our username part we use

^[a-zA-Z0-9._%-]+

^ means our username must start with this character class [a-zA-Z0-9._%-].
+ follow by the character class so you must enter one or more of the previous character class.

For our domain part we use

[a-zA-Z0-9.-]+

Our domain name must present one or more of that character class.

And then we need to escape with the backslash to use the . as the literal character.

For the extension part we use

[a-zA-Z]{2,4}$

So your extension must present 2 to 4 of the previous character class.

Now you can create some of the regex patterns by yourself I think.

There are many other regex patterns in PHP. I will explain you the rest of other regex patterns in the part 2.

Ref:

phpro.org

regular-expressions.info
Read more ...