Hi! Today we are going to review a very important part of the development process of a web application. The validation of users input. This is one the trickiest parts of any application at all. Why is that? Because the developer doesn't control it. You can write the best algorithm in the world, but still if it includes user input there is a place for mistakes. Even if we put some coplicated logic to prevent the input of wrong symbols, check the consistence of the data and do whatever possible to make sure that it is all OK, there is still possibility that the users enter the wrong number. Though all said, we must try to prevent the most of human errors and the best way to do this is by using Regular Expressions.
Basicly Regular Expressions are used for string matches. They are based on search and pattern matching strings in text. A lot of books are written about them, there are even some programming languages designed especially for Regular Expressions. But today we are just going to take a brief look at how regular expressions can help us with user input. First of all I suggest that you get familiar with some basic concepts of the language. It's syntax is fully explained in PHP Manual --> Pattern Syntax.
Now let's get to work. I'll present some of the most common problems with user input. I'm pretty sure that you met most of them if not all.
We are going to create a registration form with required input fields. They are as follows:
- Full Name
- Address
- Passport
- Email
- Phone
- Zip code
- Date
- Username
- Password
Here is the test form that we will use PHP validation example (download)
We have to define some variables that will hold our error messages. Their values have to be cleared every time we reload our page.
There are two ways to use regular expressions in php. One is the true PHP style in which case we have to use ereg() function and the other is to use Perl style syntax for our validations. In this case we have to use preg_match() function. In this tutorial we will use preg_match() because it is faster in most cases and also supports the most common regular expression syntax. It also gives us more capabilities, that we can use.
We will start with validation of the name of the user. We will allow only letters, space and a dash. So we create our regexp (Regular Expression).
We will make a class for our possible values. The class is created when we enclose some symbols in parences. This is our class:
[a-zA-Z -] Our class includes all letters between a-z (all lower case letters), A-Z (all upper case letters), space and a dash.
Now we have to set this class to apply for every character that we enter. So we add a (+) plus sign after our class definition. We
are still missing something. We have not defined the range of our validation test. We have to set which part of the text we are validating.
If we don't do this our regular expression will be satisfied if it finds even one match in the characters that we enter, which is of no use
for us. How do we do this? We put our string between /^$/ start and end characters. "^" means the start of the line and "$" means the end of it.
We are ready to build our regexp.
/^[a-zA-Z -]+$/ The forward slash is used by preg_match to define the start and the end of our regexp.
Now we are finished, are we? There is just one more thing to do. The way that we defined our class allows the user to enter dash at the
begining of the name. This is something we want to prevent. So we have to add something to our regexp, so it will disallow this.
[A-Z] We define a new class for the first letter of the user name. It can contain only upper case letters.
Now we combine what we have done so far, to get the final result. The return of preg_match() is 0 if there isn't a match. In that case we
have to set our error variable, so we can show some meaningful message to the user.
/^[A-Z][a-zA-Z -]+$/
We translate this regexp as: From the begining to the end of the address string check if our character is one of the following a-z, A-Z, 0-9, space, underscore, dash, dot, comma, semicolons, double and sigle quotes. You can add any character that you think may be part of an address. The thing to notice here is that when we have quotes we have to put an escape character before them.
The next string for validation is passport. It can contain only numbers and be 10 or 12 digits. But how we set how many characters we want.
We put the desired number of characteras in parences {} and our regexps will look like this /^\d{10}$/ and /^\d{12}$/. How we combine these
two expressions so that we use either one or the other. We use OR. It's sign is "|". Our statement is complete /^\d{10}$|^\d{12}$/.
I will present a phone mask. It can be a lot different, but it is simle enough to be easily customized. You just have to define the number of diggits in every part of the phone number and choose a delimiter. It can be any symbol you want. Zip code is also very easy to implement.
Now we will make date mask. It will look like this: YYYY-MM-DD. Our date will be made only by diggits. You already now how to set the lenght of the year, but the month and day can be between 1 and 2 diggits in lenght. We set this by separating the two values by comma {1,2}. This means that all the numbers in this interval are valid value.
The last thing to check in our registration - validation form is for username and password of our user. Username can be any string that consist of letters, diggits and uderscore character ( "\w" predefined class). We want the username to be at least 5 chars long. This is accomplised by this statement {5,}. The missing value after the comma means that it can be of any value equal or bigger that 5.
This concludes our tutorial. You see what a powerfull tool regular experessions are and how they can help us in form input verifications. They are way more complex than what you see here, but knowing at least the basics is essential. So get those heavy books and start reading. I hope that those examples help you with your work.
| Comments |
| Jay posted on 2008-05-20 00:03:59 |
| Thank you for the helpful article. I would add that email addresses can have dashes in the domain name as well. In this case how do you add a dash to the \w without rewriting it as [0-9a-zA-Z_-] ? |
| John posted on 2008-05-19 19:53:52 |
| Thank you for the brilliant explanation. I just want to add that in the case of the surname, we ought to include an apostrophe. We do have names like O'Connor and O'Reilly. ------------- Veselin: you can use if(preg_match("/^[A-Z][a-zA-Z -']+$/", $_POST["name"]) === 0) |
| satisfied posted on 2008-05-17 06:51:01 |
| this script is what i need, thanks ! |
| Alisa posted on 2008-04-24 07:40:35 |
| I love this website. Very useful and helpful information. Thank you. |
| Ogolotse posted on 2007-12-11 09:34:31 |
| I am impressed with the validation using the regular expression. it is easy and user may understand it good work keep it up.i will like to get new update about php. Ogolotse |