Scriptplayground Network

Subscribe to Tutorial Feed

Flash and PHP Bible

The Flash and PHP Bible has been released! The book can be found on Amazon or wherever fine books are sold in your area.

The Flash and PHP Bible has a forum for quick support.

Scriptplayground » tutorials » php » Simple Way to Validate Links

Simple Way to Validate Links

Validate url's before you accept them in your scripts.

<?php
function validLink($link) {
    if(preg_match("/http:\/\//", $link)) {
        return true;
    } else {
        return false;
    }
}
?>

This simple little function checks a link for the correct http://. This could be taken a step further and checked against a series of proper protocals (http, ftp, feed) etc...

Most of the magic is done in the preg_match function. We check for an http://, but must escape certain characters within this function so we use \ to escape the forward slashes /.

That all thats too this function, if you have some ideas or possible expansions. Let me know, maybe they could be added to this tutorial.

| Print It |  Follow Scriptplayground on Twitter (@scriptplay)

Comments: Simple Way to Validate Links

 entity_azirius  Sat Jan 6, 2007 10:55 pm  
Pretty cool, the preg_match() function is pretty wicked! Is there a way to make sure it has a .com/.co.uk/etc?
 Jhecht  Sun Jun 10, 2007 5:47 pm  
Something like this usually is what people do(if it doesn't work, keep in mind i just wrote this RIGHT now)

$preg = "/http:\/\/([a-z0-9]{2,}\-?\.[a-z]{2,3})(\.?[a-z]{2,3})?/i"

I haven't tested that... Maybe Matt knows a better one(considering I'm not too good at regex, it takes me a wile to get it right).
 fqa@vardagsrummet.mine.nu  Sat Dec 1, 2007 12:49 am  
how about this i made yesterday;

THE CODE - one liner
[PHP]$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}[/PHP]

(OPTIONAL: READ BELOW FOR EXPLANATION)

it will validate all these types of urls
[PHP]
// valid urls
$url = "https://user:pass@www.somewhere.com:8080/login.php?do=login&style=%23#pagetop";
$url = "http://user@www.somewhere.com/#pagetop";
$url = "https://somewhere.com/index.html";
$url = "ftp://user:****@somewhere.com:21/";
$url = "http://somewhere.com/index.html/"; //this is valid!!
[/PHP]

THE CODE - broken into section for easy editing and understanding:
[PHP]
// SCHEME
$urlregex = "^(https?|ftp)\:\/\/";

// USER AND PASS (optional)
$urlregex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?";

// HOSTNAME OR IP
$urlregex .= "[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*"; // http://x = allowed (ex. http://localhost, http://routerlogin)
//$urlregex .= "[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)+"; // http://x.x = minimum
//$urlregex .= "([a-z0-9+\$_-]+\.)*[a-z0-9+\$_-]{2,3}"; // http://x.xx(x) = minimum
//use only one of the above

// PORT (optional)
$urlregex .= "(\:[0-9]{2,5})?";
// PATH (optional)
$urlregex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
// GET Query (optional)
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?";
// ANCHOR (optional)
$urlregex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?\$";

// check
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}

[/PHP]

all the lines in the code above can be safely removed (except for hostname) if you don't want to allow some URL segment (if you don't want getqueries in your urls, just comment the respective $urlregex .= ....) - but do not reorder them.
the "(optional)" states that the part MAY exist, but url will be valid even if it doesn't contain the part (see the valid urls above).

syntax:
[code] :// [user[:pass]@] hostname [port] [/path] [?getquery] [anchor][/code]
-taking into account allowed safe characters
-assuming .. (dot dot) is never allowed in hostname or path

FEEDBACK IS APPRECIATED
 Eric  Wed Mar 19, 2008 2:00 pm  
Quite nice, i'd needed a good way to validate URL's! the last comment is very good! tnx!
 Toby Wallis  Fri Mar 28, 2008 7:37 am  
Nice... does it handle non-tlds like .co.uk, .net.uk, .cn etc.?

===Toby===
 mkeefe  Fri Mar 28, 2008 1:44 pm  
The tld can be anything given this example, however I recommend something more advanced in order to truly be secure.
 Sam  Tue Apr 14, 2009 1:08 pm  
Good job fqa at vardagsrummet.mine.nu !
tx for sharing
 Josh  Wed Sep 30, 2009 3:14 pm  
Thanks for that. I've been working on that very issue.

See it in action at http://www.MyWeddingHosting.com/blog.php

Josh
 Adidz  Fri Mar 26, 2010 5:53 am  
@fqa: what about an url like this:
http://www-google-com
http://www_google_com
 Jon  Thu Jun 17, 2010 12:21 am  
I've been trying to get this to work with preg_match():

// SCHEME
$urlregex = "^(https?|ftp)://";

// USER AND PASS (optional)
$urlregex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?";

// HOSTNAME OR IP
$urlregex .= "[a-z0-9+$_-]+(.[a-z0-9+$_-]+)*"; // http://x = allowed (ex. http://localhost, http://routerlogin)
//$urlregex .= "[a-z0-9+$_-]+(.[a-z0-9+$_-]+)+"; // http://x.x = minimum
//$urlregex .= "([a-z0-9+$_-]+.)*[a-z0-9+$_-]{2,3}"; // http://x.xx(x) = minimum
//use only one of the above

// PORT (optional)
$urlregex .= "(:[0-9]{2,5})?";
// PATH (optional)
$urlregex .= "(/([a-z0-9+$_-].?)+)*/?";
// GET Query (optional)
$urlregex .= "(?[a-z+&$_.-][a-z0-9;:@/&%=+$_.-]*)?";
// ANCHOR (optional)
$urlregex .= "(#[a-z_.-][a-z0-9+$_.-]*)?$";

// check
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}

The problem seems to be that this is POSIX and for preg it needs to be PCRE. I'm having quite a time with it. Anyone know how to fix it?
Add a comment
Name:
Website:
Comment:
Please note: Offensive comments, flaming and spamming is not permitted on this site and your comment will be deleted immediately.

HTML is not allowed.

Please provide all comments in English so that others can help you. A common helper in this is to use an online translator.

As a security measure your ip will be recorded.
 
Anti-Robot Check:

Enter the key you see above.

What is this?: This extra test has been added due to the recent explosion of spam.
 
Google