Welcome to the Proxy Update, your source of news and information on Proxies and their role in network security.

Thursday, March 20, 2008

URL Lists and webfiltering in the Proxy

There's a lot of vendors out there selling URL lists for webfiltering on the web proxy. Each one claims to be superior to the others, but what in reality are you paying for when you subscribe to these lists?

The idea is of course to categorize the entire web (so you can block unwanted sites like spyware, porn, etc), but thousands of new pages are being created on the web daily. Wikipedia estimates there are around 100 million websites, containing over 2 billion web pages. If you survey the vendors offering URL lists for webfiltering, some claim to contain as much as 20 million websites, but even that falls short of Wikipedia's estimates.

The theory behind this of course is that by categorizing 20% of the websites, you are getting 80% of the web hits. This is great for the 80% of websites that match a category, but what about the other 20%? Even if you subscribe to a URL list, your proxy needs some way to categorize the new sites and the uncategorized sites on the web. Some proxy vendors offer a way to dynamically rate websites real-time when they don't match in a URL list. The key here is to be able to produce this rating without introducing any visible latency to the end-user.

When evaluating lists, there are other criteria to watch out for as well. Can the website exist in more than one category? Just because a site may be a sports information site, doesn't preclude it from offering gambling in one form or another. You may want to block that site just because it offers gambling, and if your URL list only categorizes it as a sporting site, you'll have a less than effective policy.

You should also check to see how responsive your vendor is when a mis-categorization is found. Are they quick to verify the mis-categorization and change their lists?

The last concern has to do with links, and this one perhaps is more a proxy requirement rather than a URL database list requirement. Any given site can have useful information, while at the same time have embedded information gathered from other sites, that are categorized in a prohibited group according to your policy. The most flexible proxy should still show you the good information, while blocking the embedded portions of the website.

URL lists are necessary to help enforce policy on the proxy, but implementing URL lists alone doesn't guarantee the security policy you expect. Remember to look for the gotchas when selecting your URL list and proxy vendor.

No comments: