NetFort Advertising

How to do a URL search using network traffic analysis

URL search tips

What are your options to address URL search requirements?

Before I go into how you can do a URL search using network traffic as a data source, I want to go back over and explain what a URL string is.

A Uniform Resource Locator (URL) string is a subset of the Uniform Resource Identifier (URI) that specifies where an identified resource is available and the mechanism for retrieving it. Examples of  URLs would be:

URL: ftp://ftp.netfort.c0m/doc/languardian-tips.txt
URL: https://www.netfort.com/download-languardian/
URL: mailto:support@netfort.com

All of the above are also URI’s but a true URI may contain extra info like an anchor link which is used client side to automatically navigate to a particular section of a webpage.

URL String

For most use cases, a URL search involves searching for either a full or partial website name to see who is accessing it. Here is some feedback we recently got from a university customer which they sent back after evaluating our LANGuardian product. This is a very typical use case.

All those within the test group without exception found it (LANGuardian) to be a very useful tool for detecting suspicious traffic and for discouraging misbehavior.

A very popular feature mentioned by most users in the test group was the ability to search by URL. All the users were in agreement that it provides a very quick and easy way to extract the exact information management would like visibility of, particularly where cloud services are concerned“.

Building a database of URL search strings

There seems to be an increasing demand for more sophisticated analysis and visibility of the Internet link and activity probably driven by:

  • Security concerns,  continuous monitoring and rich visibility of activity on this link is an absolute must these days.
  • Cloud, hybrid cloud, etc. Many applications used across organizations today are hosted externally and as a result, the utilization of this link is critical.

Before you can search URL strings you need a data source. The most common ones I come across are:

  1. Local packet capture on a PC or laptop.
  2. Network wide packet capture through a SPAN, mirror ports or TAPs.
  3. Log file analysis on firewalls or proxy servers.

I am not including any flow based tools in this post as most are not good web usage trackers. Some IPFIX implementations can export HTTP header information but very few tools actually use this.

Local Packet Capture

Capturing network traffic locally on your PC or laptop is a great way to learn about packet capture and how you can use this to search for URL strings. Wireshark is the most popular tool and it allows you to capture all network traffic going in and out of local network adapters.

If you want to do a URL search, you simply use the display filter within Wireshark to search for a specific text string.

Pros

  • Free and easy way to capture local traffic
  • Great for learning about packet capture and traffic analysis

Cons

  • Does not scale up. Very easy to overload a system if you try and capture traffic at high data rates.
  • While it is fine for real time analysis, you wont get long term storage of data unless you have access to lots of disk space.
  • Complex, not that easy to read and interpret. Difficult to easily get the ‘big picture’.
URI String

Network wide packet capture through a SPAN, mirror port or TAP

If you want to scale up from local packet capture, then you should look at options like SPAN ports or TAPs. This approach will allow you to get a copy of all traffic flowing into and out of your network and so you will get a data source for all web activity on your network.

The video at the link below goes through the steps that are needed to monitor Internet activity via a SPAN port.

Pros

  • Visibility of all Internet activity on your network.
  • SPAN or port mirror options available on most managed switches with no impact on performance.
  • Works effectively whether a web proxy is in place or not.
  • Deploy in minutes, no agents, clients, no network downtime.

Cons

  • Free tools\software offerings that can connect to a SPAN or mirror ports are limited so you need to look at a commercial solution.

Web Users Report

Once you have got your SPAN port setup, you can use a tool like NetFort LANGuardian to process the packet data. The NetFort DPI engine extracts application level detail like URL strings from the traffic flows, discarding the remainder of the packet contents before storing them in the built in database.

This data reduction (400:1 over full packet capture and storage) results in cost effective long life historical storage of network and user activity, very useful for forensics, reporting and planning.

It stores all the critical details including IP address, user name, domain names, URI and bandwidth consumed in its own database. This gives you access to realtime and historical web usage reports.

If are considering other tools, make sure they include both realtime and historical reporting features to match you data retention requirements.

Log file analysis on firewalls or proxy servers

Many firewall and proxy servers will have logging options. These can be very useful for troubleshooting or checking if changes to firewall rules are working. However, server log files do have their limitations. They are meant to provide server administrators with data about the behavior of the server, not the behavior of the user like what URLs they are accessing.

I recently attended a conference which brought together network and security professionals from colleges and universities all over the UK. During the conference, one IT manager described how their network fell victim to multiple DDoS attacks. Their firewalls were under so much pressure, they could not access the logs and get any visibility. One recommendation from this was not to rely on firewall logs alone, you need another data source to troubleshoot problems.

Pros

  • Great for troubleshooting problems or checking if changes to block rules are working.

Cons

  • Enabling logging will impact on firewall or proxy performance. These devices were not designed for long term capturing of log information.
  • If your proxy or firewall is having performance issues you wont be able to access the logs to troubleshoot the problem.
Web Proxy Log

Do you have any other ideas on how to capture and search URL information? Comments welcome.