Limitations of using NetFlow to monitor cloud computing
For a variety of reasons we’re seeing more and more content being distributed via a content delivery network (CDN). CDNs are used to distribute content in such a way that multiple copies of the data exist on the Internet. These copies are on servers at points of presence around the world, so they are always close to the end user, and hence the data is delivered to the user’s desk faster.
For a long time, CDNs were only available to large organisations such as Microsoft and Adobe. These companies typically engaged Akamai™ for content delivery. Nowadays, thanks to services such as Amazon CloudFront™, CDNs are available to anyone who has a credit card. This is great news for people who are distributing content, but it’s bad news for network administrators who are relying solely on flow data, such as NetFlow, for visibility into activity on their networks.
Prior to the advent of CDNs, you could get a good understanding of a traffic flow by doing areverse DNS lookup of the source and destination IP address. Typically, the source address would correspond to a system on your network, while the destination address would correspond to an external host. For example, if the destination address resolved to downloads.AcmeInc.com, it would be clear to the network administrator that the flow would be attributable to someone downloading software from Acme, Inc.
Today, it’s very likely that the destination address for such a flow would resolve to 222.h.akami.net or similar. This destination address is clearly part of a CDN, so resolving the IP address to a hostname provides no further insight and the network administrator is none the wiser as to the real origin of the downloaded data or why the user is downloading it.
One way to identify the real origin of the download is to check the access logs on the HTTP proxy server for occurrences of the source and destination address. This might help the network administrator to hone in on the time, host name, and URL details for the download, but this is cumbersome and not certain to yield accurate information.
NetFort LANGuardian overcomes this problem by gathering and correlating traffic information from full-packet capture based on deep packet inspection (DPI) techniques. The information is accessible through a browser-based user interface, enabling the administrator to drill down to application-level detail and gain a full understanding of the traffic flow. To install LANGuardian you just need to find your network core and enable port mirroring or a SPAN port.
In the following example, we see that there has been a peak in bandwidth usage over a remote link.
Clicking the graph enables the network administrator to drill down into details of traffic over the link and see the source and destination addresses that caused the peak to occur.
In this scenario, we see that the download was from an IP address whose reverse DNS is a1775.g.akamai.net. Beyond the fact that we now know the peak was caused by HTTP traffic, we still don’t know what the user was doing.
However, when DPI is enabled, we can easily identify the real origin of the download.
The network administrators we speak to need this level of information because they often encounter remote links that are experiencing network congestion due to software patches being deployed, and patches are often deployed using CDNs. Armed with the information LANGuardian provides, they can then work with their colleagues who manage desktop deployment to identify ways to roll out patches without using up all the capacity on a remote link.
In summary, increased use of CDNs highlights the value of DPI in helping to resolve bandwidth problems that are difficult if not impossible to resolve using flow data alone.
If you want to know more about monitoring CDN activity on your network, please don’t hesitate to contact our support team here at firstname.lastname@example.org