Metrics walkthrough: finding DNS failures

The DNS (domain name system) protocol is critical for supporting internet traffic. It often works without issues. However, DNS servers are commonly misconfigured or overloaded in IT environments, which can affect internet performance.

There are many ways to explore DNS metrics in the ExtraHop system. In this walkthrough, we’ll show you how to review DNS metrics in a dashboard, navigate to DNS protocol pages, and drill-down on interesting metrics to identify potentially-affected devices.

Specifically, you’ll learn how to answer the following questions:
  • Is there a network or DNS issue that is affecting internet performance?
  • What are the number of DNS failures on my network?
  • Which clients are not responding to my DNS servers?
Additional resources are available for interpreting DNS:
  • Learn about interpreting DNS metrics in the ExtraHop system by viewing our online training module, Quick Peek: DNS.
  • Learn about problem DNS queries and errors that you can monitor in your own environment by installing the ExtraHop DNS Bundle. This bundle contains a dashboard with pre-configured charts and detailed explanations about key DNS errors.


Identify DNS issues with system dashboards

If a slow internet issue is reported, look at the system dashboards to determine whether the issue is related to network throughput or to the DNS protocol.

  1. Log into the Web UI on the Discover appliance.
  2. Click Last 30 minutes in the top-left navigation bar, select Last week, and then click Save.
    Changing the global time interval gives you a chance to see network and protocol behavior that occurred prior to the detected problem.
  3. Click Dashboards, and then click Network in the System Dashboards section.
  4. Confirm that the Network Throughput and L2 Packets charts show normal or consistent peaks, similar to the figure below.
  5. Click Activity in the System Dashboards section.
  6. Scroll down to the All Activity DNS Server Processing Time and All Activity DNS charts.
    1. The All Activity DNS Server Processing Time chart shows you the time between the last packet of a DNS request from a client and the first packet of a DNS response from the server. Hover over the median to compare the processing time at the same time point. A large difference between the median value and 95th percentile indicates that something might be wrong with a DNS server in your network.
    2. The All Activity DNS chart correlates responses and errors. A spike in errors can add delays of two to four seconds for clients, servers, applications, and customers. In the figure below, the proportion of responses to errors looks consistent.
Based on these dashboard charts, the network throughput appears okay. Next, we should investigate our DNS servers. Click the All Activity DNS chart title to switch to the All Activity page in the Metrics section of the Web UI.

View the number of DNS request timeouts

The Request Timeout metric indicates a failure to fulfill a DNS request. Let’s look at the total number of request timeouts to see if DNS requests are timing out. We can then drill-down to see which of our DNS servers are not getting responses.

  1. Click the All Activity DNS chart title.
  2. View the number of Request Timeouts in the DNS Metrics section. In the figure below, the number is high (1,174,645) and worth investigating further.
  3. Hover over the request timeouts number and select By Server IPto view all of the server IP addresses in your network with request timeouts.
  4. Note which devices have the highest number of request timeouts. In the figure below, this is Device
  5. In the Device column, click the name of the device with the highest number of request timeouts. A new page opens to display additional DNS metrics about that device.

Find the clients that are not responding

You can now pinpoint which clients are not responding to your DNS server.

  1. Click Clients near the top of the page to open a new page that lists all of the IP addresses from clients that received requests from your DNS server, the number of requests, and the time it took for the DNS server to process the requests.
  2. Search the Total (ms) column for any blank entries.
    Tip:Click column headings to sort by the highest or lowest values.
  3. In the figure below, note that client-1 has a blank entry in the Total (ms) column, which indicates that this client is not responding to the DNS server request. In addition, client-2 is experiencing long response times from the DNS server.

This information confirms that this DNS server might be misconfigured or is having some other issues. Contact the team responsible for the DNS server for further investigation.
Published 2019-10-11 14:52