The DNS (domain name system) protocol is critical for supporting internet traffic. It often works without issues. However, DNS servers are commonly misconfigured or overloaded in IT environments, which can affect internet performance.
There are many ways to explore DNS metrics in the ExtraHop system. In this walkthrough, we’ll show you how to review DNS metrics in a dashboard, navigate to DNS protocol pages, and drill-down on interesting metrics to identify potentially-affected devices.
- Is there a network or DNS issue that is affecting internet performance?
- What are the number of DNS failures on my network?
- Which clients are not responding to my DNS servers?
- Learn about interpreting DNS metrics in the ExtraHop system by viewing our online training module, Quick Peek: DNS.
- Learn about problem DNS queries and errors that you can monitor in your own environment by installing the ExtraHop DNS Bundle. This bundle contains a dashboard with pre-configured charts and detailed explanations about key DNS errors.
If a slow internet issue is reported, look at the system dashboards to determine whether the issue is related to network throughput or to the DNS protocol.
- Log into the Web UI on the Discover appliance.
Click Last 30 minutes in the top-left navigation bar,
select Last week, and then click
Changing the global time interval gives you a chance to see network and protocol behavior that occurred prior to the detected problem.
- Click Dashboards, and then click Network in the System Dashboards section.
Confirm that the Network Throughput and L2
Packets charts show normal or consistent peaks, similar to the
- Click Activity in the System Dashboards section.
Scroll down to the All Activity DNS Server Processing Time
and All Activity DNS charts.
The All Activity DNS Server Processing Time chart
shows you the time between the last packet of a DNS request from a client and the first packet of a DNS response
from the server. Hover over the median to compare the processing time at
the same time point. A large difference between the median value and
95th percentile indicates that something might be wrong with a DNS
server in your network.
The All Activity DNS chart correlates responses
and errors. A spike in errors can add delays of two to four seconds for
clients, servers, applications, and customers. In the figure below, the
proportion of responses to errors looks consistent.
- The All Activity DNS Server Processing Time chart shows you the time between the last packet of a DNS request from a client and the first packet of a DNS response from the server. Hover over the median to compare the processing time at the same time point. A large difference between the median value and 95th percentile indicates that something might be wrong with a DNS server in your network.
The Request Timeout metric indicates a failure to fulfill a DNS request. Let’s look at the total number of request timeouts to see if DNS requests are timing out. We can then drill-down to see which of our DNS servers are not getting responses.
- Click the All Activity DNS chart title.
View the number of Request Timeouts in the DNS
Metrics section. In the figure below, the number is high
(1,174,645) and worth investigating further.
Hover over the request timeouts number and select By Server
IPto view all of the server IP addresses in your network with
Note which devices have the highest number of request timeouts. In the figure
below, this is Device 192.168.35.103.
- In the Device column, click the name of the device with the highest number of request timeouts. A new page opens to display additional DNS metrics about that device.
You can now pinpoint which clients are not responding to your DNS server.
- Click Clients near the top of the page to open a new page that lists all of the IP addresses from clients that received requests from your DNS server, the number of requests, and the time it took for the DNS server to process the requests.
Search the Total (ms) column for any blank entries.
Tip: Click column headings to sort by the highest or lowest values.
In the figure below, note that client-1 has a blank entry in the Total (ms)
column, which indicates that this client is not responding to the DNS server
request. In addition, client-2 is experiencing long response times from the DNS