Here are some answers to frequently asked questions about System Health.
- How do I check for possible data loss?
- How do I monitor resource consumption?
- How do I check the performance of my RPCAP deployments?
- Are my triggers running properly?
- How do triggers affect my appliance?
- How are my open data streams performing?
- What is the estimated lookback capacity?
- How many devices is the appliance monitoring?
- Are my SSL certificates decrypting as expected?
- How do I add system health metrics to a dashboard?
- What other tools can help me evaluate system health?
The best indicators of data loss are dropped packets, TCP desyncs, and excessively high packet or throughput rates.
- Check the Capture Drop Rate chart for packets dropped at the network card interface, SPAN, or network tap
- Check the Desyncs chart for system-wide desyncs, which indicate that synchronization was lost when processing a TCP connection.
- Monitor the following charts to ensure that the ExtraHop Discover appliance is not exceeding product
A high packet rate or throughput rate might result in packets dropped at the span source or at a span aggregator. Acceptable rates and limits are available on the Datasheets for Discover appliances.
The Discover appliance allocates memory resources for capturing packets, running triggers, transmitting data to remote servers, and recording to the datastore.
Check the following charts for the amount of memory that the Discover appliance dedicates to each resource area over a given time range:
After the initial setup of a remote packet capture (RPCAP) deployment, it is a good idea to make sure your deployment is working as expected.
To get the most out of your triggers, make sure that new and modified triggers are producing accurate data without degrading system performance.
- View the Trigger Executes and Drops chart to ensure that the amount of trigger activity is consistent with your expectations. Look for bursts of trigger activity that might indicate inefficient behavior from one or more triggers. With this chart, you can also track the number of triggers that have been dropped from the trigger queue. The ExtraHop system might drop a long-running trigger that is dominating resource consumption.
- View the Trigger Executes by Trigger chart after you have created a new trigger or modified an existing one to ensure that the trigger is running. Any trigger consuming higher resources than average might have a poorly-optimized script that is affecting performance.
- Check the Trigger Exceptions by Trigger chart to display any unhandled trigger exceptions. Exceptions are a large contributor to system performance issues and should be corrected immediately.
You can monitor whether your datastore triggers, also referred to as bridge triggers, are running properly with the following charts:
In addition to monitoring how well your triggers are running, the System Health page provides charts that enable you to monitor and assess the impact of running triggers to your Discover appliance.
- View the Trigger Load chart to display several measurements of resource consumption by all running triggers. Look for spikes in consumption that can indicate that a new trigger has been introduced or that an existing trigger is having issues.
- Check the Trigger Load by Trigger chart to view the number of cycles consumed by each running trigger. A trigger that runs seldom but consumes more cycles than average can cause other triggers to be dropped from the queue.
- Check the Trigger Cycles by Thread chart to view the number of cycles each thread has allocated to trigger operations. Look for an even amount of consumption among multiple threads. Trigger drops might occur if the consumption of one thread is considerably higher than the others.
You can monitor the impact of datastore triggers, also referred to as bridge triggers, that are running on your Discover appliance with the following charts:
You can monitor charts that pertain to the health and performance of open data stream (ODS) transmissions to a third-party syslog, database, or server.
- Click the Messages Sent chart to view the total number of messages transmitted by all active data streams and the number of errors that occurred during those transmissions. Monitor this chart to ensure that messages are being transmitted as expected. If no bytes are sent, there might be an issue with the configuration of an open data stream or an ODS trigger.
- Click the Message Throughput chart to view the total number of bytes transmitted by all active data streams. Monitor this chart to ensure that bytes are being transmitted as expected. If no bytes are sent, there might be an issue with the configuration of an open data stream or an ODS trigger.
- Check the ODS Connections chart for an at-a-glance view of attempts to connect to ODS targets and errors that occurred during the attempts.
- Monitor the Messages Dropped chart to view the number of messages dropped from an open data stream because the message queue was full. Dropped messages indicate that the message queue is too long.
- Monitor the Message Queue Length chart to display the number of messages waiting in the queue. A long message queue might indicate that the Discover appliance is sending data faster than the remote system can process.
Lookback refers to how far back you are currently able to look up historical data. For example, you might be able to look up 1-hour intervals of data as far back as 9 days.
- Monitor the Metric Data Lookback Estimates chart to determine the current estimated lookback capacity of your Discover appliance. The chart displays lookback metrics for 1 hour, 5 minute, and 30 second time intervals based on the write throughput rate.
The System Health page provides charts that help you determine how many L2, gateway, pseudo, custom, and L3 devices are monitored by the Discover appliance.
You can access a list of all certificates that perform decryption on the Discover appliance by clicking Certificates at the top of the System Health page.
- Check the Certificate Details table to ensure that the correct SSL certificates are installed on the Discover appliance and to view encryption metrics for each certificate. Encryption metrics help you determine if your certificates are performing decryption as expected. For example, you can check the number of successfully encrypted sessions or the number of sessions that were not decrypted due to hardware errors.
|Tip:||If you are unfamiliar with creating and editing dashboards, see our Dashboard Walkthrough.|
The Status and Diagnostics section of the Admin UI provides metrics about the overall health of the ExtraHop system and diagnostic tools that enable ExtraHop Support to troubleshoot system errors.
- Check health statistics to view metrics that indicate the operating efficiency of the ExtraHop system.
- Check the audit log to view event logging data and to change syslog settings.
- Learn about exception files and how to enable or disable them on the ExtraHop system.
- Learn about support scripts and how to upload and run them on the ExtraHop system.
You can also view the following resources to learn more about system health: