Import external data to your ExtraHop system
The ExtraHop Open Data Context API enables you to import data from an external host into the session table on your ExtraHop sensor. That data can then be accessed to create custom metrics that you can add to ExtraHop charts, store in records on a recordstore, or export to a external analysis tool.
After you enable the Open Data Context API on your sensor, you can import data by running a Python script from a memcached client on an external host. That external data is stored in key-value pairs, and can be accessed by writing a trigger.
For example, you might run a memcached client script on an external host to import CPU load data into the session table on your Discover appliance. Then, you can write a trigger that accesses the session table and commits the data as custom metrics.
Warning: | The connection between the external host and the ExtraHop system is not encrypted and should not transmit sensitive information. |
Enable the Open Data Context API
You must enable the Open Data Context API on your sensor before it can receive data from an external host.
Before you begin
- You must have unlimited privileges to access the Administration page on your ExtraHop system.
- If you have a firewall, your firewall rules must allow external hosts to access the specified TCP and UDP ports. The default port number is 11211.
Write a Python script to import external data
Before you can import external data into the session table on your sensor, you must write a Python script that identifies your sensor and contains the data you want to import into the session table. The script is then run from a memcached client on the external host.
This topic provides syntax guidance and best practices for writing the Python script. A complete script example is available at the end of this guide.
Before you begin
Ensure that you have a memcached client on the external host machine. You can install any standard memcached client library, such as http://libmemcached.org/ or https://pypi.python.org/pypi/pymemcache. The sensor acts as a memcached version 1.4 server.
Here are some important considerations about the Open Data Context API:- The Open Data Context API supports most memcached commands, such as get, set, and increment.
- All data must be inserted as strings that are readable by the sensor. Some
memcached clients attempt to store type information in the values. For example,
the Python memcache library stores floats as pickled values, which cause invalid
results when calling Session.lookup in triggers. The following
Python syntax correctly inserts a float as a
string:
mc.set("my_float", str(1.5))
- Although session table values can be almost unlimited in size, committing large values to the session table might cause performance degradation. In addition, metrics committed to the datastore must be 4096 bytes or fewer, and oversized table values might result in truncated or imprecise metrics.
- Basic statistics reporting is supported, but detailed statistics reporting by item size or key prefix is not supported.
- Setting item expiration when adding or updating items is supported, but bulk expiration through the flush command is not supported.
- Keys expire at 30-second intervals. For example, if a key is set to expire in 50 seconds, it can take from 50 to 79 seconds to expire.
- All keys set with the Open Data Context API are exposed through the SESSION_EXPIRE trigger event as they expire. This behavior is in contrast to the Trigger API, which does not expose expiring keys through the SESSION_EXPIRE event.
Write a trigger to access imported data
You must write a trigger before you can access the data in the session table.
Before you begin
This topic assumes experience with writing triggers. If you are unfamiliar with triggers, check out the following topics:Next steps
You must assign the trigger to a device or device group. The trigger will not run until it has been assigned.Open Data Context API example
In this example, you will learn how to check the reputation score and potential risk of domains that are communicating with devices on your network. First, the example Python script shows you how to import domain reputation data into the session table on your sensor. Then, the example trigger script shows you how to check IP addresses on DNS events against that imported domain reputation data and how to create a custom metric from the results.
Example Python script
This Python script contains a list of 20 popular domain names and can reference domain reputation scores obtained from a source such as DomainTools.
This script is a REST API that accepts a POST operation where the body is the domain name. Upon a POST operation, the memcached client updates the session table with the domain information.
#!/usr/bin/python import flask import flask_restful import memcache import sqlite3 top20 = { "google.com", "facebook.com", "youtube.com", "twitter.com", "microsoft.com", "wikipedia.org", "linkedin.com", "apple.com","adobe.com", "wordpress.org", "instagram.com", "wordpress.com", "vimeo.com", "blogspot.com", "youtu.be", "pinterest.com", "yahoo.com", "goo.gl", "amazon.com", "bit.ly} dnsnames = {} mc = memcache.Client(['10.0.0.115:11211']) for dnsname in top20: dnsnames[dnsname] = 0.0 dbc = sqlite3.Connection('./dnsreputation.db') cur = dbc.cursor() cur.execute('select dnsname, score from dnsreputation;') for row in cur: dnsnames[row[0]] = row[1] dbc.close() app = flask.Flask(__name__) api = flask_restful.Api(app) class DnsReputation(flask_restful.Resource): def post(self): dnsname = flask.request.get_data() #print dnsname mc.set(dnsname, str(dnsnames.get(dnsname, 50.0)), 120) return 'added to session table' api.add_resource(DnsReputation, '/dnsreputation') if __name__ == '__main__': app.run(debug=True,host='0.0.0.0')
Example trigger script
This example trigger script canonicalizes (or converts) IP addresses that are returned on DNS events into domain names, and then checks for the domain and its reputation score in the session table. If the score value is greater than 75, the trigger adds the domain to an application container called "DNSReputation" as a detail metric called "Bad DNS reputation".
//Configure the following trigger settings: //Name: DNSReputation //Debugging: Enabled //Events: DNS_REQUEST, DNS_RESPONSE if (DNS.errorNum != 0 || DNS.qname == null || DNS.qname.endsWith("in-addr.arpa") || DNS.qname.endsWith("local") || DNS.qname.indexOf('.') == -1 ) { // error or null or reverse lookup, or lookup of local namereturn return; } //var canonicalname = DNS.qname.split('.').slice(-2).join('.'); var canonicalname = DNS.qname.substring(DNS.qname.lastIndexOf('.', DNS.qname.lastIndexOf('.')-1)+1) //debug(canonicalname); //Look for this DNS name in the session table var score = Session.lookup(canonicalname) if (score === null) { // Send to the service for lookup Remote.HTTP("dnsrep").post({path: "/dnsreputation", payload: canonicalname}); } else { debug(canonicalname + ':' +score); if (parseFloat(score) > 75) { //Create an application in the ExtraHop system and add custom metrics //Note: The application is not displayed in the ExtraHop system after the //initial request, but is displayed after subsequent requests. Application('DNSReputation').metricAddDetailCount('Bad DNS reputation', canonicalname + ':' + score, 1); } }
Thank you for your feedback. Can we contact you to ask follow up questions?