Today was the first (well sort of) day of the 8th annual Splunk .conf convention here in DC. For those of you who aren’t familiar with it, you might be asking what exactly Splunk is.
We take your machine data and make sense of it. IT sense. Security sense. Business sense. Common sense. Splunk products deliver visibility and insights for IT and the business. Splunk was founded to pursue a disruptive new vision: make machine data accessible, usable and valuable to everyone.
Basically, Splunk is a program that takes machine data from nearly any source you can imagine (firewall logs, security entry/exit logs, active directory pulls, etc) and allows you to very easily make sense of it. The query language is very easy to understand, the platform handles huge amounts of data easily, and one of my favorite features, you don’t have to actually tell it what you are giving it when you ingest the data. That means you can extract the fields much much much later, on the fly. Many people are familiar with Elk, this it’s big, very over priced, brother who used to make fun of it and give it wedgies.
.conf covers a range of topics, is three days (well really 2.5) long, has over 200 technical sessions, and includes over 6,000 participants. In short, its a bit of a data science nerd orgie. You can see video’s of last year’s sessions here. I chose not to follow a particular track, and so am taking a bit of a hodge podge of sessions. It’s my goal to post a breakdown of the key points at the end of each day so keep your eyes peeled. So what’s on my agenda?
Day 1 (Tue)
.conf2017 Welcome Keynote: Customer success is at the heart of everything we do at Splunk – from empowering data-driven business transformation at the world’s largest companies to helping build the skills and careers of our passionate community advocates in SOCs, NOCs and data centers around the world. CEO Doug Merritt takes the stage to show how Splunk turns machine data into the answers our customers need to reimagine IT, security, the internet of things and business analytics. Special Guest Speaker Michael Ibbitson, Executive Vice President for Technology at Dubai Airports, will join Doug Merritt and Nate McKervey onstage.
Detect Numeric Outliers – Advances: Last year, we showed how the Detect Numeric Outlier Assistant from MLTK was used and modified for use on metrics with higher dynamics. Since then, we have made to new extensions to it, to improve on the capabilities of this solution.
Hunting the Known Unknowns: Finding Evil With SSL Traffic: This year’s “Hunting” session will describe how to find malicious adversaries using SSL. The talk will cover new ways to log SSL/TLS certificates and how to find malware in your network using SSL certificates (and more!). Throughout this session we will show you what TLS certificates are used for, how they can be used to find evildoers on your network and other ways you can use SSL traffic to find the “unknowns.” Finally, we will release a TLS/SSL hunting Splunk app for attendees to take home to start immediately implementing these techniques on their own network!
Analytic Stories or How I Learned to Stop Worrying and Respond to Threats: How do you know what to look for in your environment? Then what do you do when you find it? This session will help you answer these questions and more! Analytic stories provide a way to organize your searches, understand how to respond to events and what data is needed to detect and respond to this threat and detail why you should care about a given threat. They also allow you to map to different security frameworks so business owners can think about their security posture in business terms. This talk will discuss what makes up an analytic story, how they can be used to guide and inform your investigation and how to better understand your security posture.
Using NetFlow for Insider Threat Detection: The use of NetFlow (that is available on select routers) can be a very helpful tool in diagnosing security threats from inside sources. In this 15-minute, hands-on lab, you’ll learn how it’s possible, by utilizing layer 4 type of statistics, to view which endpoints are communicating to whom, which ports and protocols are being used, calculate endpoint deviation in bytes, highlight endpoint outliers, among others. Use cases can include deviation of bandwidth from endpoint to endpoint, possibly signaling an “exfiltration of data” issue, and violation of corporate policy by reporting on forbidden port usage and/or forbidden network communications.
Tokens in Splunk Web Framework: Use, Abuse and Incantations: This session covers the ins-and-outs of tokens in the Splunk Web Framework: SimpleXML and HTML/JS dashboards. Any dashboard containing more than predefined report searches will likely require tokens to relay user inputs, search states, and user feedback. This session will review how token states are represented, demonstrate how to manipulate tokens to drive dashboards using built-in features and explain how to extend the basic SimpleXML with custom JavaScript to make dashboards really shine. Examples will be provided in both SimpleXML and custom JavaScript/CSS/HTML. This talk is for any Splunk developer that wants to learn how to boost dashboard performance, improve user experience, and add safeguards against misuse. Splunk app developers that must maintain compatibility across versions of Splunk Enterprise are encouraged to attend and contribute to the community discussion.
How’d You Get So Big? Tips & Tricks for Growing Your Splunk Deployment from 50 GB/Day to 1 TB/Day: This session will cover two main subjects: minimizing the amount of hardware your Splunk installation requires through performance tuning and troubleshooting a number of issues that will likely occur as your Splunk installation grows in size and users. This session aims to assist Splunk administrators with troubleshooting and tuning their growing Splunk installation.
Day 2 (Wed)
.conf2017 Technology Keynote: The explosion of machine data presents a massive opportunity for companies able to use that to data meet and exceed the ever-increasing expectations of their customers and stakeholders. Find out what’s new, emerging and transformative across the Splunk platform and solutions to arm customers with the insights and intelligence needed to thrive in a digital marketplace.
Choosing the Right Infrastructure for Your Splunk Deployment: The Splunk platform has become a business-critical application with power that organizations around the world depend on for security, operations and other needs. But with great power comes great responsibility, as users demand the necessary performance, availability and scalability from their Splunk environment. Deploying and running Splunk on the right infrastructure is critical to success, and there are many paths one can take: on-premises, off-premises; SAN or DAS, virtual or bare metal. This session will explore these different paths and discuss the benefits and potential drawbacks to each, followed by reviewing the relevant best practices for deploying Splunk
Data Science Ops in Practice – Learn How Splunk Enables Fast Science for Cybersecurity Operations: This session will provide real-world examples of how one data-science team has been providing quick turnaround operational support within the federal sector with our client (U.S. Cyber Command). We will walk through how our agile workflow allows flexibility in identifying data analytic needs to complement cyber analysis, include a real-world scenario showing how we fought through cultural barriers to deliver impact-to-security reporting and outline how Splunk can be leveraged for analyzing both big and small data challenges, while leveraging machine learning. Those that attend this session will walk away armed with actionable steps they can employ within their own government organizations that will foster growth and collaboration between cyber analysts, mission directors and data scientists alike! At the conclusion of our talk, we will announce new modular advanced analytics/machine learning apps that were developed with the Booz Allen and Splunk partnership and tested in Operations.
Automating Threat Hunting With Machine Learning: Organizations continue to be challenged by human resource constraints, time constraints and the expanding footprint of IT and security. As a result, conversations about security automation are becoming mainstream. Likewise, machine learning is gaining attention for its threat detection talents. In this talk, we explore the intersection of automation and machine learning in the context of threat hunting. We will demonstrate a Splunk proof of concept that enables hypothesis testing. We will share a model to rationalize extensions of the implementations. And we will discuss the concepts behind the Splunk components used in the examples.
Fields, Indexed Tokens and You: Splunk software does many things to make your searches run fast. Most importantly, Splunk has to narrow down the set of potentially matching events. The fewer events that Splunk must scan, the faster your search will run. In this session, we will explore how Splunk software uses fields and indexed tokens to achieve this and how you can leverage them to your advantage. You will learn how to detect optimization potential in your searches and how to make meaningful changes. Additionally, we will cover how common configurations can have a great impact on search performance.
The Critical Syslog Tricks That No One Seems to Know About: Some of the most important logs an enterprise generates can only be delivered to Splunk in syslog format. In this talk, we’ll guide you through every step you need to follow to get Splunk collecting syslog perfectly in any environment. We’ll provide a ready-to-use syslog-ng.conf along with detailed explanations of why we used the settings we did. We’ll give you working cron jobs that roll old log data over, and explain why you’ll lose a couple of seconds of logs every night if you use logrotate instead. You’ll learn where syslog-ng fits in your network and Splunk architectures to minimize data loss. You’ll also learn about the default Splunk setting that causes major input delays if you don’t know to change it. Finally, we’ll give you the tool we built to manage thousands of syslog inputs and make sure they all get labeled with the right index, source type, host and time zone. In short, we’re going to lay out everything you need to solve the syslog problem for your enterprise once and for all.
Bushfire Alerting Automation System: The Converging Data Bushfire Alerting Automation System is designed to gather data on homes and their surrounding fire-related characteristics. Sensors can measure: smoke, water tank levels, temperature, humidity, wind direction, wind speed, flame characteristics, rain, UV, infrared output, air quality and power. The data generated from these sensors is shared to a Splunk Cloud instance. Communities can securely access their data, which can be shared with emergency services including government fire agencies, police, fire departments, ambulances, hospitals, and infrastructure service providers.
Day 3 (Thur)
.conf2017 Guest Keynote: Featured Special Guest Speaker Billy Beane is considered one of the most progressive and talented baseball executives in the game today. Billy Beane has molded the Oakland Athletics into one of professional baseball’s most consistent winners. Beane’s innovative management style involves utilizing analytics to create and sustain a competitive advantage. By striking parallels between baseball and business, Beane inspires audiences across industries.
APT Splunking: Searching for Adversaries with Quadrants (and other methods): As their name suggests, APT (advanced persistent threat) attacks are among the most pernicious and most damaging attacks an information environment can face. Fortunately, Splunk is here to help. Using real world examples and utilizing statistical analysis tools that are cooked into core Splunk, learn some tricks that you can leverage back home to help find these evildoers in your systems.
Navigating Data Quality Issues for Better Decision Making: In today’s digital revolution, organizations must be data driven or they will be left behind. Regardless of the analytics techniques used, analysis is ultimately only as useful as the data fed into it. In other words, “garbage in, garbage out.” Not all data is created with downstream usage implications in mind. Furthermore, data quality is highly subjective and what appears as useless for one business decision may actually be the most telling attribute for another decision. The onus is often on the data scientist to bridge the gap between data context and analysis interpretation. In this session, we will delve into various common data-quality issues and how to minimize their impact on analytics quality. We will share best practices for designing data-collection interfaces that mitigate ambiguous and incorrect data semantics. Last, we will discuss various processes that help us ensure data harmony within an organization.
Drive More Value Through Data Source and Use Case Optimization: Whether you’re new to Splunk or a current user, you’re probably wondering what value other organizations are realizing with the Splunk platform and what data sources are most commonly indexed to achieve this value. Come learn about the most common value drivers and a simple approach and tool to identify the right data sources based on your key objectives. We’ll also show you techniques to size your data sources and measure the data overlap between groups, so you can better plan your implementation.