IT security is a dangerous and expensive hellhole. Vast amounts of money are spent protecting company data and networks. Hordes of bad guys are motivated to break in, and the consequences for failure are more painful than the cost of protection.
Worse, the current ways of dealing with security are intrusive. While core security tools such as managed endpoint protection will always be necessary, every one of us has bemoaned the difficulty of managing passwords, cussed about access rights to the software we need, and complained about the barriers between us and the work we need to do. If security procedures worked 100 percent of the time, perhaps we'd be okay with it—but hey, have you noticed how many breaches are still reported? Me, too. Just take a look at how the number of data breaches per year has exploded in this graphic below (by data analytics and visualization blog Sparkling Data). The graphic shows data breaches since 2009, broken out by industry type and how many millions of records were compromised:
Source: July 24 2016; Analysis of HIPAA Breach Data; Sparkling Data
But there's good news as well. The same machine learning (ML) technologies and predictive analytic algorithms that give you useful book recommendations and power your most advanced self-serve business intelligence (BI) and data visualization tools are being incorporated into IT security tools. Experts report that you probably won't spend less money on your company's IT security because of this, but at least your staff will work more efficiently and have a better chance of finding hackers and malware before damage is done.
The combination of ML and IT security can certainly be labeled as "emerging tech," but what makes it cool is that we're not talking about just one technology. ML is comprised of several kinds of technology, each applied in various ways. And, because so many vendors are working in this area, we get to watch a whole new technology category compete, evolve, and hopefully provide benefit to all of us.
So, What's Machine Learning?ML allows a computer to teach itself something without having to be explicitly programmed. It does so by accessing large data sets—often huge ones.
"With machine learning, we can give a computer 10,000 pictures of cats and tell it, 'This is what a cat looks like.' And then you can give the computer 10,000 unlabeled pictures and ask it to find out which ones are cats," explains Adam Porter-Price, a Senior Associate at Booz Allen. The model improves as you give the system feedback, whether its guess is correct or incorrect. Over time, the system gets more accurate at determining if the photo includes a cat (as, of course, all photos should).
This isn't a brand-new technology, though recent advances in faster computers, better algorithms, and Big Data tools have certainly improved things. "Machine learning (especially as applied to modeling human behaviors) has been around for a long time," said Idan Tendler, CEO of Fortscale. "It's a core component of the quantitative sides of many disciplines, ranging from airfare pricing to political polling to fast food marketing as far back as the 1960s."
The most evident and recognizable modern uses are in marketing endeavors. When you buy a book on Amazon, for example, its recommendation engines mine previous sales and suggest additional books you'll likely enjoy (e.g., people who liked Steven Brust's Yendi may also like Jim Butcher's novels), which translates into more book sales. That's applied ML right there. Another example might be a business that uses its customer relationship management (CRM) data to analyze customer churn, or an airline that uses ML to analyze how many reward points incentivize frequent flyers to accept a particular offer.
The more data a computer system gathers and analyzes, the better its insights (and its cat photo identification). Plus, with the advent of Big Data, ML systems can pool information from multiple sources. An online retailer can look beyond its own data sets to include analysis of the customer's web browser data and information from its partner sites, for instance.
ML takes data that's too much for humans to comprehend (such as millions of lines of network log files or a huge number of e-commerce transactions) and turns it into something easier to understand, said Balázs Scheidler, CTO of IT security tool vendor Balabit.
"Machine learning systems recognize patterns and highlight anomolies, which help humans to grasp a situation and, when appropriate, take action on it," Scheidler said. "And machine learning does this analysis in an automated way; you couldn't learn the same things simply from looking only at transaction logs."
Where ML Patches Security WeaknessesFortunately, the same ML principles that can help you decide on a new book purchases can make your company network more secure. In fact, said Fortscale's Tendler, the IT vendors are a little late to the ML party. The marketing departments could see financial benefits in early ML adoption, particularly because the cost of being wrong was minimal. Recommending the wrong book won't take down anyone's network. Security specialists needed more certainty about the technology and it seems they finally have it.
Frankly, it's about time. Because the current ways to deal with security are intrusive and reactive. Worse: The sheer volume of new security tools and disparate data collection tools has resulted in too much input even for the watchers.
"Most companies are flooded with thousands of alerts per day, largely dominated by false positives," said David Thompson, Senior Director of Product Management at IT security company LightCyber. "Even if the alert is seen, it would likely be viewed as a singular event and not understood to be part of a larger, orchestrated attack."
Thompson cites a Gartner report that said most attackers go undetected for an average of five months. Those false positives may also result in angry users, pointed out Ting-Fang Yen, a research scientist at DataVisor, whenever employees are blocked or flagged in error, not to mention the time spent by the IT team to resolve the issues.
So the first tack in IT security using ML is analyzing network activity. Algorithms assess activity patterns, comparing them to past behavior, and they determine whether the current activity poses a threat. To help, vendors such as Core Security evaluate network data such as users' DNS lookup behavior and communication protocols within HTTP requests.
Some analysis happens in real time, and other ML solutions examine transaction records and other log files. For example, Fortscale's product spots insider threats, including threats that involve stolen credentials. "We focus on access and authentication logs, but the logs can come from almost anywhere: Active Directory, Salesforce, Kerberos, your own 'crown jewel applications,'" said Fortscale's Tendler. "The more variety, the better." Where ML makes a key difference here is that it can turn an organization's humble and oft-ignored housekeeping logs into valuable, highly effective, and cheap threat intelligence sources.
And these strategies are making a difference. An Italian bank with under 100,000 users experienced an insider threat involving large-scale exfiltration of sensitive data to a group of unidentified computers. Specifically, legitimate user credentials were used to send large volumes of data outside the organization via Facebook. The bank deployed the ML-powered Darktrace Enterprise Immune System, which detected anomalous behavior within three minutes when a company server connected to Facebook—an uncharacteristic activity, said Dave Palmer, Director of Technology at Darktrace.
The system immediately issued a threat alert, which enabled the bank's security team to respond. Eventually, an inquiry led to a systems administrator who had inadvertently downloaded malware that trapped the bank's server in a bitcoin mining botnet—a group of machines controlled by hackers. In less than three minutes, the company triaged, investigated in real time, and began its response—without corporate data loss or damage to customer operational services, said Palmer.
Monitoring Users, Not Access Control or DevicesBut computer systems can investigate any kind of digital footprint. And that's where much vendor attention is going these days: towards creating baselines of "known good" behavior by an organization's users called User Behavior Analytics (UBA). Access control and device monitoring go only so far. It's far better, say several experts and vendors, to make users the central focus of security, which is what UBA is all about.
"UBA is a way to watch what people are doing and to notice if they are doing something out of the ordinary," said Balabit's Scheiler. The product (in this case, Balabit's Blindspotter and Shell Control Box) builds a digital database of each user's typical behavior, a process that takes about three months. Thereafter, the software recognizes anomalies from that baseline. The ML system creates a score of how "off" a user account is behaving, along with the criticality of the issue. Alerts are generated whenever the score exceeds a threshold.
"Analytics try to decide if you are yourself," said Scheiler. For example, a database analyst regularly uses certain tools. So, if she logs in from an unusual location at an unusual time and accesses unusual-for-her applications, then the system concludes that her account may be compromised.
The UBA characteristics tracked by Balabit include the user's historical habits (login time, commonly-used applications, and commands), possessions (screen resolution, trackpad use, operating system version), context (ISP, GPS data, location, network traffic counters), and inherence (something you are). In the latter category are mouse movement analysis and keystroke dynamics, whereby the system maps how hard and fast a user's fingers whack the keyboard.
While fascinating in geek terms, Scheiler cautions that the mouse and keyboard measurements aren't foolproof yet. For example, he said, identifying someone's keystrokes is about 90 percent reliable, so the company's tools don't rely heavily on an anomaly in that area. Besides, user behavior is slightly different all the time; if you have a stressful day or a pain in your hand, the mouse movements are different.
"Since we work with many aspects of the users' behavior and the aggregated value is the one to be compared to the baseline profile, altogether it has a very high reliability that converges to 100 percent," said Scheiler.
Balabit certainly isn't the only vendor whose products use UBA to identify security events. Cybereason, for instance, uses a similar methodology to identify behavior that makes attentive humans say, "Hmm, that's funny."
Explains Cybereason's CTO Yonatan Streim Amit: "When our platform sees an anomaly—James working late—we can correlate it with other known behaviors and relevant data. Is he using the same applications and access patterns? Is he sending data to someone he never communicates with or are all communications going to his manager, who is replying back?" Cybereason analyzes the anomaly of James working abnormally late with a long list of other observed data to provide a context for determining if an alert is a false positive or a legitimate concern.
It is IT's job to find answers but it sure helps to have software that can raise the right questions. For instance, two users in a healthcare organization were accessing records of deceased patients. "Why would someone be looking at patients who have passed away two or three years ago, unless you want to do some kind of identity or medical fraud?" asks Amit Kulkarni, CEO of Cognetyx. In identifying this security risk, the Cognetyx system identified the inappropriate access based on the normal activities for that department, and compared the two users' behavior to that of their peers' access patterns and against their own normal behavior.
"By definition, machine learning systems are iterative and automated," said Fortscale's Tendler. "They look to 'match' new data against what they've seen before, but won't 'disqualify' anything out of hand or automatically 'throw away' unexpected or out-of-bounds results."
So Fortscale's algorithms look for hidden structures in a data set, even when they don't know what the structure looks like. "Even if we find the unexpected, it provides fodder on which to potentially build a new pattern map. That's what makes machine learning so much more powerful than deterministic rule sets: Machine learning systems can find security problems that have never been seen before."
What happens when the ML system finds an anomaly? Generally, these tools hand off alerts to a human to make a final call in some way since the side effects of a false positive are damaging to the company and its customers. "Troubleshooting and forensics needs human expertise," asserts Balabit's Scheiler. The ideal is that the generated alerts are accurate and automated, and dashboards give a useful overview of system status with the ability to drill into "hey, that's weird" behavior.
It's Just the BeginningDon't assume that ML and IT security is a perfect match like chocolate and peanut butter or cats and the internet. This is a work in progress, though it will gain more power and usefulness as products gain more features, application integration, and tech improvements.
In the short term, look for automation advancements so that security and operations teams can gain new data insights faster and with less human intervention. In the next two or three years, said Mike Paquette, VP of products at Prelert, "we expect advancements to come in two forms: an expanded library of preconfigured use cases that identify attack behaviors, and advances in automated feature selection and configuration, reducing the need for consulting engagements."
The next steps are self-learning systems that can fight back against attacks on their own, said Darktrace's Palmer. "They'll respond to emerging risks from malware, hackers, or disaffected employees in a way that understands the full context of normal behavior of individual devices and the overall business processes, rather than making individual binary decisions like traditional defenses. This will be crucial to responding to faster moving attacks, like extortion-based attacks, that will morph into attacking any valuable asset (not just file systems) and will be designed to react faster than is possible by human beings."
This is an exciting area with plenty of promise. The combination of ML and advanced security tools not only give IT professionals new tools to use but, more importantly, it gives them tools that let them do their jobs more accurately, yet still faster than ever before. While not a silver bullet, it's a significant step forward in a scenario in which the bad guys have had all of the advantages for far too long.