Vulnerability management involves discovering, analyzing, and handling new or reported security vulnerabilities in information systems. The services provided by vulnerability management systems are essential to both computer and network security. This blog posting evaluates the pros and cons of the Exploit Prediction Scoring System (EPSS), which is a data-driven model designed to estimate the probability that software vulnerabilities will be exploited in practice.

The EPSS model was initiated in 2019 in parallel to our criticisms of the Common Vulnerability Scoring System (CVSS) in 2018. EPSS was developed in parallel to our own attempt at improving CVSSS, the Stakeholder-Specific Vulnerability Categorization (SSVC); 2019 also saw version 1 of SSVC. This post will focus on EPSS version 2, released in February 2022, and when it is and not appropriate to use the model. This latest release has created a lot of excitement around EPSS, especially since improvements to CVSS (version 4) are still being developed. Unfortunately, the applicability of EPSS is much narrower than people might expect, so it is not yet a useful tool for most vulnerability managers.

This post assumes you know about the services comprising vulnerability management and why prioritization is important during analysis and response. Response includes remediation (patching or otherwise removing the problem) and mitigation (doing something to reduce exposure of vulnerable systems or reduce impact of exploitation). Within coordinated vulnerability disclosure roles, I’ll focus just on people who deploy systems. These are the folks most likely to be tempted to cut corners in their response prioritization with EPSS, but for most of them this approach will lead to a short circuit rather than a shortcut.

EPSS semi-formalized as a special interest group (SIG) at FIRST in 2020. I’ve participated on the SIG since it started up. A SIG at FIRST serves to “explore an area of interest or specific technology area, with a goal of collaborating and sharing expertise and experiences to address common challenges.” Basically, this means I’ve been on a lot of calls and email threads with people trying to improve EPSS. In general, I think everyone on the SIG has done a great job working within the constraints of donating their time and resources to a project, which was initially described by this 2020 paper.

However, I have a few concerns about EPSS that I’d like to highlight here. I have raised these concerns within the SIG, but the SIG has no formal voting process, so I can’t be sure whether my views represent a minority opinion.

Here are the two general spheres of problems I see: problems due to model opacity and problems stemming from the details of where data have come from so far (elaborated in sections below). These problems should stop people who deploy systems from using EPSS to prioritize vulnerability response generally. However, EPSS v2 is currently useful in some restricted scenarios, which I’ll highlight below.

EPSS Opacity

EPSS uses machine learning to predict exploitation probabilities for each CVE ID (CVE IDs provide identifiers for vulnerabilities in IT products or protocols). As a result, the process is generally hard to interpret. Both the equations that produce the predictions and the process for creating the equations based on the data are proprietary. Even as a SIG member, I would have to sign an NDA to see them.

EPSS calls itself an “open, data-driven effort”—but it is only open in the sense that anyone can come and ask questions during the meetings to the handful of people who have signed the NDA. Those folks are generally super nice and do their best to answer questions seriously within the constraints of the proprietary aspects of the data collection, training, and modeling. However, because salient operational details of the EPSS prediction mechanism are not open to the SIG generally, we can only rely on the metrics about them that are made available.

In addition, there is no guarantee that either the input data or the work to produce the predictions from the data will continue to be donated to the public indefinitely. It could go away at any time if just a couple key members of the SIG decide to stop or to charge FIRST for the data. Multiple vendors donating data would make the system more robust; multiple vendors would also address some, but not all, of the problems with the data discussed next.

Finally, this opacity makes the clear labeling of the outputs critically important, which is the topic of the next section.

EPSS Data and Outputs

EPSS outputs genuine probabilities. In the phrase “the probability that _____,” that blank needs to be filled in. “On the first line of its website, EPSS purports to fill that blank as “probability that a software vulnerability will be exploited in the wild.” This statement is oversimplified enough that I think it is both misleading and wrong. The units must be captured properly from the data that is used.

EPSS got here attempting to avoid one of our key criticisms of CVSS: CVSS vector elements are not actually numbers, just rankings, and so the whole idea of using mathematics to combine the CVSS vector elements into a final score is unjustified. EPSS takes in qualitative attributes, but the machine learning architecture treats all of these with the right kinds of mathematical formalisms and produces a genuine probability. These outputs still need the correctly specified event and timeframe. EPSS forecasts the probability that “a software vulnerability will be exploited in the wild in the next 30 days.”

There are lots of ways the event and time frame can be specified badly. Here is one such example:

I like apples. Maybe there is a 0.33 probability that I eat an apple during any given day. Pittsburgh is relatively cloudy, throughout the year, during daytime hours, so the mean probability that it is sunny here is 0.45. We can multiply 0.33 times 0.45. That’s 0.1485. But what exactly are the units for that value? Well, in this case, it’s a bit unclear, because 0.33 is apples/day and 0.45 is sunshine/hour. If we don’t make the units work together, the value doesn’t make sense. With some additional work, we can say that if I’m awake 16 hours each day, then the probability I’m eating an apple during any given waking hour might be about 0.02. Then we could say 0.45*0.02=0.009 is the probability I’m eating an apple while it is a sunny hour in Pittsburgh.

There are several further questions we might illustrate with this example that all come down to independence. Is the probability that I’m eating an apple independent of the weather? Probably not. The opposite of “independent” here is “conditional.” There are lots of interdependencies and conditionals related to vulnerabilities and EPSS. But I’m not so worried about those because the mathematics for handling that—as long as we know about them—is well established. I’m not going to introduce it here.

I’m much more worried about assumptions and connections that get introduced into the probability that we cannot capture with unit conversions (as we did with days to waking hours above) or calculation of conditional probabilities. Here is the crux of the problem. As far as I know, the EPSS phrase “a software vulnerability will be exploited in the wild” actually means the following:

software vulnerability = a CVE ID in the NVD with a CVSSv3 vector string
exploited = an IDS signature triggered for an attempt to exploit the CVE-ID over the network
in the wild = a customer of Kenna Security whose network is instrumented with their IDS systems and their data is shared with Kenna (I think it’s specifically and only Kenna, but as they’re a subsidiary of Cisco, it is a little bit hard to say if other Cisco brands contribute data. But that is really the point: I know it is IDS alert data, and only network IDS alert data, that provides the key signal for EPSS, but we cannot examine the details of how it is sourced, because it is not open.)

Also, EPSS is clear, later in its specification, that the time frame for the prediction is “in the next 30 days.” What is not clear from the documentation is that only about 10 percent of the vulnerabilities with CVE IDs even have IDS signatures. So 90 percent of CVE IDs could never be detected to be actively exploited this way. The way IDS signatures are created is complex. Moreover, the signature curators have their own priorities and own performance aspects to optimize, which means the coverage for the signatures is probably much better than random as long as your environment is similar to the environment the IDS vendor is managing. The flip side is that your coverage is plausibly worse than random if your environment is a mismatch.

In some important way, EPSS is doing something smart. It’s saying, Hey, we saw IDS alerts for attempts to exploit these CVE IDs, and here are a handful of things we didn’t see alerts for but that seem similar. That’s great if you have an environment similar to the environments of Kenna’s main and biggest customers. I don’t know what that is, but my guess is offices and other classic IT shops. They probably run mail and AD servers, databases, and Microsoft endpoints; are midsize; have employees who are English-speaking; are located primarily in North America; and are regular commercial-ish businesses.

If you’re anyone else or running anything else, I wouldn’t rely on EPSS to tell you what is being exploited in your environment. It is almost certainly different. Perhaps not entirely different. But EPSS fairly consistently gives, for instance, low scores to IoT vulnerabilities that we know are being exploited.

For example, there are several CVE IDs in CISA’s known exploited vulnerabilities list with low EPSS scores, and there are plenty of CVE IDs with high EPSS scores not in that list. People seem to think that this discrepancy means one or the other is wrong. Actually, it probably does not inform rightness or wrongness about either. The discrepancy might be telling us that attackers use different methods to attack the organizations in CISA’s constituency than they use to attack Kenna’s constituency. This interpretation would be consistent with the fact that we know attackers target victims using specific infrastructure. Perhaps, however, it is just the result of the expected error rate reported about the EPSS model.

How to Use EPSS Now

EPSS is great in that it is bringing attention to threat data. I agree 100 percent that paying attention to what attackers are exploiting is important in prioritizing vulnerabilities. The scores are probably not attuned to your environment or threat landscape, however, unless you are one of Kenna’s customers who is donating their IDS data to the project or know your environment is similar to those who are.

Yet even if they are attuned, threat data is not the only input into a prioritization decision. You probably want to know how exposed the system is and whether your organization can continue to function if the system were compromised, for example. SSVC could use EPSS data and combine it with these other information items right now. CVSSv3 can also account for threat in the temporal metrics.

I happen to not like that CVSSv3 implicitly assumes everything is being exploited (default worst case, temporal scores only reduce scores) even though we know from EPSS data and other sources that most vuls are not exploited; however, properly using the CVSS base, environmental, and temporal scores is probably better than using EPSS alone. When the EPSS website says EPSS is better than CVSSv3, it means CVSSv3 base scores. The CVSS SIG has made it clear you should not be using CVSS base scores by themselves to rank and sort vulnerabilities. EPSS is useful because it calls attention to that shortcoming with the way people have used CVSS base scores.

EPSS is heavily focused on a particular kind of environment, so I recommend not relying on EPSS. A low EPSS score does not mean a vulnerability will not be exploited. A high EPSS score is a signal that many people could pay attention to. If your environment looks a lot like where the EPSS data comes from, you could use a high EPSS score to set values in SSVC or CVSSv3 temporal metrics related to public proof of concept or active exploitation. That would certainly be a win.

If your environment does not resemble the same starting point as the EPSS data, however, then this strategy could produce a lot of false positive prioritizations while likely missing many of the things you should care about. For stakeholder organizations in this situation, improving your asset management system is probably a better use of your time than adopting EPSS.

You also might want to know how expensive it will be to remediate the CVE ID. I don’t know of anyone who has a good public system for this, but we know it’s something people need to be able to integrate into the decision.

Additional Resources

Read the SEI white paper, “Towards Improving CVSS,” which I coauthored with Eric Hatleback, Allen Householder, Art Manion, and Deana Shick.

Read the SEI white paper, “Prioritizing Vulnerability Response: A Stakeholder-Specific Vulnerability Categorization (Version 2.0),” which I coauthored with Allen D. Householder, Eric Hatleback, Art Manion, Madison Oliver, Vijay S. Sarvepalli, Laurie Tyzenhaus, and Charles G. Yarbrough.

Read the SEI white paper “Historical Analysis of Exploit Availability Timelines,” which I coauthored with Allen D. Householder, Jeff Chrabaszcz (Govini), Trent Novelly, and David Warren.

The CERT Coordination Center Vulnerability Notes Database provides information about software vulnerabilities. Vulnerability notes include summaries, technical details, remediation information, and lists of affected vendors.

Space shortcuts

Blog

EPSS Opacity

EPSS Data and Outputs

How to Use EPSS Now

Additional Resources

Space shortcuts

Blog

Probably Don’t Rely on EPSS Yet

EPSS Opacity

EPSS Data and Outputs

How to Use EPSS Now

Additional Resources