By Patrick Toomey
If you have been in the security space for any stretch of time you have undoubtedly run across the Common Vulnerability Scoring System (CVSS). CVSS attempts to provide an “objective” way to calculate a measure of risk associated with a given vulnerability based on a number of criteria the security community has deemed worthwhile. While I admire the goals of such a scoring system, in practice I think it falls short, and over-complicates the issue of assigning risk to vulnerabilities. Before we get into my specific issues with CVSS, let’s briefly review how a CVSS score is calculated. Put simply, the calculation tries to take into account criteria such as:
- Exploitability Metrics (i.e. probability)
- Impact Metrics (i.e. severity)
- Temporal Metrics (extra fudge factors for probability)
- Environmental Metrics (extra fudge factors for severity)
Each of the above categories is composed of a number of questions/criteria that are used as input into a calculation that results in a value between 0.0 and 10.0. This score is often reported with publically disclosed vulnerabilities as a means of conveying the relative importance of fixing/patching the affected software. The largest source of public CVSS scores comes from the National Vulnerability Database (NVD), as they have XML documents that contain a CVSS score for every CVE from 2002 to 2012. In addition to the NVD, I’ve also seen CVSS used by various security tools as well as used internally by numerous organizations, as it doesn’t require reinventing the wheel when ranking vulnerabilities. So, what’s wrong with CVSS?
There are so many things I dislike about CVSS, though I will freely admit I am not steeped in CVSS lore, and would be open to hearing/discussing the reasoning behind the scoring system. That said, here are my issues with CVSS in no particular order.
We don’t measure football fields in inches for a reason
Nobody cares that the distance between goal lines on an American football field is 3600 inches. Why? Because it is a useless unit of measurement when we are talking about football. Nobody cares if someone has made 2 inches of progress on the field, as yards are the only thing that matters. Similarly, what is an organization supposed to take away from a CVSS score that can take on 100 potential values? Is a 7.2 any better than a 7.3 when it comes down to whether someone is deciding to fix something or not? A reasonable argument against CVSS being too fine grained is that you can always bubble the result into a more coarse unit of measure. But, that leads to my second complaint.
The “fix” is broken
So, sure, 100 distinct values is overkill for ranking vulnerabilities, and CVSS acknowledges this to some degree by mapping the overall score to a “severity score” of High, Medium and Low. On the surface this seems reasonable, as it abstracts the ugly sausage making details of the detailed CVSS score into a very actionable severity score. But, I feel like they managed to mess this up as well. They started with a pretty fine granularity and bubbled up to something that is too coarse, as it tends to blur together various high severity vulnerabilities. I’ve always been a fan of a four point score that breaks down as follows:
- Critical – The vulnerability needs to have been fixed yesterday. The entire team responsible will not sleep until the vulnerability has been fixed.
- High – This vulnerability is serious and we are going to fix it in the near term, but we also don’t need to make everyone lose sleep over it.
- Medium – This vulnerability is worth fixing, and we will set a relatively fixed date in the near future for when it will be fixed.
- Low – This vulnerability is on our radar and if it fits in our next release schedule we will fix it.
As can be seen, there are some pretty obvious groupings of scores within this data. Without staring at the data too hard you can see that there are clearly four groupings of scores that would map very cleanly to the four point system I mentioned earlier.
The main thing to make note of here is that there is a vast chasm between each grouping and its nearest neighbor(s). There is very little chance of mistaking a low vulnerability for a medium vulnerability. In contrast, with the current CVSS scoring system the grouping looks more like this:
There is some seemingly arbitrary dividing lines between High, Medium, and Low scores. Particularly troubling is the dividing line between Medium and High. Anything scored less than a 7 is a Medium risk and anything greater is a High. Unfortunately, there is a fair bit of data clustered at exactly that juncture. This leads to my final complaint against CVSS.
Objectivity is in the eye of the beholder
As mentioned in the beginning of the blog entry, a CVSS score is based on some base metric, but can be adjusted using a number of “Temporal” and “Environmental” metrics. In other words, given a base score, you can just tweak it how ever you want using a number of fuzzy criteria. This, compounded with the coarse High, Medium, Low severity scores, leads to a troubling amount of score fiddling. I am not going to go all conspiracy theory on you and claim people are fudging numbers for publically disclosed CVEs. But, I have seen internal groups within companies leveraging these additional metrics to make the data fit their desired outcome. I can’t blame them, as it is almost a requirement. When presented a vulnerability there is generally an internal consensus about how serious this vulnerability is to the organization and whether it is a Critical, High, Medium, or Low (as I defined them above). However, once they enter all of the base metrics into the CVSS calculator there is a reasonable chance that it is going to give you a score that doesn’t mesh with their gut. So, adjustments are made to the temporal metrics and environmental metrics until it gives them the appropriate score. Again, I blame nobody for “fudging” the data, as often times the base score just doesn’t work. One could argue that the temporal and environmental scores could be adjusted in a reliable/repeatable way for a given application/environment. Then, anytime a vulnerability is identified in that specific application then the same temporal/environmental adjustments could be used to create reliable/repeatable scores. In reality, this doesn’t happen. An organization should be praised for using any kind of scoring system at all. To try to enforce an extra level of unnecessary/burdensome process is not worthwhile or realistic.
Even with all the above being said, as soon as you pitch the idea of using a four point scoring system you run into the problem of objectivity. How do we decide what criteria delineates a Critical from a High vulnerability? I am sure that is how CVSS started, as it provided an approach for scoring things objectively. But, as we already discussed, it is only superficialy objective, as there are numerous ways to adjust the score using subjective metrics. So, why bother? I think following a model similar to the Chrome severity guidelines makes more sense. The Chrome team has developed some specific criteria they use to group vulnerabilities. Given that they are only trying to place a vulnerability into one of four buckets it isn’t that difficult. Most organizations could come up with a similar set of organization specific criteria for assigning a vulnerability score. In the end, while I am a fan of standardization in general, I am not a fan of the current standard for vulnerability scoring. Not to be to cliche, but an Albert Einstein quote sums up my thoughts pretty well: “Everything should be made as simple as possible, but no simpler”. I think CVSS could using a little simplifying.