Are You Prepared for Certificate Authority Breaches?

By Nate Couper

In the last few years, security breaches of signed SSL certificates, as well as a number of certificate authorities (CA’s) themselves, have illustrated gaps in the foundations of online security.

  • Diginotar
  • Comodo
  • Verisign
  • others

It is no longer safe to assume that CA’s, large or small, have sufficient stake in their reputation to invest in security that is 100% effective.  In other words, it’s time to start assuming that CA’s can and will be breached again.

Fortunately for the white hats out there, NIST has just released a bulletin on responding to CA breaches.  Find it on NIST’s website at http://csrc.nist.gov/publications/nistbul/july-2012_itl-bulletin.pdf.

The NIST document has great recommendations for responding to CA breaches, including:

  • Document what certificates and CA’s your organization uses.
  • Document logistics and information required to respond to CA compromises.
  • Review and understand CA’s in active use in your organization.
  • Understand “trust anchors” in your organization.
  • Develop policies for application development and procurement, and implement them.
  • Understand and react appropriately to CA breaches.

Let’s dive into these:

1. Document the certificates and CA’s that your organization uses

Any compliance wonk will tell you that inventory is your first and best control.  Does your organization have an inventory?

Let’s count certificates.  There’s http://www.example.com, www2.example.com, admin.example.com, backend.example.com, and there’s mail.example.com.  There may also be VPN.example.com, ftps.example.com, ssh.example.com.  These are the obvious ones.

Practically every embedded device from the cheapest WIFI router to the lights-out management interface on your big iron systems these days comes with an SSL interface.  Count each of those.  Every router, switch, firewall, every blade server enclosure, every SAN array.  Take a closer look at your desktops.  Windows has a certificate database, Firefox carries its own, Java has its own, and multiple instances of Java on a single system can have multiple CA databases.  Now your servers—every major OS ships with SSL capabilities, Windows, Linux (OpenSSL), Unix.  Look at your applications – chances are every piece of J2EE and .NET middleware has a CA database associated with it.  Every application your organization bought or wrote that uses SSL probably has a CA database.  Every database, every load balancer, every IDS / IPS.  Every temperature sensor, scanner, printer, and badging system that supports SSL probably has a list of CA’s somewhere.

All your mobile devices.  All your cloud providers and all the services they backend to.

If your organization is like most, you probably have an excel spreadsheet with a list of AD servers, or maybe you query a domain controller when you need a list of systems.  Forget about software and component inventory.  Don’t even think about printers, switches, or cameras.

If you’re lucky enough to have a configuration management database (CMDB), what is its scope?  When was the last time you checked it for accuracy?  In-scope accuracy rates of 75% are “good”, if some of my clients are any measure.  And CMDB scope rarely exceeded production servers.

Each one of these devices may have several SSL certificates, and may trust hundreds of CA’s for no reason other than it shipped that way.

Using my laptop as an example, I’ve got several hundred “trusted” CA’s loaded by default into Java, Firefox, IE and OpenSSL.  Times five or so to account for the virtual machines I frequent.  Of those thousands of CA’s, my system probably uses a dozen or so per day.

2. Document logistics and information required to respond to CA breaches

How exactly do you manage the list of trusted CA’s on your iPad anyway?  Your load balancer?  Who is responsible for these devices, and who depends on them? If you found out that Thawte was compromised tomorrow, would you be able to marshal all the people who manage these systems in less than a day?  In a week?

What would it take to replace certificates, to tweak the list of CA’s across the enterprise?  It will definitely take longer if you’re trying to figure it out as you go.

3. Review and understand CA’s in active use in your organization

Of all the dozens of CA’s on my laptop, I actually use no more than a dozen or so each day.  In fact, it would be noteworthy if more than a handful got used at all.  I could disable hundreds of them and never notice.  After all, I don’t spend a lot of time on Romanian or Singaporean sites, and CA’s from those regions probably don’t see a lot of foreign use.

Most organizations are savvy enough to source their certificates from at most a handful of trusted CA’s.  A server might only need one trusted CA.  Ask your network and application administrators – which CA’s do we trust and which do we need to trust?  It might make sense to preemptively strike some or all the CA’s you’re not actually using, if only in the name of reducing attack surface.

4. Understand “trust anchors” within your organization.

Trust Anchors are the major agents in a PKI – the CA’s.  Trust anchors provide rules and services to govern the roles of others such as the intermediates, the registrars, and the users of certificates.  Go back through your inventory (you made one of those, right?) and document the configuration.  What do the trust anchors allow and disallow with your certificates?  Will revoked certificates get handled correctly?  How do you configure it?

Does your organization deploy internal CA’s?  Which parts of the organization control the internal CA’s, and what other parts of the business depend on them?  What internal SLA’s / SLO’s are afforded?  What metrics measure them?

5. Develop policies for application development and procurement.

How many RSA SecurID customers really understood that RSA was holding on to secret information that could contribute to attacks against RSA’s customers?  Did your organization ask RIM if trusted CA’s on your Blackberries could be replaced?  Do you use external CA’s for purely internal applications, knowing full well the potential implications of an external breach?

Does your purchase and service contract language oblige your vendor even to tell you if they do have a breach, or will you have to wait till it turns up on CNN?  Do they make claims about their security, and are their claims verifiable?  Do they coast on vague marketing language, or ride on the coattails of once-hip internet celebrities and gobbled-up startups?

6. Understand CA breaches and react appropriately.

Does your incident response program understand CA breaches?  Can you mobilize your organization to do what it needs to when the time comes, and within operational parameters?

CA breaches have happened before and will happen again.  NIST has again delivered a world-class roadmap for achieving enterprise security objectives.  Is your organization equipped?

What Security Teams Can Learn From The Department of Streets & Sanitation

As I sit here typing in beautiful Chicago, a massive blizzard is just starting to hit outside.  This storm is expected to drop between 12 to 20 inches over the next 24 hours, which could be the most snowfall in over 40 years.  Yet in spite of the weather, the citizens and city workers remain calm, confident in the fact that they know how to handle an incident like this.  A joke has been going around on twitter about the weather today – “Other cities call this a snowpacolypse.  Chicago calls it ‘Tuesday’”.

Northern cities like Chicago have been dealing with snowstorms and snow management for decades, and have gotten pretty stable at it.  Yet, when cities fail at it, there can be dramatic consequences – in 1979, the incumbent mayor of Chicago, Michael Bilandic lost to a challenger, with his poor response to a blizzard cited as one of the main reasons for his defeat.

The same crisis management practices, as well as the same negative consequences attached to failure, apply to information security organizations today.  Security teams should pay attention to what their better established, less glamorous counterparts in the Department of Streets and Sanitation do to handle a crisis like two feet of snow.

So, how do cities adequately prepare for blizzards?  The preparations can be summarized into four main points:

  • Prepare the cleanup plan in advance
  • Get as much early warning as possible
  • Communicate with the public about how to best protect themselves
  • Handle the incident as it unfolds to reduce loss of continuity

These steps are all also hallmarks of a mature security program:

  • Have an incident response plan in place in advance of a crisis
  • Utilize real-time detection methods to understand the situation
  • Ensure end users are aware of the role that they play in information security
  • Remediate security incidents with as little impact to the business as possible

An information security program that is proactive in preparing for security incidents is one that is going to be the most successful in the long run.  It will gain the appreciation of both end users and business owners, by reducing the overall impact of a security event.  Because as winter in Chicago shows us – you can’t stop a snowstorm from coming.  Security incidents are going to happen.  The test of a truly mature organization is how they react to the problem once it arrives.

COSO II Event Identification will be a significant challenge for companies

COSO’s improvement to COSO II, sometimes referred to as COSO ERM, added requirements for objective setting, risk identification, management & reporting, as well as risk treatment and event identification. These would be regarded as the basic elements of a good ERM (Enterprise Risk Management) program, and tying these into COSO integrates the already established control framework around those ERM practices.

Sounds good! It is good, or at least a good starting point, I believe. A far more granular assessment to derive a risk level is needed if this is to become truly scientific, but that’s for another blog topic!

The issue now is that whilst most companies will probably be able to implement just about all the elements of COSO II, there is one that I believe will be a significant challenge, the ‘Event Identification’ element.

Under COSO II, Event Identification encompasses incidents (internal or external) that can have a negative (or positive – so perhaps opportunities?) effect. Much like under Basel II Operational Risk analysis, the benefit of hindsight determines or assists the prediction of future risk. So, effectively, to adhere to COSO II a company must identify and quantify incidents within the ERM framework, such that predictions of risk can be assisted by knowledge of actual loss in the past.

This is a whole new set of business processes and responsibilities that certain individuals must accept as part of their regular employment descriptions. But it is more complicated than that, because internal systems and processes will need to be developed to help those individuals to obtain the correct data to support event identification.

Take a simple example in IT. Let’s assume we are a pharmaceutical company, and a system falls victim to a security breach. On that system is 20 years of clinical trial information for a product and we know that an outside organization has potentially accessed that IP. Who’s responsibility is it to recognize that the incident has occurred? Who decides what the cost to the organization is? Who’s responsibility is it to capture that information? Who’s responsibility is it to identify the business and technical risks associated with the incident? Who’s responsibility is it to decide what actions should be taken as a result of the incident to prevent it happening again?

Even for this fairly tangible event, there are a whole set of new processes, policies, documentation and responsibilities that need to be in place to properly implement Event Identification.

So much so, that I contend most companies should not declare COSO II compliance, just yet.

The case for extending XBRL to encompass a Risk and Control taxonomy

Through the SEC’s ‘21st Century Disclosure Initiative‘ announced in January 2009, and their demand that Fortune 500 companies start XBRL tagging of financial statements and footnotes this year, it’s clear that greater transparency associated with financial reporting and transactions is seen as one of the steps towards improving the ability of investors and lenders to analyse and compare reports of financial performance and strategic declarations. By adopting such a standard, the SEC is seeking to provide investors and lenders with greater confidence in the results of their analysis because there is a defined taxonomy that ensures they are analysing and comparing apples to apples in all aspects of relevant financial statements.

That’s good, it’s helpful and the derived confidence will be further enhanced through the involvement of an Assurance Working Group (AWG) that is co-operating with the International Audit and Assurance Standards Board (IAASB) to develop standards around how XBRL information can be audited.

Whilst XBRL was initially designed to allow standard tagging of financial reporting, it also can be used for financial statements around transaction information, discrete projects and initiatives, etc. It would seem, therefore, that if XBRL tagging could be extended to encompass risk and control information by introducing an extended taxonomy for that, then, perhaps, a far more meaningful value could be associated with those financial statements, or the validity of them could be better trusted.

When Credit Default Swaps (CDS) were sold on, and on, and on, imagine if along with the financial details of the transaction there was a clear statement about the associated risks, along with details of what mitigation measures were in place and how effective they were likely to be. Surely, that would have allowed the prevention of them being significantly over valued or at least recognition that they were being overvalued despite their associated risks.

Ultimately the whole issue of trust is at the hub of the financial crisis we find ourselves in and, interestingly, it parallels an observation that the American economist John Kenneth Galbraith made in 1954. He observed that fraud can be easily hidden in the good times, yet it gets revealed in the bad times, which he called the ‘bezzle’. With reference to the great crash of 1929 he wrote,”In good times people are relaxed, trusting, and money is plentiful. But even though money is plentiful, there are always many people who need more. Under these circumstances the rate of embezzlement grows, the rate of discovery falls off, and the bezzle increases rapidly. In depression all this is reversed. Money is watched with a narrow, suspicious eye. The man who handles it is assumed to be dishonest until he proves himself otherwise. Audits are penetrating and meticulous. Commercial morality is enormously improved. The bezzle shrinks.” He also observed that “the bezzle is harder to hide during a tougher economic climate” because of the demand for increased scrutiny.

Applying a similar theory to our CDS example, in the good times the bezzle was large, and there were high levels of trust between the banks and asset management companies, thus, nobody really worried about the increasing risks. But now the bezzle has been revealed, trust has all but disappeared and the market has stagnated.

Hence, it is my belief that additional assurance will be required around financial reporting, particularly with specific transactions, such that a high level of trust can be regained. This will not occur through a high bezzle which exists due to positive market conditions. Rather, it will occur through qualified assurance and tangible evidence of the levels of associated risks and how effectively they are being mitigated. Taking the CDS situation as an example, if the level of associated risk and the efficacy of the control strategy accompanies the transaction the buyer will be better informed and the information will have higher trust.

In my view, therefore, the XBRL taxonomy must extend to include taxonomy around risk and control information.

Risk and Understanding All the Variables

One of the things that drives me the most insane is when data is presented as information without properly considering all of the variables. Over dinner with Martin a few weeks ago, I got off on a rant about an example of this. My target that night was the Dow Jones Industrial Average, which we hear about every time we turn on the financial news.

The Dow is a single point of data within the large sea of the economy. And, unfortunately, it presents a relatively skewed picture of the actual state of economic progress. I have been most enervated by it over the past couple years with the 2006-2007 psuedo-bull market that had every financial analyst talking about economic prosperity. (For example, see articles here, here, here, etc.).

Unfortunately, it’s pretty easy (about 2 hours of research and excel mojo) to show the illusory nature of the “bull market” that we have seen in the past 4 years. The US government has pursued an aggressive strategy of currency devaluation since 2002, and that plays heavily into the value of any asset valued in USD (e.g. the Dow). In order to understand the true nature of the state of the Dow, one must take into account all of the variables – in this case, the value of the currency that the price of the market is described in.

Dow raw and adjusted for the euro

Martin would point out that this is a somewhat naive analysis as I have only adjusted for one currency. To do a more robust analysis, we would have to average the currency impact across world regions, including the Chinese Yuan, Canadian Dollar and British Pound. However, even with a simple analysis, the results show a staggering change. While the DJIA enjoyed a raw increase of approximately 75% between 2003-2008, when adjusted for changes in currency value against the Euro, we see that increase drop to 35%.

This is significant – assuming that you got in at the absolute low (Feb 2003) and out at the top of each, this suggests that your actual annual rate of return on the investment went from 11.7% when just looking at the DJIA to 6.3% when adjusting for currency.

And I didn’t even factor in goods/services inflation or taxes. If I had, you would see that the currency loss here is the difference between making a profit and just barely breaking even on the investment.

Why am I talking about all this random financial stuff on a blog dedicated to risk and security (especially when I’m not an accountant)? Because this IS risk. This is the same kind of calculation that we make every day, and it is the same sort of mistake that I see risk management professionals make all the time. When calculating risk, we have a tendency to look only at the simple numbers – humans just aren’t good at multivariate analysis, especially in our head. So, we have a tendency to look for simple answers, often resorting the Tarzan method:

Dow up, good. Dow down, bad.

Unfortunately, it’s never quite that simple. If you don’t take into account all of the variables when considering the risk of your investments (whether financial, information security, or otherwise), you’re likely to significantly mis-read the potential for return on those investments.

Whose Risk?

I often get frustrated when we talk about risk, measurement, metrics, and (my new least favorite buzz word) “key performance indicators”. Because we (as an industry) have a tendency to drop the audience from the statement of risk.

That may sound confusing, but I’ll illustrate by example. This is a real sentence that I hear far too often:

Doing that presents too much risk.

Unfortunately, that sentence is linguistically incomplete. The concept of “risk” requires specification of the audience – Risk to whom/what? This is a similar problem as that which Lakoff presents in Whose Freedom? – certain concepts require a reference to the audience in order to make sense of them. Leaving the audience unspecified is productive when used in marketing (or politics), but creates massive confusion when actually trying to have real productive discourse.

A recent post at Security Retentive illustrates the kind of confusion that ensues when the audience for risk metrics/measurements isn’t specified. (I have also previously talked (ranted?) about this type of confusion here and here.

This confusion fundamentally arises from the need to remember that risk is relative to an audience. The confusion arises because of a lack of perspective – each person in the discourse applies the “risk” to their own perspective, and comes up with radically differing meanings.

It seems important that when we’re talking about and attempting to measure and specify risk, we need to always present the data/information to a relevant audience: risk to what/whom is an important way of ensuring that we don’t remain mired in the kind of confusion that Security Retentive talked about.