Oh ho, lookit what the EFF went and did!
The EFF SSL Observatory is a project to investigate the certificates used to secure all of the sites encrypted with HTTPS on the Web. We have downloaded a dataset of all of the publicly-visible SSL certificates, and will be making that data available to the research community in the near future.
This is exciting. I knocked together a less ambitious version of this last year, but the EFF guys are doing it like grown-ups, and are getting some interesting data.
Numbers-wise, they’re in the right ballpark, as far as I can tell. Their numbers (1-2m CA-signed certs) coarsely match ones I’ve seen from private sources. I’ve heard from a few CAs that public-crawl estimates tend to err 50-80% low since they miss intranet dark matter, but at least the EFF is tracking other public-crawls. Given that their collection tools and data are going to be made public, that’s a really big deal. Previously, I haven’t been able to get this kind of data without paying for it or collecting it myself. If the database is actively maintained and updated, this will be a great resource for research.
Their analysis of CA certificate usage is also interesting. I’d like to see more work done, here, and in particular I’d like to see how CA usage breaks down between the Mozilla root store and others. We spend considerable effort managing our root store, and recently removed a whole pile of CA certificates that were idle. In some places, the paper seems to make the claim that fully half of trusted CAs are never used, but in other places, the number of active roots they count outnumbers our entire root program. I understand why they blurred the line for the initial analysis, but it would be swell to see it broken out.
As they mention, there are legit reasons for root certs to be idle, particularly for future-proofing. We have several elliptic curve roots, and some large-modulus RSA roots, which are waiting for technology to catch up before they become active issuers while giving CAs a panic switch in the case of an Interesting Mathematical Result — that feels okay to me. On the other hand, if there are certs which are just redundant, it would be great to know, so that we can have that conversation with the relevant CAs, and understand the need to keep the cert active.
This is exactly what I hoped would come of my crawler last year, but they’ve done a much more thorough job. We’ve seen an uptick in research interest in SSL over the last few years. Having a high quality data source to poke when testing a hunch is going to make it easier to spot trends, positive or otherwise. Interesting work, folks; keep it going!






5
Mar 09
Deep Packet Inspection Considered Harmful?
I was recently asked, in the context of the ongoing Phorm debacle, and with other interested parties, to meet with members of the UK government and discuss deep packet inspection technologies, and their impact on the web. I’m still organizing my thoughts on the subject – I’ve put some here, but I’d love to know where else you think I should look to ensure I have considered the relevant angles.
Brief Background
Phorm‘s technology hooks in at the ISP level, watches and logs user traffic, and uses it to assemble a comprehensive profile for targeting advertising. While an opt-out mechanism was provided, many users have complained that there was no notice, or that it was insufficiently clear what was going on. NebuAd, another company with a similar product, has apparently used its position at the ISP level to not only observe, but also to inject content into the pages before they reached the user. It’s hard to get unbiased information here, but this is what I understand of the situation.
Thoughts
1. Deep packet inspection, in the general case, is a neutral technology. Some technologies are malicious by design (virus code, for instance), but I think DPI has as many positive uses as negative. DPI can let an ISP make better quality of service decisions, and can be done with the full knowledge and support of its users. I don’t think DPI, as a technology, should be treated as insidious.
2. Using deep packet inspection to assemble comprehensive browsing profiles of users without explicit opt-in is substantially more questionable. My browsing history and habits are things I consider private in aggregate, even though any single visit is obviously visible to the site I’m browsing.
It’s possible that I will choose to allow this surveillance in exchange for other things I value, but it must be a deliberate exchange. I would want to have that choice in an explicit way, and not to be opted in by default, even for aggregate data. Moreover, given the complexity of this technology, I would want a great deal of care to go into the quality of the explanation. Explaining this well to non-technical users might be so difficult as to be impossible, which is why it’s so important that it be opt-in.
3. Using deep packet inspection in conjunction with software that modifies the resultant pages to include, for instance, extra advertising content, is profoundly offensive and undermines the web. The content provider and the user have a reasonable expectation that no one else is modifying the content, and a typical user should not be expected to understand the mechanics of the web sufficiently to be able to anticipate such modifications.
Solutions
As a browser, we do some things to help our users here, but we can’t solve the problem. https resists this kind of surveillance and tampering well, but requires sites to provide 100% of their content over SSL. Technologies like signed http content would prevent tampering, if not surveillance, but once again assume that sites (and browsers!) will support the technology. Ad blockers can turn off any injected ads, tools like NoScript can de-fang any injected javascript but, fundamentally, http content is not tamper-proof, and no plaintext protocol is eavesdropping proof.
People trust their ISPs with a huge amount of very personal data. It’s fine to say that customers should vote with their feet if their ISP is breaking that trust, but in many areas, the list of available ISPs is small, and so the need for prudence on the part of ISPs is large.
That’s what I’m thinking so far, what am I missing?