Updated SSL Certificate Database

When I blogged about my database of SSL certs from the top 1M alexa sites, it got much more reaction than I expected. It’s nice to have peers in this microcosm of nerdspace.

Easily the most often requested improvement was to include intermediates in the database. People wanted to see which issuers had a bunch of subordinate CAs and which issued right from the root. They wanted to see what kind of key sizes and algorithms CAs chose, and how they compared to the key sizes and algorithms used in regular site certs.

I’ve gone and re-crawled to gather that information now, and you can download the zipped db (509M). It’s still an SQLite3 database, though I’ve changed the schema a bit, with certificates now stored in their own table.  Let me know in the comments/email if you need help working with the data.

The schema, if you can call it that, was 100% expediency over forethought, so I would welcome any suggestions on DB organization/performance tweaking. I have done no optimizing so low-hanging fruit abounds, and a complicated query can take more than a day right now, so your suggestions will have visible effects!

6 thoughts on “Updated SSL Certificate Database

  1. Nice work! I’m not sure if this is even possible, but something that would be useful would be some statistics on client-cert login sites, and also the confusing question of how many of them are configured correctly.

    (Hmm.. I wonder if it is possible to test client-cert login .. without a client cert?)

  2. Interesting! Any chance of getting them as flat files with (say) ten thousand per directory (the last time I processed cert files 100K at a time on a system with NTFS the amount of disk thrashing was… disturbing). This would make it easier to run a script over the certs without having to set up SQLite and extracting each one in turn.

  3. Sorry, forgot to add: Compressing them with tar+gzip rather than zip alone would probably be a considerable win due to the ability to exploit redundancy between certs. I’m not sure what bzip2 would give you since I don’t think the BWT would be too good with many short redundant strings interspersed with lots of non-redundant data.

  4. Converting to a MySQL DB and some good keys would perhaps improve search a lot. If you have any intention to run a conversion to MySQL please let us know. Also should you want some suggestions on that feel free to ping me. Thanks!

  5. Интересные доводы. Надо попробовать

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s