When I blogged about my database of SSL certs from the top 1M alexa sites, it got much more reaction than I expected. It’s nice to have peers in this microcosm of nerdspace.
Easily the most often requested improvement was to include intermediates in the database. People wanted to see which issuers had a bunch of subordinate CAs and which issued right from the root. They wanted to see what kind of key sizes and algorithms CAs chose, and how they compared to the key sizes and algorithms used in regular site certs.
I’ve gone and re-crawled to gather that information now, and you can download the zipped db (509M). It’s still an SQLite3 database, though I’ve changed the schema a bit, with certificates now stored in their own table.Â Let me know in the comments/email if you need help working with the data.
The schema, if you can call it that, was 100% expediency over forethought, so I would welcome any suggestions on DB organization/performance tweaking. I have done no optimizing so low-hanging fruit abounds, and a complicated query can take more than a day right now, so your suggestions will have visible effects!