The State of CRLs Today
Certificate Revocation Lists (CRLs) are a way for Certificate Authorities to announce to their relying parties (e.g., users validating the certificates) that a Certificate they issued should no longer be trusted. E.g., was revoked.
As the name implies, they're just flat lists of revoked certificates. This has advantages and disadvantages:
- It's easy to see how many revocations there are
- It's easy to see differences from day to day
- Since processing the list is up to the client, it doesn't reveal what information you're interested in
- They can quickly get quite big, leading to significant latency while downloading a web page
- They're not particularly compressible
- There's information in there you probably will never care about
CRLs aren't much used anymore; Firefox stopped checking them in version 28 in 2014, in favor of online status checks (OCSP).
The Baseline Requirements nevertheless still require that CRLs, if published, remain available:
4.10.2 Service availability
The CA SHALL operate and maintain its CRL and OCSP capability with resources sufficient to provide a response time of ten seconds or less under normal operating conditions.
Since much as been written about the availability of OCSP, I thought I'd check-in on CRLs.
Collecting available CRLs
When a certificate's status will be available in a CRL, that's encoded into the certificate itself (RFC 5280, 18.104.22.168). If that field is there, we should expect the CRL to survive for the lifetime of the certificate.
I went to Censys.io and after a quick request to them for SQL access, I ran this query:
SELECT parsed.extensions.crl_distribution_points FROM certificates.certificates WHERE validation.nss.valid = true AND parsed.extensions.crl_distribution_points LIKE 'http%' AND parsed.validity.end >= '2017-07-18 00:00' GROUP BY parsed.extensions.crl_distribution_points
Today, this yields 3,035 CRLs, the list of which I've posted on Github.
Downloading those CRLs into a directory
downloaded_crls can be done serially using
wget quite simply, logging to a file named
mkdir downloaded_crls script wget_log-all_crls.txt wget --recursive --tries 3 --level=1 --force-directories -P downloaded_crls/ --input-file=all_crls.csv
This took 2h 36m 31s on my Internet connection.
Analyzing the Download Process
Out of 3,035 CRLs, I ended up downloading 2,993 files. The rest failed.
Ignoring all the times when requesting a file resulted in the file straightaway (hey, those cases are boring), here's the graphical breakdown of the other cases:
There are 40 CRLs that weren't available to me when I checked, or more simply put, 1% of CRLs appear to be dead.
Some of them are dead in temporary-looking ways, like the load balancer giving a
500 Internal Server Error, some of them have hostnames that aren't resolving in DNS.
These aren't currently resolving for me:
Searching Censys' dataset, these CRLs are only used by intermediate CAs, so presumably if one of the handful of CA certificates covered would need to be revoked, their IT staff could fix these links.
http://atospki/, which is clearly an internal name. Mistakes like that can only be revoked via technologies like OneCRL and CRLSets.
The complete list of 400s, 404s, and timeouts by URL is available in crl_resolutions.csv.
Are the missing CRLs a problem?
This doesn't attempt to eliminate possible false-positives where the CRL was for a certificate which is revoked by its parent. For example, if there is a chain
A is revoked, it may not be important that
A's CRL exist. (Thanks, @sleevi for pointing this out!)
As could be expected, there were a fair number of CRLs which are now serviced by redirects. Interestingly, while section 22.214.171.124(b) of the Baseline Requirements require CRLs to have a "HTTP URL", 13 of the CRL fetches redirect to HTTPS, two of them through HSTS headers .
There was a recent thread on Mozilla.dev.security.policy about OCSP responders that were only available over HTTPS; these are problematic as OCSP and CRLs are used to decide whether a secure connection is possible. Having to make such a determination for the revocation check leads to a potential deadlock, so most software will refuse to try it.
Interestingly, there's one CRL that is encoded as HTTPS directly in certificates:
https://crl.firmaprofesional.com/fproot.crl [Censys.io search][Example at crt.sh] That's pretty clearly a violation of the Baseline Requirements.
I've generally understood that most CRLs are small, but some are very large, so I expected some kind of bi-modal distribution. It's really not, though the retrieved CRLs do have a wild size distribution:
In table form :
|Size Buckets||# of CRLs|
|0.5 KB to 0.625 KB||264|
|0.625 KB to 0.75 KB||246|
|0.75 KB to 1 KB||310|
|1 KB to 2 KB||366|
|2 KB to 4 KB||237|
|4 KB to 8 KB||232|
|8 KB to 32 KB||500|
|32 KB to 64 KB||297|
|64 KB to 128 KB||218|
|128 KB to 1 MB||106|
|1 MB to 8 MB||33|
|8 MB to 128 MB||9|
I figured that most CRLs would be tiny, and we'd have a handful of outliers. Indeed, 50% of the CRLs are less than 4 Kbytes, and 75% are less than 32 Kbytes:
On the top end, however, are 9 CRLs larger than 8 MB:
Remember, these are part of the WebPKI, not some private hierarchy. For a convenient example of why browsers don't download CRLs when connecting somewhere, just point to these.
Latency matters. I'm on a pretty fast Internet connection, but even so, some of the CRLs that were even reasonable sizes took a while to download. I won't harp on this, but just a quick histogram:
CRLs that took longer than 1 second to download on a really fast Internet connection -- 142 of them, or 4.7% -- are clear reasons for users' software to not check them for live revocation status.
Conclusions (such as there are any)
CRLs are not an exciting technology, but they're still used by the Web PKI. Since they're not exciting, it appears that some CAs believe they don't even need to keep their CRLs online; I mean, who checks these things, anyway?
Oh, yeah, me...
Still, with technologies such as CRLSets depending on CRLs as a means for revocation data, they clearly still have a purpose. It's not particularly convenient to make a habit of crawling OCSP responders to figure out the state of revocations on the Web.
 Note, that's not found by the Python script ; you'll need to
grep the log for
"URL transformed to HTTPS due to an HSTS policy"
 I admit that the buckets are a bit arbitrary, but here's what it looks like without some manual massaging:
 Most of these are not realistically going to be reached by browsers, however. The largest contains revocations that appear to belong to a government's national ID card list. GoDaddy's is a master list, but is only referred to by a revoked cert [crt.sh link].