Certificate Revocation Lists (CRLs) are a way for Certificate Authorities to announce to their relying parties (e.g., users validating the certificates) that a Certificate they issued should no longer be trusted. E.g., was revoked.

As the name implies, they're just flat lists of revoked certificates. This has advantages and disadvantages:

Advantages:

  • It's easy to see how many revocations there are
  • It's easy to see differences from day to day
  • Since processing the list is up to the client, it doesn't reveal what information you're interested in

Disadvantages:

  • They can quickly get quite big, leading to significant latency while downloading a web page
  • They're not particularly compressible
  • There's information in there you probably will never care about

CRLs aren't much used anymore; Firefox stopped checking them in version 28 in 2014, in favor of online status checks (OCSP).

The Baseline Requirements nevertheless still require that CRLs, if published, remain available:

4.10.2 Service availability

The CA SHALL operate and maintain its CRL and OCSP capability with resources sufficient to provide a response time of ten seconds or less under normal operating conditions.

Since much as been written about the availability of OCSP, I thought I'd check-in on CRLs.

Collecting available CRLs

When a certificate's status will be available in a CRL, that's encoded into the certificate itself (RFC 5280, 4.2.1.13). If that field is there, we should expect the CRL to survive for the lifetime of the certificate.

I went to Censys.io and after a quick request to them for SQL access, I ran this query:

SELECT parsed.extensions.crl_distribution_points  
   FROM certificates.certificates
WHERE validation.nss.valid = true  
   AND parsed.extensions.crl_distribution_points LIKE 'http%' 
   AND parsed.validity.end >= '2017-07-18 00:00'
GROUP BY parsed.extensions.crl_distribution_points  

Today, this yields 3,035 CRLs, the list of which I've posted on Github.

Downloading those CRLs into a directory downloaded_crls can be done serially using wget quite simply, logging to a file named wget_log-all_crls.txt:

mkdir downloaded_crls  
script wget_log-all_crls.txt wget --recursive --tries 3 --level=1 --force-directories -P downloaded_crls/ --input-file=all_crls.csv  

This took 2h 36m 31s on my Internet connection.

Analyzing the Download Process

Out of 3,035 CRLs, I ended up downloading 2,993 files. The rest failed.

I post-processed the command line wget log (wget_log-all_crls.txt) using a small Python script to categorize each CRL download by how it completed.

Ignoring all the times when requesting a file resulted in the file straightaway (hey, those cases are boring), here's the graphical breakdown of the other cases:

Problems with CRL Downloads

Missing CRLs

There are 40 CRLs that weren't available to me when I checked, or more simply put, 1% of CRLs appear to be dead.

Some of them are dead in temporary-looking ways, like the load balancer giving a 500 Internal Server Error, some of them have hostnames that aren't resolving in DNS.

These aren't currently resolving for me:

Searching Censys' dataset, these CRLs are only used by intermediate CAs, so presumably if one of the handful of CA certificates covered would need to be revoked, their IT staff could fix these links.

Except for http://atospki/, which is clearly an internal name. Mistakes like that can only be revoked via technologies like OneCRL and CRLSets.

The complete list of 400s, 404s, and timeouts by URL is available in crl_resolutions.csv.

Are the missing CRLs a problem?

This doesn't attempt to eliminate possible false-positives where the CRL was for a certificate which is revoked by its parent. For example, if there is a chain Root -> A -> B -> C, and A is revoked, it may not be important that A's CRL exist. (Thanks, @sleevi for pointing this out!)

Redirects

As could be expected, there were a fair number of CRLs which are now serviced by redirects. Interestingly, while section 7.1.2.2(b) of the Baseline Requirements require CRLs to have a "HTTP URL", 13 of the CRL fetches redirect to HTTPS, two of them through HSTS headers [1].

There was a recent thread on Mozilla.dev.security.policy about OCSP responders that were only available over HTTPS; these are problematic as OCSP and CRLs are used to decide whether a secure connection is possible. Having to make such a determination for the revocation check leads to a potential deadlock, so most software will refuse to try it.

Interestingly, there's one CRL that is encoded as HTTPS directly in certificates: https://crl.firmaprofesional.com/fproot.crl [Censys.io search][Example at crt.sh] That's pretty clearly a violation of the Baseline Requirements.

Sizes

I've generally understood that most CRLs are small, but some are very large, so I expected some kind of bi-modal distribution. It's really not, though the retrieved CRLs do have a wild size distribution:

Size Distribution of CRLs

In table form [2]:

Size Buckets# of CRLs
0.5 KB174
0.5 KB to 0.625 KB264
0.625 KB to 0.75 KB246
0.75 KB to 1 KB310
1 KB to 2 KB366
2 KB to 4 KB237
4 KB to 8 KB232
8 KB to 32 KB500
32 KB to 64 KB297
64 KB to 128 KB218
128 KB to 1 MB106
1 MB to 8 MB33
8 MB to 128 MB9

I figured that most CRLs would be tiny, and we'd have a handful of outliers. Indeed, 50% of the CRLs are less than 4 Kbytes, and 75% are less than 32 Kbytes:
Cumulative Distribution of CRL size

On the top end, however, are 9 CRLs larger than 8 MB:

URLSize
http://www.sk.ee/repository/crls/esteid2011.crl66.57 MB
http://crl.godaddy.com/repository/mastergodaddy2issuing.crl36.22 MB
http://crl.eid.belgium.be/eidc201208.crl16.03 MB
http://crl.eid.belgium.be/eidc201204.crl10.84 MB
http://crl.eid.belgium.be/eidc201207.crl10.82 MB
http://crl.eid.belgium.be/eidc201202.crl10.67 MB
http://crl.eid.belgium.be/eidc201203.crl10.66 MB
http://crl.eid.belgium.be/eidc201201.crl10.47 MB

Remember, these are part of the WebPKI, not some private hierarchy.[3] For a convenient example of why browsers don't download CRLs when connecting somewhere, just point to these.

Download Latency

Latency matters. I'm on a pretty fast Internet connection, but even so, some of the CRLs that were even reasonable sizes took a while to download. I won't harp on this, but just a quick histogram:

Histogram of CRLs bucketed by download time

CRLs that took longer than 1 second to download on a really fast Internet connection -- 142 of them, or 4.7% -- are clear reasons for users' software to not check them for live revocation status.

Conclusions (such as there are any)

CRLs are not an exciting technology, but they're still used by the Web PKI. Since they're not exciting, it appears that some CAs believe they don't even need to keep their CRLs online; I mean, who checks these things, anyway?

Oh, yeah, me...

Still, with technologies such as CRLSets depending on CRLs as a means for revocation data, they clearly still have a purpose. It's not particularly convenient to make a habit of crawling OCSP responders to figure out the state of revocations on the Web.

Footnotes

[1] Note, that's not found by the Python script ; you'll need to grep the log for "URL transformed to HTTPS due to an HSTS policy"

[2] I admit that the buckets are a bit arbitrary, but here's what it looks like without some manual massaging:
Auto-generated buckets

[3] Most of these are not realistically going to be reached by browsers, however. The largest contains revocations that appear to belong to a government's national ID card list. GoDaddy's is a master list, but is only referred to by a revoked cert [crt.sh link].