Specialized Details on the Recent Firefox Addition Outage
|Editor’ s Note: Might 9, 8: 22 pt – Updated as follows: (1) Fixed action-word tense (2) Clarified the situation along with downstream distros. For more detail, notice Bug 1549886 .
Lately, Firefox had an incident in which the majority of add-ons stopped working. This was because of an error on our end: we allow one of the certificates used to sign addons expire which had the effect associated with disabling the vast majority of add-ons. Now that we’ ve fixed the problem for most customers and most people’ s add-ons are usually restored, I wanted to walk with the details of what happened, why, and how we all repaired it.
History: Add-Ons and Add-On Signing
Although many people use Opera out of the box, Firefox also facilitates a powerful extension mechanism called “ add-ons”. Add-ons allow users to include third party features to Firefox that will extend the capabilities we offer automatically. Currently there are over 15, 1000 Firefox add-ons with capabilities which range from blocking advertisements to managing hundreds of tabs .
Firefox requires that add-ons that are installed be digitally signed . This requirement is intended to protect customers from malicious add-ons by needing some minimal standard of review by Mozilla staff. Before we introduced this requirement in 2015, we had serious problems with malicious add-ons.
The way that the add-on signing works is that Firefox is configured having a preinstalled “ root certificate”. That will root is stored offline within a hardware safety module (HSM) . Every couple of years it is used to sign a new “ intermediate certificate” which is kept on the web and used as part of the signing process. For the add-on is presented for signature bank, we generate a new temporary “ end-entity certificate” and sign that will using the intermediate certificate. The end-entity certificate is then used to indication the add-on itself. Shown aesthetically, this looks like this:
Note that each certification has a “ subject” (to who the certificate belongs) and a good “ issuer” (the signer). When it comes to the root, these are the same entity, however for other certificates, the issuer of the certificate is the subject of the certification that signed it.
An important point here is that each addition is signed by its own end-entity certificate, but nearly all add-ons discuss the same intermediate certificate [1] . It is this certification that encountered a problem: Each certification has a fixed period during which it really is valid. Before or after this windows, the certificate won’ t end up being accepted, and an add-on authorized with that certificate can’ t end up being loaded into Firefox. Unfortunately, the particular intermediate certificate we were using ended just after 1AM UTC on Might 4, and immediately every accessory that was signed with that certificate turn out to be unverifiable and could not be loaded straight into Firefox.
Although addons all expired around midnight, the particular impact of the outage wasn’ capital t felt immediately. The reason for this is that will Firefox doesn’ t continuously check out add-ons for validity. Rather, almost all add-ons are checked about each 24 hours, with the time of the examine being different for each user. The end result is that some people experienced problems immediately, some people didn’ t experience all of them until much later. We from Mozilla first became aware of the issue around 6PM Pacific time upon Friday May 3 and instantly assembled a team to try to resolve the issue.
Damage Restriction
Once we realized what we should were up against, we took several procedure for try to avoid things getting any worse.
First, we disabled putting your signature on of new add-ons. This was sensible during the time because we were signing with a certification that we knew was expired. Within retrospect, it might have been OK in order to leave it up, but it also turned out in order to interfere with the “ hardwiring the date” mitigation we discuss beneath (though eventually didn’ t use) and so it’ s good we all preserved the option. Signing is now backup.
Second, we instantly pushed a hotfix which under control re-validating the signatures on addons. The idea here was to avoid splitting users who hadn’ t re-validated yet. We did this just before we had any other fix, and have taken out it now that fixes are available.
Working in Parallel
In theory, fixing a problem like this appears simple: make a new, valid certification and republish every add-on with this certificate. Unfortunately, we quickly decided that this wouldn’ t work for several reasons:
- There is a very large number of add-ons (over fifteen, 000) and the signing service isn’ t optimized for bulk putting your signature on, so just re-signing every accessory would take longer than we desired.
- Once add-ons had been signed, users would need to get a brand new add-on. Some add-ons are managed on Mozilla’ s servers plus Firefox would update those addons within 24 hours, but users would need to manually update any add-ons they had installed from other sources, which may be very inconvenient.
Instead, we focused on endeavoring to develop a fix which we could offer to all our users with little if any manual intervention.
Right after examining a number of approaches, we rapidly converged on two major methods which we pursued in seite an seite:
- Patching Opera to change the date which is used in order to validate the certificate. This would create existing add-ons magically work once again, but required shipping a new create of Firefox (a “ us dot release” ).
- Produce a replacement certificate that was still legitimate and somehow convince Firefox to simply accept it instead of the existing, expired certification.
We weren’ t sure that either of these works, so we decided to pursue them within parallel and deploy the first one that will looked like it was going to work. All in all, we ended up deploying the second repair, the new certificate, which I’ lmost all describe in some more detail beneath.
A Replacement Certificate
As suggested above, you will find two main steps we had to follow along with here:
- Create a new, valid, certificate.
- Install it remotely in Firefox.
In order to understand why this particular works, you need to know a little more about how Opera validates add-ons. The add-on alone comes as a bundle of files which includes the certificate chain used to indication it. The result is that the add-on will be independently verifiable as long as you know the underlying certificate, which is configured into Opera at build time. However , like i said, the intermediate certificate was damaged, so the add-on wasn’ t really verifiable.
However , as it happens that when Firefox tries to validate the particular add-on, it’ s not restricted to just using the certificates in the accessory itself. Instead, it tries to create a valid chain of certificates beginning at the end-entity certificate and ongoing until it gets to the root. The particular algorithm is complicated, but in a high level, you start with the end-entity certification and then find a certificate whose issue is equal to the issuer from the end-entity certificate (i. e., the particular intermediate certificate). In the simple situation, that’ s just the intermediate that will shipped with the add-on, but it might be any certificate that the browser occurs know about. If we can remotely put in a new, valid, certificate, then Opera will try that as well. The find below shows the situation before and after we all install the new certificate.
After the new certificate is installed, Opera has two choices for how to confirm the certificate chain, use the aged invalid certificate (which won’ big t work) and use the new legitimate certificate (which will work). An essential feature here is that the new certification has the same subject name plus public key as the old certification, so that its signature on the End-Entity certificate is valid. Fortunately, Opera is smart enough to try both till it finds a path functions, so the add-on becomes valid once again. Note that this is the same logic we all use for validating TLS accreditation, so it’ s relatively nicely understood code that we were able to power. [2]
The great thing about this fix is that it doesn’ t require us to change any kind of existing add-on. As long as we obtain the new certificate into Firefox, after that even add-ons which are carrying the certificate will just automatically confirm. The tricky bit then gets getting the new certificate into Opera, which we need to do automatically plus remotely, and then getting Firefox in order to recheck all the add-ons that may are already disabled.
Normandy as well as the Studies System
Actually, the solution to this problem is a special kind of add-on called a system add-on (SAO). In order to let us do research studies, we now have developed a system called Normandy which usually lets us serve SAOs to Opera users. Those SAOs automatically implement on the user’ s browser even though they are usually used for running experiments, they likewise have extensive access to Firefox internal APIs. Important for this case, they can tasks certificates to the certificate database that will Firefox uses to verify addons. [3]
So the fix here is to build the SAO which does two elements:
- Install the brand new certificate we have made.
- Force the browser to re-verify every add-on so that the ones that have been disabled become active.
But wait, you state. Add-ons don’ t work so, just how do we get it to run? Properly, we sign it with the brand new certificate!
Putting everything together… and what took so long?
OK, so now we’ ve got a plan: issue a brand new certificate to replace the old one, create a system add-on to install it upon Firefox, and deploy it through Normandy. Starting from about 6 EVENING Pacific on Friday May several, we were shipping the fix within Normandy at 2: 44 FEEL, or after less than 9 hrs, and then it took another 6-12 hours before most of our customers had it. This is actually quite great from a standing start, but I’ ve seen a number of questions upon Twitter about why we couldn’ t get it done faster. There are a number associated with steps that were time consuming.
First, it took a while in order to issue the new intermediate certificate. When i mentioned above, the Root certificate is in the hardware security module which is kept offline. This is good security exercise, as you use the Root very hardly ever and so you want it to be secure, yet it’ s obviously somewhat undesirable if you want to issue a new certificate with an emergency basis. At any rate, one of the engineers had to drive to the protected location where the HSM is kept. Then there were a few false begins where we didn’ t concern exactly the right certificate, and each try cost an hour or two of testing just before we knew exactly what to do.
Second, developing the system accessory takes some time. It’ s conceptually very simple, but even simple applications require taking some care, and really wanted to make sure we didn’ t make things worse. Plus before we shipped the SAO, we had to test it, and that requires time, especially because it has to be authorized. But the signing system was handicapped, so we had to find some workarounds for that.
Finally, after we had the SAO ready to deliver, it still takes time to set up. Firefox clients check for Normandy improvements every 6 hours, and of course many consumers are offline, so it takes a while for the fix to propagate with the Firefox population. However , at this point we all expect that most people have received the particular update and/or the dot discharge we did later.
Final Steps
As the SAO that was deployed with Research should fix most users, this didn’ t get to everyone. Particularly, there are a number of types of affected customers who will need another approach:
- Users who have handicapped either Telemetry or Studies.
- Users on Firefox with regard to Android (Fennec), where we don’ t have Studies.
- Users of downstream builds associated with Firefox ESR that don’ big t opt-in to
telemetry confirming. - Users who are at the rear of HTTPS Man-in-the-middle proxies, because our own add-on installation systems enforce important pinning for these connections, which unblock proxies interfere with.
- Users associated with very old builds of Firefox that the Studies system can’ t achieve.
We can’ t really do anything about the final group — they should update to some new version of Firefox anyhow because older versions typically have very serious unfixed security vulnerabilities. We can say that some people have stayed on old versions of Firefox because they wish to run old-style add-ons, but many of such now work with newer versions associated with Firefox. For the other groups we now have developed a patch to Opera that will install the new certificate household update. This was released as a “ dot release” so people can get it — and probably have previously — through the ordinary update route. If you have a downstream build, you’ ll need to wait for your construct maintainer to update.
We recognize that none of this is ideal. In particular, in some cases, users lost information associated with their add-ons (an illustration here is the “ multi-account containers” add-on ).
We were unable to create a fix that would avoid this side-effect, but we believe this is the greatest approach for the most users for the short term. Long term, we will be looking at better system approaches for dealing with this kind of concern.
Lessons
First, I want to say that the group here did amazing work: they will built and shipped a repair in less than 12 hours from the preliminary report. As someone who sat within the meeting where it happened, I could say that people were working incredibly tough in a tough situation and that hardly any time was wasted.
With that said, obviously this isn’ capital t an ideal situation and it shouldn’ to have happened in the first place. We obviously need to adjust our processes each to make this and similar occurrences it less likely to happen and to get them to easier to fix.
We’ ll be running a formal post-mortem next week and will publish the list associated with changes we intend to make, however in the meantime here are my preliminary thoughts about what we need to do. Initial, we should have a much better way of monitoring the status of everything in Opera that is a potential time bomb plus making sure that we don’ t discover ourselves in a situation where one activates unexpectedly. We’ re still exercising the details here, but at minimal we need to inventory everything of this character.
Second, we need the mechanism to be able to quickly push improvements to our users even when — especially when — everything else is down. It had been great that we are able to use the Research system, but it was also an imperfect tool that we pressed into company, and that had some undesirable unwanted effects. In particular, we know that many users possess auto-updates enabled but would prefer never to participate in Studies and that’ h a reasonable preference (true story: I had developed it off as well! ) yet at the same time we need to be able to push improvements to our users; whatever the internal specialized mechanisms, users should be able to opt-in order to updates (including hot-fixes) but choose out of everything else. Additionally , the up-date channel should be more responsive compared to what we have today. Even upon Monday, we still had a few users who hadn’ t found either the hotfix or the us dot release, which clearly isn’ to ideal. There’ s been a few work on this problem already, but this particular incident shows just how important it really is.
Finally, we’ lmost all be looking more generally at the add-on security architecture to make sure that it’ s enforcing the right security attributes at the least risk of breakage.
We’ ll be subsequent up next week with the results of a far more thorough post-mortem, but in the interim, I’ ll be happy to solution questions by email at ekr-blog@mozilla. com.
[1] A few very old add-ons had been signed with a different intermediate.
[2] Readers who are familiar with the WebPKI will recognize that this is also the way in which that cross-certification works.
[3] Technical notice: we aren’ t adding the particular certificate with any special liberties; it gets its authority when you are signed for the root. We’ lso are just adding it to the swimming pool of certificates which can be used by Opera. So , it’ s not like we have been adding a new privileged certificate in order to Firefox.
Eric can be CTO of the Firefox team with Mozilla.
If you liked Specialized Details on the Recent Firefox Addition Outage by Eric Rescorla Then you'll love Web Design Agency Miami