How we built Picture-in-Picture in Firefox Desktop with more control over video

Picture-in-Picture support for videos is a feature that we shipped to Firefox Desktop users in version 71 for Windows users, and 72 for macOS and Linux users. It allows the user to pull a <video> element out into an always-on-top window, so that they can switch tabs or applications, and keep the video within sight — ideal if, for example, you want to keep an eye on that sports game while also getting some work done.

As always, we designed and developed this feature with user agency in mind. Specifically, we wanted to make it extremely easy for our users to exercise greater control over how they watch video content in Firefox.

Firefox is shown playing a video, and a mouse cursor enters the frame. Upon clicking on the Picture-in-Picture toggle on the video, the video pops out into its own always-on-top player window.Using Picture-in-Picture in Firefox is this easy!

In these next few sections, we’ll talk about how we designed the feature and then we’ll go deeper into details of the implementation.

The design process

Look behind and all around

To begin our design process, we looked back at the past. In 2018, we graduated Min-Vid, one of our Test Pilot experiments. We asked the question: “How might we maximize the learning from Min-Vid?“. Thanks to the amazing Firefox User Research team, we had enough prior research to understand the main pain points in the user experience. However, it was important to acknowledge that the competitive landscape had changed quite a bit since 2018. How were users and other browsers solving this problem already? What did users think about those solutions, and how could we improve upon them?

We had two essential guiding principles from the beginning:

  1. We wanted to turn this into a very user-centric feature, and make it available for any type of video content on the web. That meant that implementing the Picture-in-Picture spec wasn’t an option, as it requires developers to opt-in first.
  2. Given that it would be available on any video content, the feature needed to be discoverable and straight-forward for as many people as possible.

Keeping these principles in mind helped us to evaluate all the different solutions, and was critical for the next phase.

Three sketches showing a possible drag and drop interaction for picture-in-pictureExploring different interactions for Picture-in-Picture

Try, and try again

Once we had an understanding of how others were solving the problem, it was our turn to try. We wanted to ensure discoverability without making the feature intrusive or annoying. Ultimately, we wanted to augment — and not disrupt — the experience of viewing video. And we definitely didn’t want to cause issues with any of the popular video players or platforms.

A screenshot of a YouTube page with a small blue rectangle on the right edge of the video, center alignedA screenshot of one of our early prototypes

This led us to building an interactive, motion-based prototype using Framer X. Our prototype provided a very effective way to get early feedback from real users. In tests, we didn’t focus solely on usability and discoverability. We also took the time to re-learn the problems users are facing. And we learned a lot!

The participants in our first study appreciated the feature, and while it did solve a problem for them, it was too hard to discover on their own.

So, we rolled our sleeves up and tried again. We knew what we were going after, and we now had a better understanding of users’ basic expectations. We explored, brainstormed solutions, and discussed technical limitations until we had a version that offered discoverability without being intrusive. After that, we spent months polishing and refining the final experience!

Stay tuned

From the beginning, our users have been part of the conversation. Early and ongoing user feedback is a critical aspect of product design. It was particularly exciting to keep Picture-in-Picture in our Beta channel as we engaged with users like you to get your input.

We listened, and you helped us uncover new blind spots we might have missed while designing and developing. At every phase of this design process, you’ve been there. And you still are. Thank you!

Implementation detail

The Firefox Picture-in-Picture toggle exists in the same privileged shadow DOM space within the <video> element as the built-in HTML <video> controls. Because this part of the DOM is inaccessible to page JavaScript and CSS stylesheets, it is much more difficult for sites to detect, disable, or hijack the feature.

Into the shadow DOM

Early on, however, we faced a challenge when making the toggle visible on hover. Sites commonly structure their DOM such that mouse events never reach a <video> that the user is watching.

Often, websites place transparent nodes directly over top of <video> elements. These can be used to show a preview image of the underlying video before it begins, or to serve an interstitial advertisement. Sometimes transparent nodes are used for things that only become visible when the user hovers the player — for example, custom player controls. In configurations like this, transparent nodes prevent the underlying <video> from matching the :hover pseudo-class.

Other times, sites make it explicit that they don’t want the underlying <video> to receive mouse events. To do this, they set the pointer-events CSS property to none on the <video> or one of its ancestors.

To work around these problems, we rely on the fact that the web page is being sent events from the browser engine. At Firefox, we control the browser engine! Before sending out a mouse event, we can check to see what sort of DOM nodes are directly underneath the cursor (re-using much of the same code that powers the elementsFromPoint function).

If any of those DOM nodes are a visible <video>, we tell that <video> that it is being hovered, which shows the toggle. Likewise, we use a similar technique to determine if the user is clicking on the toggle.

We also use some simple heuristics based on the size, length, and type of video to determine if the toggle should be displayed at all. In this way, we avoid showing the toggle in cases where it would likely be more annoying than not.

A browser window within a browser

The Picture-in-Picture player window itself is a browser window with most of the surrounding window decoration collapsed. Flags tell the operating system to keep it on top. That browser window contains a special <video> element that runs in the same process as the originating tab. The element knows how to show the frames that were destined for the original <video>. As with much of the Firefox browser UI, the Picture-in-Picture player window is written in HTML and powered by JavaScript and CSS.

Other browser implementations

Firefox is not the first desktop browser to ship a Picture-in-Picture implementation. Safari 10 on macOS Sierra shipped with this feature in 2016, and Chrome followed in late 2018 with Chrome 71.

In fact, each browser maker’s implementation is slightly different. In the next few sections we’ll compare Safari and Chrome to Firefox.


Safari’s implementation involves a non-standard WebAPI on <video> elements. Sites that know the user is running Safari can call video.webkitSetPresentationMode("picture-in-picture"); to send a video into the native macOS Picture-in-Picture window.

Safari includes a context menu item for <video> elements to open them in the Picture-in-Picture window. Unfortunately, this requires an awkward double right-click to access video on sites like YouTube that override the default context menu. This awkwardness is shared with all browsers that implement the context menu option, including Firefox.

Safari’s video context menu on YouTube.

Safari users can also right-click on the audio indicator in the address bar or the tab strip to trigger Picture-in-Picture:

The Safari web browser playing a video, with the context menu for the audio toggle in the address bar displayed. “Enter Picture in Picture” is one of the menu items.Here’s another way to trigger Picture-in-Picture in Safari.

On newer MacBooks, Safari users might also notice the button immediately to the right of the volume-slider. You can use this button to open the currently playing video in the Picture-in-Picture window:

A close-up photograph of the MacBook Pro touchbar when a video is playing. There is an icon next to the playhead scrubber that opens the video in an always-on-top player window.Safari users with more recent MacBooks can use the touchbar to enter Picture-in-Picture too.

Safari also uses the built-in macOS Picture-in-Picture API, which delivers a very smooth integration with the rest of the operating system.

Comparison to Firefox

Despite this, we think Firefox’s approach has some advantages:

  • When multiple videos are playing at the same time, the Safari implementation is somewhat ambiguous as to which video will be selected when using the audio indicator. It seems to be the most recently focused video, but this isn’t immediately obvious. Firefox’s Picture-in-Picture toggle makes it extremely obvious which video is being placed in the Picture-in-Picture window.
  • Safari appears to have an arbitrary limitation on how large a user can make their Picture-in-Picture player window. Firefox’s player window does not have this limitation.
  • There can only be one Picture-in-Picture window system-wide on macOS. If Safari is showing a video in Picture-in-Picture, and then another application calls into the macOS Picture-in-Picture API, the Safari video will close. Firefox’s window is Firefox-specific. It will stay open even if another application calls the macOS Picture-in-Picture API.

Chrome’s implementation

The PiP WebAPI and WebExtension

Chrome’s implementation of Picture-in-Picture mainly centers around a WebAPI specification being driven by Google. This API is currently going through the W3C standardization process. Superficially, this WebAPI is similar to the Fullscreen WebAPI. In response to user input (like clicking on a button), site authors can request that a <video> be put into a Picture-in-Picture window.

Like Safari, Chrome also includes a context menu option for <video> elements to open in a Picture-in-Picture window.

The Chrome web browser playing a video, with the context menu for the video element hovering over top of it. “Picture in Picture” is one of the menu items.Chrome’s video context menu on YouTube.

This proposed WebAPI is also used by a PiP WebExtension from Google. The extension adds a toolbar button. The button finds the largest video on the page, and uses the WebAPI to open that video in a Picture-in-Picture window.

The Chrome web browser playing a video. The mouse cursor clicks a button in the toolbar provided by a WebExtension which pops the video out into an always-on-top player window.There’s also a WebExtension for Chrome that adds a toolbar button for opening Picture-in-Picture.

Google’s WebAPI lets sites indicate that a <video> should not be openable in a Picture-in-Picture player window. When Chrome sees this directive, it doesn’t show the context menu item for Picture-in-Picture on the <video>, and the WebExtension ignores it. The user is unable to bypass this restriction unless they modify the DOM to remove the directive.

Comparison to Firefox

Firefox’s implementation has a number of distinct advantages over Chrome’s approach:

  • The Chrome WebExtension which only targets the largest <video> on the page. In contrast, the Picture-in-Picture toggle in Firefox makes it easy to choose any <video> on a site to open in a Picture-in-Picture window.
  • Users have access to this capability on all sites right now. Web developers and site maintainers do not need to develop, test and deploy usage of the new WebAPI. This is particularly important for older sites that are not actively maintained.
  • Like Safari, Chrome seems to have an artificial limitation on how big the Picture-in-Picture player window can be made by the user. Firefox’s player window does not have this limitation.
  • Firefox users have access to this Picture-in-Picture capability on all sites. Websites are not able to directly disable it via a WebAPI. This creates a more consistent experience for <video> elements across the entire web, and ultimately more user control.

Recently, Mozilla indicated that we plan to defer implementation of the WebAPI that Google has proposed. We want to see if the built-in capability we just shipped will meet the needs of our users. In the meantime, we’ll monitor the evolution of the WebAPI spec and may revisit our implementation decision in the future.

Future plans

Now that we’ve shipped the first version of Picture-in-Picture in Firefox Desktop on all platforms, we’re paying close attention to user feedback and bug intake. Your inputs will help determine our next steps.

Beyond bug fixes, we’d like to share some of the things we’re considering for future feature work:

  • Repositioning the toggle when there are visible, clickable elements overlapping it.
  • Supporting video captions and subtitles in the player window.
  • Adding a playhead scrubber to the player window to control the current playing position of a <video>.
  • Adding a control for the volume level of the <video> to the player window.

How are you using Picture-in-Picture?

Are you using the new Picture-in-Picture feature in Firefox? Are you finding it useful? Please us know in the comments section below, or send us a Tweet with a screenshot! We’d love to hear what you’re using it for. You can also file bugs for the feature here.

Engineer working on Firefox for Desktop

More articles by Mike Conley…

More articles by Emanuela Damiani…

If you liked How we built Picture-in-Picture in Firefox Desktop with more control over video by Mike Conley Then you'll love Web Design Agency Miami

Add a Comment

Your email address will not be published. Required fields are marked *