It was suggested at the Mozilla Summit that there isn’t good information around about what Encrypted Media Extensions (EME) actually is. Since I’m on the HTML working group and have been reading the email threads about EME there, I thought that I could provide an introduction that explains things that may not be apparent from the specification itself.
Major Hollywood studios require that companies that license movies from them for streaming use DRM between the streaming company and the end user. Traditionally, in the Web context, this has been done by using the Microsoft PlayReady DRM component inside the Silverlight plug-in or the Adobe Access DRM component inside Flash Player. As the HTML/CSS/JS platform gains more and more capabilities, the general need to use Silverlight or Flash becomes smaller and smaller, such that soon the video DRM capability will be the only thing that Silverlight and Flash have but the HTML/CSS/JS platform doesn’t.
Proposals have been written to augment
<video> with features that enable the Netflix player to be ported from Silverlight to
<video> without a loss of features. The additions are split across two specifications: Media Source Extensions (MSE) and Encrypted Media Extensions (EME). The noncontroversial parts (giving JS precise control over media-related networking) are in MSE and the controversial parts (DRM interface) are in EME. I will not cover MSE further.
<audio> for dealing with media files that contain encrypted tracks.
EME requires the presence of one or more components called Content Decryption Modules (CDM) which are integrated in some way with the browser. For the purpose of this introduction, the CDM is not considered to be part of the browser. The browser (which, as noted, excludes the CDM) is considered untrusted by copyright holders who require DRM to be used. (The browser is assumed to be trusted by the user as before.) The CDM is trusted by the copyright holders to hide certain pieces of data from the user (and to prevent the user from manipulating that data).
A CDM could be bundled with the browser, downloaded separately, bundled with the operating system, embedded in hardware as firmware running in a second domain of computing (such as ARM TrustZone) or wired into hardware. EME leaves this aspect implementation-dependent.
A CDM implements what is colloquially referred to as a DRM scheme but EME calls a Key System. A CDM implements at minimum a Key System-specific format for messages (byte buffers from the EME point of view) to request and receive keys and the capability to decrypt content with the keys acquired via these messages. The inputs of a CDM are Key System-specific initialization data, Key System-specific messages and encrypted media stream data.
EME specifies a toy Key System called Clear Key, which could be used to demonstrate interoperability of two EME implementations to the point of satisfying the requirements of the W3C Process. So far, there has been no indication that anyone would be interested in deploying Clear Key for non-test purposes.
EME does not specify the sort of Key System that one could expect to be deployed for the purpose of streaming Hollywood movies. The non-toy Key System supported by IE 11 on Windows 8.1 is PlayReady (proprietary to Microsoft and bundled with Windows 8.1) and the non-toy Key System supported by Chrome on Chrome OS is Widevine (proprietary to Google and bundled with Chrome OS). Therefore, a Web site that wishes to be cross-browser-compatible needs to support multiple Key Systems.
EME does not specify the output abstraction for CDMs. It leaves open several options. The CDM could:
The more the CDM does to conceal the decryption keys, the elementary stream data or the decoded data from software that the user can control, the more likely the CDM is to be approved by the copyright holders for use with content that they hold copyright to. Also, the requirements placed by the copyright holders on CDMs permitted to play HD content may be stricter than the requirements placed on CDMs permitted to play SD content.
<object> element. However, in the EME case there is no standardized analog to NPAPI, since as noted above, even the level of output abstraction isn’t specified.
The media that requires a CDM to play comes in one of the usual container formats. Since the W3C has avoided specifying mandatory formats for
<video>, EME doesn’t normatively require support for a specific container format. The EME specification contains guidance for the MP4 (typically used with H.264) and WebM (typically used with VP8) containers. EME does not have normative requirements on whether encryption happens inside or outside the container, but in practice, encryption happens inside the container. Compared to ordinary use of MP4 or WebM, one or more of the elementary streams (“tracks”) inside the container format is encrypted with a key that is not included in the media file. The EME specification does not require a particular encryption scheme, and there are multiple possible ways to have encrypted tracks in MP4. However, when a scheme called Common Encryption is used, one MP4 file can be used with multiple Key Systems.
The server side of the Web app needs to have Key System-specific software for each Key System that the Web app supports in order to be able to make sense of the messages received and in order to be able to construct responses.
EME doesn’t define how many messages are emitted by the CDM and how many by the server. There might be any number, including new ones during playback, depending on the Key System.
EME does not specify any policy about the conditions under which the CDM may be allowed to decrypt media or a vocabulary for expressing such a policy. EME doesn’t even specify whether a policy is enforced on the server based on information that got sent there or whether a policy is enforced on the CDM based on information received back. That’s all Key System-specific.