Position on emerging standard: WebCodecs

# Dan Sanders (9 days ago)

Hello,

I'm reaching out to see if WebKit would like to weigh in on the WebCodecs WICG proposal: discourse.wicg.io/t/webcodecs-proposal/3662.

The WebCodecs API enables web developers to instantiate codecs (audio/video encoders/decoders) and use them to process individual frames.

There is a related proposal for image decoders; it enables access to individual animation frames: discourse.wicg.io/t/proposal-imagedecoder-api-extension-for-webcodecs/4418.

An implementation of these APIs is being developed in Chromium.

Thank you,

Dan Sanders

Contact us to advertise here
# youenn fablet (4 days ago)

Thanks for reaching out. Here is some feedback based on the web codecs explainer (let me know if there is a more detailed proposal).

  • I am generally in favour of trying to unify the various media pipelines. It is always sad when one feature ships in the video streaming pipeline, and not automatically to the RTC one (or the reverse).
  • Efficient processing of raw video frames (whatever its source: camera, RTC, video streaming) seems like a really useful area to work on.
  • Existing APIs (video/image tags, MSE, WebRTC, recording API...) already provide features that make use of codecs. It should be clear what the goals/key benefits of directly exposing codecs are.
  • Any media pipeline should be off the JS main thread, by default. This does not seem guaranteed by the proposal.
  • Providing deep access to codecs (in terms of capabilities, observability of timing of operations...) requires careful thinking of how much fingerprinting this ends up creating and how the processing model will ensure to keep the whole API fingerprinting neutral.
  • A codec implementation used for RTC may differ significantly from a codec implementation used for recording/MSE. With this proposal, a web page could try to use for RTC purposes a hardware codec dedicated to recording/MSE. The results would be disastrous. That will probably require extensive testing by web developers to ensure their scripts are working in a wide variety of devices. At some point, that might require supporting APIs to properly discover and setup encoders/decoders for the various uses. This might further add to both complexity and fingerprinting.
  • The complexity behind WebRTC, MSE or the media recorder API is not to be neglected. There might be drawbacks in solving these issues at the JS level instead of the browser level. I am for instance uncertain that the level of complexity of a WebRTC pipeline can best be solved by JS.
  • Some code might best stay in control of the browser. Related to WebCodecs is the insertable stream proposal which could allow deploying end-to-end encryption quickly to RTC with a JS-only solution. A JS solution leaves full control to web pages and limits the ability for user agents to upgrade such security mechanisms like they can do for other security mechanisms like TLS/DTLS.

As a background information, I would also note the effort done lately in WebKit to move parts of media processing out of processes running JavaScript.

Hope this helps, Y

Le jeu. 30 avr. 2020 à 00:53, Dan Sanders <sandersd at chromium.org> a écrit :

# Dan Sanders (3 days ago)

Thanks for the detailed feedback! Responses to some of the items are below.

There is a bi-weekly meeting to discuss WebCodecs development, if you (or others working on WebKit) are interested in participating please let me know and I can invite you.

(let me know if there is a more detailed proposal).

The explainer is the primary source at this time. There is IDL in Chromium that has yet to be proposed: source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/modules/webcodecs/?q=webcodecs&ss=chromium

  • Any media pipeline should be off the JS main thread, by default. This does not seem guaranteed by the proposal.

I filed WICG/web-codecs#51

  • Providing deep access to codecs (in terms of capabilities, observability of timing of operations...) requires careful thinking of how much fingerprinting this ends up creating and how the processing model will ensure to keep the whole API fingerprinting neutral.

We have avoided providing codec enumeration API for this reason. Since a site can already run experiments on the <video> implementation it's

not clear that there is substantial new surface, but it may be easier to implement those experiments accurately using WebCodecs.

  • A codec implementation used for RTC may differ significantly from a codec implementation used for recording/MSE.

Chrome's implementation of <video> does not really distinguish here.

There are some tuning parameters (eg. latencyhint) which would be nice to expose to WebCodecs users.

Do you foresee cases where usage hints are not sufficient?

  • The complexity behind WebRTC, MSE or the media recorder API is not to be neglected. There might be drawbacks in solving these issues at the JS level instead of the browser level. I am for instance uncertain that the level of complexity of a WebRTC pipeline can best be solved by JS.

Agreed; WebCodecs is a low-level API and is unlikely to be a better choice for use cases that are served by current APIs.

# youenn fablet (2 days ago)
  • Providing deep access to codecs (in terms of capabilities, observability of timing of operations...) requires careful thinking of how much fingerprinting this ends up creating and how the processing model will ensure to keep the whole API fingerprinting neutral.

We have avoided providing codec enumeration API for this reason. Since a site can already run experiments on the <video> implementation it's not clear that there is substantial new surface, but it may be easier to implement those experiments accurately using WebCodecs.

Capabilities is indeed an issue. The more you need to understand the codec, the wider is the API and the potential fingerprinting surface. For instance, is the encoder supporting a realtime mode or not? Some of it might be already retrieved from using a video element but the potential mitigations might be easier with a video element than with a JS API.

WebCodec might also potentially expose precise timing information in how long it takes to decode/encode frames. Is it a way to identify which hardware is being used underneath?

  • A codec implementation used for RTC may differ significantly from a codec implementation used for recording/MSE.

Chrome's implementation of <video> does not really distinguish here. There are some tuning parameters (eg. latencyhint) which would be nice to expose to WebCodecs users.

Do you foresee cases where usage hints are not sufficient?

I did not look at the usage hints but here are two examples:

  1. A RTC decoder might want to do real-time error concealment / A MSE decoder might not need to do any error concealment.
  2. A MediaRecorder encoder might buffer frames to compress more aggressively / a RTC encoder should not buffer frames.

Want more features?

Request early access to our private beta of readable email premium.