Friday, September 4, 2009

Keyboard control of html5 video elements

HTML 5 introduces the video and audio media elements. Playback is manipulated by the user with browser supplied controls (indicated via the controls attribute), or with author supplied controls. In the case of browser supplied controls the current Firefox implementation is described here, and an idea for potentially improved keyboard support is suggested.

video with no controlsShow here is a screen shot of a video with no controls shown. In this case the reason they are not shown is because the mouse pointer is not hovering over the video. Hovering over this now won't show the controls; it is just an image after all.

This next image shows a blown up view of the controls that appear when the mouse hovers over the video. I've also hovered over the volume control here to show reveal the volume slider.
video with controls Point at what you need, click what you want, move away, and enjoy your video. Nice. When the mouse user doesn't want to interact with the video, the controls slide away leaving an uncluttered video viewing experience. The gotcha here is that not all users are mouse users, and not all devices have a mouse (or touch screen).

Are you a keyboard user?

We have you covered. You can tab to the video element. The controls are not shown but the you can manipulate the video using some intuitive keystrokes such as arrowing left and right to go back and forward, space to toggle play and pause, and up and down arrows to control volume etc. Sighted keyboard users can enjoy uncluttered interaction with the video, while screen reader users can of course enjoy the same interaction regardless of visual clutter.

We still have some concerns:

1. Discoverability. Once a user has tabbed to a video, it is difficult to tell that the video has focus and there is nothing indicating that the video is keyboard controllable.

2. Feedback. The feedback after a user action is not as rich as the feedback when using the controls. For example, pressing right arrow to advance the video doesn't tell you how far ahead we went, or where we are in the overall length of the video.

An Idea...

Keep the current functionality but add a secondary keyboard interaction model. Once a user has tabbed to the video element then the video is directly controlled via the existing keystrokes. If the user hits tab again, the controls appear, and the first control is focused. A regular keyboard interaction model ensues for the controls (tab navigation, and per control keyboard manipulation). Tabbing past the last control leaves the video element entirely, moving to the next element in the document tab order.

Pros: Discoverability is solved. Feedback is solved.

Cons: It increases the number of items in the overall document tab order. Additional source code is required.

Thoughts?