In a world where online video use is soaring and bandwidth remains at a premium, video compression is essential to keep the gears running smoothly.
But conventional techniques have reached the end of the line. The coding algorithm on which all major video compression schemes have been based for 30+ years has been refined and refined, but it is still based on the same original concept.
Even Versatile Video Coding (VVC), which MPEG is targeting at “next-gen” immersive applications like 8K VR, is only an evolutionary step forward from HEVC, itself a generation away from H.261 in 1988.
What’s more, the physical capacity of the silicon chip is reaching its limit too. Codecs are at an evolutionary cul-de-sac. What we need is a new species.
AI Codecs Developed
As this article for RedShark News makes clear, the smarts of codec development are being trained on artificial intelligence, machine learning, and neural networks. They have the benefit of being software-based and therefore more suited for an environment in which applications will run on generic hardware or virtualized in the cloud.
Among companies with AI-based codecs is V-Nova. Its VC-6 codec, standardized as SMPTE ST 2117, can calculate bitrate to optimize bandwidth usage while maintaining an appropriate level of quality at superspeed.
Nvidia’s Maxine system uses an AI to compress video for virtual collaborations like video conferencing.
Haivision offers Lightflow Encode, which uses ML to analyse video content (per title or per scene), to determine the optimal bitrate ladder and encoding configuration for video.
This also uses a video quality metric called LQI which represents how well the human visual system perceives video content at different bitrates and resolutions.
Perceptual quality rather than “broadcast quality” is increasingly being used to rate video codecs and automate bit rate tuning. Metrics like VMAF (Video Multi-method Assessment Fusion) combines human vision modelling with machine learning and seeks to understand how viewers perceive content when streamed on a laptop, connected TV or smartphone.
It was originated by Netflix and is now open sourced.
Perceptual Quality and VMAF
“VMAF can capture larger differences between codecs, as well as scaling artifacts, in a way that’s better correlated with perceptual quality,” Netflix explains in a recent blog post. “It enables us to compare codecs in the regions which are truly relevant.”
iSize Technologies has developed an encoder to capitalize on the trend for perceptual quality metrics. Its bitrate saving and quality improvements are achieved by incorporating a proprietary deep perceptual optimization and precoding technology as a pre-processing stage of a standard codec pipeline.
This “precoder” stage enhances details of the areas of each frame that affect the perceptual quality score of the content after encoding and dials down details that are less important.
“Our perceptual optimization algorithm seeks to understand what part of the picture triggers our eyes and what we don’t notice at all,” CEO Sergio Grce explains to IBC 365.
This not only keeps an organization’s existing codec infrastructure and workflow unchanged but is claimed to save 30 to 50 percent on bitrate at the cost in latency of just 1 frame — making it suitable for live as well as VOD.
The company has tested its technology against AVC, HEVC and VVC with “substantial savings” in each case.
“Companies with planet scale steaming services like YouTube and Netflix have started to talk about hitting the tech walls,” says Grce. “Their content is generating millions and millions of views but they cannot adopt a new codec or build new data centers fast enough to cope with such an increase in streaming demand.”
Using Convolutional Neural Networks
ML techniques which have been used heavily in image recognition will be key to meeting the growing demand for video streaming that we are seeing, according to Christian Timmerer, a co-founder of streaming technology company Bitmovin and a member of the research project Athena Christian Doppler Pilot Laboratory. The lab is currently preparing for large-scale testing of a convolutional neural network (CNN) integrated into production-style video coding solutions.
Timmerer’s team has proposed the use of CNNs to speed the encoding of “multiple representations” of video. In layperson’s terms, videos are stored in versions or ‘representations’ of multiple sizes and qualities. The player, which is requesting the video content from the server on which it resides, chooses the most suitable representation based on whatever the network conditions are at the time.
READ MORE: Deep Learning Offers the Potential to Improve the Video Streaming Experience (Inside Big Data)
In theory, this adds efficiency to the encoding and streaming process. In practicality, however, the most common approach for delivering video over the Internet — HTTP Adaptive Streaming limits in the ability to encode the same content at different quality levels.
“Fast multirate encoding approaches leveraging CNNs, we found, may offer the ability to speed the process by referencing information from previously encoded representations,” he explains. “Basing performance on the fastest, not the slowest element in the process.”
IP Protection and Standards
There’s a body looking to wrap a framework around these and future developments in media as well as applications in other industries. MPAI – Moving pictures, audio and data coding by Artificial Intelligence — is founded by MPEG co-founder Leonardo Chiariglione.
In a recent blog post, he writes about the 1997 chess match between IBM Deep Blue and Garry Kasparov, which made headlines when the machine beat the man.
“As with IBM Deep Blue, old coding tools had a priori statistical knowledge modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics.
“This is the reason why AI tools are more promising than traditional data processing tools. For a new age you need new tools and a new organization tuned to use those new tools.”