RTE2021, the evolution and butterfly change of real-time interactive technology

SaveSavedRemoved 0

From 2015 to 2021, this year is the seventh year of the Real-time Internet Conference. This conference has attracted 150+ global forward-looking and practical technical leaders, nearly a thousand practitioners in the real-time Internet field, and thousands The attention and participation of industry developers. The conference focused on the industry changes and trend prospects of the real-time interactive industry in the past year, and conducted in-depth discussions and sharing from multiple dimensions such as scenarios, technologies, products, and ecology.

RTE Vientiane Atlas is released

The world’s first fully automated multi-scenario simulation acoustic laboratory completed

On October 22, at the main forum of the RTE2021 conference, Zhao Bin, the founder and CEO of Agora, delivered a speech with the theme of “Vanxiang, Real-time Evolution”. As the online mode is accepted by more and more industries, RTE technology is constantly unlocking new application scenarios. From the rise of remote office to the explosion of the LiveAudioCast scene, in 2021, everyone has witnessed the potential of real-time interaction and real-time audio and video capabilities in the media industry to transform industries, detonate new media forms, and transform social communication.

In his speech, Zhao Bin talked about the key words he summarized for the future development trend of the real-time interactive field: twinning and fusion.

From a trend point of view, the popularization of digitalization is the general trend. But from the perspective of entertainment scenes, from film and television to art, from live broadcasts to exhibitions, more and more entertainment scenes have completed a gorgeous turn from online to offline. When the true digitalization is completed and effective, interaction becomes an indispensable link. This is for existing scenarios, and when digital technology and real-time interactive technology collide, more application scenarios will be born. The twin of digitization and interactive technology is the root cause of the current increase in the use of RTE, the increase in application penetration, and the explosion of application scenarios.

When digging into the digital scene, it is not difficult to find that the integration of online and offline experiences is essentially a profound technological evolution. In the scene where virtual and reality blend, the real-time synchronization and commonality of data brings about the fusion of virtual and real data. Whether it is technological evolution or environmental changes, it will free up more space for various possibilities. In the same way, the boundaries of real-time interaction are constantly evolving. The coexistence of people from traditional real-time communication to real-time interactive scenes may also create value expansion.

During the speech, Zhao Bin released the “RTE Vientiane Atlas” based on real-time interactive scenes, covering 20+ industry tracks and 200+ scenes such as education, pan-entertainment, IoT, finance, healthcare, corporate collaboration, digital government, and smart cities. . Zhao Bin said: The impact of the epidemic has helped to accelerate the enrichment and maturity of the scenes in the Vientiane Atlas. For example, the fields of education, social networking, live broadcasting, and conferences have initially formed mature scenes, and they will exist for a long time in the future.

Among the 200+ scenes included in the Vientiane Atlas, there are not only many mature scenes that have been tested in practical applications, but also new budding scenes that far exceed mature scenes. In the Vientiane Atlas, Agore relies on its huge industry experience and market analysis capabilities to comprehensively sort out the application scenarios in the budding stage of the world. Developers and entrepreneurs can sort out the scenes in the Vientiane Atlas to find new inspirations and perspectives, and to polish these budding scenes together with Tongsheng.com to explore the true value and innovation of them.

In addition to the Vientiane Atlas, Zhao Bin also announced another big news in his speech: Agora has created the world’s first fully automated multi-scene acoustics laboratory. He said: Its appearance represents that the RTE industry’s first professional test facility and test environment under multi-scenario real-time interaction has been successfully put into production. Come to a new height and convenience.

Where is the next generation of real-time Internet?

Dr. Zhong Sheng, Chief Scientist of Agora, shared the keynote speech of “Real-time Interaction and Intelligent Internet”:

With the accelerated integration of online and offline, video calls, online classrooms, VR/AR, and live shows have all become our daily life experiences. The emergence of real-time interactive technology has undoubtedly greatly strengthened people’s social experience in the online world, and at the same time enhanced user stickiness in online application scenarios.

If you want to reproduce the experience of “gathering together” offline people in the online world, the demand for low latency in the communication network is very stringent. In response to the low-latency requirements in real-time interactions, in the face of massive unstructured data that needs to be processed, understood, and restored, technical support for the integration of perception, communication and computing is very necessary.

In the future real-time interactive scenes, it is an extension of narrative in terms of experience. From the simple sensory experience in the past to the immersive interactive narrative experience, people will get a richer experience in the future real-time interactive scenes. In the online world, we need to build a virtual character based on ourselves to perform the effect. The connection between the virtual character and the real “I” in reality requires the digital twin and the digital twin technology of the human body as a link. What is necessary in the future real-time interaction is a powerful ability to deliver, express and empathize, which includes the recognition of facial expressions and emotions, as well as the perception of the environment, the perception of touch, and AI-based 3D modeling. These technologies share the same It constitutes a digital twin.

When talking about the key technologies affecting the development of real-time Internet in the future, Zhong Sheng introduced: Shengwang focuses on low-latency + edge + cloud acceleration, real-time construction in PaaS system, API provides content for flexible application business construction, and advanced Cloud/edge computing, these are certainly very important technology development directions in the future. In addition to the continuous evolution of bandwidth, wide port delay, reliability, and multi-device connections in the communications field, terahertz, millimeter waves, and ultra-large-scale MIMO are all key technologies at the bottom. At the same time, in order to present a more realistic video image effect, ultra-high resolution video technology is also inevitable.

At the same time, for the application of AI technology in the real-time Internet, Zhong Sheng also gave his own ideas: how to retrieve and restore all the information through a small data? This may seem impossible, but in fact, it is possible to generalize big data into small data through AI algorithms, and use small data to drive big data. Extract the key points from the sending end, and regenerate the video based on the key points at the receiving end.

After the integration of communication and computing today, the existing operation and maintenance and technical architecture can no longer cope with today’s real-time interactive business and experience requirements. Zhong Sheng, chief scientist of Agora, said in his speech that the next generation of real-time Internet requires network-wide collaboration and network awareness, as well as real-time scheduling of global bandwidth, real-time scheduling of global resources, and development of support for flexible dynamic distributed computing cloud native Software architecture, full use of AI algorithms to generalize intelligence, small data to drive big data, and continuous improvement of hardware capabilities such as end/side/chip can meet the technical, business and experience requirements of today’s real-time interactive scenarios.

AI and deep learning continue to penetrate all aspects of RTE

In addition to Dr. Zhong Sheng’s forward-looking research, another important phenomenon we can see at the RTE2021 conference is that AI and deep learning are constantly penetrating all aspects of real-time audio and video. Algorithm engineers in various fields of audio, video, and network are practicing, using AI to optimize and improve the performance of their own fields.

Google engineers shared the latest low-bit-rate voice codec Lyra at the RTE2021 conference. Lyra compresses and reconstructs voice with a small amount of data to achieve smooth video calls below 20kpbs. Google engineers sparse the single largest matrix, that is, the matrix in the gated recurrent unit (GRU), and these block matrices can be implemented as small and dense matrices, which doubles the speed of deep learning training.

Soundnet’s Silver voice codec uses deep learning to explore the balance of bit rate, computing power and effects. The audio algorithm team of Shengwang uses the AI-NS noise reduction algorithm to improve the speech signal-to-noise ratio, solve the noise problem, and reduce the artifacts caused by low bitrates;

The coding is based on the sub-band feature extraction of traditional algorithms (fundamental frequency, sub-band spectrum envelope, energy, etc.), as well as RVQ, distance coding and other methods to achieve feature coding to save bit rate; decoding adopts WaveRNN with autoregressive model and bandwidth extension (BWE) The model realizes the streamlining of model computing power. Based on the self-developed multi-platform AI inference engine, it takes time to implement model asymmetric quantization, mixed-precision inference, and computational compression and decoding, and ultimately ensure the real-time deployment of mobile terminals.

In addition, NVIDIA’s deep learning senior solution architect shared at the conference NVIDIA’s deep learning “one increase and two reduction” ideas from increasing computing power, structured sparseness to reduce computing power loss, and model quantification to find the optimal computing power. The voice network Agora SD-RTNTM network transmission quality engineer shared and explored how the voice network can API and platform operation and maintenance operations, and disassemble AI and OPS into algorithms-decision-execution to achieve 7*24H uninterrupted, operation and maintenance The quality and efficiency of execution.

Video standards and patents are developing rapidly, looking forward to the arrival of AV2

In addition to the in-depth practice of AI, domestic manufacturers are also investing a lot of energy in the formulation of standards in another important battlefield in the real-time audio and video field. At the RTE2021 conference, Dr. Ye Yan, who participated in the formulation of international video standards such as HD and 4K, shared the latest personal views on the development of MPEG and ITU, the detailed index data of VVC performance evolution, and the commercial implementation of video standards.

MPEG development path

She frankly pointed out that the compression technology of videos that already account for 80% of the Internet traffic is getting more and more attention, which will also bring about patent technology disputes and complex scenarios for patent authorization. In addition, MPEG video-related work will be developed in the development of VVC next-generation standard technology, AI video coding (including higher compression performance under the traditional framework, and neural network for video compression in two directions), and immersive video.

In addition to video standards, at the RTE2021 conference, Google engineers also brought the latest AV2 codec design and performance optimization results. The previous generation of AV1 was only in the process of introducing actual scenes. Google’s engineers have continued to move forward, with a performance improvement of 0.4~1.5%, exploring the limit of codec efficiency. Therefore, scenes and standards related to AV2 will also be the focus of the industry.

On the basis of WebRTC open source and standards, how does Soundnet break the “black box”

This year, WebRTC officially became the official standard of W3C and IETF, and it seems that the dust has settled down. However, Mao Yujie, head of the WebRTC open source community Committer and Agora WebRTC, shared at the conference that various organizations have the status quo of the dispute between open source and standards in WebRTC Codecs, and how to design the Web end of the WebRTC standard on top of the WebRTC standard. End-to-end audio and video transmission architecture reduces the threshold for developers to use real-time audio and video.

He summarized the six existing problems of WebRTC: lack of adaptation between devices and peripherals, compatibility of various browsers, poor mobile terminal support, non-customizable audio and video modules, performance issues, and lack of statistical data. Not all of the above problems can be solved. Soundnet combined with multiple technical standards such as ORTC, WebRTC Extenions, WebRTC-SVC, Web Transport, Raw-Socket, etc., to form the current soundnet Web end-to-end audio and video transmission architecture.

From the launch of the AgoraAI real-time AI acceleration engine, to the launch of various AI gameplay on the thousand-yuan machine; from the standardization of WebRTC, to exploring the equivalent capabilities of the native on the Web side; from the aPaaS provided on the basis of the integration of RTC and IM to RTE The release of Vientiane Atlas. The past year has been a year of technological evolution of Agora, and it is also an important point of change for the real-time Internet to move towards a new technological stage. The future of the RTE field is worth looking forward to!

The Links: MIG25Q901H MG50Q6ES41 INFIGBT