The Gateway to the Next Generation of Super-Intelligent Terminals: Shaped by the Fusion of AI Large Models and AR Technology

in Jun 4, 2024

The arrival of the AI era has never been more certain. The space for public and industry imagination is now shifting towards the specific forms of next-generation intelligent terminals.


PCs, laptops, tablets, smartphones, car systems, watches... Will these existing terminals form the basis of future devices? And in the AI era, can we truly achieve the Internet of Everything?


Amidst the intelligent breakthroughs brought by large model technology, the next-generation interaction gateways are beginning to emerge.

The evolution of human-machine interaction follows a clear path. While "brain-computer interaction" is often seen as the ultimate goal, it remains largely a fantasy. From early punched tape storage to mouse and keyboard, and now to touchscreens, human interaction with machines has been largely passive. However, with the advent of multimodal interactions such as voice, image, and gesture recognition, we are moving towards machines actively understanding humans.


The digital technologies supporting these interactions show a similar trend. Milestones in AI, like cybernetics, NLP, and deep learning, mark AI's journey towards better understanding the world. With generative AI, we have reached a stage where AI not only perceives and understands but also creates.
Human-machine interaction devices are limited by the development of technologies like AI. Yet, we can already glimpse the future forms of these terminals. Traditionally, hardware evolves by upgrading existing forms—for example, from phones to smartphones, and from watches to smartwatches.
Future intelligent terminals, or AI hardware, will have their product forms dictated by interaction methods, aligning with the enhanced experiences brought by new technologies.

Currently, the upgrade of existing smart hardware, such as AIPC and AI Phones, faces various issues. In the PC industry, hardware control is dominated by a few giants, with innovation driven by upstream forces. Similarly, software and system improvements are often incremental updates from these giants, leading to inconsistent innovation between hardware and software.


In the smartphone industry, innovation has stagnated, and AI is seen as a new growth point beyond imaging functions and hardware specs. However, this primarily enhances the application layer rather than introducing truly new product forms.


Native AI hardware like AI Pins is still in the early stages, with experiences far from mature. Consequently, smart glasses, as an evolved traditional form, are viewed as a promising direction. Companies like Pico, Bird, Microsoft, and Apple are exploring AR, VR, and XR headsets, which increasingly resemble glasses.

Specifically, in terms of interaction, optimized distribution of large AI models can provide users with a more natural experience. This includes real-time interactions under 2 seconds and voice activation accuracy above 99%. Leveraging specialized AI agents, these models offer fast and high-quality interactions.
Emotionally, large AI models offer more than cold machine responses.

Capabilities like Emotional Voice Cloning enable the AI to understand and recognize emotions, engage in deep, preference-based conversations, and use AI agents to address specific issues, providing a more emotionally connected companionship.


In outdoor verticals, multimodal large AI models deliver more precise services. Data from tourist spots, niche scenic routes, and local cuisines can transform core functions like navigation, recommendations, and guides into real-time AR glasses experiences such as dynamic tours and location-based commentary.
However, this is not the ultimate form. AR glasses represent a concept and a direction but are not the final result.


This can be seen as an evolution from AR smart glasses to AI glasses, serving as a contemporary AI carrier.