首页 News 正文

The response speed of the new OpenAI model GPT-4o's explosive debut is comparable to that of a real person, and it is also free!

Ronan62054
1381 0 0

On May 13th (Monday), US time, OpenAI Chief Technology Officer Mira Murati announced in a highly anticipated live demonstration the launch of a new flagship AI model called GPT-4o, which is an updated version of its GPT-4 model that has been around for over a year. Meanwhile, OpenAI has also launched a desktop version of ChatGPT and a new user interface (UI).
GPT-4o model is trained based on a large amount of data from the Internet, is better at processing text and audio, and supports 50 languages. It is worth mentioning that the GPT-4o can respond to audio input in as fast as 232 milliseconds, almost reaching the level of human response.
Murati stated that the new model is aimed at everyone, not just paying users, bringing "a GPT-4 level of intelligence to our free users.". However, the application programming interface of GPT-4o has not yet provided voice functionality for all customers. Given the risk of abuse, OpenAI plans to first introduce support for the new audio features of GPT-4o to a small group of trusted partners in the coming weeks.
After the release of ChatGPT-4o, netizens also gave it mixed reviews. Nvidia scientist Jim Fan commented, "From a technical perspective, overall it's a data and system optimization problem." Some netizens also said that they feel that OpenAI is not as innovative so far, but some netizens believe that OpenAI has further widened the gap with Apple, and now it's Apple's Siri who is sweating profusely.
How explosive is the GPT-4o? There are three core competencies
The "o" in GPT-4o represents "omni", meaning "omnipotent". According to the OpenAI official website, GPT-4o has taken a step towards more natural human-computer interaction as it accepts any combination of text, audio, and image as input content and generates any combination of text, audio, and image as output content.
How strong is GPT-4o and what are its core competencies?
OpenAI official website screenshot

Ability 1: "Real time" interaction, expressing emotions, and stronger visual function
OpenAI stated that GPT-4o significantly improved the user experience of the AI chatbot ChatGPT. Although ChatGPT has long supported voice mode, which can convert ChatGPT text into speech, GPT-4o has been optimized based on this, allowing users to use ChatGPT naturally like interacting with assistants.
For example, users can now interrupt ChatGPT while answering questions. Moreover, the new model can provide "real-time" response and even capture the emotions in the user's voice, and generate speech in different emotional styles, just like a real person. In addition, GPT-4o also enhances the visual function of ChatGPT. Through photos or screenshots, ChatGPT can now quickly answer related questions, from "What is this code used for" to "What brand of shirt is this person wearing?".
US technology media Quartz reported that OpenAI's newly released ChatGPT-4o technology is impressive. The OpenAI demonstration shows that robots can now engage in real-time conversations with humans, almost indistinguishable from human level. If the final version is like the official demonstration of OpenAI, then OpenAI seems to have verified to some extent how much AI will change our world.
Ability 2: Excellent multilingual performance with almost lifelike response speed
The multilingual functionality of GPT-4o has been enhanced, performing better in 50 different languages. In the OpenAI API, the processing speed of GPT-4o is twice that of GPT-4 (especially GPT-4 Turbo), and its price is half that of GPT-4 Turbo, while also having a higher speed limit.
According to the OpenAI official website, the GPT-4o can respond to audio input in as fast as 232 milliseconds, with an average response time of 320 milliseconds, which is similar to the response time of humans in conversations. Its performance in English text and code is consistent with that of the GPT-4 Turbo, and there is a significant improvement in performance in non English text.
Users only need to send a simple "Hey ChatGPT" voice prompt to receive a spoken response from the agent. Then, users can submit queries in spoken language and attach text, audio, or visual effects as necessary - the latter can include photos, real-time images from mobile phone cameras, or any other content that agents can "see".
Ability 3: Set a new benchmark in reasoning and audio translation
According to OpenAI researcher William Fedus, the GPT-4o is actually another version of the GPT-2 model that caused a frenzy in the LMSYS model arena last week, with a benchmark score comparison chart attached. Compared to the GPT-4 Turbo, it has improved by over 100 units.
In terms of reasoning ability, GPT-4o has surpassed cutting-edge models such as GPT-4 Turbo, Claude 3 Opusn, and Gemini Pro 1.5 in MMLU, GPQA, MATH, and HumanEval, achieving the highest score.
OpenAI

In terms of audio ASR (Intelligent Speech Recognition) performance, GPT-4o significantly improves speech recognition performance in all languages compared to Whisper-v3, especially in languages with limited resources.
OpenAI

In terms of audio translation, GPT-4o has also set a new benchmark, outperforming Whisper-v3, Meta, and Google's voice models in MLS benchmark testing.
OpenAI

There are mixed reviews, and some netizens believe that the pressure has been on Siri
Although he did not appear in OpenAI's heavyweight live broadcast presentation on Monday, OpenAI CEO Altman provided an important summary of the presentation. Altman stated that OpenAI provides the world's best models for free in ChatGPT, and the new voice and video modes are the best computational interaction interfaces he has ever used. It feels like artificial intelligence in movies, achieving response speed and expressive power similar to humans.
At present, the text and image functions of GPT-4o are being launched for free in ChatGPT, and Plus users can enjoy 5 times the call quota. In the coming weeks, OpenAI will launch a new version of Voice Mode in ChatGPT Plus, which comes with GPT-4o.
On social media platform X (formerly Twitter), netizens have mixed reviews of ChatGPT-4o.
NVIDIA scientist Jim Fan commented, "From a technical perspective, OpenAI has found a way to directly map audio to audio as a primary modality and transmit video in real-time to the transformer. These require some new research on tokenization and architecture, but overall, it is a data and system optimization problem (which is often the case)."
X

Regarding the new models and UI updates launched by OpenAI, some netizens have expressed that they feel that OpenAI has not been as innovative so far.
X

Some netizens also pointed out that GPT-4o can not only convert speech into text, but also understand and label other features of audio, such as breathing and emotion, but it is uncertain how this is expressed in the model response.
X

But most netizens still gave very positive opinions.
For the sentence "her" left by Altman on X, it seems to imply that ChatGPT has implemented the "flesh and blood" AI in the classic movie "Her," and a netizen commented on it: "You finally did it." It was accompanied by a meme that "changed the head" of the AI in the movie "Her" stills to OpenAI.
X

X

Another netizen commented, "This is too crazy. OpenAI has just launched ChatGPT -4o, which will completely change the competition for artificial intelligence assistants." The netizen also listed 10 cases of ChatGPT -4o being "crazy," such as real-time visual assistance and so on.
X

Another netizen commented on the example of Khan and his son using GPT-4o to guide children in doing math problems at the Khan Academy, saying, "Students share their iPad screens with the new ChatGPT-4+GPT-4o, AI talks to them, and helps them learn 'in real time'. Imagine if every student in the world could learn like this, the future would be so bright."
X

Some netizens also feel that OpenAI has further widened the gap with Apple, and even posted a sweaty animated picture, claiming that Apple's voice assistant Siri should be like this now.
X

Regarding this, Quartz reported that the emotional attributes of GPT-4o make AI chatbots more personalized than Apple's Siri. Siri gives the impression of being in conversation with a robot, but OpenAI's demonstration clearly demonstrates that GPT-4o has "artificial emotional intelligence" that can recognize user emotions and match them with yours. This makes the GPT-4o feel like a true companion, adding a touch of humanity to the user's smartphone operating system.
In fact, in response to technological threats, Apple is also in talks to collaborate with OpenAI. Wedbush analyst Dan Ives predicts in a report that Apple will announce its partnership with OpenAI and launch an AI chatbot based on Apple LLM at the WWDC conference on June 10th.
LogoMoney.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表LogoMoney.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   美股市场:纽约股市三大股指4月30日涨跌不一。截至当天收盘,道琼斯工业平均指数比前一交易日上涨141.74点,收于40669.36点,涨幅为0.35%;标准普尔500种股票指数上涨8.23点,收于5569.06点,涨幅为0.15%;纳斯 ...
    joey791216
    前天 11:57
    支持
    反对
    回复
    收藏
  •   当地时间周四,美股三大股指集体收涨,其中道指和标普500指数实现“八连涨”。不过,三大股指均在尾盘出现小幅跳水。   苹果、亚马逊于周四美股盘后公布了最新业绩,尽管业绩有所超出预期,但仍有令市场不满 ...
    jiangu12
    昨天 10:28
    支持
    反对
    回复
    收藏
  •   5月2日,全球电商巨头亚马逊公布了2025年第一季度财报。亚马逊第一季度净销售额为1556.67亿美元,较2024年第一季度同比增长9%;净利润为171.27亿美元,较2024年第一季度增长64%;每股摊薄收益1.59美元,较上年同 ...
    独品金莲芳
    3 小时前
    支持
    反对
    回复
    收藏
  •   周三热门中概股涨跌不一。纳斯达克中国金龙指数(HXC)收跌0.95%。   上涨股当中(按市值从高到低),台积电涨1.34%,阿里巴巴涨0.46%,拼多多涨1.36%,网易涨0.66%,中华电信涨1.33%,理想汽车涨0.91%,日月 ...
    蓝蓝的彩
    前天 11:15
    支持
    反对
    回复
    收藏
Ronan62054 新手上路
  • 粉丝

    0

  • 关注

    0

  • 主题

    1