OpenAI held its annual DevDay conference yesterday, where it announced its Realtime API, as well as features like prompt caching, vision fine-tuning, and model distillation.
The Realtime API is designed for building low-latency, multimodal experiences, and it’s now available as a public beta.
The company shared a couple of examples of companies that are using the Realtime API already, such as fitness coaching app Healthify, which used it to enable more natural conversations with its AI coach, or Speak, which is a language learning app that used the Realtime API to enable customers to practice conversations in the language they are learning.
The API supports the six preset voices in ChatGPT’s Advanced Voice Mode, according to OpenAI.
Audio input and output have also been added to the Chat Completions API to support voice in use cases that don’t require the low latency benefits of the Realtime API. This enables developers to pass text or audio into GPT-4o and have it respond with text, audio, or both.
According to the company, the Realtime API and the addition of audio to the Chat Completions API will enable developers to build natural conversational experiences using a single API call, rather than needing to combine multiple models to build those experiences.
In the future, OpenAI plans to add features like new modalities like vision and video, increased rate limits, official SDK support, prompt caching, and expanded model support.
Speaking of prompt caching, that was another feature announced during DevDay. Prompt caching allows developers to reuse recent input tokens to save money and have their prompts processed faster. Cached inputs cost 50% less than uncached tokens, and this functionality is now available by default in the latest versions of GPT-4o, GPT-4o mini, o1-preview, and o1-mini, in addition to fine-tuned versions of them.
Next, it announced fine-tuning for vision in GPT-4o, allowing users to customize the model to have stronger image understanding. This can then be used for scenarios like advanced visual search, improved object detection for autonomous vehicles, or more accurate medical image analysis.
Through the end of the month, the company will be offering 1 million free training tokens per day for fine-tuning GPT-4o with images.
And finally, OpenAI announced Model Distillation, which allows developers to use the outputs of more capable models to fine-tune smaller, more cost-efficient models. For example, it would enable GTP-4o or o1-preview outputs to be used to improve GPT-4o mini.
Its Model Distillation suite includes the ability to capture and store input-output pairs generated by a model, the ability to create and run evaluations, and integration with OpenAI’s fine-tuning capabilities.
This feature can be used now on any of OpenAI’s models, and the company will be offering 2 million free training tokens per day on GPT-4o mini and 1 million free training tokens per day on GPT-4o through the end of the month to encourage people to try it out.