Google Ups Its AI Game With Project Astra, AI Overviews and Gemini Updates     – CNET

Google Ups Its AI Game With Project Astra, AI Overviews and Gemini Updates – CNET

At its Google I/O developer event on Tuesday, Google showed off advances in its artificial intelligence lineup, including a search feature called AI Overviews and an initiative called Project Astra, along with updates to its Gemini chatbot.

The company also introduced Gemini Live, a conversation-driven feature, and Imagen 3, the latest version of its image generation model.

AI Atlas

The news comes just one day after ChatGPT maker OpenAI announced its latest flagship model, GPT-4o, and a few weeks before Apple’s own developer event, WWDC, where AI is expected to dominate as well. The field of generative AI has exploded in the last year and a half, since ChatGPT’s debut, with offerings ranging from Google’s Gemini (formerly Bard), Microsoft’s Copilot and Adobe Firefly to entries from startups including Perplexity and Anthropic, maker of the Claude chatbot.

Gemini updates

Google is bringing its Gemini 1.5 Pro model with a 1 million context window to Gemini Advanced users in 35 languages.

That means you can, for instance, ask Gemini to summarize all recent emails from your child’s school and it can identify relevant messages and analyze attachments such as PDFs to provide a summary of key points. Or you can ask Gemini to look at a lease for a rental property and tell you if you can have pets.

Gemini Advanced subscribers will have access to Gemini 1.5 Pro as of today.

More from Google I/O 2024

Google plans to expand the context window to 2 million tokens for developers and Gemini Advanced subscribers later this year, said Sissie Hsiao, vice president at Google and general manager for Gemini experiences and Google Assistant.

“We are making progress towards our ultimate goal of an infinite context,” added Sundar Pichai, CEO of Google.

Developers told Google they wanted a model that was faster and more cost-effective than Gemini 1.5 Pro, so Google has added Gemini 1.5 Flash. Demis Hassabis, CEO of Google’s AI research arm, DeepMind, said Flash features the multimodal reasoning capabilities and long context of Pro but is designed for speed and efficiency — it’s “optimized for tasks where low latency and cost matter most.”

Gemini 1.5 Flash is available in public preview in AI studio and Vertex AI today.

AI Overviews

Starting this week, Google will roll out a new search experience in the US with what it calls AI Overviews. The goal is to “take the work out of searching,” said Liz Reid, vice president of search at Google.

With the help of a custom Gemini model designed specifically for search, Google wants to take on some of that legwork for its users. “It is a way for Google to do the searching for you,” Reid said.

Instead of having to ask multiple questions about a topic like finding a nearby yoga studio, Gemini’s multistep reasoning helps Google do more advanced research on the users’ behalf — taking into consideration factors like location, hours and offers — so you can get the information you’re looking for faster.

Or let’s say you want to make a reservation for an anniversary dinner in Dallas, but you’ve never been to Texas before. This new AI functionality allows Google to “do the brainstorming with you” via an AI-organized search results page, Reid said.

Google uses generative AI to organize the results based on the topic itself and what the user might find interesting.

“We’re really focused on putting AI Overviews where they add value to the user,” she said. “When search works really well today, great, we’ll keep it as is. And we’ll add Overviews when it unlocks new queries for you.”

Gemini is also bringing multimodal understanding of video to help search evolve even further beyond text. This will allow you to share a video of, say, a broken record player and ask how to fix it.

google ups its ai game with project astra ai overviews and gemini updates

Watch this: Project Astra Revealed at Google I/O

Project Astra

Google has big plans for not only AI assistants, but also AI agents, or “intelligent systems that show reasoning, planning and memory,” Pichai said.

“We’ve always wanted to build a universal agent that will be useful in everyday life,” Hassabis added. “That’s why we made Gemini multimodal from the very beginning.”

This agent would be able to see and hear what we do, and understand the context we’re in to respond to us in conversation. It’s what Google calls Project Astra.

This is possible in part because of Gemini’s long context window, which allows the agent to remember a lot, while multimodality allows it to not only answer questions, but also interact with files on your computer or access your calendar.

It’s still in the prototype stage, but Google shared a video of a woman walking around a London office with the camera on her phone displaying her surroundings, so she could ask the agent questions. These agents will be able to act on your behalf to perform actions like returning a pair of shoes that don’t fit or learning about a new city prior to a move.

“It’s early days, but we are prototyping these experiences,” Pichai said.

Google is going to bring the video understanding capability from Project Astra to Gemini Live last this year, Hsiao said. Google I/O attendees will be able to try out the technology, but it’s not clear how it’ll ultimately be made available. Google Lens is one possibility.

“The goal is to make Astra seamlessly available across our products, but we’ll be obviously gated by quality, latency, etc.,” Pichai said.

Gemini Live

To further its goal of making Gemini a personal AI assistant that can solve complex problems while also feeling natural and conversational, Google is launching Gemini Live, which allows you to have a conversation with Gemini using your voice.

“We think that this two-way dialogue can make diving into a topic much better — rehearsing for an important event or brainstorming ideas feel very natural,” Hsiao said.

These responses are custom tuned to be intuitive and let you have a back-and-forth actual conversation with the model. It’s meant to provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text.

“Think of Live as just a window into Project Astra,” said Oriol Vinyals, vice president of research at Google DeepMind. “There might be others, but that is the one that we feel, obviously is closest.”

It should be available later this year.

Multimodal generation

Google also introduced Imagen 3, the latest version of its image generation model. Signups for access open today, and it’ll be coming soon to developers and enterprise customers in Vertex AI.

Google also announced a generative video model called Veo, which creates videos from text, image and video prompts.

“It can capture the details of your instructions in different visual and cinematic styles,” Hassabis said. “And you can prompt for things like aerial shots of a landscape or time lapse, and further edit your generated videos using additional prompts.”

These features will be available via waitlist to select creators “over the coming weeks.”

In partnership with YouTube, Google has been building a music AI sandbox, which includes AI tools for music generation.

Editors’ note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. The note you’re reading is attached to articles that deal substantively with the topic of AI but are created entirely by our expert editors and writers. For more, see our AI policy.

Lisa Lacy

Leave a Reply