Google wrapped up its first day of I/O 2024 with a load of new AI announcements related to Gemini and how it will make the world smarter and more convenient for humans. Gemini is about to get further rooted into Google products and services, such as the Android operating system, Google Workspace apps and much more. Here’s a recap of all the announcements that took place at Google I/O 2024.

Google Unveils Ask Photos (Powered By Gemini)

Now, you can use the Ask Photos feature in the Google Photos App to find the pictures you need to. Gemini’s multimodal capabilities can understand the context and subject of photos to pull out details. For instance, you can ask: “What themes have we had for Lena’s birthday parties?”. Ask Photos will understand details, like what decorations are in the background or on the birthday cake, to give you the answer.

Getting tasks done is also easier in Google Photos with the help of Gemini models. For example, Ask Photos can help you create a trip highlight more easily after you end your long trip with friends or family. All you need to do is ask, and it’ll suggest top pictures — and even write a personalised caption to share on social media. Google further confirms that your personal data in Google Photos is never used for ads.

Ask Photos is an experimental feature that we’re starting to roll out soon, with more capabilities to come. 

Gemini 1.5 Flash, Updates to Gemini 1.5 Pro

Google introduced Gemini 1.5 Pro a while back but realised the need for lower Latency and a lower cost to serve in some applications. As a result, one of the announcements at Google I/O 2024 was the Gemini 1.5 Flash which is lighter in weight but is designed to be as fast and efficient as the 1.5 Pro to serve at scale. While it’s a lighter weight model than 1.5 Pro, it’s highly capable of multimodal reasoning across vast amounts of information and delivers impressive quality for its size.

Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window in Google AI Studio and Vertex AI. And now, 1.5 Pro is also available with a 2 million token context window via waitlist to developers using the API and to Google Cloud customers.

Aside from extending its context window to 2 million tokens, Google has also enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding through data and algorithmic advances.

Gemini Nano Goes Multimodal

Gemini Nano is expanding beyond text-only inputs to include images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand inputs not just through text, but also through vision, sound and spoken language.

Gemma 2 (Built With Same Technology Used For Gemini)

Gemma 2 is Google’s next generation of open models for responsible AI innovation. The Gemma 2 model has a new architecture designed for breakthrough performance and efficiency, and will be available in new sizes. Gemma is a family of open models built from the same research and technology used to create the Gemini models.

Project Astra (A New AI Assistant built on Gemini)

Google says that it is building the future of AI assistants with Project Astra which acts as an advanced seeing and talking responsive agent. While still in its prototype stage, Project Astra can process information faster than current models by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall.

By leveraging it’s own leading speech models, Google also enhanced how they sound, giving the agents a wider range of intonations. These agents can better understand the context they’re being used in, and respond quickly, in conversation.

Gemini Enters Google Workspace

Gemini has now been integrated into Workspace for faster and efficient workflows. Starting today, Gemini in the side panel of Gmail, Docs, Drive, Slides and Sheets will use Gemini 1.5 Pro. With a longer context window and more advanced reasoning, Gemini can answer a wider variety of questions and provide more insightful responses. Plus, it’s easy to get started with summaries that will appear in the side panel, suggested prompts and more. Gemini in the Workspace side panel is now available for Workspace Labs and Gemini for Workspace Alpha users. It will be available next month on desktop for businesses and consumers through Gemini for Workspace add-ons and the Google One AI Premium plan.

Next up, the Gmail app is getting a few updates too, of course, with Gemini driving them. With the “Summarise Emails” button, Gemini can analyze email threads and provide a summarized view directly in the Gmail app. One has to tap the summarize button at the top of their email thread to get the highlights. This will be available to Workspace Labs users this month, and to all Gemini for Workspace customers and Google One AI Premium subscribers next month.

Gemini in Gmail will offer even more detailed and nuanced suggested replies based on context from your email thread, thanks to Contextual Smart Reply. You can now edit or simply send the reply as-is. This will be available to Workspace Labs users on mobile and web starting in July.

Finally, now Gemini in Gmail will offer helpful options, like “summarize this email,” “list the next steps” or “suggest a reply” with the Gmail Q&A feature. And similar to the side panel on desktop, you can use the open prompt box when you have more specific requests. Gmail Q&A will be available to Workspace Labs users on mobile and web starting in July.

Google is also adding language support for more Gemini for Workspace features. In the coming weeks, Help me write in Gmail and Docs will support Spanish and Portuguese on desktop. The company will continue to add more languages through the course of time.

Read More: Android 15 Beta Arrives For OnePlus 12, OnePlus Open

Generative AI in Search (Backed up by Gemini)

Earlier last year, Google unveiled AI Overviews as an experiment, and it’s now getting out of that phase to roll out to the general public, beginning with the US and more countries to join the list soon. Moreover, soon, you’ll also be able to adjust your AI Overview with options to simplify the language or break it down in more detail.

Aside from that, you can now combine multiple of your queries into one question. With the help of the custom Gemini model’s multi-step reasoning capabilities, AI Overviews will be able to answer those increasingly complex questions with a single reply. These multi-step reasoning capabilities are coming soon to AI Overviews in Search Labs for English queries in the U.S.

Planning capabilities are also coming to Search, where you can get help creating plans for whatever you need, starting with meals and vacations. Meal and trip planning are available now in Search Labs in English in the U.S. Later this year, Google will add customisation capabilities and more categories like parties, date night and workouts.

In addition, when you’re looking for ideas, Search will now use generative AI to brainstorm with you and create an AI-organized results page that makes it easy to explore. You’ll see helpful results categorized under unique, AI-generated headlines, featuring a wide range of perspectives and content types. For English searches in the U.S., users will start to see this new AI-organized search results page when they look for inspiration — starting soon with dining and recipes, followed by movies, music, books, hotels, shopping and more.

Google Lens Now Supports Video Queries (Thanks To Gemini)

Until now, Google lens has been limited to photos for understanding a query but now you can ask it queries with a video. Searching with video saves you the time and trouble of finding the right words to describe an issue, and you’ll get an AI Overview with steps and resources to troubleshoot. Searching with video will be available soon for Search Labs users in English in the U.S., and Google will expand to more regions over time.

Veo: The New Video Generation Model

Veo is Google’s latest video generation model that can create videos from text prompts. It generates high-quality 1080p Resolution videos in a wide range of cinematic and visual styles that can go beyond a minute in length. One can give it detailed prompts to get exactly what they need out of it, and it’ll be able to fulfil those requests with a deep understanding of natural language and visual semantics. Veo is available to select creators in private preview in VideoFX by joining a waitlist. In the future, Veo’s capabilities will also be integrated within YouTube Shorts and other products.

Imagen 3

Succeeding Imagen 2, Imagen 3 is Google’s highest-quality text-to-image model. It generates a high level of detail, producing photorealistic, lifelike images with far fewer distracting visual artefacts than the prior models. Imagen 3 better understands natural language the intent behind your prompt and incorporates small details from longer prompts. The model’s advanced understanding helps it master a range of styles. Imagen 3 also supports high-quality text rendering. Starting today, Imagen 3 is available to select creators in a private preview in ImageFX and by joining a waitlist. Imagen 3 will be coming soon to Vertex AI also.

Updates to Gemini on Android

The Circle to Search feature is getting updates to understand more complex issues. Google also made announcements at I/O 2024 about how Gemini is powering new features on Android. Circle to Search can now help students with homework, giving them a deeper understanding, not just an answer — directly from their phones and tablets.

When students circle a prompt they’re stuck on, they’ll get step-by-step instructions to solve a range of physics and math word problems without leaving their digital info sheet or syllabus. This ability is already rolling out to the feature. Later this year, Circle to Search will be able to help solve even more complex problems involving symbolic formulas, diagrams, graphs and more.

Soon, you’ll also be able to bring up Gemini’s overlay on top of the app you’re in to easily use Gemini in more ways. For example, you can drag and drop generated images into Gmail, Google Messages and other places, or tap “Ask this video” to find specific information in a YouTube video. If you have Gemini Advanced, you’ll also have the option to “Ask this PDF” to quickly get answers without having to scroll through multiple pages. This update will roll out to hundreds of millions of devices over the next few months.

Later this year, Gemini Nano’s multimodal capabilities that were mentioned above are also coming to TalkBack, helping people who experience blindness or low vision get richer and clearer descriptions of what’s happening in an image.

Google is further testing a new feature that uses Gemini Nano to provide real-time alerts during a call if it detects conversation patterns commonly associated with scams. It will be available as an opt-in feature later this year.

Even more updates to Gemini

Firstly, Gemini 1.5 Pro is coming to Gemini Advanced. This model supports expanded context window starting at 1 million tokens — the longest of any widely available consumer chatbot in the world. A context window this long means Gemini Advanced can make sense of multiple large documents, up to 1,500-pages total, or summarize 100 emails. Soon it will be able to handle an hour of video content or codebases with more than 30,000 lines.

Google will also allow users to upload files via Google Drive or directly from their device, right into Gemini Advanced. Aside from that, Google I/O 2024 announcements included what’s called Gemini Live. It will be rolling out for Gemini Advanced subscribers as a new mobile conversational experience that uses advanced speech technology to make speaking with Gemini more intuitive. With Gemini Live, you can talk to Gemini and choose from a variety of natural-sounding voices it can respond with. You can even speak at your own pace or interrupt mid-response with clarifying questions, just like you would in any conversation.

Gemini will also be able to plan trip itineraries for you in the coming months. Additionally, for an even more personal experience, Gemini Advanced subscribers will soon be able to create Gems — customized versions of Gemini.

Google also announced a YouTube Music extension for Gemini that’s rolling out now. It will also connect even more Google tools with Gemini, including Google Calendar, Tasks and Keep, soon.

Updates To SynthID

Google debuted SynthID last year, its digital toolkit for watermarking AI-generated content. Now, Google is expanding SynthID’s capabilities to watermarking AI-generated text in the Gemini app and web experience, and video in Veo. SynthID for text is designed to complement most widely-available AI text generation models and for deploying at scale, while SynthID for video builds upon our image and audio watermarking method to include all frames in generated videos. This method embeds an imperceptible watermark without impacting the quality, accuracy, creativity or speed of the text or video generation process.

New models in Vertex AI

Vertex AI is Google Cloud’s fully-managed, unified development platform for leveraging models at scale, with a selection of over 150 first-party, open, and third-party foundation models; for customizing models with enterprise-ready tuning, grounding, monitoring, and deployment capabilities; and for building AI agents.

In addition to the new Gemini 1.5 Flash, Imagen 3, Gemma 2, and Gemini 1.5 Pro models, Vertex AI also added PaliGemma. Available in the Vertex AI Model Garden, PlaiGemma is the first vision-language model in the Gemma family of open models and is well-suited for tasks like image captioning and visual question-answering.


One of the Google I/O 2024 announcements included Trillium, the sixth-generation Tensor Processing Unit from Google. The company says its the most performant and most energy-efficient TPU to date. TPUs are AI-specific hardware that provide the compute power, memory, and communication to train and fine tune the most capable models and to serve them interactively to a global user population.

Trillium TPUs achieve a 4.7X increase in peak compute performance per chip compared to TPU v5e. It has doubled High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled Interchip Interconnect (ICI) bandwidth over TPU v5e. Additionally, Trillium is equipped with third-generation SparseCore, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads.

VideoFX and Updates To ImageFX and MusicFX

Google Labs also introduced new tools to help people create videos, images, and music using AI. The new VideoFX lets you make videos from text descriptions. ImageFX has added editing controls and better-quality images, while MusicFX gets a new DJ mode for mixing beats. These tools are available in more countries and languages, and Google is working with creators to develop them responsibly.

As you’d have noticed by now, Google I/O 2024 was all about entering the Gemini era. The majority of the announcements at I/O 2024 involve Google’s Gemini AI model at its core in some way, which shows how AI is about to transform our daily life experiences.


Di Luigi M