Gemini 1.5 Pro now offers a 2 million token context window for devs

Following education and Workspace news this week, Google has a number of Gemini announcements for developers, including a 2 million token context window for 1.5 Pro.

At I/O 2024, Google announced a 2 million token context window for Gemini 1.5 Pro. It can process 2 hours of video, 22 hours of audio, 60,000+ lines of code, and over 1.4 million words. (Gemini Advanced with 1.5 Pro offers half that today.) After a private preview, all developers can now take advantage of it.

Processing just six minutes of video requires over 100,000 tokens and large code bases can exceed 1 million tokens — so whether the use case involves finding bugs across countless lines of code, locating the right information across libraries of research, or analyzing hours of audio or video, Gemini 1.5 Pro’s expanded context window is helping organizations break new ground.

Gemini 1.5 Pro is already being used by a fast food retailer, financial institution, insurer, and even a “sports company” to analyze a player’s swing.

Additionally, Gemini 1.5 Flash is entering general availability. It features a 1 million token context window, low latency, and “competitive pricing.” Ideal use cases include retail chat agents, document processing, and “research agents that can synthesize entire repositories.”

Google explicitly compares it to GPT-3.5 Turbo today:

1 million token context window, which is approximately 60x bigger than the context window provided by GPT-3.5 Turbo
On average, 40% faster than GPT-3.5 Turbo when given input of 10,000 characters3
Up to 4X lower input price than GPT-3.5 Turbo, with context caching enabled for inputs larger than 32,000 characters

Gemma 2, Google’s open model, is now available globally in 9 billion and 27 billion parameter sizes.

Meanwhile, Imagen 3 is launching in preview (for Vertex AI customers with early access). Compared to Imagen 2, it offers:

“over 40% faster generation for rapid prototyping and iteration”
“better prompt understanding and instruction-following”
“photo-realistic generations of groups of people”
“greater control over text rendering within an image”

This is the prompt for the below image: “a photorealistic image of a woman’s hand reaching up to touch a dandelion seed head, a field of dandelions stretching to the horizon, with the phrase “Sometimes letting go is the bravest act” written in delicate cursive above the hand.”