Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.
apm install @diskd-ai/gemini-api---
name: gemini-api
description: Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.
---
# Gemini API
Generate text from text, images, video, and audio using Google's Gemini API.
## Models
| Model | Code | I/O | Context | Thinking |
|-------|------|-----|---------|----------|
| **Gemini 3 Pro** | `gemini-3-pro-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 3 Flash** | `gemini-3-flash-preview` | Text/Image/Video/Audio/PDF -> Text | 1M/64K | Yes |
| **Gemini 2.5 Pro** | `gemini-2.5-pro` | Text/Image/Video/Audio/PDF -> Text | 1M/65K | Yes |
| **Gemini 2.5 Flash** | `gemini-2.5-flash` | Text/Image/Video/Audio -> Text | 1M/65K | Yes |
| **Nano Banana** | `gemini-2.5-flash-image` | Text/Image -> Image | - | No |
| **Nano Banana Pro** | `gemini-3-pro-image-preview` | Text/Image -> Image (up to 4K) | 65K/32K | Yes |
| **Veo 3.1** | `veo-3.1-generate-preview` | Text/Image/Video -> Video+Audio | - | - |
| **Veo 3** | `veo-3-generate-preview` | Text/Image -> Video+Audio | - | - |
| **Veo 2** | `veo-2.0-generate-001` | Text/Image -> Video (silent) | - | - |
| **Lyria RealTime** | `lyria-realtime-exp` | Text -> Music (streaming) | - | - |
| **Embeddings** | `gemini-embedding-001` | Text -> Embeddings | 2K | No |
**Free Tier**: Flash models only (no free tier for `gemini-3-pro-preview` in API). **Default Temperature**: 1.0 (do not change for Gemini 3).
**Pricing (per 1M tokens)**:
- Gemini 3 Pro: $2/$12 (<200k), $4/$18 (>200k)
- Gemini 3 Flash: $0.50/$3
- Nano Banana Pro: $2 (text) / $0.134 (image)
---
## Basic Text Generation
### Python
```python
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="How does AI work?"
)
print(response.text)
```
### JavaScript
```javascript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "How does AI work?",
});
console.log(response.text);
```
### REST
```bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"contents": [{"parts": [{"text": "How does AI work?"}]}]}'
```
---
## System Instructions
```python
response = client.models.generate_content(
model="gemini-3-flash-preview",
config=types.GenerateContentConfig(
system_instruction="You are a helpful assistant."
),
contents="Hello"
)
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Hello",
config: { systemInstruction: "You are a helpful assistant." },
});
```
---
## Streaming
```python
for chunk in client.models.generate_content_stream(
model="gemini-3-flash-preview",
contents="Tell me a story"
):
print(chunk.text, end="")
```
```javascript
const response = await ai.models.generateContentStream({
model: "gemini-3-flash-preview",
contents: "Tell me a story",
});
for await (const chunk of response) {
console.log(chunk.text);
}
```
---
## Multi-turn Chat
```python
chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("I have 2 dogs.")
print(response.text)
response = chat.send_message("How many paws total?")
print(response.text)
```
```javascript
const chat = ai.chats.create({ model: "gemini-3-flash-preview" });
const response = await chat.sendMessage({ message: "I have 2 dogs." });
console.log(response.text);
```
---
## Multimodal (Image)
```python
from PIL import Image
image = Image.open("/path/to/image.png")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[image, "Describe this image"]
)
```
```javascript
const image = await ai.files.upload({ file: "/path/to/image.png" });
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: [
createUserContent([
"Describe this image",
createPartFromUri(image.uri, image.mimeType),
]),
],
});
```
---
## Document Processing (PDF)
Process PDFs with native vision understanding (up to 1000 pages).
```python
from google.genai import types
import pathlib
filepath = pathlib.Path('document.pdf')
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
"Summarize this document"
]
)
```
```javascript
import * as fs from 'fs';
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: [
{ text: "Summarize this document" },
{
inlineData: {
mimeType: 'application/pdf',
data: Buffer.from(fs.readFileSync("document.pdf")).toString("base64")
}
}
]
});
```
**For large PDFs**, use Files API (stored 48 hours):
```python
uploaded_file = client.files.upload(file=pathlib.Path('large.pdf'))
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[uploaded_file, "Summarize this document"]
)
```
See [references/documents.md](references/documents.md) for Files API, multiple PDFs, and best practices.
---
## Image Generation (Nano Banana)
Generate and edit images conversationally.
```python
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents="Create a picture of a sunset over mountains",
)
for part in response.parts:
if part.inline_data is not None:
part.as_image().save("generated.png")
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image",
contents: "Create a picture of a sunset over mountains",
});
for (const part of response.candidates[0].content.parts) {
if (part.inlineData) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("generated.png", buffer);
}
}
```
**Nano Banana Pro** (`gemini-3-pro-image-preview`): 4K output, Google Search grounding, up to 14 reference images, conversational editing with thought signatures.
See [references/image-generation.md](references/image-generation.md) for editing, multi-turn, and advanced features.
See [references/gemini-3.md](references/gemini-3.md#nano-banana-pro-image-generation) for Gemini 3 image capabilities.
---
## Video Generation (Veo)
Generate 8-second 720p, 1080p, or 4K videos with native audio using Veo.
```python
import time
from google import genai
client = genai.Client()
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt="A cinematic shot of a majestic lion in the savannah at golden hour",
)
# Poll until complete (video generation is async)
while not operation.done:
time.sleep(10)
operation = client.operations.get(operation)
# Download the video
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("lion.mp4")
```
```javascript
let operation = await ai.models.generateVideos({
model: "veo-3.1-generate-preview",
prompt: "A cinematic shot of a majestic lion in the savannah at golden hour",
});
while (!operation.done) {
await new Promise(resolve => setTimeout(resolve, 10000));
operation = await ai.operations.getVideosOperation({ operation });
}
ai.files.download({
file: operation.response.generatedVideos[0].video,
downloadPath: "lion.mp4",
});
```
**Veo 3.1 features**: Portrait (9:16), video extension (up to 148s), 4K resolution, native audio with dialogue/SFX.
See [references/veo.md](references/veo.md) for image-to-video, reference images, video extension, and prompting guide.
---
## Music Generation (Lyria RealTime)
Generate continuous instrumental music in real-time with dynamic steering.
```python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def main():
async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
# Set prompts and config
await session.set_weighted_prompts(
prompts=[types.WeightedPrompt(text='minimal techno', weight=1.0)]
)
await session.set_music_generation_config(
config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
)
# Start streaming
await session.play()
# Receive audio chunks
async for message in session.receive():
if message.server_content and message.server_content.audio_chunks:
audio_data = message.server_content.audio_chunks[0].data
# Process audio...
asyncio.run(main())
```
```javascript
const session = await ai.live.music.connect({
model: "models/lyria-realtime-exp",
callbacks: {
onmessage: (message) => {
if (message.serverContent?.audioChunks) {
for (const chunk of message.serverContent.audioChunks) {
const audioBuffer = Buffer.from(chunk.data, "base64");
// Process audio...
}
}
},
},
});
await session.setWeightedPrompts({
weightedPrompts: [{ text: "minimal techno", weight: 1.0 }],
});
await session.setMusicGenerationConfig({
musicGenerationConfig: { bpm: 90, temperature: 1.0 },
});
await session.play();
```
**Output**: 48kHz stereo 16-bit PCM. **Instrumental only**. Configurable BPM, scale, density, brightness.
See [references/lyria.md](references/lyria.md) for steering music, configuration, and prompting guide.
---
## Embeddings
Generate text embeddings for semantic similarity, search, and classification.
```python
result = client.models.embed_content(
model="gemini-embedding-001",
contents="What is the meaning of life?"
)
print(result.embeddings)
```
```javascript
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: 'What is the meaning of life?',
});
console.log(response.embeddings);
```
**Task types**: `SEMANTIC_SIMILARITY`, `CLASSIFICATION`, `CLUSTERING`, `RETRIEVAL_DOCUMENT`, `RETRIEVAL_QUERY`
**Output dimensions**: 768, 1536, 3072 (default)
See [references/embeddings.md](references/embeddings.md) for batch processing, task types, and normalization.
---
## Thinking (Gemini 3)
Control reasoning depth with `thinking_level`: `minimal` (Flash only), `low`, `medium` (Flash only), `high` (default).
```python
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Solve this math problem...",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="high")
),
)
```
```javascript
import { ThinkingLevel } from "@google/genai";
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Solve this math problem...",
config: { thinkingConfig: { thinkingLevel: ThinkingLevel.HIGH } },
});
```
**Note**: Cannot mix `thinking_level` with legacy `thinking_budget` (returns 400 error).
For Gemini 2.5, use `thinking_budget` (0-32768) instead. See [references/thinking.md](references/thinking.md).
For complete Gemini 3 features (thought signatures, media resolution, etc.), see [references/gemini-3.md](references/gemini-3.md).
---
## Structured Outputs
Generate JSON responses adhering to a schema.
```python
from pydantic import BaseModel
from typing import List
class Recipe(BaseModel):
name: str
ingredients: List[str]
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Extract: chocolate chip cookies need flour, sugar, chips",
config={
"response_mime_type": "application/json",
"response_json_schema": Recipe.model_json_schema(),
},
)
recipe = Recipe.model_validate_json(response.text)
```
```javascript
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const recipeSchema = z.object({
name: z.string(),
ingredients: z.array(z.string()),
});
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Extract: chocolate chip cookies need flour, sugar, chips",
config: {
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(recipeSchema),
},
});
```
See [references/structured-outputs.md](references/structured-outputs.md) for advanced patterns.
---
## Built-in Tools (Gemini 3)
**Available**: Google Search, File Search, Code Execution, URL Context, Function Calling
**Not supported**: Google Maps grounding, Computer Use (use Gemini 2.5 for these)
```python
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="What's the latest news on AI?",
config={"tools": [{"google_search": {}}]},
)
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "What's the latest news on AI?",
config: { tools: [{ googleSearch: {} }] },
});
```
**Structured outputs + tools**: Gemini 3 supports combining JSON schemas with built-in tools (Google Search, URL Context, Code Execution). See [references/gemini-3.md](references/gemini-3.md#structured-outputs-with-built-in-tools).
See [references/tools.md](references/tools.md) for all tool patterns.
---
## Function Calling
Connect models to external tools and APIs. The model determines when to call functions and provides parameters.
```python
from google.genai import types
# Define function
get_weather = {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
}
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What's the weather in Tokyo?",
config=types.GenerateContentConfig(
tools=[types.Tool(function_declarations=[get_weather])]
),
)
# Check for function call
if response.function_calls:
fc = response.function_calls[0]
print(f"Call {fc.name} with {fc.args}")
```
```javascript
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "What's the weather in Tokyo?",
config: {
tools: [{ functionDeclarations: [getWeather] }],
},
});
if (response.functionCalls) {
const { name, args } = response.functionCalls[0];
// Execute function and send result back
}
```
**Automatic function calling (Python)**: Pass functions directly as tools for automatic execution.
See [references/function-calling.md](references/function-calling.md) for execution modes, compositional calling, multimodal responses, MCP integration, and best practices.
---
## Quick Reference
| Feature | Python | JavaScript |
|---------|--------|------------|
| Generate | `generate_content()` | `generateContent()` |
| Stream | `generate_content_stream()` | `generateContentStream()` |
| Chat | `chats.create()` | `chats.create()` |
| Structured | `response_json_schema=` | `responseJsonSchema:` |
| Image Gen | `gemini-2.5-flash-image` | `gemini-2.5-flash-image` |
| Video Gen | `generate_videos()` | `generateVideos()` |
| Music Gen | `live.music.connect()` | `live.music.connect()` |
| Function Call | `function_declarations` | `functionDeclarations` |
| Embeddings | `embed_content()` | `embedContent()` |
| Files API | `files.upload()` | `files.upload()` |
---
## Gemini 3 Specific Features
For advanced Gemini 3 features, see [references/gemini-3.md](references/gemini-3.md):
- **Thinking levels**: Control reasoning depth (`minimal`, `low`, `medium`, `high`)
- **Media resolution**: Fine-grained multimodal processing (`media_resolution_low` to `ultra_high`)
- **Thought signatures**: Required for function calling and image editing context
- **Structured outputs + tools**: Combine JSON schemas with Google Search, URL Context
- **Multimodal function responses**: Return images in tool responses
---
## Resources
- [Gemini 3 Guide](https://ai.google.dev/gemini-api/docs/gemini-3)
- [Models Overview](https://ai.google.dev/gemini-api/docs/models)
- [Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)
- [Thought Signatures](https://ai.google.dev/gemini-api/docs/thought-signatures)
- [Structured Outputs](https://ai.google.dev/gemini-api/docs/structured-output)
- [Image Generation](https://ai.google.dev/gemini-api/docs/image-generation)
- [Video Generation (Veo)](https://ai.google.dev/gemini-api/docs/video)
- [Music Generation (Lyria)](https://ai.google.dev/gemini-api/docs/music-generation)
- [Function Calling](https://ai.google.dev/gemini-api/docs/function-calling)
- [Document Processing](https://ai.google.dev/gemini-api/docs/document-processing)
- [Embeddings](https://ai.google.dev/gemini-api/docs/embeddings)
- [Google AI Studio](https://aistudio.google.com)
- [Gemini 3 Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-preview)
- [Gemini 3 Flash in AI Studio](https://aistudio.google.com?model=gemini-3-flash-preview)
- [Nano Banana Pro in AI Studio](https://aistudio.google.com?model=gemini-3-pro-image-preview)
- [Veo Studio](https://aistudio.google.com/apps/bundled/veo_studio)