Google AI Studio API - Free Gemini API Service
๐ Service Information
Provider: Google AI Studio
Service Type: API Service
API Endpoint: https://generativelanguage.googleapis.com
Free Tier: Free Forever (with usage limits)
๐ฏ Service Overview
Google AI Studio API provides powerful access to Gemini models, fully compatible with OpenAI API format, allowing developers to easily integrate into their applications.
Key Advantages:
- ๐ Ultra-High Free Quota - Multiple models available for free, quota varies by model
- โก Fast Response - Flash series optimized
- ๐ OpenAI Compatible - Supports OpenAI API format (partial compatibility)
- ๐จ Multimodal API - Text, image, audio, video support
- ๐ฑ Long Context - Up to 2M tokens (Pro series)
- ๐ Web Search - Google Search Grounding
๐ Quick Start
Prerequisites
Required:
- โ API key created
For detailed steps, see: Google AI Studio Registration Guide
Get API Key
- Visit: https://aistudio.google.com
- Click “Get API key” in left menu
- Select or create a Google Cloud project
- Click “Create API key”
- Copy and save the key immediately
5-Minute Quick Example
Python Example:
import google.generativeai as genai
# Configure API key
genai.configure(api_key="YOUR_API_KEY")
# Create model instance (using latest stable model)
model = genai.GenerativeModel('gemini-2.5-flash')
# Send request
response = model.generate_content("Hello, please introduce yourself")
print(response.text)cURL Example:
curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=YOUR_API_KEY \
-H 'Content-Type: application/json' \
-d '{
"contents": [{
"parts":[{"text": "Hello, please introduce yourself"}]
}]
}'๐ค Supported Models
Gemini 3 Series (Latest, Preview)
| Model Name | Model ID | Features | Free Tier |
|---|---|---|---|
| Gemini 3 Flash | gemini-3-flash-preview | Smartest + Fastest | โ Free |
| Gemini 3 Pro | gemini-3-pro-preview | Most Powerful | โ Paid |
Gemini 2.5 Series (Stable, Recommended)
| Model Name | Model ID | Features | Free Tier |
|---|---|---|---|
| Gemini 2.5 Flash | gemini-2.5-flash | Hybrid reasoning model | โ Free |
| Gemini 2.5 Pro | gemini-2.5-pro | Advanced multi-purpose | โ Free |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | Lightweight efficient | โ Free |
Gemini 2.0 Series
| Model Name | Model ID | Features | Free Tier |
|---|---|---|---|
| Gemini 2.0 Flash | gemini-2.0-flash | Balanced multimodal | โ Free |
| Gemini 2.0 Flash-Lite | gemini-2.0-flash-lite | Compact efficient | โ Free |
Gemma Open Source Series
| Model Name | Model ID | Features | Free Tier |
|---|---|---|---|
| Gemma 3 27B | gemma-3-27b-instruct | Large parameters | โ Free |
| Gemma 3 12B | gemma-3-12b-instruct | Medium parameters | โ Free |
| Gemma 3 4B | gemma-3-4b-instruct | Small and fast | โ Free |
๐ข Quotas and Limits
Free Tier Explanation
Google AI Studio’s free tier is very generous, with all major models available for free:
| Feature | Free Tier |
|---|---|
| Input Tokens | Completely free |
| Output Tokens | Completely free |
| Available Models | Gemini 3 Flash, 2.5 Flash/Pro, 2.0 Flash, Gemma 3, etc. |
| Rate Limits | Varies by model, see table below |
| Quota Reset | Daily quota resets at Pacific Time midnight |
Rate Limits (Free Tier)
Different models have different rate limits. Here are the limits for major models:
| Model Series | Requests/Minute (RPM) | Tokens/Minute (TPM) | Requests/Day (RPD) |
|---|---|---|---|
| Gemini 3 Flash | 10 RPM | 4M TPM | 1,500 RPD |
| Gemini 2.5 Flash | 15 RPM | 1M TPM | 1,500 RPD |
| Gemini 2.5 Pro | 2 RPM | 32K TPM | 50 RPD |
| Gemini 2.0 Flash | 15 RPM | 1M TPM | 1,500 RPD |
| Gemma 3 Series | 30 RPM | 15K TPM | 14,400 RPD |
Context Length
| Model | Input Context | Output Length |
|---|---|---|
| Gemini 2.5 Flash | 1M tokens | 8K tokens |
| Gemini 2.5 Pro | 2M tokens | 8K tokens |
| Gemini 2.0 Flash | 1M tokens | 8K tokens |
| Gemma Series | 8K-128K tokens | 8K tokens |
โ ๏ธ Important Notes
- Completely Free: Both input and output are completely free in the free tier
- Rate Limits: Exceeding rate limits will result in 429 errors, implement backoff retry is recommended
- Paid Upgrade: If you need higher quotas, you can request an upgrade in AI Studio
- Data Usage: Free tier data may be used to improve Google products (opt-out available), paid tier will not
- View Quota: Real-time quotas can be viewed in AI Studio Dashboard
๐ API Usage Examples
1. Basic Text Generation
Python SDK:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-2.5-flash')
# Simple conversation
response = model.generate_content("Explain what machine learning is")
print(response.text)
# Request with parameters
response = model.generate_content(
"Write a poem about autumn",
generation_config={
"temperature": 0.9,
"top_p": 0.95,
"max_output_tokens": 1024,
}
)
print(response.text)REST API:
curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=YOUR_API_KEY \
-H 'Content-Type: application/json' \
-d '{
"contents": [{
"parts":[{"text": "Explain what machine learning is"}]
}],
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 1024
}
}'2. Multi-turn Conversation
Python SDK:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-2.5-flash')
# Create chat session
chat = model.start_chat(history=[])
# First round
response = chat.send_message("Hello, I want to learn Python")
print(response.text)
# Second round (model remembers context)
response = chat.send_message("Where should I start?")
print(response.text)
# View chat history
print(chat.history)3. Image Understanding
Python SDK:
import google.generativeai as genai
from PIL import Image
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-2.5-flash')
# Load image
image = Image.open('example.jpg')
# Send image and question
response = model.generate_content([
"What's in this image? Please describe in detail.",
image
])
print(response.text)Load image from URL:
import google.generativeai as genai
from PIL import Image
import requests
from io import BytesIO
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-2.5-flash')
# Load image from URL
response = requests.get('https://example.com/image.jpg')
image = Image.open(BytesIO(response.content))
# Analyze image
response = model.generate_content(["Analyze this image", image])
print(response.text)4. Streaming Output
Python SDK:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-2.5-flash')
# Streaming output
response = model.generate_content(
"Write an article about artificial intelligence",
stream=True
)
for chunk in response:
print(chunk.text, end='')5. Function Calling
Python SDK:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
# Define function
def get_weather(city: str):
"""Get weather for specified city"""
# Should call real weather API here
return f"Weather in {city}: Sunny, 25ยฐC"
# Define function description
tools = [{
"function_declarations": [{
"name": "get_weather",
"description": "Get weather information for specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}
}]
}]
model = genai.GenerativeModel(
'gemini-2.5-flash',
tools=tools
)
# Send request
chat = model.start_chat()
response = chat.send_message("What's the weather in Shanghai today?")
# Check if function call is needed
if response.candidates[0].content.parts[0].function_call:
function_call = response.candidates[0].content.parts[0].function_call
# Call function
if function_call.name == "get_weather":
result = get_weather(function_call.args["city"])
# Return result to model
response = chat.send_message({
"function_response": {
"name": "get_weather",
"response": {"result": result}
}
})
print(response.text)6. Google Search Grounding
Python SDK:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
# Enable Google Search Grounding
model = genai.GenerativeModel('gemini-2.5-flash')
# โ ๏ธ Note: Grounding feature may be separately charged, check latest pricing
response = model.generate_content(
"What are the latest AI technology trends?",
tools=[{"google_search_retrieval": {}}]
)
print(response.text)
# View citation sources
for candidate in response.candidates:
if hasattr(candidate, 'grounding_metadata'):
print("\nCitation Sources:")
for attribution in candidate.grounding_metadata.grounding_attributions:
print(f"- {attribution.source_id.grounding_passage.url}")๐ ๏ธ SDKs and Client Libraries
Official SDKs
Python:
pip install google-generativeaiNode.js:
npm install @google/generative-aiGo:
go get github.com/google/generative-ai-goOpenAI Compatibility
Google AI Studio API provides OpenAI compatibility layer, you can use OpenAI SDK:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GOOGLE_API_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "user", "content": "Hello"}
]
)
print(response.choices[0].message.content)โ ๏ธ Note: Compatibility is not 100% identical, some parameters and features may differ, refer to official compatibility documentation
๐ก Best Practices
โ Recommended Practices
Choose the Right Model
- High-frequency calls: Gemini 3 Flash (fastest)
- Daily tasks: Gemini 2.5 Flash (balanced)
- Complex tasks: Gemini Pro (stronger)
- Open source needs: Gemma series
Optimize Quota Usage
# Use streaming to reduce wait time response = model.generate_content(prompt, stream=True) # Set reasonable max_output_tokens generation_config = {"max_output_tokens": 512} # Batch process requests prompts = ["Question 1", "Question 2", "Question 3"] responses = [model.generate_content(p) for p in prompts]Error Handling and Retries
import time from google.api_core import retry @retry.Retry(predicate=retry.if_exception_type(Exception)) def call_api_with_retry(prompt): return model.generate_content(prompt) try: response = call_api_with_retry("Hello") except Exception as e: print(f"API call failed: {e}")Monitor Quota Usage
# Log API calls import logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) def track_api_call(model, prompt): logger.info(f"Calling model: {model}") response = model.generate_content(prompt) logger.info(f"Tokens used: {response.usage_metadata.total_token_count}") return responseSecurely Manage API Keys
# Use environment variables import os api_key = os.getenv('GOOGLE_AI_API_KEY') # Or use config file import json with open('config.json') as f: config = json.load(f) api_key = config['api_key']
โ ๏ธ Practices to Avoid
- โ Hard-coding API keys in code
- โ Frequently sending the same requests (use caching)
- โ Ignoring rate limits (implement backoff retry)
- โ Not handling errors and exceptions
- โ Using excessively large max_output_tokens
๐ง Common Issues
1. 403 Error: API key not valid
Cause:
- API key is incorrect or expired
- API is not enabled
Solution:
# Check API key
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
# List available models to verify
for model in genai.list_models():
print(model.name)2. 429 Error: Resource exhausted
Cause:
- Exceeded rate limit
- Exceeded daily quota
Solution:
import time
from google.api_core.exceptions import ResourceExhausted
def call_with_backoff(prompt, max_retries=3):
for i in range(max_retries):
try:
return model.generate_content(prompt)
except ResourceExhausted:
if i < max_retries - 1:
wait_time = 2 ** i # Exponential backoff
print(f"Rate limited, waiting {wait_time} seconds...")
time.sleep(wait_time)
else:
raise3. Network Timeout
Cause:
- Unstable network
- Request content too large
Solution:
from google.api_core import timeout
# Set timeout
timeout_config = timeout.ExponentialTimeout(initial=30, maximum=120)
response = model.generate_content(prompt, timeout=timeout_config)4. Access Issues in Mainland China
Solution:
- Use proxy or VPN
- Configure proxy:
import os os.environ['HTTP_PROXY'] = 'http://your-proxy:port' os.environ['HTTPS_PROXY'] = 'https://your-proxy:port'
๐ Performance Optimization
1. Use Caching
from functools import lru_cache
@lru_cache(maxsize=100)
def cached_api_call(prompt):
return model.generate_content(prompt).text2. Batch Processing
import asyncio
from google.generativeai import GenerativeModel
async def batch_generate(prompts):
model = GenerativeModel('gemini-2.5-flash')
tasks = [model.generate_content_async(p) for p in prompts]
return await asyncio.gather(*tasks)
# Usage
prompts = ["Question 1", "Question 2", "Question 3"]
responses = asyncio.run(batch_generate(prompts))3. Streaming Large Text
def process_large_text(text, chunk_size=10000):
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
results = []
for chunk in chunks:
response = model.generate_content(f"Summarize: {chunk}")
results.append(response.text)
return " ".join(results)๐ Related Resources
Official Documentation
Code Examples
Learning Resources
Service Provider: Google AI Studio