API - GitHub Models

GitHub Models API - Free AI Model API Service

šŸ“‹ Service Overview

Service Name: GitHub Models API
Provider: GitHub Models
API Endpoint: https://models.github.ai/inference
Service Type: Free (with rate limits)
Requirements: GitHub account and Personal Access Token


āœ… Service Description

GitHub Models API is a developer API interface provided by GitHub, allowing developers to programmatically call various mainstream AI models. The API is fully compatible with OpenAI specifications and can be called directly using the OpenAI SDK.

Main Features

  • šŸ”Œ OpenAI Compatible - Fully compatible with OpenAI API specifications, easy integration
  • šŸ†“ Completely Free - All models offer free access with rate limits
  • šŸ¤– Multi-Model Support - Supports 10+ mainstream AI models
  • šŸ”’ Secure & Reliable - Authentication based on GitHub Personal Access Token
  • šŸš€ Easy to Use - Use OpenAI SDK or any HTTP client
  • šŸ“š Comprehensive Docs - Detailed official documentation and examples

šŸŽ Available Models

Free Model List

Model NameProviderContext LengthFeaturesUse Cases
gpt-4oOpenAI128KStrongest overallComplex reasoning, creative writing
gpt-4o-miniOpenAI128KFast & lightweightDaily chat, high-frequency calls
Llama-3.1-405BMeta128KUltra-large open-sourceComplex tasks
Llama-3.1-70BMeta128KBalanced performanceGeneral tasks
Llama-3.1-8BMeta128KFast responseLightweight apps
Phi-3.5-miniMicrosoft128KSmall but powerfulEfficiency-focused
Phi-3-mediumMicrosoft128KBalancedMedium complexity
DeepSeek-R1DeepSeek64KStrong reasoningLogic, Chinese tasks
Mistral-LargeMistral128KEuropean leaderMultilingual
Mistral-NemoMistral128KLightweight & fastReal-time apps
Command-R+Cohere128KRAG expertKnowledge retrieval

Model Details

GPT-4o (Recommended)

  • Context Window: 128K tokens
  • Primary Use: Complex reasoning, creative writing, professional tasks
  • Advantages: World-leading AI capability, multimodal support
  • Rate Limit: 10 RPM, 50 RPD

Llama-3.1-405B

  • Context Window: 128K tokens
  • Primary Use: Complex reasoning, professional applications
  • Advantages: Most powerful open-source model
  • Rate Limit: 10 RPM, 50 RPD

DeepSeek-R1

  • Context Window: 64K tokens
  • Primary Use: Logic reasoning, Chinese tasks
  • Advantages: Chinese optimized, strong reasoning
  • Rate Limit: 15 RPM, 150 RPD

šŸ”¢ Quotas and Limits

Free Tier Limits

Different models have different rate limits. Here are typical examples:

High-tier Models (GPT-4o, Llama-3.1-405B):

Limit ItemQuotaNotes
Requests Per Minute10RPM (Requests Per Minute)
Requests Per Day50RPD (Requests Per Day)
Max Input Tokens8,000Single request input limit
Max Output Tokens4,000Single request output limit
Max Concurrent Requests2Simultaneous requests
Credit CardāŒCompletely free

Low-tier Models (Phi-3, Llama-3.1-8B, DeepSeek-R1):

Limit ItemQuotaNotes
Requests Per Minute15RPM
Requests Per Day150RPD
Max Input Tokens8,000Single request input limit
Max Output Tokens4,000Single request output limit
Max Concurrent Requests5Simultaneous requests
Credit CardāŒCompletely free

āš ļø Important Limits

  1. Rate Limits: Exceeding limits returns 429 error, recommend implementing exponential backoff retry.
  2. Daily Reset: Daily quota resets at UTC 0:00.
  3. Concurrent Limit: Requests exceeding max concurrency will be rejected.
  4. Token Limits: Both input and output tokens have per-request limits.
  5. Quota Examples: Above quotas are reference examples, actual limits may vary by model and account, check model details page for real-time info.

Quota Reset Time

  • Daily Quota: Resets at UTC 0:00
  • Per-Minute Quota: Rolling window, continuous reset
  • Concurrent Limit: Real-time calculation, released immediately after request completes

šŸ’° Pricing

Free Quota

  • Free Usage: All models completely free
  • No Credit Card: No card required
  • Rate Limits: Each model has independent rate limits
  • How to Get: Register GitHub account and create PAT

Paid Options

Currently GitHub Models API is completely free with no paid options. For higher quotas, consider:

  • Using multiple models to distribute load
  • Optimizing request frequency
  • Or using official APIs from model providers

šŸš€ How to Use

Prerequisites

1. Register GitHub Account

If you don’t have a GitHub account, please register first.

2. Create Personal Access Token

Visit GitHub Settings
  1. Login to GitHub
  2. Click avatar in top right > Settings
  3. Select Developer settings in left menu
Create Token
  1. Click Personal access tokens > Tokens (classic)
  2. Click “Generate new token” > “Generate new token (classic)”
  3. Set token name (e.g., GitHub Models API)
  4. Set expiration (recommend choosing appropriate period)
Select Permission Scope
  1. Important: Check models scope
  2. This is required permission for GitHub Models API
  3. No need to check other permissions
Generate and Save Token
  1. Click “Generate token” at bottom of page
  2. Copy and save immediately (shown only once!)
  3. Save token to secure location (recommend using password manager)

šŸ’» Code Examples

Python Example

Install Dependencies:

Bash
pip install openai

Basic Usage:

Python
from openai import OpenAI

# Initialize client
client = OpenAI(
    base_url="https://models.github.ai/inference",
    api_key="YOUR_GITHUB_PAT"  # Replace with your GitHub Personal Access Token
)

# Send request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, please introduce GitHub Models."}
    ],
    max_tokens=1000,
    temperature=0.7
)

# Print response
print(response.choices[0].message.content)

# View token usage
print(f"\nTotal Tokens: {response.usage.total_tokens}")
print(f"Input Tokens: {response.usage.prompt_tokens}")
print(f"Output Tokens: {response.usage.completion_tokens}")

Streaming Example:

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://models.github.ai/inference",
    api_key="YOUR_GITHUB_PAT"
)

# Streaming output (suitable for real-time display)
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a poem about programming"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # New line

cURL Example

Basic Request:

Bash
curl -X POST "https://models.github.ai/inference/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_GITHUB_PAT" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello, please introduce GitHub Models."
      }
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'

Streaming:

Bash
curl -X POST "https://models.github.ai/inference/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_GITHUB_PAT" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

Node.js Example

Install Dependencies:

Bash
npm install openai

Basic Usage:

JavaScript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://models.github.ai/inference',
  apiKey: process.env.GITHUB_PAT,  // Use environment variable
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Hello, please introduce GitHub Models.' }
    ],
    max_tokens: 1000,
    temperature: 0.7,
  });

  console.log(completion.choices[0].message.content);
  console.log(`\nTotal Tokens: ${completion.usage.total_tokens}`);
}

main();

🌟 Core Advantages

Technical Advantages

  1. OpenAI Compatibility:

    • Fully compatible with OpenAI API specs
    • Can use OpenAI SDK directly
    • Easy migration from other platforms
    • Lower learning curve
  2. Multi-Model Access:

    • One API accesses models from multiple providers
    • Easy to compare and select best model
    • Different models have different strengths
    • Flexible switching for different needs
  3. GitHub Ecosystem Integration:

    • Deep integration with GitHub
    • Can be used in GitHub Actions
    • Convenient for development workflow
    • Unified authentication

Comparison with Other APIs

FeatureGitHub ModelsGoogle AI StudioGroq
Free QuotaVaries by modelCompletely free~14,400/day
Model CountšŸ† 10+ models5+ models5+ models
OpenAI Compatibleāœ… Fully compatibleāŒ Needs adaptationāœ… Fully compatible
Context LengthUp to 128KUp to 2MUp to 128K
GitHub IntegrationšŸ† Deep integrationāŒ NoneāŒ None
Credit CardāŒāŒāš ļø Some require

šŸ’” Practical Recommendations

āœ… Recommended Practices

  1. Secure Token Management:

    import os
    from dotenv import load_dotenv
    
    # Use environment variables
    load_dotenv()
    api_key = os.getenv('GITHUB_PAT')
    
    # Don't hardcode tokens
    # āŒ api_key = "github_pat_xxxx"  # Don't do this!
  2. Choose the Right Model:

    • Complex tasks: GPT-4o or Llama-3.1-405B
    • Daily tasks: GPT-4o-mini or Llama-3.1-8B
    • Chinese tasks: DeepSeek-R1
    • Knowledge retrieval: Cohere Command-R+
  3. Implement Error Handling:

    from openai import OpenAI, RateLimitError, APIError
    import time
    
    client = OpenAI(
        base_url="https://models.github.ai/inference",
        api_key=os.getenv('GITHUB_PAT')
    )
    
    def call_api_with_retry(messages, model="gpt-4o", max_retries=3):
        """API call with retry mechanism"""
        for attempt in range(max_retries):
            try:
                response = client.chat.completions.create(
                    model=model,
                    messages=messages
                )
                return response
            except RateLimitError:
                if attempt < max_retries - 1:
                    wait_time = 2 ** attempt  # Exponential backoff
                    print(f"Rate limit reached, waiting {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    print("Max retries reached")
                    raise
            except APIError as e:
                print(f"API error: {e}")
                if attempt == max_retries - 1:
                    raise
        return None

šŸŽÆ Best Practices

Maximize Free Quota:

  • Choose appropriate model based on task complexity
  • Use caching to avoid duplicate requests
  • Batch similar tasks
  • Optimize prompts to reduce token usage

Optimize Token Usage:

# āœ… Concise prompts
messages = [
    {"role": "user", "content": "Summarize key points: [text]"}
]

# āŒ Avoid redundancy
messages = [
    {"role": "system", "content": "You are an excellent assistant..."},
    {"role": "user", "content": "Please help me summarize..."}
]

Monitor Usage:

def log_usage(response):
    """Log token usage"""
    usage = response.usage
    print(f"Input: {usage.prompt_tokens} tokens")
    print(f"Output: {usage.completion_tokens} tokens")
    print(f"Total: {usage.total_tokens} tokens")
    
    # Can save to file or database
    with open('usage_log.txt', 'a') as f:
        f.write(f"{usage.total_tokens}\n")

āš ļø Notes

  1. Rate Limits: Different models have different limits, choose and allocate carefully.
  2. Token Security: Don’t commit PAT to public repos, use environment variables.
  3. Error Handling: Implement comprehensive error handling and retry mechanisms.
  4. Cost Control: Although free, use reasonably to avoid wasting quota.
  5. Data Privacy: Do not include sensitive information (passwords, keys, personal data, etc.) in API requests.

šŸŽÆ Real-World Use Cases

Case 1: Smart Code Review

Scenario: Use AI to automatically review code and provide improvement suggestions.

Python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://models.github.ai/inference",
    api_key=os.getenv('GITHUB_PAT')
)

def review_code(code):
    """AI code review"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a professional code review expert."},
            {"role": "user", "content": f"Review this code:\n\n{code}"}
        ]
    )
    return response.choices[0].message.content

# Usage example
code = """
def calc(a, b):
    return a + b
"""

review = review_code(code)
print(review)

Case 2: Auto Documentation

Scenario: Automatically generate docstrings for functions.

Python
def generate_docstring(function_code):
    """Generate docstring for function"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Generate Python docstrings in Google style."},
            {"role": "user", "content": f"Generate docstring:\n\n{function_code}"}
        ]
    )
    return response.choices[0].message.content

# Usage example
function_code = """
def process_data(data, threshold=0.5):
    filtered = [x for x in data if x > threshold]
    return sum(filtered) / len(filtered) if filtered else 0
"""

docstring = generate_docstring(function_code)
print(docstring)

Case 3: Model Comparison

Scenario: Compare output quality of different models.

Python
def compare_models(prompt, models=["gpt-4o", "llama-3.1-70b", "deepseek-r1"]):
    """Compare outputs from multiple models"""
    results = {}
    
    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}]
            )
            results[model] = {
                "response": response.choices[0].message.content,
                "tokens": response.usage.total_tokens
            }
        except Exception as e:
            results[model] = {"error": str(e)}
    
    return results

# Usage example
prompt = "Explain recursion with a Python example."
comparison = compare_models(prompt)

for model, result in comparison.items():
    print(f"\n{'='*50}")
    print(f"Model: {model}")
    print(f"{'='*50}")
    if "error" in result:
        print(f"Error: {result['error']}")
    else:
        print(result['response'])
        print(f"\nTokens used: {result['tokens']}")

šŸ”§ FAQ

Q: How do I get a GitHub Personal Access Token?
A: Visit GitHub Settings > Developer settings > Personal access tokens, create new token and check models permission. See registration steps.

Q: What’s the API endpoint?
A: Base URL is https://models.github.ai/inference, Chat Completions endpoint is https://models.github.ai/inference/chat/completions.

Q: Which models are available?
A: Supports 10+ models including GPT-4o, Llama 3.1, Phi-3, DeepSeek-R1, Mistral, Cohere. See Available Models section.

Q: What are the rate limits?
A: Different models have different limits. E.g., GPT-4o: 10 RPM, 50 RPD; DeepSeek-R1: 15 RPM, 150 RPD. See Quotas and Limits.

Q: How to handle rate limit errors (429)?
A: Implement exponential backoff retry, or switch to another model. See error handling example.

Q: Fully compatible with OpenAI API?
A: Yes, fully compatible with OpenAI Chat Completions API specs, can use OpenAI SDK directly.

Q: Do I need to pay?
A: Currently completely free with rate limits. No credit card required.

Q: How to protect API Token?
A: Use environment variables, don’t commit to repos, rotate tokens regularly.


šŸ”— Related Resources


šŸ“ Update Log

  • September 2024: GitHub Models API public testing launched
  • October 2024: Added DeepSeek-R1 and other model support
  • November 2024: Optimized API response speed and stability
  • 2025: Continuously adding new models, optimizing developer experience

Service Provider: GitHub Models

Last updated on