ChatGPT | API Access & Developer Tools

Developer Platform at a Glance

The ChatGPT API provides RESTful access to GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, and GPT-4o models via the Chat Completions endpoint. Official SDKs ship for Python (openai package) and Node.js (@openai/openai). Key capabilities include function calling (structured JSON tool invocation), embeddings (text-embedding-3-small and text-embedding-3-large), fine-tuning (custom model training on your data), the Assistants API (persistent threads with file handling and code execution), and Vision API endpoints (image analysis via URL or base64). Pricing follows a pay-as-you-go model billed per token. All API traffic is encrypted with TLS 1.3, and the platform holds SOC 2 Type II certification with GDPR, CCPA, and ISO 27001 compliance.

Core ChatGPT API Endpoints and Capabilities

Every API feature explained with the technical detail developers need to start building.

Chat Completions API — The Foundation Endpoint

Every conversation with ChatGPT — whether through the web interface, mobile app, or API — hits the Chat Completions endpoint. Send an array of messages with roles (system, user, assistant) and receive a completion response. The system message sets behavior and constraints. User messages contain the prompt. Assistant messages provide conversation history for multi-turn exchanges.

Key parameters control output: temperature (0.0-2.0) adjusts randomness, max_tokens caps response length, top_p implements nucleus sampling, frequency_penalty and presence_penalty reduce repetition, and seed enables reproducible outputs. Streaming mode delivers tokens incrementally via server-sent events, eliminating the wait for complete responses in user-facing applications.

The endpoint accepts JSON request bodies and returns JSON responses. Response objects include usage statistics (prompt tokens, completion tokens, total tokens) for accurate cost tracking. Error responses follow standard HTTP status codes with descriptive error messages. The National Institute of Standards and Technology (NIST) has published API security guidelines that align with the authentication and encryption patterns used by the ChatGPT API.

Function Calling — Let GPT Interact With Your Systems

Function calling transforms GPT from a text generator into a decision engine. Define functions with JSON Schema descriptions in your API request. When user input matches a function's purpose, GPT returns structured JSON arguments instead of plain text. Your application calls the actual function, then feeds the result back to GPT for a final natural-language response.

A practical example: define a get_weather function with parameters for location and unit. When a user asks "What is the weather in Tokyo?", GPT returns {"location": "Tokyo, Japan", "unit": "celsius"}. Your backend calls a weather API with those arguments, returns the data, and GPT formats a conversational response: "It is currently 22 degrees Celsius in Tokyo with partly cloudy skies."

Parallel function calling lets GPT invoke multiple functions simultaneously when the user's request involves independent data sources. "Compare weather in Tokyo and London" triggers two parallel get_weather calls, reducing total latency. This pattern powers complex agent architectures where GPT orchestrates database queries, API calls, and calculations across multiple systems in a single turn.

Embeddings API — Semantic Search and Similarity

The Embeddings API converts text into high-dimensional vectors that capture semantic meaning. Two text passages about the same topic produce similar vectors, even if they use completely different words. This enables semantic search (find documents by meaning, not keywords), recommendation systems, clustering, and classification.

Two models are available: text-embedding-3-small (1,536 dimensions, $0.02 per million tokens) for cost-efficient applications, and text-embedding-3-large (3,072 dimensions, $0.13 per million tokens) for maximum accuracy. Both support dimensional reduction through the dimensions parameter, letting you trade accuracy for storage efficiency.

Common architectures pair embeddings with vector databases (Pinecone, Weaviate, pgvector, Qdrant) for retrieval-augmented generation. Embed your knowledge base once, store the vectors, query by semantic similarity at runtime, and feed relevant context into the Chat Completions API. This is the same RAG pattern that powers Custom GPTs knowledge file retrieval.

Fine-Tuning — Train GPT on Your Data

Fine-tuning creates a custom model variant trained on your specific examples. Prepare a JSONL file with message arrays demonstrating desired input-output behavior. Upload the file, start a training job, and receive a fine-tuned model identifier that you reference in subsequent API calls like any standard model.

Fine-tuning excels at consistent formatting (always return data in a specific JSON schema), domain terminology (legal, medical, financial jargon), and style matching (matching your brand's writing voice). It does not add new factual knowledge — for that, use embeddings and RAG. Fine-tuning changes how the model writes, not what it knows.

GPT-3.5 Turbo fine-tuning costs $8 per million training tokens plus a 1.6x multiplier on inference pricing. GPT-4o mini fine-tuning is also available. Training jobs typically complete within 1-3 hours depending on dataset size. You can run multiple fine-tuned models simultaneously and A/B test outputs to identify the best performer. The Department of Energy (DOE) has used similar fine-tuning approaches to adapt language models for scientific literature analysis across national laboratories.

Assistants API — Build Persistent AI Agents

The Assistants API packages conversation management, file handling, code execution, and function calling into a managed service. Create an assistant with instructions and tools, open a thread for each user conversation, add messages, and run the assistant. The API maintains conversation history server-side, eliminating the need to resend entire conversation histories with each request.

File search lets assistants query uploaded documents using vector-based retrieval. Code Interpreter executes Python in a sandboxed environment, processing uploaded files and generating outputs. Function calling connects to your external systems. All three tools can activate within a single assistant run based on the user's request.

Think of the Assistants API as the programmatic equivalent of Custom GPTs. Where Custom GPTs serve end users through the ChatGPT interface, the Assistants API serves developers building GPT-powered features into their own applications. Both share the same underlying capabilities but differ in interface and deployment model. Explore the model options available for your assistant configuration.

SDKs, Authentication, and Getting Started

From zero to first API call in under five minutes.

Authentication and API Keys

Every API request requires an API key passed in the Authorization header as a Bearer token. Generate keys from the API settings dashboard. Keys can be scoped to specific projects and have individual rate limits. Rotate keys regularly, never commit them to version control, and use environment variables in production. Organization-level keys allow billing and usage tracking across teams.

Python SDK

Install with pip install openai. The Python SDK provides typed methods for every endpoint: client.chat.completions.create(), client.embeddings.create(), client.fine_tuning.jobs.create(), and client.beta.assistants.create(). Async support is available through the AsyncOpenAI client. Streaming responses use Python generators. The SDK handles retry logic, timeout configuration, and HTTP connection pooling automatically.

Node.js SDK

Install with npm install openai. The Node.js library provides the same endpoint coverage as the Python SDK with TypeScript type definitions. Streaming uses async iterators. ESM and CommonJS module formats are both supported. The SDK integrates with popular frameworks — Express, Next.js, Fastify — through standard middleware patterns. Response types are fully typed for IDE autocompletion and compile-time error checking.

REST API Direct Access

Any language with HTTP capabilities works. Send POST requests to https://api.openai.com/v1/chat/completions with a JSON body and your API key in the Authorization header. cURL, Go, Ruby, Java, PHP, Rust — if it can make HTTPS requests, it can call the ChatGPT API. Community-maintained SDKs exist for most popular languages beyond the official Python and Node.js libraries.

ChatGPT API Pricing by Model

Pay-as-you-go pricing based on token consumption. No subscription or minimum commitment required.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Best For
GPT-3.5 Turbo	$0.50	$1.50	16K tokens	High-volume, simple tasks
GPT-4	$30.00	$60.00	8K / 32K tokens	Complex reasoning, legacy
GPT-4 Turbo	$10.00	$30.00	128K tokens	Long context, JSON mode
GPT-4o	$5.00	$15.00	128K tokens	Best quality/cost ratio
GPT-4o mini	$0.15	$0.60	128K tokens	Budget-friendly, fast
text-embedding-3-small	$0.02	N/A	8K tokens	Cost-efficient embeddings
text-embedding-3-large	$0.13	N/A	8K tokens	High-accuracy embeddings
DALL-E 3 (Standard)	$0.040 per image (1024x1024)		N/A	Image generation
DALL-E 3 (HD)	$0.080 per image (1024x1024)		N/A	High-quality images
Whisper (STT)	$0.006 per minute	N/A	N/A	Audio transcription
TTS	$15.00 per 1M characters	N/A	N/A	Text-to-speech

Rate Limits, Quotas, and Production Best Practices

Scaling from prototype to production requires understanding the guardrails.

Tiered Rate Limits

API rate limits scale automatically with usage. Tier 1 (new accounts) starts at 500 requests per minute (RPM) for GPT-3.5 Turbo and 500 RPM for GPT-4o. As your cumulative spending increases, you advance through five tiers. Tier 5 allows up to 10,000 RPM on GPT-3.5 Turbo, 10,000 RPM on GPT-4o, and 500 RPM on GPT-4. Token-per-minute (TPM) limits also apply. Enterprise agreements offer custom rate limit configurations.

Error Handling and Retries

Implement exponential backoff for 429 (rate limit) and 500-series (server error) responses. The official SDKs include built-in retry logic with configurable maximum attempts and backoff multipliers. Cache responses for idempotent queries to reduce unnecessary API calls. Monitor the x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens response headers to proactively throttle before hitting limits.

Cost Optimization Strategies

Route simple queries to GPT-3.5 Turbo or GPT-4o mini and reserve GPT-4o for complex tasks. Use max_tokens to cap response length. Implement prompt caching for repeated system messages. Batch requests when latency is not critical. Monitor per-endpoint usage through the API dashboard to identify cost hotspots. A well-architected multi-model pipeline can reduce API costs by 70-80% compared to routing everything through GPT-4. Visit our model comparison page for guidance on model selection, or explore built-in ChatGPT plugins for no-code alternatives.

Start Building With the ChatGPT API

New accounts receive $5 in free API credits. No credit card required to start exploring endpoints.

Get Started Free

Frequently Asked Questions About the ChatGPT API

Technical answers for developers building on the ChatGPT platform.

How do I get API access to ChatGPT?

Create an account on the platform, navigate to the API section, and generate an API key. The API uses pay-as-you-go pricing — no subscription required. New accounts receive $5 in free credits. Install the official Python SDK (pip install openai) or Node.js SDK (npm install openai) and make your first API call in under five minutes. The REST API works with any HTTP client for languages without official SDK support. Read our getting started guide for step-by-step setup.

What is the ChatGPT API pricing?

Pricing varies by model and is billed per token (approximately 4 characters per token). GPT-3.5 Turbo: $0.50/$1.50 per million input/output tokens. GPT-4o: $5/$15. GPT-4 Turbo: $10/$30. GPT-4: $30/$60. GPT-4o mini: $0.15/$0.60. Embeddings: $0.02-$0.13 per million tokens. DALL-E 3: $0.04-$0.12 per image. Fine-tuning adds training costs (GPT-3.5 Turbo: $8 per million training tokens). Volume discounts are available for enterprise accounts. View our plans page for ChatGPT subscription pricing.

What is function calling in the ChatGPT API?

Function calling lets you define tool schemas in your API request, and GPT returns structured JSON arguments when it determines a tool should be invoked. Your application executes the function (database query, API call, calculation) and sends the result back to GPT for a natural-language response. Parallel function calling allows multiple simultaneous tool invocations. This pattern enables GPT to act as an orchestrator for external systems. Build custom tool-using agents with the Custom GPTs Builder or the Assistants API.

What are ChatGPT API rate limits?

Rate limits scale with your cumulative spending across five tiers. Tier 1 (new accounts): ~500 RPM for GPT-3.5 Turbo, ~500 RPM for GPT-4o. Tier 5 (high-volume accounts): up to 10,000 RPM for GPT-3.5 Turbo, 10,000 RPM for GPT-4o, 500 RPM for GPT-4. Token-per-minute limits also apply. Tier advancement is automatic based on spending. Enterprise agreements offer custom configurations. Monitor rate limit headers in API responses to stay within bounds.

What is the Assistants API?

The Assistants API is a managed service for building AI assistants with persistent conversation threads, file handling, code execution, and function calling. Unlike the Chat Completions API where you manage conversation state client-side, the Assistants API stores threads server-side and supports file uploads for retrieval-augmented generation. It is the programmatic equivalent of Custom GPTs for developers who want to embed AI assistant capabilities within their own applications. Learn about the Vision API for adding image analysis to your assistants.

Explore More ChatGPT Features

The API connects to every ChatGPT capability. Explore what you can build.

⚙

Features

Plans

Resources

Company

ChatGPT API Access & Developer Tools — Build AI Into Everything