Token Usage Management¶
Complete guide to managing AI chat assistant token budgets, quota enforcement, and usage tracking in the Reserva platform.
Overview¶
The token usage system provides AI-powered customer support with built-in quota management:
- Quota Checking - Pre-flight checks before AI API calls
- Usage Tracking - Record token consumption after successful AI conversations
- Daily & Monthly Limits - Multi-tier quota enforcement
- Usage History - Analytics and reporting for customers
- Admin Quota Management - Override limits for specific customers
- Automated Resets - Cloud Function-triggered daily/monthly quota resets
Key Concepts:
- Token = Unit of measurement for AI model usage (input + output text)
- Daily Quota = Maximum tokens allowed per day per customer
- Monthly Quota = Maximum tokens allowed per month per customer
- Quota Exceeded = Customer blocked from AI chat until quota resets
- Cloud Function Reset = Automated daily/monthly quota reset jobs
Subscription Plan Limits¶
Token quotas are dynamically allocated based on tenant subscription tiers. Higher plans get more AI assistance capacity. Quotas are configurable via environment variables for flexible scaling.
Token Quota by Plan (Default Configuration)¶
| Plan | Daily Quota | Monthly Quota | Multiplier | Cost per Token (USD) |
|---|---|---|---|---|
| FREE | 16,000 tokens | 480,000 tokens | 1x | Free |
| PRO | 64,000 tokens | 1,920,000 tokens | 4x | Free |
| ENTERPRISE | 128,000 tokens | 3,840,000 tokens | 8x | Free |
✨ NEW: Subscription-Based Allocation (Implemented November 17, 2025)
- Quotas are automatically determined based on tenant's active subscription plan
- Lookup chain:
Customer → Tenant → Subscription → Plan → Token Quotas - Environment-configurable via
.envsettings for easy adjustment - Graceful fallback to default quotas (16K daily / 480K monthly) when subscription unavailable
Pricing Note: Gemini 2.5 Flash pricing (~$0.075 per 1M tokens for input, ~$0.30 per 1M tokens for output)
Token Estimation:
- 1,000 tokens ≈ 750 words
- Average chat message: 100-200 tokens
- Typical conversation (10 messages): 1,500-3,000 tokens
- FREE plan allows ~5-10 full conversations per day
- PRO plan allows ~21-42 full conversations per day (4x more than FREE)
- ENTERPRISE plan allows ~42-85 full conversations per day (8x more than FREE)
Environment Configuration¶
Quota limits can be customized via environment variables:
# FREE Plan Token Limits
FREE_PLAN_DAILY_TOKENS=16000
FREE_PLAN_MONTHLY_TOKENS=480000
# PRO Plan Token Limits
PRO_PLAN_DAILY_TOKENS=64000
PRO_PLAN_MONTHLY_TOKENS=1920000
# ENTERPRISE Plan Token Limits
ENTERPRISE_PLAN_DAILY_TOKENS=128000
ENTERPRISE_PLAN_MONTHLY_TOKENS=3840000
# Fallback Defaults (when subscription unavailable)
DEFAULT_DAILY_TOKENS=16000
DEFAULT_MONTHLY_TOKENS=480000
Plan Upgrade Impact¶
When upgrading subscription plans:
- FREE → PRO: Quota immediately increases to 64K daily / 1.92M monthly (4x boost)
- PRO → ENTERPRISE: Quota immediately increases to 128K daily / 3.84M monthly (8x boost)
- Downgrade (PRO → FREE): New limits (16K daily / 480K monthly) apply at next billing cycle
Important: Quota changes take effect immediately upon subscription upgrade payment confirmation. The system automatically detects the new subscription plan and adjusts token allocations in real-time.
Architecture¶
Integration with Gemini AI¶
sequenceDiagram
Customer->>Next.js: Send chat message
Next.js->>Backend: GET /api/v1/token-usage/quota
Backend->>Next.js: {allowed: true/false, remaining}
alt Quota Available
Next.js->>Gemini API: Send message + conversation
Gemini API->>Next.js: Response + usageMetadata
Next.js->>Backend: POST /api/v1/token-usage/track
Backend->>Database: Update usage counters
Backend->>Next.js: {success: true, quota_exceeded}
Next.js->>Customer: Display AI response
else Quota Exceeded
Next.js->>Customer: "Daily limit reached"
end
Quota Reset Schedule¶
graph TB
A[Cloud Scheduler] -->|Daily 00:00 UTC| B[Daily Reset Function]
A -->|1st of Month 00:00 UTC| C[Monthly Reset Function]
B -->|POST /api/v1/token-usage/quota-reset| D[Backend API]
C -->|POST /api/v1/token-usage/quota-reset-monthly| D
D -->|Update Database| E[MongoDB]
D -->|Log Results| F[Cloud Logging]
Check Token Quota¶
Check customer's remaining token allowance before making AI request.
Endpoint¶
Authentication: Required (Customer JWT token)
Portal: 🟦 CUSTOMER
Response¶
{
"allowed": true,
"used_today": 2345,
"used_this_month": 45678,
"daily_quota": 10000,
"monthly_quota": 300000,
"remaining_today": 7655,
"remaining_this_month": 254322,
"percentage_used_today": 23.45,
"percentage_used_this_month": 15.23,
"resets_at": "2025-01-14T00:00:00Z",
"blocked": false,
"blocked_reason": null
}
Fields:
allowed(boolean) - Whether customer can send messagesused_today(integer) - Tokens consumed todayused_this_month(integer) - Tokens consumed this monthdaily_quota(integer) - Daily token limitmonthly_quota(integer) - Monthly token limitremaining_today(integer) - Tokens remaining todayremaining_this_month(integer) - Tokens remaining this monthpercentage_used_today(float) - Daily usage percentage (0-100)percentage_used_this_month(float) - Monthly usage percentage (0-100)resets_at(string) - ISO 8601 timestamp when quota resets (midnight UTC)blocked(boolean) - Whether customer is blockedblocked_reason(string|null) - Reason for blocking if applicable
Response Examples¶
Available Quota:
{
"allowed": true,
"used_today": 1234,
"used_this_month": 45678,
"daily_quota": 10000,
"monthly_quota": 300000,
"remaining_today": 8766,
"remaining_this_month": 254322,
"percentage_used_today": 12.34,
"percentage_used_this_month": 15.23,
"resets_at": "2025-01-14T00:00:00Z",
"blocked": false,
"blocked_reason": null
}
Quota Exceeded:
{
"allowed": false,
"used_today": 10234,
"used_this_month": 310456,
"daily_quota": 10000,
"monthly_quota": 300000,
"remaining_today": 0,
"remaining_this_month": 0,
"percentage_used_today": 102.34,
"percentage_used_this_month": 103.49,
"resets_at": "2025-01-14T00:00:00Z",
"blocked": true,
"blocked_reason": "Daily quota exceeded"
}
Unlimited (Enterprise):
{
"allowed": true,
"used_today": 150000,
"used_this_month": 3500000,
"daily_quota": -1,
"monthly_quota": -1,
"remaining_today": -1,
"remaining_this_month": -1,
"percentage_used_today": 0,
"percentage_used_this_month": 0,
"resets_at": "2025-01-14T00:00:00Z",
"blocked": false,
"blocked_reason": null
}
Note: -1 indicates unlimited quota (Enterprise plan).
Usage Flow¶
Frontend Implementation:
// Before sending message to Gemini
const checkQuota = async () => {
const response = await fetch('/api/v1/token-usage/quota', {
headers: {
'Authorization': `Bearer ${customerToken}`
}
});
const quota = await response.json();
if (!quota.allowed) {
// Show error: "Daily limit reached. Resets at {quota.resets_at}"
return false;
}
// Show warning if > 80% used
if (quota.percentage_used_today > 80) {
console.warn(`AI quota at ${quota.percentage_used_today}%`);
}
return true;
};
Business Rules¶
- Always returns 200 OK - Even if quota exceeded (check
allowedfield) - Auto-initialization - Creates quota record if customer has none
- Real-time calculation - Fetches current subscription plan limits
- Timezone: All timestamps in UTC
- Soft enforcement - Frontend should respect
allowed: false
Track Token Usage¶
Record token consumption after successful AI conversation.
Endpoint¶
Authentication: Required (Customer JWT token)
Portal: 🟦 CUSTOMER
Request Body¶
{
"tokens": 1234,
"prompt_tokens": 456,
"completion_tokens": 778,
"conversation_id": "conv_abc123xyz",
"message_count": 3,
"function_calls": ["get_service_availability", "book_appointment"],
"system_tokens": 150,
"model": "gemini-2.5-flash"
}
Parameters:
tokens(required, integer) - Total tokens used (from GeminiusageMetadata.totalTokens)prompt_tokens(optional, integer) - Input tokens (fromusageMetadata.promptTokens)completion_tokens(optional, integer) - Output tokens (fromusageMetadata.completionTokens)conversation_id(required, string) - Unique conversation identifiermessage_count(required, integer) - Number of messages in conversationfunction_calls(optional, array) - List of function names called (for analytics)system_tokens(optional, integer) - Tokens used by system promptmodel(optional, string) - Gemini model used (default:gemini-2.5-flash)
Response¶
Success (Quota Remaining):
{
"success": true,
"recorded": true,
"new_total_today": 2468,
"new_total_month": 67890,
"quota_exceeded": false,
"warning": null
}
Success (Warning at 80%):
{
"success": true,
"recorded": true,
"new_total_today": 8234,
"new_total_month": 267890,
"quota_exceeded": false,
"warning": "You have used 82% of your daily token quota"
}
Quota Exceeded (429 Response):
Fields:
success(boolean) - Whether tracking succeededrecorded(boolean) - Whether data was saved to databasenew_total_today(integer) - Updated daily token totalnew_total_month(integer) - Updated monthly token totalquota_exceeded(boolean) - Whether quota was exceeded after this requestwarning(string|null) - Warning message if approaching limit (80%+ usage)
Response Codes¶
| Code | Description |
|---|---|
200 |
Usage tracked successfully, quota available |
401 |
Not authenticated |
429 |
Quota exceeded after tracking this request |
500 |
Internal server error |
Usage Flow¶
Frontend Implementation:
// After receiving Gemini response
const trackUsage = async (geminiResponse) => {
const { usageMetadata, conversationId } = geminiResponse;
try {
const response = await fetch('/api/v1/token-usage/track', {
method: 'POST',
headers: {
'Authorization': `Bearer ${customerToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
tokens: usageMetadata.totalTokens,
prompt_tokens: usageMetadata.promptTokens,
completion_tokens: usageMetadata.completionTokens,
conversation_id: conversationId,
message_count: conversationMessages.length,
model: 'gemini-2.5-flash'
})
});
if (response.status === 429) {
// Quota exceeded - disable chat input
showQuotaExceededMessage();
} else {
const result = await response.json();
if (result.warning) {
showWarningToast(result.warning);
}
}
} catch (error) {
console.error('Failed to track token usage:', error);
// Non-blocking error - conversation continues
}
};
Business Rules¶
- Returns 429 if quota exceeded - Frontend must handle gracefully
- Updates both daily and monthly counters
- Generates warning at 80% usage
- Creates detailed log entry for analytics
- Current request always tracked - Even if it exceeds quota
- Automatic cost calculation - Based on Gemini 2.5 Flash pricing
Get Usage History¶
Retrieve token usage history for analytics and reporting.
Endpoint¶
Authentication: Required (Customer JWT token)
Portal: 🟦 CUSTOMER
Query Parameters¶
days(optional, integer) - Number of days to retrieve (1-90, default: 30)
Response¶
{
"daily_usage": [
{
"date": "2025-01-13",
"total_tokens": 5678,
"total_conversations": 4,
"total_messages": 12,
"average_tokens_per_message": 473,
"estimated_cost_usd": 0.00142,
"quota_exceeded": false
},
{
"date": "2025-01-12",
"total_tokens": 8234,
"total_conversations": 6,
"total_messages": 18,
"average_tokens_per_message": 457,
"estimated_cost_usd": 0.00206,
"quota_exceeded": false
},
{
"date": "2025-01-11",
"total_tokens": 10234,
"total_conversations": 8,
"total_messages": 24,
"average_tokens_per_message": 426,
"estimated_cost_usd": 0.00256,
"quota_exceeded": true
}
],
"total_tokens": 24146,
"total_cost_usd": 0.00604,
"average_tokens_per_day": 8048
}
Fields:
daily_usage (array):
date(string) - Date in YYYY-MM-DD formattotal_tokens(integer) - Tokens used on this datetotal_conversations(integer) - Number of conversationstotal_messages(integer) - Number of messages sentaverage_tokens_per_message(integer) - Average tokens per messageestimated_cost_usd(float) - Estimated cost in USDquota_exceeded(boolean) - Whether quota was exceeded on this date
Summary fields:
total_tokens(integer) - Total tokens across all daystotal_cost_usd(float) - Total estimated cost in USDaverage_tokens_per_day(integer) - Average daily token consumption
Usage Examples¶
Last 7 Days:
Last 90 Days (Maximum):
Business Rules¶
- Maximum 90 days history - Returns up to 90 days
- Sorted by date descending - Most recent first
- Includes cost estimates - Based on Gemini pricing
- Analytics-ready format - Easy to chart/graph
- Quota exceeded tracking - Shows days when limit was hit
Update Customer Quota (Admin)¶
Modify customer's token quota limits (admin only).
Endpoint¶
Authentication: 🔴 ADMIN ONLY (TENANT_ADMIN or SUPER_ADMIN required)
Portal: 🟧 ADMIN
Path Parameters¶
customer_id(required, string) - Customer ID (ObjectId)
Request Body¶
Parameters:
daily_quota(optional, integer) - Daily token limit (use -1 for unlimited)monthly_quota(optional, integer) - Monthly token limit (use -1 for unlimited)quota_type(optional, string) - Enforcement type:daily,monthly, orunlimited
At least one parameter must be provided.
Response¶
Usage Examples¶
Set Custom Daily Quota:
curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
-H "Authorization: Bearer ADMIN_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_quota": 25000
}'
Set Unlimited Quota (Enterprise):
curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
-H "Authorization: Bearer ADMIN_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_quota": -1,
"monthly_quota": -1,
"quota_type": "unlimited"
}'
Reduce Quota (Penalty/Downgrade):
curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
-H "Authorization: Bearer ADMIN_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_quota": 5000,
"monthly_quota": 150000
}'
Business Rules¶
- Admin authentication required - Only TENANT_ADMIN or SUPER_ADMIN roles can access
- At least one quota parameter must be provided
- Quota values must be non-negative (except -1 for unlimited)
- Changes take effect immediately
- Use -1 for unlimited (Enterprise behavior)
- Supports subscription tier changes - When upgrading, quota is updated
- Tenant isolation enforced - TENANT_ADMIN can only modify customers in their tenant(s)
Security¶
Authentication: This endpoint requires a valid JWT token with admin privileges.
Role Requirements:
- SUPER_ADMIN - Can update quotas for any customer across all tenants
- TENANT_ADMIN - Can only update quotas for customers in their assigned tenant(s)
- Other roles - Access denied (403 Forbidden)
Authorization Flow:
1. Request must include valid JWT token in Authorization header
2. Token is validated and user role is checked
3. If TENANT_ADMIN: Verify customer belongs to admin's tenant
4. If SUPER_ADMIN: Allow access to any customer
5. Update quota in database
Response Codes:
200- Quota updated successfully401- Not authenticated (missing or invalid token)403- Forbidden (not an admin OR accessing wrong tenant)404- Customer not found422- Invalid quota values
Getting Admin JWT Token¶
Admin Login:
# Login as admin to get JWT token
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "admin@company.com",
"password": "admin_password",
"tenant_slug": "company"
}'
Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"user": {
"id": "507f1f77bcf86cd799439020",
"email": "admin@company.com",
"role": "tenant_admin"
}
}
Use the token in subsequent requests:
curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
-H "Content-Type: application/json" \
-d '{"daily_quota": 50000}'
Reset Daily Quotas (Cloud Function)¶
Reset quota_exceeded flags for all customers (automated daily job).
Endpoint¶
Authentication: 🔴 NONE (Internal Cloud Function only)
Access: Internal only - Do not expose publicly
Request Body¶
No request body required.
Response¶
{
"success": true,
"message": "Successfully reset quotas for 1234 customers",
"reset_count": 1234,
"reset_date": "2025-01-14"
}
Fields:
success(boolean) - Whether reset was successfulmessage(string) - Human-readable result messagereset_count(integer) - Number of customer records resetreset_date(string) - Date of reset (YYYY-MM-DD)
Cloud Function Setup¶
Google Cloud Scheduler Configuration:
name: reset-daily-token-quotas
schedule: "0 0 * * *" # Daily at midnight UTC
timezone: UTC
httpTarget:
uri: https://api.myreserva.id/api/v1/token-usage/quota-reset
httpMethod: POST
headers:
Content-Type: application/json
Cloud Function (Node.js):
const functions = require('@google-cloud/functions-framework');
const axios = require('axios');
functions.http('resetDailyQuotas', async (req, res) => {
try {
const response = await axios.post(
'https://api.myreserva.id/api/v1/token-usage/quota-reset',
{},
{ timeout: 30000 }
);
console.log('Daily quota reset completed:', response.data);
res.status(200).json(response.data);
} catch (error) {
console.error('Failed to reset quotas:', error);
res.status(500).json({ error: error.message });
}
});
Process Flow¶
- Cloud Scheduler triggers at 00:00 UTC daily
- Cloud Function calls backend API endpoint
- Backend finds all customers with
quota_exceeded=True - Backend resets
quota_exceededtoFalse - Backend clears
blocked_attimestamp - Backend returns count of reset records
- Cloud Function logs result to Cloud Logging
Security Considerations¶
- No authentication required (by design for Cloud Functions)
- Should be triggered only by trusted Cloud Functions
- Consider implementing IP allowlist in production if needed
- Do not expose this URL publicly (no public documentation)
- Monitor call frequency to prevent abuse
Production Security:
# Optional: Add IP allowlist in middleware
ALLOWED_IPS = [
"35.190.0.0/16", # Google Cloud IP range
"34.96.0.0/16"
]
@app.middleware("http")
async def verify_cloud_function_ip(request: Request, call_next):
if request.url.path == "/api/v1/token-usage/quota-reset":
client_ip = request.client.host
if client_ip not in ALLOWED_IPS:
return JSONResponse(
status_code=403,
content={"detail": "Forbidden"}
)
return await call_next(request)
Reset Monthly Quotas (Cloud Function)¶
Reset monthly usage counters (automated monthly job).
Endpoint¶
Authentication: 🔴 NONE (Internal Cloud Function only)
Access: Internal only - Do not expose publicly
Request Body¶
No request body required.
Response¶
{
"success": true,
"message": "Monthly quota reset completed successfully",
"reset_count": 0,
"reset_date": "2025-02"
}
Fields:
success(boolean) - Whether reset was successfulmessage(string) - Human-readable result messagereset_count(integer) - Number of records processed (0 for month initialization)reset_date(string) - Month of reset (YYYY-MM)
Cloud Function Setup¶
Google Cloud Scheduler Configuration:
name: reset-monthly-token-quotas
schedule: "0 0 1 * *" # 1st of every month at midnight UTC
timezone: UTC
httpTarget:
uri: https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly
httpMethod: POST
headers:
Content-Type: application/json
Cloud Function (Node.js):
const functions = require('@google-cloud/functions-framework');
const axios = require('axios');
functions.http('resetMonthlyQuotas', async (req, res) => {
try {
const response = await axios.post(
'https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly',
{},
{ timeout: 30000 }
);
console.log('Monthly quota reset completed:', response.data);
res.status(200).json(response.data);
} catch (error) {
console.error('Failed to reset monthly quotas:', error);
res.status(500).json({ error: error.message });
}
});
Process Flow¶
- Cloud Scheduler triggers at 00:00 UTC on 1st of month
- Cloud Function calls backend API endpoint
- Backend archives previous month's data (optional)
- Backend initializes new month's tracking records
- Backend optionally sends monthly usage reports to customers
- Backend returns success status
- Cloud Function logs result to Cloud Logging
Future Enhancements¶
Planned Features:
- Archive previous month data - For historical reporting
- Send monthly usage reports - Email customers with usage summary
- Generate analytics - Monthly trends, top users, etc.
Security¶
Same security considerations as daily reset endpoint (see above).
Database Schema¶
token_usage_budgets Collection¶
Purpose: Track customer token quotas and usage counters
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"customer_id": ObjectId("507f1f77bcf86cd799439010"),
"tenant_id": ObjectId("507f1f77bcf86cd799439009"),
"daily_quota": 10000,
"monthly_quota": 300000,
"quota_type": "daily",
"used_today": 2345,
"used_this_month": 45678,
"quota_exceeded": false,
"blocked_at": null,
"last_reset_date": "2025-01-13T00:00:00Z",
"created_at": "2025-01-01T10:30:00Z",
"updated_at": "2025-01-13T14:25:00Z"
}
Indexes:
db.token_usage_budgets.createIndex({ customer_id: 1, tenant_id: 1 }, { unique: true });
db.token_usage_budgets.createIndex({ tenant_id: 1 });
db.token_usage_budgets.createIndex({ quota_exceeded: 1 });
token_usage_logs Collection¶
Purpose: Detailed conversation logs for analytics
{
"_id": ObjectId("507f1f77bcf86cd799439012"),
"customer_id": ObjectId("507f1f77bcf86cd799439010"),
"tenant_id": ObjectId("507f1f77bcf86cd799439009"),
"conversation_id": "conv_abc123xyz",
"tokens": 1234,
"prompt_tokens": 456,
"completion_tokens": 778,
"system_tokens": 150,
"message_count": 3,
"model": "gemini-2.5-flash",
"function_calls": ["get_service_availability", "book_appointment"],
"estimated_cost_usd": 0.0003702,
"timestamp": "2025-01-13T14:25:30Z",
"created_at": "2025-01-13T14:25:30Z"
}
Indexes:
db.token_usage_logs.createIndex({ customer_id: 1, tenant_id: 1 });
db.token_usage_logs.createIndex({ tenant_id: 1 });
db.token_usage_logs.createIndex({ timestamp: -1 });
db.token_usage_logs.createIndex({ conversation_id: 1 });
TTL Index (Auto-delete after 90 days):
Testing¶
Manual Testing Flow¶
1. Check Quota (Before AI Call):
curl -X GET http://localhost:8000/api/v1/token-usage/quota \
-H "Authorization: Bearer CUSTOMER_JWT_TOKEN"
Expected Response:
{
"allowed": true,
"used_today": 0,
"daily_quota": 10000,
"remaining_today": 10000,
"percentage_used_today": 0
}
2. Track Usage (After AI Call):
curl -X POST http://localhost:8000/api/v1/token-usage/track \
-H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tokens": 1234,
"prompt_tokens": 456,
"completion_tokens": 778,
"conversation_id": "conv_test_123",
"message_count": 3,
"model": "gemini-2.5-flash"
}'
Expected Response:
{
"success": true,
"recorded": true,
"new_total_today": 1234,
"new_total_month": 1234,
"quota_exceeded": false
}
3. Get Usage History:
curl -X GET http://localhost:8000/api/v1/token-usage/history?days=7 \
-H "Authorization: Bearer CUSTOMER_JWT_TOKEN"
4. Update Customer Quota (Admin Only):
curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
-H "Authorization: Bearer ADMIN_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"daily_quota": 50000,
"monthly_quota": 1500000
}'
Expected Response:
5. Test Quota Exceeded:
# Track usage that exceeds daily quota
curl -X POST http://localhost:8000/api/v1/token-usage/track \
-H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"tokens": 15000,
"conversation_id": "conv_test_456",
"message_count": 1
}'
Expected Response (429):
6. Check Quota After Exceeded:
curl -X GET http://localhost:8000/api/v1/token-usage/quota \
-H "Authorization: Bearer CUSTOMER_JWT_TOKEN"
Expected Response:
{
"allowed": false,
"used_today": 16234,
"daily_quota": 10000,
"remaining_today": 0,
"percentage_used_today": 162.34,
"blocked": true,
"blocked_reason": "Daily quota exceeded"
}
Unit Tests¶
Location: tests/test_token_usage.py
import pytest
from datetime import datetime, timezone
@pytest.mark.asyncio
async def test_check_quota_success(client, customer_token):
"""Test quota check with available quota"""
response = await client.get(
"/api/v1/token-usage/quota",
headers={"Authorization": f"Bearer {customer_token}"}
)
assert response.status_code == 200
data = response.json()
assert data["allowed"] is True
assert data["daily_quota"] > 0
@pytest.mark.asyncio
async def test_track_usage_success(client, customer_token):
"""Test tracking token usage"""
payload = {
"tokens": 1234,
"prompt_tokens": 456,
"completion_tokens": 778,
"conversation_id": "conv_test_123",
"message_count": 3
}
response = await client.post(
"/api/v1/token-usage/track",
json=payload,
headers={"Authorization": f"Bearer {customer_token}"}
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["new_total_today"] == 1234
@pytest.mark.asyncio
async def test_quota_exceeded(client, customer_token):
"""Test quota exceeded returns 429"""
# First, exceed quota
payload = {"tokens": 15000, "conversation_id": "conv_test", "message_count": 1}
response = await client.post(
"/api/v1/token-usage/track",
json=payload,
headers={"Authorization": f"Bearer {customer_token}"}
)
assert response.status_code == 429
# Then, check quota shows blocked
response = await client.get(
"/api/v1/token-usage/quota",
headers={"Authorization": f"Bearer {customer_token}"}
)
assert response.status_code == 200
data = response.json()
assert data["allowed"] is False
assert data["blocked"] is True
@pytest.mark.asyncio
async def test_usage_history(client, customer_token):
"""Test fetching usage history"""
response = await client.get(
"/api/v1/token-usage/history?days=30",
headers={"Authorization": f"Bearer {customer_token}"}
)
assert response.status_code == 200
data = response.json()
assert "daily_usage" in data
assert "total_tokens" in data
assert "average_tokens_per_day" in data
@pytest.mark.asyncio
async def test_admin_update_quota(client, admin_token):
"""Test admin updating customer quota"""
customer_id = "507f1f77bcf86cd799439010"
payload = {
"daily_quota": 50000,
"monthly_quota": 1500000
}
response = await client.patch(
f"/api/v1/token-usage/quota/{customer_id}",
json=payload,
headers={"Authorization": f"Bearer {admin_token}"}
)
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert data["updated_count"] == 1
@pytest.mark.asyncio
async def test_daily_quota_reset(client):
"""Test daily quota reset endpoint"""
response = await client.post("/api/v1/token-usage/quota-reset")
assert response.status_code == 200
data = response.json()
assert data["success"] is True
assert "reset_count" in data
Best Practices¶
For Frontend Developers¶
✅ DO:
- Always check quota before AI call - Use
GET /quotaendpoint - Show warning at 80% usage - Alert customers proactively
- Handle 429 gracefully - Display friendly "quota exceeded" message
- Track usage after successful calls - Use
POST /trackendpoint - Respect
allowed: false- Disable chat input when quota exceeded - Show reset time - Display when quota will reset
- Cache quota checks - Don't check on every keystroke (throttle)
❌ DON'T:
- Don't skip quota checks - Always verify before Gemini API call
- Don't ignore tracking errors - Log failures for debugging
- Don't hardcode quotas - Always fetch from backend
- Don't allow chat when blocked - Enforce
allowed: falsestrictly
For Backend Developers¶
✅ DO:
- Use accurate token counts - Extract from Gemini
usageMetadata - Update both daily and monthly counters
- Generate warnings at 80% usage
- Create detailed logs for analytics
- Handle subscription changes - Update quotas on upgrade
- Implement idempotency - Prevent duplicate tracking
❌ DON'T:
- Don't estimate tokens - Always use actual counts from Gemini
- Don't skip validation - Validate all input parameters
- Don't expose reset endpoints - Keep Cloud Function URLs private
- Don't trust client-provided quotas - Always fetch from database
For DevOps¶
✅ DO:
- Set up Cloud Scheduler - Daily and monthly reset jobs
- Monitor reset job success - Alert on failures
- Configure IP allowlist - Restrict reset endpoints to Cloud Functions
- Set TTL indexes - Auto-delete logs after 90 days
- Monitor quota usage patterns - Detect abuse
❌ DON'T:
- Don't skip backups - Backup quota data before resets
- Don't ignore failed resets - Investigate immediately
- Don't expose reset URLs publicly - Keep internal only
Error Handling¶
Common Errors¶
401 Unauthorized:
Fix: Ensure customer JWT token is included in Authorization header.
429 Quota Exceeded:
Fix: Customer must wait until daily reset or upgrade subscription plan.
422 Validation Error:
{
"detail": [
{
"loc": ["body", "tokens"],
"msg": "field required",
"type": "value_error.missing"
}
]
}
Fix: Provide all required fields in request body.
500 Internal Server Error:
Fix: Check server logs for specific error. May be database connectivity issue.
Troubleshooting¶
Quota Not Resetting¶
Symptoms: Customer still blocked after midnight UTC
Checks:
- Verify Cloud Scheduler is running:
- Check Cloud Function logs:
- Manually trigger reset:
Fix:
- Ensure Cloud Function has correct backend URL
- Verify backend endpoint is accessible from Cloud Functions
- Check for database connectivity issues
Usage Not Tracking¶
Symptoms: Token usage not updating after AI calls
Checks:
- Verify tracking endpoint returns 200:
- Check database for new log entries:
- Verify customer_id and tenant_id are correct
Fix:
- Ensure frontend sends all required fields
- Verify customer JWT token is valid
- Check database indexes are created
Incorrect Quota Limits¶
Symptoms: Customer has wrong quota (not matching subscription plan)
Checks:
- Verify subscription plan:
- Check quota record:
- Compare with plan limits
Fix:
- Manually update quota using admin endpoint
- Or delete quota record (will auto-initialize on next check)
- Ensure subscription upgrade triggers quota update
API Reference Summary¶
| Endpoint | Method | Purpose | Auth | Portal |
|---|---|---|---|---|
/token-usage/quota |
GET | Check remaining quota | Customer JWT | 🟦 Customer |
/token-usage/track |
POST | Record token usage | Customer JWT | 🟦 Customer |
/token-usage/history |
GET | Get usage history | Customer JWT | 🟦 Customer |
/token-usage/quota/{customer_id} |
PATCH | Update customer quota | Admin JWT (TENANT_ADMIN/SUPER_ADMIN) | 🟧 Admin |
/token-usage/quota-reset |
POST | Reset daily quotas | None (Cloud Function) | 🔴 Internal |
/token-usage/quota-reset-monthly |
POST | Reset monthly quotas | None (Cloud Function) | 🔴 Internal |
Related Documentation¶
- Subscription Management - Plan tiers, upgrades, billing
- Customer Authentication - Customer JWT tokens
- Customer Profile Management - Customer account management
- API Reference (Swagger) - Interactive API testing
Next Steps:
- Check available plans:
GET /subscriptions/plans - Check your quota:
GET /token-usage/quota - Start AI conversation with quota tracking
- View usage history:
GET /token-usage/history?days=30 - Upgrade plan if needed:
POST /subscriptions/upgrade