Token Usage Management¶

Complete guide to managing AI chat assistant token budgets, quota enforcement, and usage tracking in the Reserva platform.

Overview¶

The token usage system provides AI-powered customer support with built-in quota management:

Quota Checking - Pre-flight checks before AI API calls
Usage Tracking - Record token consumption after successful AI conversations
Daily & Monthly Limits - Multi-tier quota enforcement
Usage History - Analytics and reporting for customers
Admin Quota Management - Override limits for specific customers
Automated Resets - Cloud Function-triggered daily/monthly quota resets

Key Concepts:

Token = Unit of measurement for AI model usage (input + output text)
Daily Quota = Maximum tokens allowed per day per customer
Monthly Quota = Maximum tokens allowed per month per customer
Quota Exceeded = Customer blocked from AI chat until quota resets
Cloud Function Reset = Automated daily/monthly quota reset jobs

Subscription Plan Limits¶

Token quotas are dynamically allocated based on tenant subscription tiers. Higher plans get more AI assistance capacity. Quotas are configurable via environment variables for flexible scaling.

Token Quota by Plan (Default Configuration)¶

Plan	Daily Quota	Monthly Quota	Multiplier	Cost per Token (USD)
FREE	16,000 tokens	480,000 tokens	1x	Free
PRO	64,000 tokens	1,920,000 tokens	4x	Free
ENTERPRISE	128,000 tokens	3,840,000 tokens	8x	Free

✨ NEW: Subscription-Based Allocation (Implemented November 17, 2025)

Quotas are automatically determined based on tenant's active subscription plan
Lookup chain: Customer → Tenant → Subscription → Plan → Token Quotas
Environment-configurable via .env settings for easy adjustment
Graceful fallback to default quotas (16K daily / 480K monthly) when subscription unavailable

Pricing Note: Gemini 2.5 Flash pricing (~$0.075 per 1M tokens for input, ~$0.30 per 1M tokens for output)

Token Estimation:

1,000 tokens ≈ 750 words
Average chat message: 100-200 tokens
Typical conversation (10 messages): 1,500-3,000 tokens
FREE plan allows ~5-10 full conversations per day
PRO plan allows ~21-42 full conversations per day (4x more than FREE)
ENTERPRISE plan allows ~42-85 full conversations per day (8x more than FREE)

Environment Configuration¶

Quota limits can be customized via environment variables:

# FREE Plan Token Limits
FREE_PLAN_DAILY_TOKENS=16000
FREE_PLAN_MONTHLY_TOKENS=480000

# PRO Plan Token Limits
PRO_PLAN_DAILY_TOKENS=64000
PRO_PLAN_MONTHLY_TOKENS=1920000

# ENTERPRISE Plan Token Limits
ENTERPRISE_PLAN_DAILY_TOKENS=128000
ENTERPRISE_PLAN_MONTHLY_TOKENS=3840000

# Fallback Defaults (when subscription unavailable)
DEFAULT_DAILY_TOKENS=16000
DEFAULT_MONTHLY_TOKENS=480000

Plan Upgrade Impact¶

When upgrading subscription plans:

FREE → PRO: Quota immediately increases to 64K daily / 1.92M monthly (4x boost)
PRO → ENTERPRISE: Quota immediately increases to 128K daily / 3.84M monthly (8x boost)
Downgrade (PRO → FREE): New limits (16K daily / 480K monthly) apply at next billing cycle

Important: Quota changes take effect immediately upon subscription upgrade payment confirmation. The system automatically detects the new subscription plan and adjusts token allocations in real-time.

Architecture¶

Integration with Gemini AI¶

sequenceDiagram
    Customer->>Next.js: Send chat message
    Next.js->>Backend: GET /api/v1/token-usage/quota
    Backend->>Next.js: {allowed: true/false, remaining}
    alt Quota Available
        Next.js->>Gemini API: Send message + conversation
        Gemini API->>Next.js: Response + usageMetadata
        Next.js->>Backend: POST /api/v1/token-usage/track
        Backend->>Database: Update usage counters
        Backend->>Next.js: {success: true, quota_exceeded}
        Next.js->>Customer: Display AI response
    else Quota Exceeded
        Next.js->>Customer: "Daily limit reached"
    end

Quota Reset Schedule¶

graph TB
    A[Cloud Scheduler] -->|Daily 00:00 UTC| B[Daily Reset Function]
    A -->|1st of Month 00:00 UTC| C[Monthly Reset Function]
    B -->|POST /api/v1/token-usage/quota-reset| D[Backend API]
    C -->|POST /api/v1/token-usage/quota-reset-monthly| D
    D -->|Update Database| E[MongoDB]
    D -->|Log Results| F[Cloud Logging]

Check Token Quota¶

Check customer's remaining token allowance before making AI request.

Endpoint¶

GET /api/v1/token-usage/quota

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Response¶

{
  "allowed": true,
  "used_today": 2345,
  "used_this_month": 45678,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 7655,
  "remaining_this_month": 254322,
  "percentage_used_today": 23.45,
  "percentage_used_this_month": 15.23,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Fields:

allowed (boolean) - Whether customer can send messages
used_today (integer) - Tokens consumed today
used_this_month (integer) - Tokens consumed this month
daily_quota (integer) - Daily token limit
monthly_quota (integer) - Monthly token limit
remaining_today (integer) - Tokens remaining today
remaining_this_month (integer) - Tokens remaining this month
percentage_used_today (float) - Daily usage percentage (0-100)
percentage_used_this_month (float) - Monthly usage percentage (0-100)
resets_at (string) - ISO 8601 timestamp when quota resets (midnight UTC)
blocked (boolean) - Whether customer is blocked
blocked_reason (string|null) - Reason for blocking if applicable

Response Examples¶

Available Quota:

{
  "allowed": true,
  "used_today": 1234,
  "used_this_month": 45678,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 8766,
  "remaining_this_month": 254322,
  "percentage_used_today": 12.34,
  "percentage_used_this_month": 15.23,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Quota Exceeded:

{
  "allowed": false,
  "used_today": 10234,
  "used_this_month": 310456,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 0,
  "remaining_this_month": 0,
  "percentage_used_today": 102.34,
  "percentage_used_this_month": 103.49,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": true,
  "blocked_reason": "Daily quota exceeded"
}

Unlimited (Enterprise):

{
  "allowed": true,
  "used_today": 150000,
  "used_this_month": 3500000,
  "daily_quota": -1,
  "monthly_quota": -1,
  "remaining_today": -1,
  "remaining_this_month": -1,
  "percentage_used_today": 0,
  "percentage_used_this_month": 0,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Note: -1 indicates unlimited quota (Enterprise plan).

Usage Flow¶

Frontend Implementation:

// Before sending message to Gemini
const checkQuota = async () => {
  const response = await fetch('/api/v1/token-usage/quota', {
    headers: {
      'Authorization': `Bearer ${customerToken}`
    }
  });

  const quota = await response.json();

  if (!quota.allowed) {
    // Show error: "Daily limit reached. Resets at {quota.resets_at}"
    return false;
  }

  // Show warning if > 80% used
  if (quota.percentage_used_today > 80) {
    console.warn(`AI quota at ${quota.percentage_used_today}%`);
  }

  return true;
};

Business Rules¶

Always returns 200 OK - Even if quota exceeded (check allowed field)
Auto-initialization - Creates quota record if customer has none
Real-time calculation - Fetches current subscription plan limits
Timezone: All timestamps in UTC
Soft enforcement - Frontend should respect allowed: false

Track Token Usage¶

Record token consumption after successful AI conversation.

Endpoint¶

POST /api/v1/token-usage/track

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Request Body¶

{
  "tokens": 1234,
  "prompt_tokens": 456,
  "completion_tokens": 778,
  "conversation_id": "conv_abc123xyz",
  "message_count": 3,
  "function_calls": ["get_service_availability", "book_appointment"],
  "system_tokens": 150,
  "model": "gemini-2.5-flash"
}

Parameters:

tokens (required, integer) - Total tokens used (from Gemini usageMetadata.totalTokens)
prompt_tokens (optional, integer) - Input tokens (from usageMetadata.promptTokens)
completion_tokens (optional, integer) - Output tokens (from usageMetadata.completionTokens)
conversation_id (required, string) - Unique conversation identifier
message_count (required, integer) - Number of messages in conversation
function_calls (optional, array) - List of function names called (for analytics)
system_tokens (optional, integer) - Tokens used by system prompt
model (optional, string) - Gemini model used (default: gemini-2.5-flash)

Response¶

Success (Quota Remaining):

{
  "success": true,
  "recorded": true,
  "new_total_today": 2468,
  "new_total_month": 67890,
  "quota_exceeded": false,
  "warning": null
}

Success (Warning at 80%):

{
  "success": true,
  "recorded": true,
  "new_total_today": 8234,
  "new_total_month": 267890,
  "quota_exceeded": false,
  "warning": "You have used 82% of your daily token quota"
}

Quota Exceeded (429 Response):

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

Fields:

success (boolean) - Whether tracking succeeded
recorded (boolean) - Whether data was saved to database
new_total_today (integer) - Updated daily token total
new_total_month (integer) - Updated monthly token total
quota_exceeded (boolean) - Whether quota was exceeded after this request
warning (string|null) - Warning message if approaching limit (80%+ usage)

Response Codes¶

Code	Description
`200`	Usage tracked successfully, quota available
`401`	Not authenticated
`429`	Quota exceeded after tracking this request
`500`	Internal server error

Usage Flow¶

Frontend Implementation:

// After receiving Gemini response
const trackUsage = async (geminiResponse) => {
  const { usageMetadata, conversationId } = geminiResponse;

  try {
    const response = await fetch('/api/v1/token-usage/track', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${customerToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        tokens: usageMetadata.totalTokens,
        prompt_tokens: usageMetadata.promptTokens,
        completion_tokens: usageMetadata.completionTokens,
        conversation_id: conversationId,
        message_count: conversationMessages.length,
        model: 'gemini-2.5-flash'
      })
    });

    if (response.status === 429) {
      // Quota exceeded - disable chat input
      showQuotaExceededMessage();
    } else {
      const result = await response.json();
      if (result.warning) {
        showWarningToast(result.warning);
      }
    }
  } catch (error) {
    console.error('Failed to track token usage:', error);
    // Non-blocking error - conversation continues
  }
};

Business Rules¶

Returns 429 if quota exceeded - Frontend must handle gracefully
Updates both daily and monthly counters
Generates warning at 80% usage
Creates detailed log entry for analytics
Current request always tracked - Even if it exceeds quota
Automatic cost calculation - Based on Gemini 2.5 Flash pricing

Get Usage History¶

Retrieve token usage history for analytics and reporting.

Endpoint¶

GET /api/v1/token-usage/history?days=30

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Query Parameters¶

days (optional, integer) - Number of days to retrieve (1-90, default: 30)

Response¶

{
  "daily_usage": [
    {
      "date": "2025-01-13",
      "total_tokens": 5678,
      "total_conversations": 4,
      "total_messages": 12,
      "average_tokens_per_message": 473,
      "estimated_cost_usd": 0.00142,
      "quota_exceeded": false
    },
    {
      "date": "2025-01-12",
      "total_tokens": 8234,
      "total_conversations": 6,
      "total_messages": 18,
      "average_tokens_per_message": 457,
      "estimated_cost_usd": 0.00206,
      "quota_exceeded": false
    },
    {
      "date": "2025-01-11",
      "total_tokens": 10234,
      "total_conversations": 8,
      "total_messages": 24,
      "average_tokens_per_message": 426,
      "estimated_cost_usd": 0.00256,
      "quota_exceeded": true
    }
  ],
  "total_tokens": 24146,
  "total_cost_usd": 0.00604,
  "average_tokens_per_day": 8048
}

Fields:

daily_usage (array):

date (string) - Date in YYYY-MM-DD format
total_tokens (integer) - Tokens used on this date
total_conversations (integer) - Number of conversations
total_messages (integer) - Number of messages sent
average_tokens_per_message (integer) - Average tokens per message
estimated_cost_usd (float) - Estimated cost in USD
quota_exceeded (boolean) - Whether quota was exceeded on this date

Summary fields:

total_tokens (integer) - Total tokens across all days
total_cost_usd (float) - Total estimated cost in USD
average_tokens_per_day (integer) - Average daily token consumption

Usage Examples¶

Last 7 Days:

GET /api/v1/token-usage/history?days=7

Last 90 Days (Maximum):

GET /api/v1/token-usage/history?days=90

Business Rules¶

Maximum 90 days history - Returns up to 90 days
Sorted by date descending - Most recent first
Includes cost estimates - Based on Gemini pricing
Analytics-ready format - Easy to chart/graph
Quota exceeded tracking - Shows days when limit was hit

Update Customer Quota (Admin)¶

Modify customer's token quota limits (admin only).

Endpoint¶

PATCH /api/v1/token-usage/quota/{customer_id}

Authentication: 🔴 ADMIN ONLY (TENANT_ADMIN or SUPER_ADMIN required)

Portal: 🟧 ADMIN

Path Parameters¶

customer_id (required, string) - Customer ID (ObjectId)

Request Body¶

{
  "daily_quota": 50000,
  "monthly_quota": 1500000,
  "quota_type": "daily"
}

Parameters:

daily_quota (optional, integer) - Daily token limit (use -1 for unlimited)
monthly_quota (optional, integer) - Monthly token limit (use -1 for unlimited)
quota_type (optional, string) - Enforcement type: daily, monthly, or unlimited

At least one parameter must be provided.

Response¶

{
  "success": true,
  "updated_count": 1
}

Usage Examples¶

Set Custom Daily Quota:

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 25000
  }'

Set Unlimited Quota (Enterprise):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": -1,
    "monthly_quota": -1,
    "quota_type": "unlimited"
  }'

Reduce Quota (Penalty/Downgrade):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 5000,
    "monthly_quota": 150000
  }'

Business Rules¶

Admin authentication required - Only TENANT_ADMIN or SUPER_ADMIN roles can access
At least one quota parameter must be provided
Quota values must be non-negative (except -1 for unlimited)
Changes take effect immediately
Use -1 for unlimited (Enterprise behavior)
Supports subscription tier changes - When upgrading, quota is updated
Tenant isolation enforced - TENANT_ADMIN can only modify customers in their tenant(s)

Security¶

Authentication: This endpoint requires a valid JWT token with admin privileges.

Role Requirements:

SUPER_ADMIN - Can update quotas for any customer across all tenants
TENANT_ADMIN - Can only update quotas for customers in their assigned tenant(s)
Other roles - Access denied (403 Forbidden)

Authorization Flow:

1. Request must include valid JWT token in Authorization header
2. Token is validated and user role is checked
3. If TENANT_ADMIN: Verify customer belongs to admin's tenant
4. If SUPER_ADMIN: Allow access to any customer
5. Update quota in database

Response Codes:

200 - Quota updated successfully
401 - Not authenticated (missing or invalid token)
403 - Forbidden (not an admin OR accessing wrong tenant)
404 - Customer not found
422 - Invalid quota values

Getting Admin JWT Token¶

Admin Login:

# Login as admin to get JWT token
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "admin@company.com",
    "password": "admin_password",
    "tenant_slug": "company"
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "user": {
    "id": "507f1f77bcf86cd799439020",
    "email": "admin@company.com",
    "role": "tenant_admin"
  }
}

Use the token in subsequent requests:

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
  -H "Content-Type: application/json" \
  -d '{"daily_quota": 50000}'

Reset Daily Quotas (Cloud Function)¶

Reset quota_exceeded flags for all customers (automated daily job).

Endpoint¶

POST /api/v1/token-usage/quota-reset

Authentication: 🔴 NONE (Internal Cloud Function only)

Access: Internal only - Do not expose publicly

Request Body¶

No request body required.

Response¶

{
  "success": true,
  "message": "Successfully reset quotas for 1234 customers",
  "reset_count": 1234,
  "reset_date": "2025-01-14"
}

Fields:

success (boolean) - Whether reset was successful
message (string) - Human-readable result message
reset_count (integer) - Number of customer records reset
reset_date (string) - Date of reset (YYYY-MM-DD)

Cloud Function Setup¶

Google Cloud Scheduler Configuration:

name: reset-daily-token-quotas
schedule: "0 0 * * *"  # Daily at midnight UTC
timezone: UTC
httpTarget:
  uri: https://api.myreserva.id/api/v1/token-usage/quota-reset
  httpMethod: POST
  headers:
    Content-Type: application/json

Cloud Function (Node.js):

const functions = require('@google-cloud/functions-framework');
const axios = require('axios');

functions.http('resetDailyQuotas', async (req, res) => {
  try {
    const response = await axios.post(
      'https://api.myreserva.id/api/v1/token-usage/quota-reset',
      {},
      { timeout: 30000 }
    );

    console.log('Daily quota reset completed:', response.data);
    res.status(200).json(response.data);
  } catch (error) {
    console.error('Failed to reset quotas:', error);
    res.status(500).json({ error: error.message });
  }
});

Process Flow¶

Cloud Scheduler triggers at 00:00 UTC daily
Cloud Function calls backend API endpoint
Backend finds all customers with quota_exceeded=True
Backend resets quota_exceeded to False
Backend clears blocked_at timestamp
Backend returns count of reset records
Cloud Function logs result to Cloud Logging

Security Considerations¶

No authentication required (by design for Cloud Functions)
Should be triggered only by trusted Cloud Functions
Consider implementing IP allowlist in production if needed
Do not expose this URL publicly (no public documentation)
Monitor call frequency to prevent abuse

Production Security:

# Optional: Add IP allowlist in middleware
ALLOWED_IPS = [
  "35.190.0.0/16",  # Google Cloud IP range
  "34.96.0.0/16"
]

@app.middleware("http")
async def verify_cloud_function_ip(request: Request, call_next):
    if request.url.path == "/api/v1/token-usage/quota-reset":
        client_ip = request.client.host
        if client_ip not in ALLOWED_IPS:
            return JSONResponse(
                status_code=403,
                content={"detail": "Forbidden"}
            )
    return await call_next(request)

Reset Monthly Quotas (Cloud Function)¶

Reset monthly usage counters (automated monthly job).

Endpoint¶

POST /api/v1/token-usage/quota-reset-monthly

Authentication: 🔴 NONE (Internal Cloud Function only)

Access: Internal only - Do not expose publicly

Request Body¶

No request body required.

Response¶

{
  "success": true,
  "message": "Monthly quota reset completed successfully",
  "reset_count": 0,
  "reset_date": "2025-02"
}

Fields:

success (boolean) - Whether reset was successful
message (string) - Human-readable result message
reset_count (integer) - Number of records processed (0 for month initialization)
reset_date (string) - Month of reset (YYYY-MM)

Cloud Function Setup¶

Google Cloud Scheduler Configuration:

name: reset-monthly-token-quotas
schedule: "0 0 1 * *"  # 1st of every month at midnight UTC
timezone: UTC
httpTarget:
  uri: https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly
  httpMethod: POST
  headers:
    Content-Type: application/json

Cloud Function (Node.js):

const functions = require('@google-cloud/functions-framework');
const axios = require('axios');

functions.http('resetMonthlyQuotas', async (req, res) => {
  try {
    const response = await axios.post(
      'https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly',
      {},
      { timeout: 30000 }
    );

    console.log('Monthly quota reset completed:', response.data);
    res.status(200).json(response.data);
  } catch (error) {
    console.error('Failed to reset monthly quotas:', error);
    res.status(500).json({ error: error.message });
  }
});

Process Flow¶

Cloud Scheduler triggers at 00:00 UTC on 1st of month
Cloud Function calls backend API endpoint
Backend archives previous month's data (optional)
Backend initializes new month's tracking records
Backend optionally sends monthly usage reports to customers
Backend returns success status
Cloud Function logs result to Cloud Logging

Future Enhancements¶

Planned Features:

Archive previous month data - For historical reporting
Send monthly usage reports - Email customers with usage summary
Generate analytics - Monthly trends, top users, etc.

Security¶

Same security considerations as daily reset endpoint (see above).

Database Schema¶

`token_usage_budgets` Collection¶

Purpose: Track customer token quotas and usage counters

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "customer_id": ObjectId("507f1f77bcf86cd799439010"),
  "tenant_id": ObjectId("507f1f77bcf86cd799439009"),
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "quota_type": "daily",
  "used_today": 2345,
  "used_this_month": 45678,
  "quota_exceeded": false,
  "blocked_at": null,
  "last_reset_date": "2025-01-13T00:00:00Z",
  "created_at": "2025-01-01T10:30:00Z",
  "updated_at": "2025-01-13T14:25:00Z"
}

Indexes:

db.token_usage_budgets.createIndex({ customer_id: 1, tenant_id: 1 }, { unique: true });
db.token_usage_budgets.createIndex({ tenant_id: 1 });
db.token_usage_budgets.createIndex({ quota_exceeded: 1 });

`token_usage_logs` Collection¶

Purpose: Detailed conversation logs for analytics

{
  "_id": ObjectId("507f1f77bcf86cd799439012"),
  "customer_id": ObjectId("507f1f77bcf86cd799439010"),
  "tenant_id": ObjectId("507f1f77bcf86cd799439009"),
  "conversation_id": "conv_abc123xyz",
  "tokens": 1234,
  "prompt_tokens": 456,
  "completion_tokens": 778,
  "system_tokens": 150,
  "message_count": 3,
  "model": "gemini-2.5-flash",
  "function_calls": ["get_service_availability", "book_appointment"],
  "estimated_cost_usd": 0.0003702,
  "timestamp": "2025-01-13T14:25:30Z",
  "created_at": "2025-01-13T14:25:30Z"
}

Indexes:

db.token_usage_logs.createIndex({ customer_id: 1, tenant_id: 1 });
db.token_usage_logs.createIndex({ tenant_id: 1 });
db.token_usage_logs.createIndex({ timestamp: -1 });
db.token_usage_logs.createIndex({ conversation_id: 1 });

TTL Index (Auto-delete after 90 days):

db.token_usage_logs.createIndex({ timestamp: 1 }, { expireAfterSeconds: 7776000 }); // 90 days

Testing¶

Manual Testing Flow¶

1. Check Quota (Before AI Call):

curl -X GET http://localhost:8000/api/v1/token-usage/quota \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

Expected Response:

{
  "allowed": true,
  "used_today": 0,
  "daily_quota": 10000,
  "remaining_today": 10000,
  "percentage_used_today": 0
}

2. Track Usage (After AI Call):

curl -X POST http://localhost:8000/api/v1/token-usage/track \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": 1234,
    "prompt_tokens": 456,
    "completion_tokens": 778,
    "conversation_id": "conv_test_123",
    "message_count": 3,
    "model": "gemini-2.5-flash"
  }'

Expected Response:

{
  "success": true,
  "recorded": true,
  "new_total_today": 1234,
  "new_total_month": 1234,
  "quota_exceeded": false
}

3. Get Usage History:

curl -X GET http://localhost:8000/api/v1/token-usage/history?days=7 \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

4. Update Customer Quota (Admin Only):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 50000,
    "monthly_quota": 1500000
  }'

Expected Response:

{
  "success": true,
  "updated_count": 1
}

5. Test Quota Exceeded:

# Track usage that exceeds daily quota
curl -X POST http://localhost:8000/api/v1/token-usage/track \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": 15000,
    "conversation_id": "conv_test_456",
    "message_count": 1
  }'

Expected Response (429):

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

6. Check Quota After Exceeded:

curl -X GET http://localhost:8000/api/v1/token-usage/quota \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

Expected Response:

{
  "allowed": false,
  "used_today": 16234,
  "daily_quota": 10000,
  "remaining_today": 0,
  "percentage_used_today": 162.34,
  "blocked": true,
  "blocked_reason": "Daily quota exceeded"
}

Unit Tests¶

Location: tests/test_token_usage.py

import pytest
from datetime import datetime, timezone

@pytest.mark.asyncio
async def test_check_quota_success(client, customer_token):
    """Test quota check with available quota"""
    response = await client.get(
        "/api/v1/token-usage/quota",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["allowed"] is True
    assert data["daily_quota"] > 0

@pytest.mark.asyncio
async def test_track_usage_success(client, customer_token):
    """Test tracking token usage"""
    payload = {
        "tokens": 1234,
        "prompt_tokens": 456,
        "completion_tokens": 778,
        "conversation_id": "conv_test_123",
        "message_count": 3
    }
    response = await client.post(
        "/api/v1/token-usage/track",
        json=payload,
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert data["new_total_today"] == 1234

@pytest.mark.asyncio
async def test_quota_exceeded(client, customer_token):
    """Test quota exceeded returns 429"""
    # First, exceed quota
    payload = {"tokens": 15000, "conversation_id": "conv_test", "message_count": 1}
    response = await client.post(
        "/api/v1/token-usage/track",
        json=payload,
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 429

    # Then, check quota shows blocked
    response = await client.get(
        "/api/v1/token-usage/quota",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["allowed"] is False
    assert data["blocked"] is True

@pytest.mark.asyncio
async def test_usage_history(client, customer_token):
    """Test fetching usage history"""
    response = await client.get(
        "/api/v1/token-usage/history?days=30",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert "daily_usage" in data
    assert "total_tokens" in data
    assert "average_tokens_per_day" in data

@pytest.mark.asyncio
async def test_admin_update_quota(client, admin_token):
    """Test admin updating customer quota"""
    customer_id = "507f1f77bcf86cd799439010"
    payload = {
        "daily_quota": 50000,
        "monthly_quota": 1500000
    }
    response = await client.patch(
        f"/api/v1/token-usage/quota/{customer_id}",
        json=payload,
        headers={"Authorization": f"Bearer {admin_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert data["updated_count"] == 1

@pytest.mark.asyncio
async def test_daily_quota_reset(client):
    """Test daily quota reset endpoint"""
    response = await client.post("/api/v1/token-usage/quota-reset")
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert "reset_count" in data

Best Practices¶

For Frontend Developers¶

✅ DO:

Always check quota before AI call - Use GET /quota endpoint
Show warning at 80% usage - Alert customers proactively
Handle 429 gracefully - Display friendly "quota exceeded" message
Track usage after successful calls - Use POST /track endpoint
Respect allowed: false - Disable chat input when quota exceeded
Show reset time - Display when quota will reset
Cache quota checks - Don't check on every keystroke (throttle)

❌ DON'T:

Don't skip quota checks - Always verify before Gemini API call
Don't ignore tracking errors - Log failures for debugging
Don't hardcode quotas - Always fetch from backend
Don't allow chat when blocked - Enforce allowed: false strictly

For Backend Developers¶

✅ DO:

Use accurate token counts - Extract from Gemini usageMetadata
Update both daily and monthly counters
Generate warnings at 80% usage
Create detailed logs for analytics
Handle subscription changes - Update quotas on upgrade
Implement idempotency - Prevent duplicate tracking

❌ DON'T:

Don't estimate tokens - Always use actual counts from Gemini
Don't skip validation - Validate all input parameters
Don't expose reset endpoints - Keep Cloud Function URLs private
Don't trust client-provided quotas - Always fetch from database

For DevOps¶

✅ DO:

Set up Cloud Scheduler - Daily and monthly reset jobs
Monitor reset job success - Alert on failures
Configure IP allowlist - Restrict reset endpoints to Cloud Functions
Set TTL indexes - Auto-delete logs after 90 days
Monitor quota usage patterns - Detect abuse

❌ DON'T:

Don't skip backups - Backup quota data before resets
Don't ignore failed resets - Investigate immediately
Don't expose reset URLs publicly - Keep internal only

Error Handling¶

Common Errors¶

401 Unauthorized:

{
  "detail": "Not authenticated"
}

Fix: Ensure customer JWT token is included in Authorization header.

429 Quota Exceeded:

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

Fix: Customer must wait until daily reset or upgrade subscription plan.

422 Validation Error:

{
  "detail": [
    {
      "loc": ["body", "tokens"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

Fix: Provide all required fields in request body.

500 Internal Server Error:

{
  "detail": "Failed to track token usage"
}

Fix: Check server logs for specific error. May be database connectivity issue.

Troubleshooting¶

Quota Not Resetting¶

Symptoms: Customer still blocked after midnight UTC

Checks:

Verify Cloud Scheduler is running:

gcloud scheduler jobs describe reset-daily-token-quotas

Check Cloud Function logs:

gcloud functions logs read resetDailyQuotas --limit 50

Manually trigger reset:

gcloud scheduler jobs run reset-daily-token-quotas

Fix:

Ensure Cloud Function has correct backend URL
Verify backend endpoint is accessible from Cloud Functions
Check for database connectivity issues

Usage Not Tracking¶

Symptoms: Token usage not updating after AI calls

Checks:

Verify tracking endpoint returns 200:

curl -X POST http://localhost:8000/api/v1/token-usage/track \
  -H "Authorization: Bearer TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tokens": 100, "conversation_id": "test", "message_count": 1}'

Check database for new log entries:

db.token_usage_logs.find().sort({timestamp: -1}).limit(10)

Verify customer_id and tenant_id are correct

Fix:

Ensure frontend sends all required fields
Verify customer JWT token is valid
Check database indexes are created

Incorrect Quota Limits¶

Symptoms: Customer has wrong quota (not matching subscription plan)

Checks:

Verify subscription plan:

db.subscriptions.findOne({tenant_id: ObjectId("...")})

Check quota record:

db.token_usage_budgets.findOne({customer_id: ObjectId("...")})

Compare with plan limits

Fix:

Manually update quota using admin endpoint
Or delete quota record (will auto-initialize on next check)
Ensure subscription upgrade triggers quota update

API Reference Summary¶

Endpoint	Method	Purpose	Auth	Portal
`/token-usage/quota`	GET	Check remaining quota	Customer JWT	🟦 Customer
`/token-usage/track`	POST	Record token usage	Customer JWT	🟦 Customer
`/token-usage/history`	GET	Get usage history	Customer JWT	🟦 Customer
`/token-usage/quota/{customer_id}`	PATCH	Update customer quota	Admin JWT (TENANT_ADMIN/SUPER_ADMIN)	🟧 Admin
`/token-usage/quota-reset`	POST	Reset daily quotas	None (Cloud Function)	🔴 Internal
`/token-usage/quota-reset-monthly`	POST	Reset monthly quotas	None (Cloud Function)	🔴 Internal

Subscription Management - Plan tiers, upgrades, billing
Customer Authentication - Customer JWT tokens
Customer Profile Management - Customer account management
API Reference (Swagger) - Interactive API testing

Next Steps:

Check available plans: GET /subscriptions/plans
Check your quota: GET /token-usage/quota
Start AI conversation with quota tracking
View usage history: GET /token-usage/history?days=30
Upgrade plan if needed: POST /subscriptions/upgrade

Token Usage Management¶

Overview¶

Subscription Plan Limits¶

Token Quota by Plan (Default Configuration)¶

Environment Configuration¶

Plan Upgrade Impact¶

Architecture¶

Integration with Gemini AI¶

Quota Reset Schedule¶

Check Token Quota¶

Endpoint¶

Response¶

Response Examples¶

Usage Flow¶

Business Rules¶

Track Token Usage¶

Endpoint¶

Request Body¶

Response¶

Response Codes¶

Usage Flow¶

Business Rules¶

Get Usage History¶

Endpoint¶

Query Parameters¶

Response¶

Usage Examples¶

Business Rules¶

Update Customer Quota (Admin)¶

Endpoint¶

Path Parameters¶

Request Body¶

Response¶

Usage Examples¶

Business Rules¶

Security¶

Getting Admin JWT Token¶

Reset Daily Quotas (Cloud Function)¶

Endpoint¶

Request Body¶

Response¶

Cloud Function Setup¶

Process Flow¶

Security Considerations¶

Reset Monthly Quotas (Cloud Function)¶

Endpoint¶

Request Body¶

Response¶

Cloud Function Setup¶

Process Flow¶

Future Enhancements¶

Security¶

Database Schema¶

token_usage_budgets Collection¶

token_usage_logs Collection¶

Testing¶

Manual Testing Flow¶

Unit Tests¶

Best Practices¶

For Frontend Developers¶

For Backend Developers¶

For DevOps¶

Error Handling¶

Common Errors¶

Troubleshooting¶

Quota Not Resetting¶

Usage Not Tracking¶

Incorrect Quota Limits¶

API Reference Summary¶

Related Documentation¶

`token_usage_budgets` Collection¶

`token_usage_logs` Collection¶