Skip to content

Token Usage Management

Complete guide to managing AI chat assistant token budgets, quota enforcement, and usage tracking in the Reserva platform.


Overview

The token usage system provides AI-powered customer support with built-in quota management:

  • Quota Checking - Pre-flight checks before AI API calls
  • Usage Tracking - Record token consumption after successful AI conversations
  • Daily & Monthly Limits - Multi-tier quota enforcement
  • Usage History - Analytics and reporting for customers
  • Admin Quota Management - Override limits for specific customers
  • Automated Resets - Cloud Function-triggered daily/monthly quota resets

Key Concepts:

  • Token = Unit of measurement for AI model usage (input + output text)
  • Daily Quota = Maximum tokens allowed per day per customer
  • Monthly Quota = Maximum tokens allowed per month per customer
  • Quota Exceeded = Customer blocked from AI chat until quota resets
  • Cloud Function Reset = Automated daily/monthly quota reset jobs

Subscription Plan Limits

Token quotas are dynamically allocated based on tenant subscription tiers. Higher plans get more AI assistance capacity. Quotas are configurable via environment variables for flexible scaling.

Token Quota by Plan (Default Configuration)

Plan Daily Quota Monthly Quota Multiplier Cost per Token (USD)
FREE 16,000 tokens 480,000 tokens 1x Free
PRO 64,000 tokens 1,920,000 tokens 4x Free
ENTERPRISE 128,000 tokens 3,840,000 tokens 8x Free

✨ NEW: Subscription-Based Allocation (Implemented November 17, 2025)

  • Quotas are automatically determined based on tenant's active subscription plan
  • Lookup chain: Customer → Tenant → Subscription → Plan → Token Quotas
  • Environment-configurable via .env settings for easy adjustment
  • Graceful fallback to default quotas (16K daily / 480K monthly) when subscription unavailable

Pricing Note: Gemini 2.5 Flash pricing (~$0.075 per 1M tokens for input, ~$0.30 per 1M tokens for output)

Token Estimation:

  • 1,000 tokens ≈ 750 words
  • Average chat message: 100-200 tokens
  • Typical conversation (10 messages): 1,500-3,000 tokens
  • FREE plan allows ~5-10 full conversations per day
  • PRO plan allows ~21-42 full conversations per day (4x more than FREE)
  • ENTERPRISE plan allows ~42-85 full conversations per day (8x more than FREE)

Environment Configuration

Quota limits can be customized via environment variables:

# FREE Plan Token Limits
FREE_PLAN_DAILY_TOKENS=16000
FREE_PLAN_MONTHLY_TOKENS=480000

# PRO Plan Token Limits
PRO_PLAN_DAILY_TOKENS=64000
PRO_PLAN_MONTHLY_TOKENS=1920000

# ENTERPRISE Plan Token Limits
ENTERPRISE_PLAN_DAILY_TOKENS=128000
ENTERPRISE_PLAN_MONTHLY_TOKENS=3840000

# Fallback Defaults (when subscription unavailable)
DEFAULT_DAILY_TOKENS=16000
DEFAULT_MONTHLY_TOKENS=480000

Plan Upgrade Impact

When upgrading subscription plans:

  1. FREE → PRO: Quota immediately increases to 64K daily / 1.92M monthly (4x boost)
  2. PRO → ENTERPRISE: Quota immediately increases to 128K daily / 3.84M monthly (8x boost)
  3. Downgrade (PRO → FREE): New limits (16K daily / 480K monthly) apply at next billing cycle

Important: Quota changes take effect immediately upon subscription upgrade payment confirmation. The system automatically detects the new subscription plan and adjusts token allocations in real-time.


Architecture

Integration with Gemini AI

sequenceDiagram
    Customer->>Next.js: Send chat message
    Next.js->>Backend: GET /api/v1/token-usage/quota
    Backend->>Next.js: {allowed: true/false, remaining}
    alt Quota Available
        Next.js->>Gemini API: Send message + conversation
        Gemini API->>Next.js: Response + usageMetadata
        Next.js->>Backend: POST /api/v1/token-usage/track
        Backend->>Database: Update usage counters
        Backend->>Next.js: {success: true, quota_exceeded}
        Next.js->>Customer: Display AI response
    else Quota Exceeded
        Next.js->>Customer: "Daily limit reached"
    end

Quota Reset Schedule

graph TB
    A[Cloud Scheduler] -->|Daily 00:00 UTC| B[Daily Reset Function]
    A -->|1st of Month 00:00 UTC| C[Monthly Reset Function]
    B -->|POST /api/v1/token-usage/quota-reset| D[Backend API]
    C -->|POST /api/v1/token-usage/quota-reset-monthly| D
    D -->|Update Database| E[MongoDB]
    D -->|Log Results| F[Cloud Logging]

Check Token Quota

Check customer's remaining token allowance before making AI request.

Endpoint

GET /api/v1/token-usage/quota

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Response

{
  "allowed": true,
  "used_today": 2345,
  "used_this_month": 45678,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 7655,
  "remaining_this_month": 254322,
  "percentage_used_today": 23.45,
  "percentage_used_this_month": 15.23,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Fields:

  • allowed (boolean) - Whether customer can send messages
  • used_today (integer) - Tokens consumed today
  • used_this_month (integer) - Tokens consumed this month
  • daily_quota (integer) - Daily token limit
  • monthly_quota (integer) - Monthly token limit
  • remaining_today (integer) - Tokens remaining today
  • remaining_this_month (integer) - Tokens remaining this month
  • percentage_used_today (float) - Daily usage percentage (0-100)
  • percentage_used_this_month (float) - Monthly usage percentage (0-100)
  • resets_at (string) - ISO 8601 timestamp when quota resets (midnight UTC)
  • blocked (boolean) - Whether customer is blocked
  • blocked_reason (string|null) - Reason for blocking if applicable

Response Examples

Available Quota:

{
  "allowed": true,
  "used_today": 1234,
  "used_this_month": 45678,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 8766,
  "remaining_this_month": 254322,
  "percentage_used_today": 12.34,
  "percentage_used_this_month": 15.23,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Quota Exceeded:

{
  "allowed": false,
  "used_today": 10234,
  "used_this_month": 310456,
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "remaining_today": 0,
  "remaining_this_month": 0,
  "percentage_used_today": 102.34,
  "percentage_used_this_month": 103.49,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": true,
  "blocked_reason": "Daily quota exceeded"
}

Unlimited (Enterprise):

{
  "allowed": true,
  "used_today": 150000,
  "used_this_month": 3500000,
  "daily_quota": -1,
  "monthly_quota": -1,
  "remaining_today": -1,
  "remaining_this_month": -1,
  "percentage_used_today": 0,
  "percentage_used_this_month": 0,
  "resets_at": "2025-01-14T00:00:00Z",
  "blocked": false,
  "blocked_reason": null
}

Note: -1 indicates unlimited quota (Enterprise plan).

Usage Flow

Frontend Implementation:

// Before sending message to Gemini
const checkQuota = async () => {
  const response = await fetch('/api/v1/token-usage/quota', {
    headers: {
      'Authorization': `Bearer ${customerToken}`
    }
  });

  const quota = await response.json();

  if (!quota.allowed) {
    // Show error: "Daily limit reached. Resets at {quota.resets_at}"
    return false;
  }

  // Show warning if > 80% used
  if (quota.percentage_used_today > 80) {
    console.warn(`AI quota at ${quota.percentage_used_today}%`);
  }

  return true;
};

Business Rules

  1. Always returns 200 OK - Even if quota exceeded (check allowed field)
  2. Auto-initialization - Creates quota record if customer has none
  3. Real-time calculation - Fetches current subscription plan limits
  4. Timezone: All timestamps in UTC
  5. Soft enforcement - Frontend should respect allowed: false

Track Token Usage

Record token consumption after successful AI conversation.

Endpoint

POST /api/v1/token-usage/track

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Request Body

{
  "tokens": 1234,
  "prompt_tokens": 456,
  "completion_tokens": 778,
  "conversation_id": "conv_abc123xyz",
  "message_count": 3,
  "function_calls": ["get_service_availability", "book_appointment"],
  "system_tokens": 150,
  "model": "gemini-2.5-flash"
}

Parameters:

  • tokens (required, integer) - Total tokens used (from Gemini usageMetadata.totalTokens)
  • prompt_tokens (optional, integer) - Input tokens (from usageMetadata.promptTokens)
  • completion_tokens (optional, integer) - Output tokens (from usageMetadata.completionTokens)
  • conversation_id (required, string) - Unique conversation identifier
  • message_count (required, integer) - Number of messages in conversation
  • function_calls (optional, array) - List of function names called (for analytics)
  • system_tokens (optional, integer) - Tokens used by system prompt
  • model (optional, string) - Gemini model used (default: gemini-2.5-flash)

Response

Success (Quota Remaining):

{
  "success": true,
  "recorded": true,
  "new_total_today": 2468,
  "new_total_month": 67890,
  "quota_exceeded": false,
  "warning": null
}

Success (Warning at 80%):

{
  "success": true,
  "recorded": true,
  "new_total_today": 8234,
  "new_total_month": 267890,
  "quota_exceeded": false,
  "warning": "You have used 82% of your daily token quota"
}

Quota Exceeded (429 Response):

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

Fields:

  • success (boolean) - Whether tracking succeeded
  • recorded (boolean) - Whether data was saved to database
  • new_total_today (integer) - Updated daily token total
  • new_total_month (integer) - Updated monthly token total
  • quota_exceeded (boolean) - Whether quota was exceeded after this request
  • warning (string|null) - Warning message if approaching limit (80%+ usage)

Response Codes

Code Description
200 Usage tracked successfully, quota available
401 Not authenticated
429 Quota exceeded after tracking this request
500 Internal server error

Usage Flow

Frontend Implementation:

// After receiving Gemini response
const trackUsage = async (geminiResponse) => {
  const { usageMetadata, conversationId } = geminiResponse;

  try {
    const response = await fetch('/api/v1/token-usage/track', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${customerToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        tokens: usageMetadata.totalTokens,
        prompt_tokens: usageMetadata.promptTokens,
        completion_tokens: usageMetadata.completionTokens,
        conversation_id: conversationId,
        message_count: conversationMessages.length,
        model: 'gemini-2.5-flash'
      })
    });

    if (response.status === 429) {
      // Quota exceeded - disable chat input
      showQuotaExceededMessage();
    } else {
      const result = await response.json();
      if (result.warning) {
        showWarningToast(result.warning);
      }
    }
  } catch (error) {
    console.error('Failed to track token usage:', error);
    // Non-blocking error - conversation continues
  }
};

Business Rules

  1. Returns 429 if quota exceeded - Frontend must handle gracefully
  2. Updates both daily and monthly counters
  3. Generates warning at 80% usage
  4. Creates detailed log entry for analytics
  5. Current request always tracked - Even if it exceeds quota
  6. Automatic cost calculation - Based on Gemini 2.5 Flash pricing

Get Usage History

Retrieve token usage history for analytics and reporting.

Endpoint

GET /api/v1/token-usage/history?days=30

Authentication: Required (Customer JWT token)

Portal: 🟦 CUSTOMER

Query Parameters

  • days (optional, integer) - Number of days to retrieve (1-90, default: 30)

Response

{
  "daily_usage": [
    {
      "date": "2025-01-13",
      "total_tokens": 5678,
      "total_conversations": 4,
      "total_messages": 12,
      "average_tokens_per_message": 473,
      "estimated_cost_usd": 0.00142,
      "quota_exceeded": false
    },
    {
      "date": "2025-01-12",
      "total_tokens": 8234,
      "total_conversations": 6,
      "total_messages": 18,
      "average_tokens_per_message": 457,
      "estimated_cost_usd": 0.00206,
      "quota_exceeded": false
    },
    {
      "date": "2025-01-11",
      "total_tokens": 10234,
      "total_conversations": 8,
      "total_messages": 24,
      "average_tokens_per_message": 426,
      "estimated_cost_usd": 0.00256,
      "quota_exceeded": true
    }
  ],
  "total_tokens": 24146,
  "total_cost_usd": 0.00604,
  "average_tokens_per_day": 8048
}

Fields:

daily_usage (array):

  • date (string) - Date in YYYY-MM-DD format
  • total_tokens (integer) - Tokens used on this date
  • total_conversations (integer) - Number of conversations
  • total_messages (integer) - Number of messages sent
  • average_tokens_per_message (integer) - Average tokens per message
  • estimated_cost_usd (float) - Estimated cost in USD
  • quota_exceeded (boolean) - Whether quota was exceeded on this date

Summary fields:

  • total_tokens (integer) - Total tokens across all days
  • total_cost_usd (float) - Total estimated cost in USD
  • average_tokens_per_day (integer) - Average daily token consumption

Usage Examples

Last 7 Days:

GET /api/v1/token-usage/history?days=7

Last 90 Days (Maximum):

GET /api/v1/token-usage/history?days=90

Business Rules

  1. Maximum 90 days history - Returns up to 90 days
  2. Sorted by date descending - Most recent first
  3. Includes cost estimates - Based on Gemini pricing
  4. Analytics-ready format - Easy to chart/graph
  5. Quota exceeded tracking - Shows days when limit was hit

Update Customer Quota (Admin)

Modify customer's token quota limits (admin only).

Endpoint

PATCH /api/v1/token-usage/quota/{customer_id}

Authentication: 🔴 ADMIN ONLY (TENANT_ADMIN or SUPER_ADMIN required)

Portal: 🟧 ADMIN

Path Parameters

  • customer_id (required, string) - Customer ID (ObjectId)

Request Body

{
  "daily_quota": 50000,
  "monthly_quota": 1500000,
  "quota_type": "daily"
}

Parameters:

  • daily_quota (optional, integer) - Daily token limit (use -1 for unlimited)
  • monthly_quota (optional, integer) - Monthly token limit (use -1 for unlimited)
  • quota_type (optional, string) - Enforcement type: daily, monthly, or unlimited

At least one parameter must be provided.

Response

{
  "success": true,
  "updated_count": 1
}

Usage Examples

Set Custom Daily Quota:

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 25000
  }'

Set Unlimited Quota (Enterprise):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": -1,
    "monthly_quota": -1,
    "quota_type": "unlimited"
  }'

Reduce Quota (Penalty/Downgrade):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/507f1f77bcf86cd799439010 \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 5000,
    "monthly_quota": 150000
  }'

Business Rules

  1. Admin authentication required - Only TENANT_ADMIN or SUPER_ADMIN roles can access
  2. At least one quota parameter must be provided
  3. Quota values must be non-negative (except -1 for unlimited)
  4. Changes take effect immediately
  5. Use -1 for unlimited (Enterprise behavior)
  6. Supports subscription tier changes - When upgrading, quota is updated
  7. Tenant isolation enforced - TENANT_ADMIN can only modify customers in their tenant(s)

Security

Authentication: This endpoint requires a valid JWT token with admin privileges.

Role Requirements:

  • SUPER_ADMIN - Can update quotas for any customer across all tenants
  • TENANT_ADMIN - Can only update quotas for customers in their assigned tenant(s)
  • Other roles - Access denied (403 Forbidden)

Authorization Flow:

1. Request must include valid JWT token in Authorization header
2. Token is validated and user role is checked
3. If TENANT_ADMIN: Verify customer belongs to admin's tenant
4. If SUPER_ADMIN: Allow access to any customer
5. Update quota in database

Response Codes:

  • 200 - Quota updated successfully
  • 401 - Not authenticated (missing or invalid token)
  • 403 - Forbidden (not an admin OR accessing wrong tenant)
  • 404 - Customer not found
  • 422 - Invalid quota values

Getting Admin JWT Token

Admin Login:

# Login as admin to get JWT token
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "admin@company.com",
    "password": "admin_password",
    "tenant_slug": "company"
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "bearer",
  "user": {
    "id": "507f1f77bcf86cd799439020",
    "email": "admin@company.com",
    "role": "tenant_admin"
  }
}

Use the token in subsequent requests:

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
  -H "Content-Type: application/json" \
  -d '{"daily_quota": 50000}'

Reset Daily Quotas (Cloud Function)

Reset quota_exceeded flags for all customers (automated daily job).

Endpoint

POST /api/v1/token-usage/quota-reset

Authentication: 🔴 NONE (Internal Cloud Function only)

Access: Internal only - Do not expose publicly

Request Body

No request body required.

Response

{
  "success": true,
  "message": "Successfully reset quotas for 1234 customers",
  "reset_count": 1234,
  "reset_date": "2025-01-14"
}

Fields:

  • success (boolean) - Whether reset was successful
  • message (string) - Human-readable result message
  • reset_count (integer) - Number of customer records reset
  • reset_date (string) - Date of reset (YYYY-MM-DD)

Cloud Function Setup

Google Cloud Scheduler Configuration:

name: reset-daily-token-quotas
schedule: "0 0 * * *"  # Daily at midnight UTC
timezone: UTC
httpTarget:
  uri: https://api.myreserva.id/api/v1/token-usage/quota-reset
  httpMethod: POST
  headers:
    Content-Type: application/json

Cloud Function (Node.js):

const functions = require('@google-cloud/functions-framework');
const axios = require('axios');

functions.http('resetDailyQuotas', async (req, res) => {
  try {
    const response = await axios.post(
      'https://api.myreserva.id/api/v1/token-usage/quota-reset',
      {},
      { timeout: 30000 }
    );

    console.log('Daily quota reset completed:', response.data);
    res.status(200).json(response.data);
  } catch (error) {
    console.error('Failed to reset quotas:', error);
    res.status(500).json({ error: error.message });
  }
});

Process Flow

  1. Cloud Scheduler triggers at 00:00 UTC daily
  2. Cloud Function calls backend API endpoint
  3. Backend finds all customers with quota_exceeded=True
  4. Backend resets quota_exceeded to False
  5. Backend clears blocked_at timestamp
  6. Backend returns count of reset records
  7. Cloud Function logs result to Cloud Logging

Security Considerations

  1. No authentication required (by design for Cloud Functions)
  2. Should be triggered only by trusted Cloud Functions
  3. Consider implementing IP allowlist in production if needed
  4. Do not expose this URL publicly (no public documentation)
  5. Monitor call frequency to prevent abuse

Production Security:

# Optional: Add IP allowlist in middleware
ALLOWED_IPS = [
  "35.190.0.0/16",  # Google Cloud IP range
  "34.96.0.0/16"
]

@app.middleware("http")
async def verify_cloud_function_ip(request: Request, call_next):
    if request.url.path == "/api/v1/token-usage/quota-reset":
        client_ip = request.client.host
        if client_ip not in ALLOWED_IPS:
            return JSONResponse(
                status_code=403,
                content={"detail": "Forbidden"}
            )
    return await call_next(request)

Reset Monthly Quotas (Cloud Function)

Reset monthly usage counters (automated monthly job).

Endpoint

POST /api/v1/token-usage/quota-reset-monthly

Authentication: 🔴 NONE (Internal Cloud Function only)

Access: Internal only - Do not expose publicly

Request Body

No request body required.

Response

{
  "success": true,
  "message": "Monthly quota reset completed successfully",
  "reset_count": 0,
  "reset_date": "2025-02"
}

Fields:

  • success (boolean) - Whether reset was successful
  • message (string) - Human-readable result message
  • reset_count (integer) - Number of records processed (0 for month initialization)
  • reset_date (string) - Month of reset (YYYY-MM)

Cloud Function Setup

Google Cloud Scheduler Configuration:

name: reset-monthly-token-quotas
schedule: "0 0 1 * *"  # 1st of every month at midnight UTC
timezone: UTC
httpTarget:
  uri: https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly
  httpMethod: POST
  headers:
    Content-Type: application/json

Cloud Function (Node.js):

const functions = require('@google-cloud/functions-framework');
const axios = require('axios');

functions.http('resetMonthlyQuotas', async (req, res) => {
  try {
    const response = await axios.post(
      'https://api.myreserva.id/api/v1/token-usage/quota-reset-monthly',
      {},
      { timeout: 30000 }
    );

    console.log('Monthly quota reset completed:', response.data);
    res.status(200).json(response.data);
  } catch (error) {
    console.error('Failed to reset monthly quotas:', error);
    res.status(500).json({ error: error.message });
  }
});

Process Flow

  1. Cloud Scheduler triggers at 00:00 UTC on 1st of month
  2. Cloud Function calls backend API endpoint
  3. Backend archives previous month's data (optional)
  4. Backend initializes new month's tracking records
  5. Backend optionally sends monthly usage reports to customers
  6. Backend returns success status
  7. Cloud Function logs result to Cloud Logging

Future Enhancements

Planned Features:

  1. Archive previous month data - For historical reporting
  2. Send monthly usage reports - Email customers with usage summary
  3. Generate analytics - Monthly trends, top users, etc.

Security

Same security considerations as daily reset endpoint (see above).


Database Schema

token_usage_budgets Collection

Purpose: Track customer token quotas and usage counters

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "customer_id": ObjectId("507f1f77bcf86cd799439010"),
  "tenant_id": ObjectId("507f1f77bcf86cd799439009"),
  "daily_quota": 10000,
  "monthly_quota": 300000,
  "quota_type": "daily",
  "used_today": 2345,
  "used_this_month": 45678,
  "quota_exceeded": false,
  "blocked_at": null,
  "last_reset_date": "2025-01-13T00:00:00Z",
  "created_at": "2025-01-01T10:30:00Z",
  "updated_at": "2025-01-13T14:25:00Z"
}

Indexes:

db.token_usage_budgets.createIndex({ customer_id: 1, tenant_id: 1 }, { unique: true });
db.token_usage_budgets.createIndex({ tenant_id: 1 });
db.token_usage_budgets.createIndex({ quota_exceeded: 1 });

token_usage_logs Collection

Purpose: Detailed conversation logs for analytics

{
  "_id": ObjectId("507f1f77bcf86cd799439012"),
  "customer_id": ObjectId("507f1f77bcf86cd799439010"),
  "tenant_id": ObjectId("507f1f77bcf86cd799439009"),
  "conversation_id": "conv_abc123xyz",
  "tokens": 1234,
  "prompt_tokens": 456,
  "completion_tokens": 778,
  "system_tokens": 150,
  "message_count": 3,
  "model": "gemini-2.5-flash",
  "function_calls": ["get_service_availability", "book_appointment"],
  "estimated_cost_usd": 0.0003702,
  "timestamp": "2025-01-13T14:25:30Z",
  "created_at": "2025-01-13T14:25:30Z"
}

Indexes:

db.token_usage_logs.createIndex({ customer_id: 1, tenant_id: 1 });
db.token_usage_logs.createIndex({ tenant_id: 1 });
db.token_usage_logs.createIndex({ timestamp: -1 });
db.token_usage_logs.createIndex({ conversation_id: 1 });

TTL Index (Auto-delete after 90 days):

db.token_usage_logs.createIndex({ timestamp: 1 }, { expireAfterSeconds: 7776000 }); // 90 days

Testing

Manual Testing Flow

1. Check Quota (Before AI Call):

curl -X GET http://localhost:8000/api/v1/token-usage/quota \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

Expected Response:

{
  "allowed": true,
  "used_today": 0,
  "daily_quota": 10000,
  "remaining_today": 10000,
  "percentage_used_today": 0
}

2. Track Usage (After AI Call):

curl -X POST http://localhost:8000/api/v1/token-usage/track \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": 1234,
    "prompt_tokens": 456,
    "completion_tokens": 778,
    "conversation_id": "conv_test_123",
    "message_count": 3,
    "model": "gemini-2.5-flash"
  }'

Expected Response:

{
  "success": true,
  "recorded": true,
  "new_total_today": 1234,
  "new_total_month": 1234,
  "quota_exceeded": false
}

3. Get Usage History:

curl -X GET http://localhost:8000/api/v1/token-usage/history?days=7 \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

4. Update Customer Quota (Admin Only):

curl -X PATCH http://localhost:8000/api/v1/token-usage/quota/CUSTOMER_ID \
  -H "Authorization: Bearer ADMIN_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "daily_quota": 50000,
    "monthly_quota": 1500000
  }'

Expected Response:

{
  "success": true,
  "updated_count": 1
}

5. Test Quota Exceeded:

# Track usage that exceeds daily quota
curl -X POST http://localhost:8000/api/v1/token-usage/track \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tokens": 15000,
    "conversation_id": "conv_test_456",
    "message_count": 1
  }'

Expected Response (429):

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

6. Check Quota After Exceeded:

curl -X GET http://localhost:8000/api/v1/token-usage/quota \
  -H "Authorization: Bearer CUSTOMER_JWT_TOKEN"

Expected Response:

{
  "allowed": false,
  "used_today": 16234,
  "daily_quota": 10000,
  "remaining_today": 0,
  "percentage_used_today": 162.34,
  "blocked": true,
  "blocked_reason": "Daily quota exceeded"
}

Unit Tests

Location: tests/test_token_usage.py

import pytest
from datetime import datetime, timezone

@pytest.mark.asyncio
async def test_check_quota_success(client, customer_token):
    """Test quota check with available quota"""
    response = await client.get(
        "/api/v1/token-usage/quota",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["allowed"] is True
    assert data["daily_quota"] > 0

@pytest.mark.asyncio
async def test_track_usage_success(client, customer_token):
    """Test tracking token usage"""
    payload = {
        "tokens": 1234,
        "prompt_tokens": 456,
        "completion_tokens": 778,
        "conversation_id": "conv_test_123",
        "message_count": 3
    }
    response = await client.post(
        "/api/v1/token-usage/track",
        json=payload,
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert data["new_total_today"] == 1234

@pytest.mark.asyncio
async def test_quota_exceeded(client, customer_token):
    """Test quota exceeded returns 429"""
    # First, exceed quota
    payload = {"tokens": 15000, "conversation_id": "conv_test", "message_count": 1}
    response = await client.post(
        "/api/v1/token-usage/track",
        json=payload,
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 429

    # Then, check quota shows blocked
    response = await client.get(
        "/api/v1/token-usage/quota",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["allowed"] is False
    assert data["blocked"] is True

@pytest.mark.asyncio
async def test_usage_history(client, customer_token):
    """Test fetching usage history"""
    response = await client.get(
        "/api/v1/token-usage/history?days=30",
        headers={"Authorization": f"Bearer {customer_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert "daily_usage" in data
    assert "total_tokens" in data
    assert "average_tokens_per_day" in data

@pytest.mark.asyncio
async def test_admin_update_quota(client, admin_token):
    """Test admin updating customer quota"""
    customer_id = "507f1f77bcf86cd799439010"
    payload = {
        "daily_quota": 50000,
        "monthly_quota": 1500000
    }
    response = await client.patch(
        f"/api/v1/token-usage/quota/{customer_id}",
        json=payload,
        headers={"Authorization": f"Bearer {admin_token}"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert data["updated_count"] == 1

@pytest.mark.asyncio
async def test_daily_quota_reset(client):
    """Test daily quota reset endpoint"""
    response = await client.post("/api/v1/token-usage/quota-reset")
    assert response.status_code == 200
    data = response.json()
    assert data["success"] is True
    assert "reset_count" in data

Best Practices

For Frontend Developers

DO:

  • Always check quota before AI call - Use GET /quota endpoint
  • Show warning at 80% usage - Alert customers proactively
  • Handle 429 gracefully - Display friendly "quota exceeded" message
  • Track usage after successful calls - Use POST /track endpoint
  • Respect allowed: false - Disable chat input when quota exceeded
  • Show reset time - Display when quota will reset
  • Cache quota checks - Don't check on every keystroke (throttle)

DON'T:

  • Don't skip quota checks - Always verify before Gemini API call
  • Don't ignore tracking errors - Log failures for debugging
  • Don't hardcode quotas - Always fetch from backend
  • Don't allow chat when blocked - Enforce allowed: false strictly

For Backend Developers

DO:

  • Use accurate token counts - Extract from Gemini usageMetadata
  • Update both daily and monthly counters
  • Generate warnings at 80% usage
  • Create detailed logs for analytics
  • Handle subscription changes - Update quotas on upgrade
  • Implement idempotency - Prevent duplicate tracking

DON'T:

  • Don't estimate tokens - Always use actual counts from Gemini
  • Don't skip validation - Validate all input parameters
  • Don't expose reset endpoints - Keep Cloud Function URLs private
  • Don't trust client-provided quotas - Always fetch from database

For DevOps

DO:

  • Set up Cloud Scheduler - Daily and monthly reset jobs
  • Monitor reset job success - Alert on failures
  • Configure IP allowlist - Restrict reset endpoints to Cloud Functions
  • Set TTL indexes - Auto-delete logs after 90 days
  • Monitor quota usage patterns - Detect abuse

DON'T:

  • Don't skip backups - Backup quota data before resets
  • Don't ignore failed resets - Investigate immediately
  • Don't expose reset URLs publicly - Keep internal only

Error Handling

Common Errors

401 Unauthorized:

{
  "detail": "Not authenticated"
}

Fix: Ensure customer JWT token is included in Authorization header.

429 Quota Exceeded:

{
  "detail": "Daily token quota exceeded. Limit resets at midnight UTC."
}

Fix: Customer must wait until daily reset or upgrade subscription plan.

422 Validation Error:

{
  "detail": [
    {
      "loc": ["body", "tokens"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

Fix: Provide all required fields in request body.

500 Internal Server Error:

{
  "detail": "Failed to track token usage"
}

Fix: Check server logs for specific error. May be database connectivity issue.


Troubleshooting

Quota Not Resetting

Symptoms: Customer still blocked after midnight UTC

Checks:

  1. Verify Cloud Scheduler is running:
    gcloud scheduler jobs describe reset-daily-token-quotas
    
  2. Check Cloud Function logs:
    gcloud functions logs read resetDailyQuotas --limit 50
    
  3. Manually trigger reset:
    gcloud scheduler jobs run reset-daily-token-quotas
    

Fix:

  • Ensure Cloud Function has correct backend URL
  • Verify backend endpoint is accessible from Cloud Functions
  • Check for database connectivity issues

Usage Not Tracking

Symptoms: Token usage not updating after AI calls

Checks:

  1. Verify tracking endpoint returns 200:
    curl -X POST http://localhost:8000/api/v1/token-usage/track \
      -H "Authorization: Bearer TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"tokens": 100, "conversation_id": "test", "message_count": 1}'
    
  2. Check database for new log entries:
    db.token_usage_logs.find().sort({timestamp: -1}).limit(10)
    
  3. Verify customer_id and tenant_id are correct

Fix:

  • Ensure frontend sends all required fields
  • Verify customer JWT token is valid
  • Check database indexes are created

Incorrect Quota Limits

Symptoms: Customer has wrong quota (not matching subscription plan)

Checks:

  1. Verify subscription plan:
    db.subscriptions.findOne({tenant_id: ObjectId("...")})
    
  2. Check quota record:
    db.token_usage_budgets.findOne({customer_id: ObjectId("...")})
    
  3. Compare with plan limits

Fix:

  • Manually update quota using admin endpoint
  • Or delete quota record (will auto-initialize on next check)
  • Ensure subscription upgrade triggers quota update

API Reference Summary

Endpoint Method Purpose Auth Portal
/token-usage/quota GET Check remaining quota Customer JWT 🟦 Customer
/token-usage/track POST Record token usage Customer JWT 🟦 Customer
/token-usage/history GET Get usage history Customer JWT 🟦 Customer
/token-usage/quota/{customer_id} PATCH Update customer quota Admin JWT (TENANT_ADMIN/SUPER_ADMIN) 🟧 Admin
/token-usage/quota-reset POST Reset daily quotas None (Cloud Function) 🔴 Internal
/token-usage/quota-reset-monthly POST Reset monthly quotas None (Cloud Function) 🔴 Internal


Next Steps:

  1. Check available plans: GET /subscriptions/plans
  2. Check your quota: GET /token-usage/quota
  3. Start AI conversation with quota tracking
  4. View usage history: GET /token-usage/history?days=30
  5. Upgrade plan if needed: POST /subscriptions/upgrade