Troubleshooting & Advanced Usage
Comprehensive troubleshooting guide and advanced usage patterns for CrashLens power users
🚨 TROUBLESHOOTING
Common Issues & Solutions
❌ “No log files found” Error
Problem: CrashLens can’t find any log files to analyze.
Solutions:
BASH
# Check if logs exist in expected directories
ls -la .llm_logs/
ls -la logs/
# Generate test data if no logs exist
mkdir -p .llm_logs
crashlens simulate --output .llm_logs/test.jsonl --count 50
# Specify custom log directory
crashlens analyze --log-dir /path/to/your/logs
# Use find to locate JSONL files
find . -name "*.jsonl" -type f
⚠️ “Policy violation not detected” Issue
Problem: Expected policy violations aren’t being caught.
Debugging Steps:
BASH
# Enable debug mode to see detailed policy evaluation
crashlens policy-check --debug logs/
# Test policy syntax
crashlens validate --config crashlens.yml
# Run with verbose output
crashlens policy-check --verbose --detailed-output logs/
# Check if log format matches policy expectations
crashlens analyze --format json logs/ | jq '.entries[0]'
🔧 “Configuration not found” Error
Problem: CrashLens can’t find or parse configuration files.
Resolution:
BASH
# Initialize configuration if missing
crashlens init
# Check configuration syntax
crashlens validate --config crashlens.yml
# Use specific config file
crashlens policy-check --config /path/to/config.yml
# Generate default configuration
crashlens init --template basic --non-interactive
🐌 Performance Issues with Large Log Files
Problem: Analysis is slow or runs out of memory with large datasets.
Optimization Strategies:
BASH
# Use streaming mode for large files
crashlens analyze --stream logs/large-file.jsonl
# Enable parallel processing
crashlens analyze --parallel --workers 4 logs/
# Set memory limits
crashlens analyze --memory-limit 512MB logs/
# Use incremental processing
crashlens analyze --incremental --state-file .crashlens-state.json
# Process in batches
crashlens analyze --batch-size 1000 logs/
# Use time-based filtering to reduce dataset
crashlens analyze --since "2025-08-20T00:00:00Z" logs/
🔑 Authentication & API Key Issues
Problem: Issues with API authentication or key management.
Solutions:
BASH
# Set environment variables for API keys
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
# Use configuration file for keys
echo "api_keys:" > ~/.crashlens/config.yml
echo " openai: your-api-key" >> ~/.crashlens/config.yml
# Test API connectivity
crashlens test-connection --provider openai
# Use different API endpoint
crashlens analyze --api-base https://api.openai.com/v1 logs/
📊 Log Format Compatibility Issues
Problem: CrashLens doesn’t recognize your log format.
Format Solutions:
BASH
# Check supported log formats
crashlens formats --list
# Convert logs to compatible format
crashlens convert --input custom.log --output standard.jsonl --format langfuse
# Use custom parser
crashlens analyze --parser custom-parser.py logs/
# Validate log format
crashlens validate-logs logs/sample.jsonl
# Preview log parsing
crashlens analyze --dry-run --limit 5 logs/
🔍 Debug Mode & Diagnostics
Enabling Debug Mode
Debug mode provides detailed information about CrashLens operations, policy evaluation, and error conditions.
BASH
# Enable global debug mode
export CRASHLENS_DEBUG=true
crashlens analyze logs/
# Debug specific commands
crashlens --debug policy-check logs/
crashlens --debug --verbose analyze logs/
# Debug policy evaluation
crashlens policy-check --debug --explain logs/
# Debug configuration loading
crashlens --debug init
# Debug log parsing
crashlens --debug analyze --dry-run logs/sample.jsonl
Log Level Configuration
BASH
# Set different log levels
crashlens --log-level debug analyze logs/ # Most verbose
crashlens --log-level info analyze logs/ # Default
crashlens --log-level warning analyze logs/ # Warnings only
crashlens --log-level error analyze logs/ # Errors only
# Save debug logs to file
crashlens --debug analyze logs/ 2> debug.log
# Real-time debug monitoring
tail -f ~/.crashlens/debug.log
Diagnostic Commands
BASH
# System diagnostics
crashlens doctor # Run all diagnostic checks
crashlens doctor --check dependencies # Check Python dependencies
crashlens doctor --check permissions # Check file permissions
crashlens doctor --check configuration # Validate configuration
# Performance diagnostics
crashlens benchmark --dataset-size 1000 # Performance benchmark
crashlens profile analyze logs/ # Profile analysis performance
# Network diagnostics
crashlens test-connection # Test all API connections
crashlens test-connection --provider openai
crashlens ping --endpoint https://api.openai.com/v1
# Environment diagnostics
crashlens env --show # Show environment variables
crashlens version --detailed # Detailed version information
⚡ Performance Optimization
💾 Memory Optimization
BASH
# Set memory limits
--memory-limit 512MB
# Enable disk caching
--use-disk-cache
# Process in smaller chunks
--batch-size 500
# Streaming for large files
--stream
🏃 Speed Optimization
BASH
# Parallel processing
--parallel --workers 4
# Skip unnecessary checks
--fast-mode
# Use incremental analysis
--incremental
# Cache results
--cache-results
Large Dataset Strategies
BASH
# For datasets > 1GB
crashlens analyze \
--stream \
--parallel \
--workers 8 \
--memory-limit 1GB \
--batch-size 2000 \
--use-disk-cache \
large-logs/
# Time-based chunking for historical data
crashlens analyze --date-range "2025-08-01:2025-08-07" logs/
crashlens analyze --date-range "2025-08-08:2025-08-14" logs/
# Selective analysis by criteria
crashlens analyze --filter "cost > 1.0" logs/ # Only expensive requests
crashlens analyze --models gpt-4,claude-3 logs/ # Specific models only
📚 ADVANCED USAGE EXAMPLES
🏢 Enterprise Integration Patterns
Multi-Environment Cost Management
YAML
# Environment-specific configurations
# production.crashlens.yml
policies:
- enforce: "strict-cost-control"
rules:
- daily_budget_limit: 1000
monthly_budget_limit: 25000
alert_threshold: 0.8
block_threshold: 0.95
# staging.crashlens.yml
policies:
- enforce: "moderate-cost-control"
rules:
- daily_budget_limit: 200
monthly_budget_limit: 5000
# development.crashlens.yml
policies:
- enforce: "development-guidelines"
rules:
- daily_budget_limit: 50
warn_on_expensive_models: true
# Usage
crashlens policy-check --config production.crashlens.yml logs/
crashlens analyze --config staging.crashlens.yml --env staging logs/
Multi-Team Cost Allocation & Chargeback
BASH
# Team-based cost tracking
crashlens analyze \
--group-by team,project \
--include-metadata team_id,project_id \
--output-format csv \
--output team-costs-$(date +%Y-%m).csv \
logs/
# Chargeback report generation
crashlens report \
--template chargeback \
--period monthly \
--breakdown team,department,cost_center \
--include-budget-variance \
--output chargeback-$(date +%Y-%m).html
# Budget allocation tracking
crashlens analyze \
--filter "team_id IN ('team-a', 'team-b')" \
--budget-allocation team-a:5000,team-b:3000 \
--alert-on budget-exceeded \
logs/
Enterprise Alerting & Integration
BASH
# Slack integration with custom webhooks
crashlens policy-check \
--slack-webhook https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
--alert-channel "#finops-alerts" \
--alert-severity high \
logs/
# Email alerts
crashlens monitor \
--email-alerts admin@company.com,finops@company.com \
--email-template enterprise \
--alert-frequency daily \
.llm_logs/
# PagerDuty integration
crashlens monitor \
--pagerduty-key YOUR_PAGERDUTY_INTEGRATION_KEY \
--alert-on critical-violations,budget-exceeded \
--escalation-policy high-priority \
logs/
# JIRA ticket creation for violations
crashlens policy-check \
--jira-integration \
--jira-project FINOPS \
--jira-issue-type Bug \
--auto-assign finops-team \
logs/
📊 Advanced Analytics & Reporting
Cost Trend Analysis & Forecasting
BASH
# Trend analysis with forecasting
crashlens analyze \
--time-series daily \
--forecast 30 \
--trend-analysis \
--include-seasonality \
--output forecast-report.json \
logs/
# Cost anomaly detection
crashlens analyze \
--anomaly-detection \
--sensitivity medium \
--baseline-period 30d \
--alert-on anomalies \
logs/
# Comparative analysis across time periods
crashlens compare \
--baseline "2025-07-01:2025-07-31" \
--current "2025-08-01:2025-08-31" \
--metrics cost,usage,efficiency \
--statistical-significance 0.05 \
logs/
Custom Dashboards & Visualization
BASH
# Generate interactive dashboard
crashlens dashboard \
--template executive \
--include-charts cost-trend,model-usage,team-breakdown \
--refresh-interval 1h \
--port 8080 \
--auth-required \
logs/
# Export data for external visualization tools
crashlens analyze \
--output-format prometheus \
--metrics-endpoint /metrics \
--export-interval 5m \
logs/
# Grafana integration
crashlens export \
--format grafana \
--dashboard-config grafana-dashboard.json \
--data-source crashlens-metrics \
logs/
# Power BI integration
crashlens analyze \
--output-format powerbi \
--include-relationships \
--output powerbi-dataset.pbix \
logs/
Advanced Query & Filtering
BASH
# Complex SQL-like queries
crashlens query \
--filter "cost > 10 AND model LIKE 'gpt-4%' AND timestamp >= '2025-08-01'" \
--select "model, AVG(cost) as avg_cost, COUNT(*) as requests" \
--group-by model \
--having "AVG(cost) > 5" \
--order-by avg_cost DESC \
logs/
# Statistical analysis
crashlens analyze \
--statistics percentiles,correlation,regression \
--correlate cost,latency,token_count \
--percentiles 50,90,95,99 \
--output stats-report.json \
logs/
# Pattern mining
crashlens analyze \
--pattern-mining \
--min-support 0.1 \
--find-patterns retry-loops,cost-spikes,efficiency-issues \
--association-rules \
logs/
🔧 DevOps & CI/CD Integration
Advanced GitHub Actions Workflows
YAML
# .github/workflows/comprehensive-cost-control.yml
name: Comprehensive LLM Cost Control
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
schedule:
- cron: '0 8 * * *' # Daily at 8 AM
jobs:
cost-analysis:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Install CrashLens
run: pip install crashlens
- name: Download logs from production
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
aws s3 sync s3://your-logs-bucket/llm-logs/ ./logs/
- name: Run comprehensive analysis
run: |
crashlens analyze \
--parallel \
--output-format json \
--include-recommendations \
--output analysis-${{ github.sha }}.json \
logs/
- name: Policy compliance check
run: |
crashlens policy-check \
--config .github/crashlens.yml \
--fail-on-violation \
--detailed-output \
--slack-webhook ${{ secrets.SLACK_WEBHOOK }} \
logs/
- name: Generate executive report
if: github.event_name == 'schedule'
run: |
crashlens report \
--template executive \
--period daily \
--include-trends \
--output daily-report-$(date +%Y-%m-%d).html \
logs/
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: cost-analysis-${{ github.sha }}
path: |
analysis-${{ github.sha }}.json
daily-report-*.html
Kubernetes & Container Integration
YAML
# Kubernetes CronJob for cost monitoring
apiVersion: batch/v1
kind: CronJob
metadata:
name: crashlens-monitor
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: crashlens
image: crashlens/crashlens:latest
command:
- /bin/sh
- -c
- |
crashlens monitor \
--log-dir /logs \
--alert-on policy-violation,budget-exceeded \
--slack-webhook $SLACK_WEBHOOK \
--continuous
env:
- name: SLACK_WEBHOOK
valueFrom:
secretKeyRef:
name: crashlens-secrets
key: slack-webhook
volumeMounts:
- name: log-volume
mountPath: /logs
volumes:
- name: log-volume
persistentVolumeClaim:
claimName: llm-logs-pvc
restartPolicy: OnFailure
# Docker Compose for local development
version: '3.8'
services:
crashlens-monitor:
image: crashlens/crashlens:latest
command: >
crashlens watch /logs
--poll-interval 30
--alert-on policy-violation
--webhook http://webhook-service:3000/alerts
volumes:
- ./logs:/logs:ro
- ./crashlens.yml:/app/crashlens.yml:ro
environment:
- CRASHLENS_CONFIG=/app/crashlens.yml
depends_on:
- webhook-service
Infrastructure as Code Integration
TERRAFORM
# Terraform module for CrashLens monitoring
# modules/crashlens/main.tf
resource "aws_lambda_function" "crashlens_monitor" {
filename = "crashlens-lambda.zip"
function_name = "crashlens-cost-monitor"
role = aws_iam_role.crashlens_role.arn
handler = "index.handler"
runtime = "python3.12"
environment {
variables = {
LOG_BUCKET = var.log_bucket
SLACK_WEBHOOK = var.slack_webhook
POLICY_CONFIG = var.policy_config
}
}
}
resource "aws_cloudwatch_event_rule" "crashlens_schedule" {
name = "crashlens-daily-check"
description = "Daily CrashLens cost analysis"
schedule_expression = "rate(24 hours)"
}
resource "aws_cloudwatch_event_target" "lambda_target" {
rule = aws_cloudwatch_event_rule.crashlens_schedule.name
target_id = "CrashLensLambdaTarget"
arn = aws_lambda_function.crashlens_monitor.arn
}
# Ansible playbook for server deployment
---
- hosts: monitoring_servers
become: yes
tasks:
- name: Install CrashLens
pip:
name: crashlens
state: latest
- name: Create CrashLens config
template:
src: crashlens.yml.j2
dest: /etc/crashlens/crashlens.yml
mode: '0644'
- name: Create systemd service
template:
src: crashlens.service.j2
dest: /etc/systemd/system/crashlens.service
notify: restart crashlens
- name: Enable and start CrashLens service
systemd:
name: crashlens
enabled: yes
state: started
🔌 API Integration & Automation
REST API Usage
BASH
# Start CrashLens API server
crashlens serve --port 8080 --auth-token your-secret-token
# API endpoints usage
curl -H "Authorization: Bearer your-secret-token" \
-X POST http://localhost:8080/api/v1/analyze \
-H "Content-Type: application/json" \
-d '{
"log_files": ["logs/app.jsonl"],
"policies": ["prevent-model-overkill"],
"options": {
"include_recommendations": true,
"output_format": "json"
}
}'
# Policy check via API
curl -H "Authorization: Bearer your-secret-token" \
-X POST http://localhost:8080/api/v1/policy-check \
-H "Content-Type: application/json" \
-d '{
"log_files": ["logs/recent.jsonl"],
"policy_file": "policies/production.yml",
"severity": "high"
}'
# Real-time monitoring webhook
curl -X POST http://localhost:8080/api/v1/webhooks/register \
-H "Authorization: Bearer your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.com/webhooks/crashlens",
"events": ["policy_violation", "cost_spike"],
"secret": "webhook-secret"
}'
Python SDK Integration
PYTHON
# Python integration example
from crashlens import CrashLens, PolicyConfig
import logging
# Initialize CrashLens client
client = CrashLens(
config_file="crashlens.yml",
log_level=logging.INFO
)
# Programmatic policy checking
async def check_llm_request(request_data):
"""Check LLM request against policies before sending"""
try:
result = await client.policy_check_async(
request_data=request_data,
policies=["prevent-model-overkill", "budget-enforcement"],
fail_fast=True
)
if result.violations:
logger.warning(f"Policy violations: {result.violations}")
return result.suggested_alternatives
return None # No violations, proceed with request
except Exception as e:
logger.error(f"Policy check failed: {e}")
return None
# Real-time cost monitoring
def setup_cost_monitoring():
"""Set up real-time cost monitoring"""
@client.on_cost_threshold(threshold=100, period="daily")
async def handle_cost_alert(event):
"""Handle cost threshold alerts"""
await send_slack_alert(
f"Daily cost threshold exceeded: {event.current_cost}"
)
@client.on_policy_violation(severity="high")
async def handle_violation(violation):
"""Handle policy violations"""
await create_incident_ticket(violation)
# Start monitoring
client.start_monitoring(
log_sources=["./logs", "s3://company-llm-logs"],
poll_interval=300 # 5 minutes
)
# Batch analysis and reporting
async def generate_weekly_report():
"""Generate comprehensive weekly cost report"""
analysis = await client.analyze_async(
log_files=["logs/week-*.jsonl"],
include_trends=True,
include_recommendations=True,
time_range="last-7-days"
)
report = await client.generate_report(
analysis=analysis,
template="executive",
format="html",
include_charts=True
)
# Email report to stakeholders
await email_report(
recipients=["cto@company.com", "finops@company.com"],
subject="Weekly LLM Cost Analysis",
html_content=report.html,
attachments=[report.data_export]
)
🤖 Machine Learning & Predictive Analytics
Predictive Cost Modeling
BASH
# Train custom cost prediction models
crashlens ml train \
--model-type cost-predictor \
--features token_count,model_type,time_of_day,user_type \
--target cost \
--algorithm random-forest \
--validation-split 0.2 \
--output-model cost-model.pkl \
logs/historical-data.jsonl
# Real-time cost prediction
crashlens ml predict \
--model cost-model.pkl \
--input-features '{"token_count": 1500, "model_type": "gpt-4", "time_of_day": 14}' \
--confidence-interval 0.95
# Anomaly detection model
crashlens ml train \
--model-type anomaly-detector \
--algorithm isolation-forest \
--contamination 0.1 \
--output-model anomaly-model.pkl \
logs/normal-usage.jsonl
# Auto-scaling prediction
crashlens ml predict \
--model scaling-model.pkl \
--forecast-horizon 24h \
--include-uncertainty \
--alert-on capacity-exceeded
Intelligent Policy Optimization
BASH
# Auto-optimize policies based on historical data
crashlens ml optimize-policies \
--current-policies crashlens.yml \
--historical-data logs/last-90-days/ \
--objective minimize-cost \
--constraints maintain-quality \
--output optimized-policies.yml
# A/B testing for policy effectiveness
crashlens ml ab-test \
--policy-a current-policies.yml \
--policy-b optimized-policies.yml \
--test-data logs/test-set.jsonl \
--metrics cost,violations,user-satisfaction \
--duration 7d
# Reinforcement learning for dynamic policies
crashlens ml rl-train \
--environment production \
--reward-function cost-efficiency \
--exploration-strategy epsilon-greedy \
--episodes 1000 \
--save-model rl-policy-agent.pkl
Back to Documentation
Last updated: August 24, 2025