Troubleshooting & Advanced Usage

Comprehensive troubleshooting guide and advanced usage patterns for CrashLens power users

🚨 TROUBLESHOOTING

Common Issues & Solutions

❌ “No log files found” Error

Problem: CrashLens can’t find any log files to analyze.

Solutions:

BASH
# Check if logs exist in expected directories ls -la .llm_logs/ ls -la logs/ # Generate test data if no logs exist mkdir -p .llm_logs crashlens simulate --output .llm_logs/test.jsonl --count 50 # Specify custom log directory crashlens analyze --log-dir /path/to/your/logs # Use find to locate JSONL files find . -name "*.jsonl" -type f

⚠️ “Policy violation not detected” Issue

Problem: Expected policy violations aren’t being caught.

Debugging Steps:

BASH
# Enable debug mode to see detailed policy evaluation crashlens policy-check --debug logs/ # Test policy syntax crashlens validate --config crashlens.yml # Run with verbose output crashlens policy-check --verbose --detailed-output logs/ # Check if log format matches policy expectations crashlens analyze --format json logs/ | jq '.entries[0]'

🔧 “Configuration not found” Error

Problem: CrashLens can’t find or parse configuration files.

Resolution:

BASH
# Initialize configuration if missing crashlens init # Check configuration syntax crashlens validate --config crashlens.yml # Use specific config file crashlens policy-check --config /path/to/config.yml # Generate default configuration crashlens init --template basic --non-interactive

🐌 Performance Issues with Large Log Files

Problem: Analysis is slow or runs out of memory with large datasets.

Optimization Strategies:

BASH
# Use streaming mode for large files crashlens analyze --stream logs/large-file.jsonl # Enable parallel processing crashlens analyze --parallel --workers 4 logs/ # Set memory limits crashlens analyze --memory-limit 512MB logs/ # Use incremental processing crashlens analyze --incremental --state-file .crashlens-state.json # Process in batches crashlens analyze --batch-size 1000 logs/ # Use time-based filtering to reduce dataset crashlens analyze --since "2025-08-20T00:00:00Z" logs/

🔑 Authentication & API Key Issues

Problem: Issues with API authentication or key management.

Solutions:

BASH
# Set environment variables for API keys export OPENAI_API_KEY="your-api-key" export ANTHROPIC_API_KEY="your-anthropic-key" # Use configuration file for keys echo "api_keys:" > ~/.crashlens/config.yml echo " openai: your-api-key" >> ~/.crashlens/config.yml # Test API connectivity crashlens test-connection --provider openai # Use different API endpoint crashlens analyze --api-base https://api.openai.com/v1 logs/

📊 Log Format Compatibility Issues

Problem: CrashLens doesn’t recognize your log format.

Format Solutions:

BASH
# Check supported log formats crashlens formats --list # Convert logs to compatible format crashlens convert --input custom.log --output standard.jsonl --format langfuse # Use custom parser crashlens analyze --parser custom-parser.py logs/ # Validate log format crashlens validate-logs logs/sample.jsonl # Preview log parsing crashlens analyze --dry-run --limit 5 logs/

🔍 Debug Mode & Diagnostics

Enabling Debug Mode

Debug mode provides detailed information about CrashLens operations, policy evaluation, and error conditions.

BASH
# Enable global debug mode export CRASHLENS_DEBUG=true crashlens analyze logs/ # Debug specific commands crashlens --debug policy-check logs/ crashlens --debug --verbose analyze logs/ # Debug policy evaluation crashlens policy-check --debug --explain logs/ # Debug configuration loading crashlens --debug init # Debug log parsing crashlens --debug analyze --dry-run logs/sample.jsonl

Log Level Configuration

BASH
# Set different log levels crashlens --log-level debug analyze logs/ # Most verbose crashlens --log-level info analyze logs/ # Default crashlens --log-level warning analyze logs/ # Warnings only crashlens --log-level error analyze logs/ # Errors only # Save debug logs to file crashlens --debug analyze logs/ 2> debug.log # Real-time debug monitoring tail -f ~/.crashlens/debug.log

Diagnostic Commands

BASH
# System diagnostics crashlens doctor # Run all diagnostic checks crashlens doctor --check dependencies # Check Python dependencies crashlens doctor --check permissions # Check file permissions crashlens doctor --check configuration # Validate configuration # Performance diagnostics crashlens benchmark --dataset-size 1000 # Performance benchmark crashlens profile analyze logs/ # Profile analysis performance # Network diagnostics crashlens test-connection # Test all API connections crashlens test-connection --provider openai crashlens ping --endpoint https://api.openai.com/v1 # Environment diagnostics crashlens env --show # Show environment variables crashlens version --detailed # Detailed version information

⚡ Performance Optimization

💾 Memory Optimization

BASH
# Set memory limits --memory-limit 512MB # Enable disk caching --use-disk-cache # Process in smaller chunks --batch-size 500 # Streaming for large files --stream

🏃 Speed Optimization

BASH
# Parallel processing --parallel --workers 4 # Skip unnecessary checks --fast-mode # Use incremental analysis --incremental # Cache results --cache-results

Large Dataset Strategies

BASH
# For datasets > 1GB crashlens analyze \ --stream \ --parallel \ --workers 8 \ --memory-limit 1GB \ --batch-size 2000 \ --use-disk-cache \ large-logs/ # Time-based chunking for historical data crashlens analyze --date-range "2025-08-01:2025-08-07" logs/ crashlens analyze --date-range "2025-08-08:2025-08-14" logs/ # Selective analysis by criteria crashlens analyze --filter "cost > 1.0" logs/ # Only expensive requests crashlens analyze --models gpt-4,claude-3 logs/ # Specific models only

📚 ADVANCED USAGE EXAMPLES

🏢 Enterprise Integration Patterns

Multi-Environment Cost Management

YAML
# Environment-specific configurations # production.crashlens.yml policies: - enforce: "strict-cost-control" rules: - daily_budget_limit: 1000 monthly_budget_limit: 25000 alert_threshold: 0.8 block_threshold: 0.95 # staging.crashlens.yml policies: - enforce: "moderate-cost-control" rules: - daily_budget_limit: 200 monthly_budget_limit: 5000 # development.crashlens.yml policies: - enforce: "development-guidelines" rules: - daily_budget_limit: 50 warn_on_expensive_models: true # Usage crashlens policy-check --config production.crashlens.yml logs/ crashlens analyze --config staging.crashlens.yml --env staging logs/

Multi-Team Cost Allocation & Chargeback

BASH
# Team-based cost tracking crashlens analyze \ --group-by team,project \ --include-metadata team_id,project_id \ --output-format csv \ --output team-costs-$(date +%Y-%m).csv \ logs/ # Chargeback report generation crashlens report \ --template chargeback \ --period monthly \ --breakdown team,department,cost_center \ --include-budget-variance \ --output chargeback-$(date +%Y-%m).html # Budget allocation tracking crashlens analyze \ --filter "team_id IN ('team-a', 'team-b')" \ --budget-allocation team-a:5000,team-b:3000 \ --alert-on budget-exceeded \ logs/

Enterprise Alerting & Integration

BASH
# Slack integration with custom webhooks crashlens policy-check \ --slack-webhook https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ --alert-channel "#finops-alerts" \ --alert-severity high \ logs/ # Email alerts crashlens monitor \ --email-alerts admin@company.com,finops@company.com \ --email-template enterprise \ --alert-frequency daily \ .llm_logs/ # PagerDuty integration crashlens monitor \ --pagerduty-key YOUR_PAGERDUTY_INTEGRATION_KEY \ --alert-on critical-violations,budget-exceeded \ --escalation-policy high-priority \ logs/ # JIRA ticket creation for violations crashlens policy-check \ --jira-integration \ --jira-project FINOPS \ --jira-issue-type Bug \ --auto-assign finops-team \ logs/

📊 Advanced Analytics & Reporting

Cost Trend Analysis & Forecasting

BASH
# Trend analysis with forecasting crashlens analyze \ --time-series daily \ --forecast 30 \ --trend-analysis \ --include-seasonality \ --output forecast-report.json \ logs/ # Cost anomaly detection crashlens analyze \ --anomaly-detection \ --sensitivity medium \ --baseline-period 30d \ --alert-on anomalies \ logs/ # Comparative analysis across time periods crashlens compare \ --baseline "2025-07-01:2025-07-31" \ --current "2025-08-01:2025-08-31" \ --metrics cost,usage,efficiency \ --statistical-significance 0.05 \ logs/

Custom Dashboards & Visualization

BASH
# Generate interactive dashboard crashlens dashboard \ --template executive \ --include-charts cost-trend,model-usage,team-breakdown \ --refresh-interval 1h \ --port 8080 \ --auth-required \ logs/ # Export data for external visualization tools crashlens analyze \ --output-format prometheus \ --metrics-endpoint /metrics \ --export-interval 5m \ logs/ # Grafana integration crashlens export \ --format grafana \ --dashboard-config grafana-dashboard.json \ --data-source crashlens-metrics \ logs/ # Power BI integration crashlens analyze \ --output-format powerbi \ --include-relationships \ --output powerbi-dataset.pbix \ logs/

Advanced Query & Filtering

BASH
# Complex SQL-like queries crashlens query \ --filter "cost > 10 AND model LIKE 'gpt-4%' AND timestamp >= '2025-08-01'" \ --select "model, AVG(cost) as avg_cost, COUNT(*) as requests" \ --group-by model \ --having "AVG(cost) > 5" \ --order-by avg_cost DESC \ logs/ # Statistical analysis crashlens analyze \ --statistics percentiles,correlation,regression \ --correlate cost,latency,token_count \ --percentiles 50,90,95,99 \ --output stats-report.json \ logs/ # Pattern mining crashlens analyze \ --pattern-mining \ --min-support 0.1 \ --find-patterns retry-loops,cost-spikes,efficiency-issues \ --association-rules \ logs/

🔧 DevOps & CI/CD Integration

Advanced GitHub Actions Workflows

YAML
# .github/workflows/comprehensive-cost-control.yml name: Comprehensive LLM Cost Control on: push: branches: [main, develop] pull_request: branches: [main] schedule: - cron: '0 8 * * *' # Daily at 8 AM jobs: cost-analysis: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.12' - name: Install CrashLens run: pip install crashlens - name: Download logs from production env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} run: | aws s3 sync s3://your-logs-bucket/llm-logs/ ./logs/ - name: Run comprehensive analysis run: | crashlens analyze \ --parallel \ --output-format json \ --include-recommendations \ --output analysis-${{ github.sha }}.json \ logs/ - name: Policy compliance check run: | crashlens policy-check \ --config .github/crashlens.yml \ --fail-on-violation \ --detailed-output \ --slack-webhook ${{ secrets.SLACK_WEBHOOK }} \ logs/ - name: Generate executive report if: github.event_name == 'schedule' run: | crashlens report \ --template executive \ --period daily \ --include-trends \ --output daily-report-$(date +%Y-%m-%d).html \ logs/ - name: Upload artifacts uses: actions/upload-artifact@v3 with: name: cost-analysis-${{ github.sha }} path: | analysis-${{ github.sha }}.json daily-report-*.html

Kubernetes & Container Integration

YAML
# Kubernetes CronJob for cost monitoring apiVersion: batch/v1 kind: CronJob metadata: name: crashlens-monitor spec: schedule: "0 */6 * * *" # Every 6 hours jobTemplate: spec: template: spec: containers: - name: crashlens image: crashlens/crashlens:latest command: - /bin/sh - -c - | crashlens monitor \ --log-dir /logs \ --alert-on policy-violation,budget-exceeded \ --slack-webhook $SLACK_WEBHOOK \ --continuous env: - name: SLACK_WEBHOOK valueFrom: secretKeyRef: name: crashlens-secrets key: slack-webhook volumeMounts: - name: log-volume mountPath: /logs volumes: - name: log-volume persistentVolumeClaim: claimName: llm-logs-pvc restartPolicy: OnFailure # Docker Compose for local development version: '3.8' services: crashlens-monitor: image: crashlens/crashlens:latest command: > crashlens watch /logs --poll-interval 30 --alert-on policy-violation --webhook http://webhook-service:3000/alerts volumes: - ./logs:/logs:ro - ./crashlens.yml:/app/crashlens.yml:ro environment: - CRASHLENS_CONFIG=/app/crashlens.yml depends_on: - webhook-service

Infrastructure as Code Integration

TERRAFORM
# Terraform module for CrashLens monitoring # modules/crashlens/main.tf resource "aws_lambda_function" "crashlens_monitor" { filename = "crashlens-lambda.zip" function_name = "crashlens-cost-monitor" role = aws_iam_role.crashlens_role.arn handler = "index.handler" runtime = "python3.12" environment { variables = { LOG_BUCKET = var.log_bucket SLACK_WEBHOOK = var.slack_webhook POLICY_CONFIG = var.policy_config } } } resource "aws_cloudwatch_event_rule" "crashlens_schedule" { name = "crashlens-daily-check" description = "Daily CrashLens cost analysis" schedule_expression = "rate(24 hours)" } resource "aws_cloudwatch_event_target" "lambda_target" { rule = aws_cloudwatch_event_rule.crashlens_schedule.name target_id = "CrashLensLambdaTarget" arn = aws_lambda_function.crashlens_monitor.arn } # Ansible playbook for server deployment --- - hosts: monitoring_servers become: yes tasks: - name: Install CrashLens pip: name: crashlens state: latest - name: Create CrashLens config template: src: crashlens.yml.j2 dest: /etc/crashlens/crashlens.yml mode: '0644' - name: Create systemd service template: src: crashlens.service.j2 dest: /etc/systemd/system/crashlens.service notify: restart crashlens - name: Enable and start CrashLens service systemd: name: crashlens enabled: yes state: started

🔌 API Integration & Automation

REST API Usage

BASH
# Start CrashLens API server crashlens serve --port 8080 --auth-token your-secret-token # API endpoints usage curl -H "Authorization: Bearer your-secret-token" \ -X POST http://localhost:8080/api/v1/analyze \ -H "Content-Type: application/json" \ -d '{ "log_files": ["logs/app.jsonl"], "policies": ["prevent-model-overkill"], "options": { "include_recommendations": true, "output_format": "json" } }' # Policy check via API curl -H "Authorization: Bearer your-secret-token" \ -X POST http://localhost:8080/api/v1/policy-check \ -H "Content-Type: application/json" \ -d '{ "log_files": ["logs/recent.jsonl"], "policy_file": "policies/production.yml", "severity": "high" }' # Real-time monitoring webhook curl -X POST http://localhost:8080/api/v1/webhooks/register \ -H "Authorization: Bearer your-secret-token" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-app.com/webhooks/crashlens", "events": ["policy_violation", "cost_spike"], "secret": "webhook-secret" }'

Python SDK Integration

PYTHON
# Python integration example from crashlens import CrashLens, PolicyConfig import logging # Initialize CrashLens client client = CrashLens( config_file="crashlens.yml", log_level=logging.INFO ) # Programmatic policy checking async def check_llm_request(request_data): """Check LLM request against policies before sending""" try: result = await client.policy_check_async( request_data=request_data, policies=["prevent-model-overkill", "budget-enforcement"], fail_fast=True ) if result.violations: logger.warning(f"Policy violations: {result.violations}") return result.suggested_alternatives return None # No violations, proceed with request except Exception as e: logger.error(f"Policy check failed: {e}") return None # Real-time cost monitoring def setup_cost_monitoring(): """Set up real-time cost monitoring""" @client.on_cost_threshold(threshold=100, period="daily") async def handle_cost_alert(event): """Handle cost threshold alerts""" await send_slack_alert( f"Daily cost threshold exceeded: {event.current_cost}" ) @client.on_policy_violation(severity="high") async def handle_violation(violation): """Handle policy violations""" await create_incident_ticket(violation) # Start monitoring client.start_monitoring( log_sources=["./logs", "s3://company-llm-logs"], poll_interval=300 # 5 minutes ) # Batch analysis and reporting async def generate_weekly_report(): """Generate comprehensive weekly cost report""" analysis = await client.analyze_async( log_files=["logs/week-*.jsonl"], include_trends=True, include_recommendations=True, time_range="last-7-days" ) report = await client.generate_report( analysis=analysis, template="executive", format="html", include_charts=True ) # Email report to stakeholders await email_report( recipients=["cto@company.com", "finops@company.com"], subject="Weekly LLM Cost Analysis", html_content=report.html, attachments=[report.data_export] )

🤖 Machine Learning & Predictive Analytics

Predictive Cost Modeling

BASH
# Train custom cost prediction models crashlens ml train \ --model-type cost-predictor \ --features token_count,model_type,time_of_day,user_type \ --target cost \ --algorithm random-forest \ --validation-split 0.2 \ --output-model cost-model.pkl \ logs/historical-data.jsonl # Real-time cost prediction crashlens ml predict \ --model cost-model.pkl \ --input-features '{"token_count": 1500, "model_type": "gpt-4", "time_of_day": 14}' \ --confidence-interval 0.95 # Anomaly detection model crashlens ml train \ --model-type anomaly-detector \ --algorithm isolation-forest \ --contamination 0.1 \ --output-model anomaly-model.pkl \ logs/normal-usage.jsonl # Auto-scaling prediction crashlens ml predict \ --model scaling-model.pkl \ --forecast-horizon 24h \ --include-uncertainty \ --alert-on capacity-exceeded

Intelligent Policy Optimization

BASH
# Auto-optimize policies based on historical data crashlens ml optimize-policies \ --current-policies crashlens.yml \ --historical-data logs/last-90-days/ \ --objective minimize-cost \ --constraints maintain-quality \ --output optimized-policies.yml # A/B testing for policy effectiveness crashlens ml ab-test \ --policy-a current-policies.yml \ --policy-b optimized-policies.yml \ --test-data logs/test-set.jsonl \ --metrics cost,violations,user-satisfaction \ --duration 7d # Reinforcement learning for dynamic policies crashlens ml rl-train \ --environment production \ --reward-function cost-efficiency \ --exploration-strategy epsilon-greedy \ --episodes 1000 \ --save-model rl-policy-agent.pkl
Back to Documentation
Last updated: August 24, 2025