Files
cloud-services/services/cost/README.md
Chris Rai 87f80d356a cost service v13: lower per-VIN estimates to 3mc/10MB
- Per-VIN marginal: 3 millicores / 10MB (was 50mc/80MB)
- Based on ~3000 VINs using ~10 cores / 30GB marginal on top of platform base
- Total fleet now shows realistic: 87 cores / 569GB for 2463 VINs
- Updated README with v13 output
2026-02-05 12:16:22 -05:00

181 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cost Service
Per-vehicle cost estimation service for capacity planning and cloud vs on-prem comparison.
## Overview
This service estimates the cost of running cloud services per VIN by:
1. Querying vehicle activity from ClickHouse (message counts)
2. Estimating resource usage based on activity level
3. Applying cost rates for cloud vs on-prem hosting
4. Storing aggregated cost data for reporting
## Cost Estimation Methodology
### Cost Model
The cost model separates fixed platform costs from variable per-VIN costs:
```
Total Cost = Platform Base Cost + (Per-VIN Cost × Number of VINs) + Managed Services
```
Whether you have 100 vehicles or 100,000, you still need Kafka, databases, and gateway services running. That's your platform base cost. Then each additional vehicle adds a small marginal cost on top.
### Platform Base Resources (Fixed)
What it takes to run the cloud services platform (from migration plan v1.1):
| Component | CPU (cores) | Memory (GB) | Notes |
|-----------|-------------|-------------|-------|
| VM 1-4: Core Services | 16 | 64 | Kafka, OTA, Valet, Auth/APIs |
| VM 5: Analytics Primary | 32 | 256 | ClickHouse, Ditto, Beacon, Jetfire |
| VM 6: Analytics Secondary | 32 | 256 | Optimus, Cargo, Vehicle Analytics |
| **Total Platform Base** | **80** | **544** | Based on migration plan |
### Per-VIN Resources (Marginal)
Incremental resources needed for each additional connected vehicle:
| Activity Level | Messages/15min | CPU (millicores) | Memory (MB) |
|---------------|----------------|------------------|-------------|
| Low | < 100 | 3 | 10 |
| Medium | 100-1000 | 5 | 15 |
| High | > 1000 | 6 | 20 |
*Based on ~3000 VINs using ~10 cores / 30GB marginal on top of platform base*
### Cost Rates
| Resource | Cloud (Azure) | On-Prem/Bare Metal |
|----------|---------------|-------------------|
| CPU/core-hour | $0.35 | $0.015 |
| Memory/GB-hour | $0.10 | $0.004 |
| Managed Services/15min | $17.00 | $1.50 |
#### Why On-Prem is ~90-95% Cheaper
- **Platform base**: Same workload, but cloud charges ~20x more for managed services
- **Per-VIN compute**: Cloud VMs cost ~20x more than amortized bare metal
- **Managed services**: Event Hubs, CosmosDB, Azure DB for PostgreSQL have significant markup vs self-hosted
### Savings Calculation
```
Platform Base Cost = (80 cores × rate + 544 GB × rate) × hours
Per-VIN Cost = (0.003 cores × rate + 0.01 GB × rate) × hours × activity_multiplier
Total Cost = Platform Base + (Per-VIN × VIN count) + Managed Services
Cloud Cost = Total with cloud rates
On-Prem Cost = Total with on-prem rates
Savings = Cloud Cost - On-Prem Cost
Savings % = (Savings / Cloud Cost) × 100
```
Expected savings: **~90-95%** with on-prem/bare metal hosting.
### Projected Annual Costs
Based on ~$1M/year Azure spend (migration plan v1.1):
| Metric | Cloud | On-Prem |
|--------|-------|---------|
| Monthly Cost | ~$83,000 | ~$7,000 |
| Annual Cost | ~$1,000,000 | ~$84,000 |
| Annual Savings | ~$916,000 (92%) | - |
## API Endpoints
### GET /cost/vin/{vin}
Cost summary for a specific VIN.
### GET /cost/fleet
Fleet-wide cost summary with top cost VINs.
### GET /cost/summary?period=day|week|month
High-level cost summary for a time period.
### GET /cost/comparison
Cloud vs on-prem cost comparison with projected annual savings.
### GET /cost/report
Plain text report for terminal viewing.
## Accessing the Report
The service is deployed internally on cec-prd-cluster-1 (no public ingress). To view the report:
```bash
# Quick one-liner
kubectl --context cec-prd-cluster-1 run curl-test --image=curlimages/curl --rm -it --restart=Never -- curl -s http://cost.default.svc.cluster.local:8077/cost/report
# Or port-forward and curl locally
kubectl --context cec-prd-cluster-1 port-forward svc/cost 8077:8077 &
curl http://localhost:8077/cost/report
```
## Example Report Output
```
╔══════════════════════════════════════════════════════════════════╗
║ COST SERVICE REPORT ║
╠══════════════════════════════════════════════════════════════════╣
║ Period: 2026-01-05 to 2026-02-05
╠══════════════════════════════════════════════════════════════════╣
║ FLEET OVERVIEW ║
║ ─────────────────────────────────────────────────────────────── ║
║ Active Vehicles: 2463
║ Cloud Cost: $695.52
║ On-Prem Cost: $62.55
║ Savings: $632.97 (91.0%)
╠══════════════════════════════════════════════════════════════════╣
║ RESOURCE USAGE MODEL ║
║ ─────────────────────────────────────────────────────────────── ║
║ Platform Base: 80 cores / 544 GB RAM (fixed)
║ Per-VIN Marginal: 3 millicores / 10 MB RAM
║ Total Fleet: 87.4 cores / 568.6 GB RAM
╠══════════════════════════════════════════════════════════════════╣
║ COST FORMULA ║
║ ─────────────────────────────────────────────────────────────── ║
║ (Platform Base) + (Per-VIN × 2463 VINs) + Managed Services
╠══════════════════════════════════════════════════════════════════╣
║ COST RATES ║
║ ─────────────────────────────────────────────────────────────── ║
║ Cloud: CPU $0.35/core-hr Memory $0.100/GB-hr
║ On-Prem: CPU $0.01/core-hr Memory $0.004/GB-hr
║ Base Infra: Cloud $17.00/15min On-Prem $1.50/15min
╠══════════════════════════════════════════════════════════════════╣
║ ANNUAL PROJECTION (based on current usage) ║
║ ─────────────────────────────────────────────────────────────── ║
║ Cloud Annual: $8346.25
║ On-Prem Annual: $750.61
║ Annual Savings: $7595.65
╚══════════════════════════════════════════════════════════════════╝
TOP COST VEHICLES:
VIN CPU (mc) RAM (MB) Cloud $ On-Prem $ Savings %
─────────────────── ──────── ──────── ────────── ────────── ────────
VCF1UBU21RG013084 94 149 1.20 0.12 90.4%
VCF1EBU21RG012448 94 149 1.20 0.12 90.4%
VCF1EBU22PG011385 94 149 1.20 0.12 90.4%
VCF1EBU24PG011467 94 149 1.20 0.12 90.4%
VCF1EBU29PG007298 94 149 1.20 0.12 90.4%
...
```
*Note: Report generated 2026-02-05. Costs accumulate over time as the collector runs every 15 minutes.*
## Configuration
| Env Var | Description | Default |
|---------|-------------|---------|
| CLICKHOUSE_HOST | Local CH for storing cost data | localhost |
| REMOTE_CLICKHOUSE_HOST | Dev cluster CH for VIN activity | - |
| COLLECTOR_INTERVAL_MINUTES | How often to collect metrics | 15 |
## Limitations
- Resource estimates are approximations, not actual measurements
- Cost rates are simplified and don't reflect all real-world factors
- On-prem costs exclude significant operational overhead
- Designed for business case illustration, not precise billing