Files
cloud-services/services/cost/README.md
Chris Rai 27fe5af164 cost service: update estimates from migration plan v1.1
- Platform base: 80 cores / 544GB RAM (from migration plan)
- Cloud rates: $0.35/core-hr, $0.10/GB-hr, $17/15min managed services
- On-prem rates: $0.015/core-hr, $0.004/GB-hr, $1.50/15min
- Based on ~$1M/year Azure spend, ~92% savings with on-prem
- Updated README with migration plan references
2026-02-05 10:30:09 -05:00

179 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cost Service
Per-vehicle cost estimation service for capacity planning and cloud vs on-prem comparison.
## Overview
This service estimates the cost of running cloud services per VIN by:
1. Querying vehicle activity from ClickHouse (message counts)
2. Estimating resource usage based on activity level
3. Applying cost rates for cloud vs on-prem hosting
4. Storing aggregated cost data for reporting
## Cost Estimation Methodology
### Cost Model
The cost model separates fixed platform costs from variable per-VIN costs:
```
Total Cost = Platform Base Cost + (Per-VIN Cost × Number of VINs) + Managed Services
```
Whether you have 100 vehicles or 100,000, you still need Kafka, databases, and gateway services running. That's your platform base cost. Then each additional vehicle adds a small marginal cost on top.
### Platform Base Resources (Fixed)
What it takes to run the cloud services platform (from migration plan v1.1):
| Component | CPU (cores) | Memory (GB) | Notes |
|-----------|-------------|-------------|-------|
| VM 1-4: Core Services | 16 | 64 | Kafka, OTA, Valet, Auth/APIs |
| VM 5: Analytics Primary | 32 | 256 | ClickHouse, Ditto, Beacon, Jetfire |
| VM 6: Analytics Secondary | 32 | 256 | Optimus, Cargo, Vehicle Analytics |
| **Total Platform Base** | **80** | **544** | Based on migration plan |
### Per-VIN Resources (Marginal)
Incremental resources needed for each additional connected vehicle:
| Activity Level | Messages/15min | CPU (millicores) | Memory (MB) |
|---------------|----------------|------------------|-------------|
| Low | < 100 | 50 | 80 |
| Medium | 100-1000 | 75 | 120 |
| High | > 1000 | 100 | 160 |
### Cost Rates
| Resource | Cloud (Azure) | On-Prem/Bare Metal |
|----------|---------------|-------------------|
| CPU/core-hour | $0.35 | $0.015 |
| Memory/GB-hour | $0.10 | $0.004 |
| Managed Services/15min | $17.00 | $1.50 |
#### Why On-Prem is ~90-95% Cheaper
- **Platform base**: Same workload, but cloud charges ~20x more for managed services
- **Per-VIN compute**: Cloud VMs cost ~20x more than amortized bare metal
- **Managed services**: Event Hubs, CosmosDB, Azure DB for PostgreSQL have significant markup vs self-hosted
### Savings Calculation
```
Platform Base Cost = (80 cores × rate + 544 GB × rate) × hours
Per-VIN Cost = (0.05 cores × rate + 0.08 GB × rate) × hours × activity_multiplier
Total Cost = Platform Base + (Per-VIN × VIN count) + Managed Services
Cloud Cost = Total with cloud rates
On-Prem Cost = Total with on-prem rates
Savings = Cloud Cost - On-Prem Cost
Savings % = (Savings / Cloud Cost) × 100
```
Expected savings: **~90-95%** with on-prem/bare metal hosting.
### Projected Annual Costs
Based on ~$1M/year Azure spend (migration plan v1.1):
| Metric | Cloud | On-Prem |
|--------|-------|---------|
| Monthly Cost | ~$83,000 | ~$7,000 |
| Annual Cost | ~$1,000,000 | ~$84,000 |
| Annual Savings | ~$916,000 (92%) | - |
## API Endpoints
### GET /cost/vin/{vin}
Cost summary for a specific VIN.
### GET /cost/fleet
Fleet-wide cost summary with top cost VINs.
### GET /cost/summary?period=day|week|month
High-level cost summary for a time period.
### GET /cost/comparison
Cloud vs on-prem cost comparison with projected annual savings.
### GET /cost/report
Plain text report for terminal viewing.
## Accessing the Report
The service is deployed internally on cec-prd-cluster-1 (no public ingress). To view the report:
```bash
# Quick one-liner
kubectl --context cec-prd-cluster-1 run curl-test --image=curlimages/curl --rm -it --restart=Never -- curl -s http://cost.default.svc.cluster.local:8077/cost/report
# Or port-forward and curl locally
kubectl --context cec-prd-cluster-1 port-forward svc/cost 8077:8077 &
curl http://localhost:8077/cost/report
```
## Example Report Output
```
╔══════════════════════════════════════════════════════════════════╗
║ COST SERVICE REPORT ║
╠══════════════════════════════════════════════════════════════════╣
║ Period: 2026-01-05 to 2026-02-05
╠══════════════════════════════════════════════════════════════════╣
║ FLEET OVERVIEW ║
║ ─────────────────────────────────────────────────────────────── ║
║ Active Vehicles: 3229
║ Cloud Cost: $9761.61
║ On-Prem Cost: $677.88
║ Savings: $9083.73 (93.1%)
╠══════════════════════════════════════════════════════════════════╣
║ RESOURCE USAGE MODEL ║
║ ─────────────────────────────────────────────────────────────── ║
║ Platform Base: 176 cores / 896 GB RAM (fixed)
║ Per-VIN Marginal: 50 millicores / 82 MB RAM
║ Total Fleet: 337.5 cores / 1154.3 GB RAM
╠══════════════════════════════════════════════════════════════════╣
║ COST FORMULA ║
║ ─────────────────────────────────────────────────────────────── ║
║ (Platform Base) + (Per-VIN × 3229 VINs) + Managed Services
╠══════════════════════════════════════════════════════════════════╣
║ COST RATES ║
║ ─────────────────────────────────────────────────────────────── ║
║ Cloud: CPU $0.30/core-hr Memory $0.080/GB-hr
║ On-Prem: CPU $0.02/core-hr Memory $0.005/GB-hr
║ Base Infra: Cloud $10.00/15min On-Prem $2.50/15min
╠══════════════════════════════════════════════════════════════════╣
║ ANNUAL PROJECTION (based on current usage) ║
║ ─────────────────────────────────────────────────────────────── ║
║ Cloud Annual: $117139.28
║ On-Prem Annual: $8134.50
║ Annual Savings: $109004.77
╚══════════════════════════════════════════════════════════════════╝
TOP COST VEHICLES:
VIN CPU (mc) RAM (MB) Cloud $ On-Prem $ Savings %
─────────────────── ──────── ──────── ────────── ────────── ────────
VCF1UBU21PG008884 100 164 14.10 1.04 92.6%
VCF1EBU24PG007242 100 164 14.10 1.04 92.6%
VCF1ZBU29PG006267 100 164 14.10 1.04 92.6%
VCF1EBU26PG007307 75 123 13.41 0.99 92.6%
VCF1EBU22PG011967 50 82 12.77 0.93 92.8%
...
```
*Note: Report generated 2026-02-05. Costs accumulate over time as the collector runs every 15 minutes.*
## Configuration
| Env Var | Description | Default |
|---------|-------------|---------|
| CLICKHOUSE_HOST | Local CH for storing cost data | localhost |
| REMOTE_CLICKHOUSE_HOST | Dev cluster CH for VIN activity | - |
| COLLECTOR_INTERVAL_MINUTES | How often to collect metrics | 15 |
## Limitations
- Resource estimates are approximations, not actual measurements
- Cost rates are simplified and don't reflect all real-world factors
- On-prem costs exclude significant operational overhead
- Designed for business case illustration, not precise billing