The simulator is purpose-built as an environment for AI agents that need to learn, plan, or reason about cloud infrastructure without touching real resources.
Use Case 1
Put your Canvas Cloud AI lessons into practice by building and stress-testing real architectures in a safe, cost-free simulator.
Spinning up real AWS, GCP, Azure, or OCI resources every time you want to experiment — accumulating cloud bills, waiting for provisioning, and risking misconfigured infrastructure that leaks cost or breaks unexpectedly.
An instant, zero-cost sandbox that mirrors real provider behavior. Drag resources onto the canvas, inject traffic and failures, and watch live metrics respond — all without a cloud account or a dollar of spend.
Key Endpoints
Quick Start
# 1. Create a practice simulation (no API key needed for the demo)curl -X POST /api/simulations \-H "Content-Type: application/json" \-d '{"name": "my-first-arch","provider": "aws","resources": [{ "type": "ec2", "name": "web-server", "config": { "instanceType": "t3.medium" } },{ "type": "rds", "name": "database", "config": { "instanceType": "db.t3.micro" } }]}'# 2. Run a simulation step and observe metricscurl -X POST /api/simulations/<id>/step \-H "Content-Type: application/json" \-d '{ "trafficRPS": 500 }'# Returns: { metrics: { cpu, latency, errorRate, cost }, events: [] }# 3. Inject a failure to test resiliencecurl -X POST /api/simulations/<id>/inject-failure \-H "Content-Type: application/json" \-d '{ "type": "az_outage", "targetResourceId": "<resourceId>" }'
Use Case 2
Train a reinforcement learning agent to optimize cloud autoscaling without real infrastructure costs or risk.
Months of production data collection, thousands of dollars in cloud spend per training run, and risk of degrading real user traffic during exploration.
Compress months of production traffic patterns into minutes of safe simulation. No AWS bill, no production risk, no waiting for real scaling events to occur.
Key Endpoints
Quick Start
# 1. Create a simulationcurl -X POST /api/simulations \-H "Content-Type: application/json" \-d '{"name":"autoscale-lab","provider":"aws","resources":[...]}'# 2. Create an RL environmentcurl -X POST /api/rl/environments \-H "Authorization: Bearer <key>" \-d '{"simulationId":"<id>","maxSteps":1000}'# 3. Training loopcurl -X POST /api/rl/environments/<envId>/step \-H "Authorization: Bearer <key>" \-d '{"action":"scale_up","targetResourceId":"<resourceId>"}'# Returns: { observation, reward, done, info }
Use Case 3
Inject failures — AZ outages, DB crashes, network partitions — to find architectural weak points before production.
Expensive game days with real production risk, manual failure simulation, and no repeatable way to measure resilience scores across architecture changes.
Inject any failure type into a virtual architecture in seconds. Get a quantified resilience score, a ranked list of vulnerabilities, and specific remediation recommendations — all without touching production.
Key Endpoints
Quick Start
# 1. Browse built-in failure scenarioscurl /api/chaos/scenarios# 2. Run a chaos test (AZ outage scenario)curl -X POST /api/chaos/run \-H "Authorization: Bearer <key>" \-d '{"simulationId":"<id>","scenarioId":"az_outage","duration":300}'# Returns: { job: { id, status } }# 3. Poll for resultscurl /api/chaos/jobs/<jobId>/results \-H "Authorization: Bearer <key>"# Returns: resilienceScore, vulnerabilities[], recommendations[]
Use Case 4
Compare AWS, GCP, Azure, and DigitalOcean strategies to find the cheapest architecture that meets your SLAs.
Running parallel production workloads on multiple providers for weeks, or relying on rough estimates that miss provider-specific pricing nuances and latency trade-offs.
Evaluate every provider combination and traffic-split ratio in minutes. Typically uncovers 20–40% cost savings with detailed per-strategy cost, latency, and vendor lock-in scores.
Key Endpoints
Quick Start
# 1. Start a multi-cloud exploration jobcurl -X POST /api/multi-cloud/explore \-H "Authorization: Bearer <key>" \-d '{"simulationId": "<id>","workloadProfile": {"computeInstances": 8,"trafficRPS": 5000,"latencyRequirementMs": 100},"optimizationWeights": { "cost": 0.5, "latency": 0.3, "vendorLockIn": 0.2 }}'# 2. Get ranked strategiescurl /api/multi-cloud/jobs/<jobId>/results \-H "Authorization: Bearer <key>"# Returns: rankedStrategies[], comparisonReport, estimatedSavings
Use Case 5
Validate autoscaling thresholds against traffic forecasts before deploying changes to production.
Discovering under-provisioning during a real launch or promotion, scrambling to scale reactively, and absorbing the revenue impact of a degraded user experience.
Run what-if scenarios against any traffic shape in seconds. Get specific SLA violation windows and recommended threshold values before a single line of config changes in production.
Key Endpoints
Quick Start
# 1. Validate infrastructure against a traffic forecastcurl -X POST /api/predictions/validate \-H "Authorization: Bearer <key>" \-d '{"simulationId": "<id>","trafficForecast": {"peakRPS": 12000,"rampDurationSeconds": 300,"sustainDurationSeconds": 3600},"optimizeThresholds": true}'# 2. Get results — SLA violations and recommended thresholdscurl /api/predictions/jobs/<jobId>/results \-H "Authorization: Bearer <key>"# Returns: bottlenecks[], slaViolations[], recommendedThresholds