Self-hosting on AWS

This guide provides a comprehensive, step-by-step walkthrough for self-hosting Flexprice on AWS in a production-ready setup. It covers VPC networking, ECS compute (EC2 with ARM64), Aurora PostgreSQL, Amazon MSK (Kafka), EKS with ClickHouse, ElastiCache Redis, DynamoDB, IAM, secrets management, and observability.

Prerequisites

Before you begin, ensure you have the following:

An AWS account with administrator or equivalent permissions to create VPCs, ECS, RDS, MSK, EKS, S3, IAM roles, and CloudWatch resources

AWS CLI v2 installed and configured with credentials (aws configure)

Docker installed (for building and pushing images to ECR)

kubectl installed (for EKS/ClickHouse management)

eksctl installed (optional but recommended for EKS cluster creation)

Helm installed (for ClickHouse deployment)

Region selection

Choose an AWS region that:

Has all required services (see Cost estimation for the list)
Is geographically close to your users for lower latency
Meets your compliance requirements (e.g., GDPR for EU data)

This guide uses us-east-1 as the example region. Replace with your preferred region.

Cost estimation

We provide two configurations: a development setup for testing and a production setup for high-throughput workloads (100M+ events/month).

Production (~$5,500/month)
Development ($550-700/month)

Component	Configuration	Monthly Cost
EC2 for ECS	10x m6g.xlarge (ARM64/Graviton)	~$1,030
Aurora PostgreSQL	2x db.r8g.xlarge (Writer + Reader)	~$650
Amazon MSK	2 brokers, kafka.m5.large (4 vCPU, 8 GB), 1 TB storage per broker	~$350
EKS + ClickHouse	Control plane + m5.8xlarge nodes	~$1,900
ElastiCache Redis	Multi-node cluster (cache.r6g.large, cluster mode)	~$650
DynamoDB	On-demand, ~100M events	~$50
Storage (EBS)	3,000 GB across components (gp3)	~$290
ALB + NAT Gateway	2x NAT for HA	~$130
S3, CloudWatch, Secrets	Storage + logs	~$50
AWS Subtotal		~$5,100
Third-party services	Temporal Cloud, Supabase, Svix, Grafana	~$400
Total		~$5,500/month

Component	Configuration	Monthly Cost
ECS Fargate	3 tasks (0.5 vCPU, 1 GB each)	~$80
RDS PostgreSQL	db.t3.small, Single-AZ	~$30
Amazon MSK	2x kafka.t3.small, 100 GB each	~$90
EKS + ClickHouse	2x m5.large nodes	~$200
ElastiCache Redis	cache.t3.micro	~$15
NAT Gateway	1 gateway	~$35
ALB + S3 + CloudWatch	Standard	~$50
Total		~$500-600/month

Costs vary by region and usage. Use the AWS Pricing Calculator for accurate estimates. ARM64/Graviton instances provide ~20% cost savings over x86.

Sizing for 100M events/month

Component	Development	Production (100M events/month)
ECS API	1 task, 0.5 vCPU, 1 GB	6 tasks, 0.75 vCPU, 1.5 GB each
ECS Consumer	1 task, 0.5 vCPU, 1 GB	30 tasks, 1 vCPU, 1.75 GB each
ECS Temporal Worker	1 task, 1 vCPU, 2 GB	3 tasks, 2 vCPU, 4 GB each
Database	RDS db.t3.small	Aurora 2x db.r8g.xlarge
Kafka	2x kafka.t3.small, 100 GB	2 brokers, kafka.m5.large, 1 TB per broker
ClickHouse	2x m5.large (8 GB)	m5.8xlarge node(s)
Redis	cache.t3.micro	cache.r6g.large, multi-node cluster mode

Traffic and storage estimates:

100M events/month = ~38.5 events/second average
Peak traffic: 150-200 events/second (4-5x burst)
ClickHouse storage: ~50 GB/month growth
DynamoDB: ~20 GB/month growth

Architecture overview

Flexprice on AWS runs with the following production architecture:

Data flow:

Clients → Cloudflare (DNS, WAF, rate limiting) → ALB → ECS (API, Consumer, Temporal Worker)
API writes to Aurora PostgreSQL, publishes events to MSK (Kafka) and DynamoDB
Consumer reads from Kafka and writes to ClickHouse (on EKS) for analytics
Temporal Worker connects to Temporal Cloud for workflow orchestration
ElastiCache Redis provides caching in cluster mode
S3 stores invoice PDFs; CloudWatch and Grafana Cloud collect logs and metrics

This guide uses Temporal Cloud (recommended for production). You can also self-host Temporal, but it requires additional infrastructure. Cloudflare is optional but recommended for DNS and WAF.

Component summary

Component	AWS Service	Purpose
Compute	ECS on EC2 (ARM64)	API, Consumer, Temporal Worker services
Primary Database	Aurora PostgreSQL	Transactional data, subscriptions, customers
Analytics Database	ClickHouse on EKS	Event analytics, usage aggregation
Message Queue	Amazon MSK	Event streaming between services
Cache	ElastiCache Redis	Session cache, rate limiting
Event Store	DynamoDB	Durable event storage
Object Storage	S3	Invoice PDFs, exports
Workflow Engine	Temporal Cloud	Billing workflows, scheduled jobs
Authentication	Supabase	User authentication (optional)
Webhooks	Svix	Webhook delivery (optional)

Step 1: VPC and networking

Create a VPC with public and private subnets across two Availability Zones for high availability. Unless otherwise specified, create each resource in this guide via AWS Console, CLI, or IaC using the configuration described in the tables.

VPC configuration

Setting	Value	Purpose
VPC CIDR	`10.0.0.0/16`	65,536 IP addresses
Availability Zones	2 (e.g., `us-east-1a`, `us-east-1b`)	High availability
Public subnets	2 (`10.0.1.0/24`, `10.0.2.0/24`)	ALB, NAT Gateway
Private subnets (compute)	2 (`10.0.10.0/24`, `10.0.20.0/24`)	ECS tasks
Private subnets (data)	2 (`10.0.100.0/24`, `10.0.200.0/24`)	RDS, MSK, EKS
NAT Gateway	1 (or 2 for HA)	Private subnet internet access
Internet Gateway	1	Public subnet internet access

Create VPC with AWS CLI

Create the VPC, enable DNS hostnames, and attach an Internet Gateway.

Create subnets

Create public and private subnets in two Availability Zones using the CIDRs in the VPC configuration table.

Create NAT Gateway

Create an Elastic IP and NAT Gateway in a public subnet.

Create route tables

Create public and private route tables and associate subnets (public: default route to Internet Gateway; private: default route to NAT Gateway).

Create security groups

Create security groups for ALB, ECS, RDS, MSK, and EKS. Use the rules in the summary table below.

Security group rules summary

Security Group	Inbound	Source	Port(s)	Purpose
`flexprice-alb-sg`	HTTPS	`0.0.0.0/0`	443	Public API access
`flexprice-alb-sg`	HTTP	`0.0.0.0/0`	80	Redirect to HTTPS
`flexprice-ecs-sg`	TCP	`alb-sg`	8080	ALB to API
`flexprice-ecs-sg`	TCP	`ecs-sg`	All	Inter-task communication
`flexprice-rds-sg`	TCP	`ecs-sg`	5432	PostgreSQL access
`flexprice-msk-sg`	TCP	`ecs-sg`	9092, 9094, 9096	Kafka access
`flexprice-eks-sg`	TCP	`ecs-sg`	9000, 8123	ClickHouse access

For production, consider restricting the ALB security group to only Cloudflare IP ranges if you’re using Cloudflare for DNS and WAF.

Step 2: IAM roles and policies

Create IAM roles for ECS task execution and task runtime permissions.

ECS Task Execution Role

This role allows ECS to pull container images and write logs. Create the role and attach the managed policy AmazonECSTaskExecutionRolePolicy plus an inline policy for Secrets Manager access.

ECS Task Role

This role grants permissions for the Flexprice application at runtime (S3, CloudWatch Logs, Secrets Manager). Create the task role and attach the inline policy.

Step 3: Secrets Manager

Store sensitive configuration in AWS Secrets Manager.

Create secrets

Create secrets for PostgreSQL, ClickHouse, Kafka (SASL), auth, and Temporal Cloud. Store postgres (host, username, password, database), clickhouse (username, password), kafka (username, password), auth (64-char hex secret), and temporal (API key, key name, namespace) as needed.

Replace placeholder values with strong, unique credentials. Use a password generator for production secrets.

Step 4: Aurora PostgreSQL

Create an Aurora PostgreSQL cluster for Flexprice’s primary database. Aurora provides higher availability and performance compared to standard RDS.

Production (Aurora)
Development (RDS)

Create a DB subnet group, Aurora cluster (with Secrets Manager managed credentials), writer instance (db.r8g.xlarge), and reader instance in the other AZ. Retrieve the cluster writer and reader endpoints for application configuration.

Aurora configuration summary

Setting	Development	Production
Engine	PostgreSQL 15.4	Aurora PostgreSQL 17.4
Instance class	`db.t3.small`	`db.r8g.xlarge` (4 vCPU, 32 GB)
Instances	1 (Single-AZ)	2 (Writer + Reader, Multi-AZ)
Storage	100 GB gp3	Aurora I/O-Optimized (auto-scaling)
Multi-AZ	No	Yes (2 zones)
Encryption	Enabled	Enabled
Backup retention	7 days	7 days
Monthly cost	~$30	~$650

Update Secrets Manager with Aurora endpoints

Update the postgres secret in Secrets Manager with the Aurora writer and reader endpoints and the managed master password ARN.

Aurora with Secrets Manager managed credentials automatically rotates the master password. Use the MasterUserSecret ARN to retrieve the current password.

Run database migrations

You can run migrations using a one-off ECS task or from a bastion host. Create a migration task definition and run it via ECS (or run flexprice migrate up from a host with DB access) using the configuration described above.

Step 5: Amazon MSK (Kafka)

Create an Amazon MSK cluster for event streaming.

Create MSK configuration

Create an MSK configuration (server properties) and register it.

Create MSK cluster

Create the MSK cluster with 2 brokers (1 per AZ), kafka.m5.large (4 vCPU, 8 GB) instance type, and 1024 GB (1 TB) storage per broker. Enable SASL/SCRAM, TLS, encryption at rest, and enhanced monitoring.

Create SASL/SCRAM secret for MSK

Create a secret in Secrets Manager with the prefix AmazonMSK_ and associate it with the MSK cluster.

Get MSK bootstrap brokers

Retrieve the SASL/SCRAM bootstrap broker string from the MSK cluster (AWS Console or CLI) for application configuration.

Create Kafka topics

Use a bastion host or an EC2 instance with Kafka CLI tools to create the events and events-dlq topics (e.g. 6 partitions, replication factor 2). Use SASL_SSL and SCRAM-SHA-512 in client configuration.

MSK configuration summary

Setting	Development	Production
Kafka version	3.5.1	3.8.1
Broker type	`kafka.t3.small`	`kafka.m5.large` (4 vCPU, 8 GB)
Number of brokers	2	2 (1 per AZ)
Storage per broker	100 GB	1024 GB (1 TB)
Authentication	SASL/SCRAM	SASL/SCRAM + IAM
Encryption	TLS in transit	TLS in transit + at rest
Monitoring	Basic	Enhanced partition-level + Prometheus
Monthly cost	~$90	~$350

For development, use kafka.t3.small with 100 GB storage. For production (100M+ events/month), use 2 brokers, kafka.m5.large, and 1 TB storage per broker.

Step 6: EKS with ClickHouse

Create an EKS cluster and deploy ClickHouse for analytics storage. For production (100M+ events/month), use m5.8xlarge nodes for the ClickHouse node group.

Create EKS cluster with eksctl

Create an EKS cluster with a managed node group (m5.8xlarge for production) via eksctl or IaC. Use private subnets and attach the EKS security group.

Create gp3 StorageClass

Create a gp3 StorageClass (EBS CSI driver, encrypted, Retain, WaitForFirstConsumer) via kubectl or IaC.

Create ClickHouse namespace and secrets

Create the clickhouse namespace and a Kubernetes secret with credentials from Secrets Manager via kubectl or IaC.

Deploy ClickHouse with Helm

Add the Altinity ClickHouse Helm repo and install the ClickHouse Operator in the clickhouse namespace via Helm.

Create ClickHouse cluster

Deploy a ClickHouseInstallation (Altinity operator) with the credentials secret, gp3 storage, and appropriate resources via kubectl or Helm.

Create ClickHouse service for ECS access

Create a ClusterIP Service for ClickHouse (ports 9000, 8123) targeting the ClickHouse installation via kubectl or IaC.

Get ClickHouse endpoint

For ECS tasks to access ClickHouse, you have several options:

Internal NLB (recommended): Create an internal Network Load Balancer pointing to the ClickHouse service
VPC peering/Transit Gateway: If ECS and EKS are in separate VPCs
AWS PrivateLink: For cross-account access

Create the internal NLB (type LoadBalancer with internal annotation) and use its DNS name as the ClickHouse endpoint (port 9000) for ECS configuration.

Initialize ClickHouse database

Connect to ClickHouse (e.g. via port-forward or the NLB) and create the flexprice database using clickhouse-client.

Step 7: ElastiCache Redis

Create an ElastiCache Redis cluster for caching and session management.

Create Redis subnet group

Create a cache subnet group in the data subnets.

Create Redis security group

Create a security group for Redis allowing TCP 6379 from the ECS security group.

Create Redis replication group (cluster mode)

Production (Cluster Mode)
Development (Single Node)

Create a Redis replication group with cache.r6g.large, cluster mode, multi-node (e.g. multiple node groups for ~$600/month), TLS and at-rest encryption, and multi-AZ.

Redis configuration summary

Setting	Development	Production (multi-node cluster)
Node type	`cache.t3.micro`	`cache.r6g.large` (2 vCPU, 13 GB)
Cluster mode	Disabled	Enabled
Replicas	0	1 per shard
Multi-AZ	No	Yes
Encryption	Optional	TLS in transit + at rest
Monthly cost	~$15	~$600

Step 8: DynamoDB

Create a DynamoDB table for durable event storage alongside ClickHouse.

Create events table

Create a DynamoDB table named events with partition key pk (String) and sort key sk (String), on-demand billing.

Enable Point-in-Time Recovery

Enable point-in-time recovery (continuous backups) on the events table.

DynamoDB configuration summary

Setting	Value	Notes
Billing mode	On-demand	Pay per request, auto-scales
Partition key	`pk` (String)	Tenant/customer ID
Sort key	`sk` (String)	Event timestamp
PITR	Enabled	Point-in-time recovery
Encryption	AWS managed	Default encryption
Monthly cost	~$50	For ~100M events/month

DynamoDB is used alongside ClickHouse for durable event storage. Events are written to both DynamoDB (for durability) and ClickHouse (for analytics).

Step 9: S3 and CloudWatch

Create S3 bucket for invoices

Create an S3 bucket for invoice PDFs with versioning, AES256 encryption, block public access, and optional lifecycle rules (e.g. transition to STANDARD_IA after 90 days).

Create CloudWatch log groups

Create log groups for ECS services (api, worker, temporal-worker, migration) with a retention policy (e.g. 30 days).

Create CloudWatch alarms

Create alarms for ECS API CPU, RDS CPU, and RDS connections (e.g. threshold 80%, 2 evaluation periods) and associate with an SNS topic for alerts.

Step 10: ECR and container images

Create ECR repositories

Create ECR repositories for api, worker, and temporal-worker with scan-on-push and AES256 encryption.

Build and push images

Build Flexprice container images (api, worker, temporal-worker), authenticate to ECR, tag and push to your ECR repositories.

Step 11: ECS cluster and services

Create ECS cluster

Production (EC2/ARM64)
Development (Fargate)

For production (100M+ events/month), create an ECS cluster with EC2 capacity: launch template with m6g.xlarge (ARM64/Graviton), Auto Scaling Group with 10 nodes (min/max as needed), capacity provider with managed scaling, and associate with the cluster.

Create API task definition

Register an ECS task definition for the API service: production uses EC2/ARM64 (768 CPU, 1536 memory) with bridge network; development uses Fargate (1024 CPU, 2048 memory). Include environment variables and secrets from Secrets Manager (auth, postgres, clickhouse, kafka, temporal). Set FLEXPRICE_DEPLOYMENT_MODE=api, health check on :8080/health, and CloudWatch log group. See Step 12 for the full environment variable reference.

Create Worker task definition

Register an ECS task definition for the Consumer (worker) service: FLEXPRICE_DEPLOYMENT_MODE=consumer, postgres/clickhouse/kafka secrets from Secrets Manager. For production use 30 tasks (100M events/month). See Step 12 for environment variables.

Create Temporal Worker task definition

Register an ECS task definition for the Temporal Worker: FLEXPRICE_DEPLOYMENT_MODE=temporal_worker, postgres/clickhouse/kafka/temporal secrets. See Step 12 for environment variables.

Create Application Load Balancer

Create an internet-facing Application Load Balancer in the public subnets, a target group (HTTP 8080, health check /health), an HTTPS listener with an ACM certificate, and an HTTP listener that redirects to HTTPS.

Create ECS services

Create ECS services for API (desired count 6 for production), Worker/Consumer (desired count 30 for production), and Temporal Worker (e.g. 3 tasks). Attach the API service to the ALB target group. Use private subnets and the ECS security group.

Configure Auto Scaling

Register scalable targets and target-tracking scaling policies for the API (and optionally Worker) services (e.g. min/max desired count, CPU target 70%).

Step 12: Environment variables reference

Below is a complete reference of environment variables for each service. Variables marked with (secret) should be stored in AWS Secrets Manager.

API service

Variable	Value	Source
`FLEXPRICE_DEPLOYMENT_MODE`	`api`	Environment
`FLEXPRICE_SERVER_ADDRESS`	`:8080`	Environment
`FLEXPRICE_AUTH_SECRET`	64-char hex	Secret
`FLEXPRICE_POSTGRES_HOST`	RDS endpoint	Secret
`FLEXPRICE_POSTGRES_PORT`	`5432`	Environment
`FLEXPRICE_POSTGRES_USER`	`flexprice`	Secret
`FLEXPRICE_POSTGRES_PASSWORD`	DB password	Secret
`FLEXPRICE_POSTGRES_DBNAME`	`flexprice`	Environment
`FLEXPRICE_POSTGRES_SSLMODE`	`require`	Environment
`FLEXPRICE_CLICKHOUSE_ADDRESS`	ClickHouse NLB endpoint	Environment
`FLEXPRICE_CLICKHOUSE_USERNAME`	`flexprice`	Secret
`FLEXPRICE_CLICKHOUSE_PASSWORD`	ClickHouse password	Secret
`FLEXPRICE_CLICKHOUSE_DATABASE`	`flexprice`	Environment
`FLEXPRICE_CLICKHOUSE_TLS`	`false`	Environment
`FLEXPRICE_KAFKA_BROKERS`	MSK bootstrap brokers	Environment
`FLEXPRICE_KAFKA_USE_SASL`	`true`	Environment
`FLEXPRICE_KAFKA_SASL_MECHANISM`	`SCRAM-SHA-512`	Environment
`FLEXPRICE_KAFKA_SASL_USER`	`flexprice`	Secret
`FLEXPRICE_KAFKA_SASL_PASSWORD`	Kafka password	Secret
`FLEXPRICE_KAFKA_TOPIC`	`events`	Environment
`FLEXPRICE_KAFKA_CONSUMER_GROUP`	`flexprice-consumer-prod`	Environment
`FLEXPRICE_TEMPORAL_ADDRESS`	Temporal Cloud endpoint	Environment
`FLEXPRICE_TEMPORAL_TLS`	`true`	Environment
`FLEXPRICE_TEMPORAL_NAMESPACE`	Your namespace	Environment
`FLEXPRICE_TEMPORAL_TASK_QUEUE`	`billing-task-queue`	Environment
`FLEXPRICE_TEMPORAL_API_KEY`	Temporal API key	Secret
`FLEXPRICE_LOGGING_LEVEL`	`info`	Environment

Worker and Temporal Worker services

Worker and Temporal Worker use the same variables as API, with FLEXPRICE_DEPLOYMENT_MODE set to consumer or temporal_worker respectively; omit FLEXPRICE_SERVER_ADDRESS for both.

Additional environment variables (Production)

These variables are used in production deployments:

Variable	Description	Example
`FLEXPRICE_DYNAMODB_IN_USE`	Enable DynamoDB for events	`true`
`FLEXPRICE_DYNAMODB_REGION`	AWS region for DynamoDB	`us-west-2`
`FLEXPRICE_DYNAMODB_EVENT_TABLE_NAME`	DynamoDB table name	`events`
`FLEXPRICE_REDIS_HOST`	ElastiCache Redis endpoint	`clustercfg.xxx.cache.amazonaws.com`
`FLEXPRICE_REDIS_PORT`	Redis port	`6379`
`FLEXPRICE_REDIS_CLUSTER_MODE`	Enable cluster mode	`true`
`FLEXPRICE_REDIS_USE_TLS`	Enable TLS	`true`
`FLEXPRICE_REDIS_KEY_PREFIX`	Key prefix	`flexprice:prod`
`FLEXPRICE_EVENT_PUBLISH_DESTINATION`	Where to publish events	`all` (Kafka + DynamoDB)
`FLEXPRICE_LOGGING_FORMAT`	Log format	`json`
`FLEXPRICE_POSTGRES_READER_HOST`	Aurora reader endpoint	`xxx.cluster-ro-xxx.rds.amazonaws.com`

Step 13: Temporal Cloud configuration

Temporal Cloud is the recommended workflow orchestration service for production deployments.

Go to temporal.io/cloud
Create an account and organization
Create a namespace (e.g., flexprice-prod-usa)

Create service account and API key

In Temporal Cloud console, go to Settings > API Keys
Create a new API key with appropriate permissions
Note the API key and key name

Store Temporal credentials

aws secretsmanager create-secret \
  --name flexprice/${ENV}/temporal \
  --description "Flexprice Temporal Cloud credentials" \
  --secret-string '{
    "address": "us-west-2.aws.api.temporal.io:7233",
    "namespace": "your-namespace.your-account-id",
    "api_key": "YOUR_TEMPORAL_API_KEY",
    "api_key_name": "your-service-account-name"
  }'

Temporal environment variables

Variable	Value	Description
`FLEXPRICE_TEMPORAL_ADDRESS`	`us-west-2.aws.api.temporal.io:7233`	Temporal Cloud endpoint
`FLEXPRICE_TEMPORAL_NAMESPACE`	`your-namespace.account-id`	Your namespace
`FLEXPRICE_TEMPORAL_TLS`	`true`	TLS is required
`FLEXPRICE_TEMPORAL_TASK_QUEUE`	`billing-task-queue`	Task queue name
`FLEXPRICE_TEMPORAL_API_KEY`	(from Secrets Manager)	API key
`FLEXPRICE_TEMPORAL_API_KEY_NAME`	Service account name	Key identifier

Temporal Cloud provides managed infrastructure, automatic upgrades, and 99.99% SLA. For self-hosted Temporal, refer to the Temporal documentation.

Step 14: Third-party integrations (Optional)

Configure optional third-party services for enhanced functionality.

Supabase (Authentication)

If using Supabase for authentication:

aws secretsmanager create-secret \
  --name flexprice/${ENV}/supabase \
  --secret-string '{
    "base_url": "https://your-project.supabase.co",
    "service_key": "YOUR_SUPABASE_SERVICE_KEY"
  }'

Variable	Value
`FLEXPRICE_AUTH_PROVIDER`	`supabase`
`FLEXPRICE_AUTH_SUPABASE_BASE_URL`	Supabase project URL
`FLEXPRICE_AUTH_SUPABASE_SERVICE_KEY`	Service role key

Svix (Webhooks)

For webhook delivery via Svix:

aws secretsmanager create-secret \
  --name flexprice/${ENV}/svix \
  --secret-string '{
    "auth_token": "YOUR_SVIX_AUTH_TOKEN",
    "base_url": "https://api.us.svix.com"
  }'

Variable	Value
`FLEXPRICE_WEBHOOK_SVIX_CONFIG_ENABLED`	`true`
`FLEXPRICE_WEBHOOK_SVIX_CONFIG_AUTH_TOKEN`	Svix auth token
`FLEXPRICE_WEBHOOK_SVIX_CONFIG_BASE_URL`	`https://api.us.svix.com`

Sentry (Error Tracking)

For error tracking with Sentry:

Variable	Value
`FLEXPRICE_SENTRY_ENABLED`	`true`
`FLEXPRICE_SENTRY_DSN`	Your Sentry DSN
`FLEXPRICE_SENTRY_ENVIRONMENT`	`production`
`FLEXPRICE_SENTRY_SAMPLE_RATE`	`1` (100% sampling)

Grafana Cloud (Observability)

For profiling with Pyroscope on Grafana Cloud:

Variable	Value
`FLEXPRICE_PYROSCOPE_ENABLED`	`true`
`FLEXPRICE_PYROSCOPE_SERVER_ADDRESS`	`https://profiles-prod-xxx.grafana.net`
`FLEXPRICE_PYROSCOPE_APPLICATION_NAME`	`flexprice-prod-api`
`FLEXPRICE_PYROSCOPE_BASIC_AUTH_USER`	Grafana user ID
`FLEXPRICE_PYROSCOPE_BASIC_AUTH_PASSWORD`	Grafana API key

FluentD (Log Aggregation)

For centralized logging with FluentD:

Variable	Value
`FLEXPRICE_LOGGING_FLUENTD_ENABLED`	`true`
`FLEXPRICE_LOGGING_FLUENTD_HOST`	FluentD service IP
`FLEXPRICE_LOGGING_FLUENTD_PORT`	`30242`
`FLEXPRICE_LOGGING_FORMAT`	`json`

Resend (Email)

For transactional emails via Resend:

Variable	Value
`FLEXPRICE_EMAIL_ENABLED`	`true`
`FLEXPRICE_EMAIL_RESEND_API_KEY`	Your Resend API key
`FLEXPRICE_EMAIL_FROM_ADDRESS`	Sender email
`FLEXPRICE_EMAIL_REPLY_TO`	Reply-to email

Third-party cost summary

Breakdown below; production total is in the Cost estimation table above.

Service	Purpose	Monthly Cost
Temporal Cloud	Workflow orchestration	~$200
Supabase	Authentication	~$25
Svix	Webhooks	~$50
Grafana Cloud	Observability	~$50
Resend	Email	~$20
Sentry	Error tracking	$0-29
Total		~$345-375

Deployment checklist

Use this checklist to verify your deployment:

Troubleshooting

API unreachable

Check ALB health checks:

aws elbv2 describe-target-health --target-group-arn $TG_ARN

Check ECS task status:

aws ecs describe-services \
  --cluster flexprice-${ENV} \
  --services flexprice-api-${ENV}

Check ECS task logs:

aws logs tail /ecs/flexprice-api-${ENV} --follow

Verify security groups:
- ALB SG allows inbound 443 from internet
- ECS SG allows inbound 8080 from ALB SG
- ECS SG allows outbound to RDS, MSK, ClickHouse

Worker not consuming

Check Kafka connectivity:

# From a bastion or EC2 instance with Kafka tools
kafka-consumer-groups.sh \
  --bootstrap-server $MSK_BOOTSTRAP \
  --command-config client.properties \
  --group flexprice-consumer-${ENV} \
  --describe

Check consumer lag in MSK CloudWatch metrics
Verify SASL credentials:
- Ensure AmazonMSK_ prefixed secret is associated with cluster
- Verify username/password match in Secrets Manager
Check security group:
- MSK SG allows inbound 9094/9096 from ECS SG

Temporal workflows failing

Check Temporal Worker logs:

aws logs tail /ecs/flexprice-temporal-worker-${ENV} --follow

Verify Temporal Cloud connection:
- Correct FLEXPRICE_TEMPORAL_ADDRESS
- Valid API key and namespace
- TLS enabled
Check Temporal Cloud UI for workflow history and errors

ClickHouse connection errors

Verify ClickHouse pods are running:
```
kubectl get pods -n clickhouse
```

Check ClickHouse logs:

kubectl logs -n clickhouse -l clickhouse.altinity.com/chi=flexprice

Verify NLB is healthy:

kubectl get svc clickhouse-nlb -n clickhouse

Test connectivity from ECS:
- Ensure EKS SG allows inbound 9000 from ECS SG
- Verify NLB DNS resolves correctly

RDS connection issues

Verify RDS is available:

aws rds describe-db-instances \
  --db-instance-identifier flexprice-${ENV} \
  --query 'DBInstances[0].DBInstanceStatus'

Check security group:
- RDS SG allows inbound 5432 from ECS SG
Verify credentials:
- Check Secrets Manager values match RDS configuration

Test from bastion:

psql -h $RDS_ENDPOINT -U flexprice -d flexprice

Scaling guidelines

Scale when metrics exceed the thresholds below.

Component	Metric	Threshold	Action
ECS	CPU utilization	> 70% sustained	Scale out
ECS	Memory utilization	> 80% sustained	Scale out or increase task memory
ECS	API latency (p99)	> 500ms	Scale out API tasks
ECS	Kafka consumer lag	Growing	Scale out Worker tasks
RDS	CPU utilization	> 80% sustained	Upgrade instance class
RDS	Database connections	> 80% of max	Upgrade instance or add read replica
RDS	Read IOPS	Hitting limits	Upgrade to gp3 with higher IOPS
RDS	Storage	> 80% used	Increase allocated storage
MSK	Broker CPU	> 60% sustained	Add brokers
MSK	Consumer lag	Growing over time	Add partitions and consumers
MSK	Storage	> 80% used	Increase broker storage
ClickHouse	Query latency	Degrading	Add replicas or upgrade nodes
ClickHouse	Disk usage	> 80%	Expand PVCs or add shards
ClickHouse	Memory pressure	OOM events	Increase node memory

Cost optimization

Reserved instances

RDS: Purchase Reserved Instances for 1-3 year commitment (up to 72% savings)
MSK: Not available; consider Kafka on EC2 with Reserved Instances for significant savings

Fargate Spot

Use Fargate Spot for non-critical workloads:

# Update service to use Fargate Spot
aws ecs update-service \
  --cluster flexprice-${ENV} \
  --service flexprice-worker-${ENV} \
  --capacity-provider-strategy capacityProvider=FARGATE_SPOT,weight=2 capacityProvider=FARGATE,weight=1

S3 lifecycle policies

Already configured to transition to IA after 90 days. Consider:

Glacier for archives > 1 year
Intelligent-Tiering for unpredictable access patterns

CloudWatch log retention

Set appropriate retention periods:

Production: 30-90 days
Development: 7-14 days
Archive to S3 for long-term storage

Additional resources

Configuration Reference

Complete list of Flexprice environment variables

Architecture Overview

Understand Flexprice’s internal architecture

Monitoring

Set up monitoring and observability

Troubleshooting

Common issues and solutions

Need help?

If you encounter issues during deployment:

Check our GitHub Issues for similar problems
Join our Slack community for real-time support
Contact us at support@flexprice.io

Introduction

Getting Started

Connect

Event Ingestion

Product Catalogue

Customers

Subscriptions

Wallet

Invoices

Webhooks

Settings

Collectors

Data Exports

RBAC

Cookbooks

Contributing Guide

​Prerequisites

​Region selection

​Cost estimation

​Sizing for 100M events/month

​Architecture overview

​Component summary

​Step 1: VPC and networking

​VPC configuration

​Create VPC with AWS CLI

​Create subnets

​Create NAT Gateway

​Create route tables

​Create security groups

​Security group rules summary

​Step 2: IAM roles and policies

​ECS Task Execution Role

​ECS Task Role

​Step 3: Secrets Manager

​Create secrets

​Step 4: Aurora PostgreSQL

​Aurora configuration summary

​Update Secrets Manager with Aurora endpoints

​Run database migrations

​Step 5: Amazon MSK (Kafka)

​Create MSK configuration

​Create MSK cluster

​Create SASL/SCRAM secret for MSK

​Get MSK bootstrap brokers

​Create Kafka topics

​MSK configuration summary

​Step 6: EKS with ClickHouse

​Create EKS cluster with eksctl

​Create gp3 StorageClass

​Create ClickHouse namespace and secrets

​Deploy ClickHouse with Helm

​Create ClickHouse cluster

​Create ClickHouse service for ECS access

​Get ClickHouse endpoint

​Initialize ClickHouse database

​Step 7: ElastiCache Redis

​Create Redis subnet group

​Create Redis security group

​Create Redis replication group (cluster mode)

​Redis configuration summary

​Step 8: DynamoDB

​Create events table

​Enable Point-in-Time Recovery

​DynamoDB configuration summary

​Step 9: S3 and CloudWatch

​Create S3 bucket for invoices

​Create CloudWatch log groups

​Create CloudWatch alarms

​Step 10: ECR and container images

​Create ECR repositories

​Build and push images

​Step 11: ECS cluster and services

​Create ECS cluster

​Create API task definition

​Create Worker task definition

​Create Temporal Worker task definition

​Create Application Load Balancer

​Create ECS services

​Configure Auto Scaling

​Step 12: Environment variables reference

Prerequisites

Region selection

Cost estimation

Sizing for 100M events/month

Architecture overview

Component summary

Step 1: VPC and networking

VPC configuration

Create VPC with AWS CLI

Create subnets

Create NAT Gateway

Create route tables

Create security groups

Security group rules summary

Step 2: IAM roles and policies

ECS Task Execution Role

ECS Task Role

Step 3: Secrets Manager

Create secrets

Step 4: Aurora PostgreSQL

Aurora configuration summary

Update Secrets Manager with Aurora endpoints

Run database migrations

Step 5: Amazon MSK (Kafka)

Create MSK configuration

Create MSK cluster

Create SASL/SCRAM secret for MSK

Get MSK bootstrap brokers

Create Kafka topics

MSK configuration summary

Step 6: EKS with ClickHouse

Create EKS cluster with eksctl

Create gp3 StorageClass

Create ClickHouse namespace and secrets

Deploy ClickHouse with Helm

Create ClickHouse cluster

Create ClickHouse service for ECS access

Get ClickHouse endpoint

Initialize ClickHouse database

Step 7: ElastiCache Redis

Create Redis subnet group

Create Redis security group

Create Redis replication group (cluster mode)

Redis configuration summary

Step 8: DynamoDB

Create events table

Enable Point-in-Time Recovery

DynamoDB configuration summary

Step 9: S3 and CloudWatch

Create S3 bucket for invoices

Create CloudWatch log groups

Create CloudWatch alarms

Step 10: ECR and container images

Create ECR repositories

Build and push images

Step 11: ECS cluster and services

Create ECS cluster

Create API task definition

Create Worker task definition

Create Temporal Worker task definition

Create Application Load Balancer

Create ECS services

Configure Auto Scaling

Step 12: Environment variables reference

API service

Worker and Temporal Worker services

Additional environment variables (Production)

Step 13: Temporal Cloud configuration

Sign up for Temporal Cloud

Create service account and API key

Store Temporal credentials

Temporal environment variables

Step 14: Third-party integrations (Optional)

Supabase (Authentication)

Svix (Webhooks)

Sentry (Error Tracking)

Grafana Cloud (Observability)

FluentD (Log Aggregation)

Resend (Email)

Third-party cost summary