> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flexprice.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Self-hosting on AWS

> Complete guide to deploy Flexprice on AWS with ECS, Aurora PostgreSQL, MSK, EKS, and Redis

This guide provides a comprehensive, step-by-step walkthrough for self-hosting Flexprice on **AWS** in a production-ready setup. It covers VPC networking, ECS compute (EC2 with ARM64), Aurora PostgreSQL, Amazon MSK (Kafka), EKS with ClickHouse, ElastiCache Redis, DynamoDB, IAM, secrets management, and observability.

## Prerequisites

Before you begin, ensure you have the following:

<Check>
  An [AWS account](https://aws.amazon.com/) with administrator or equivalent
  permissions to create VPCs, ECS, RDS, MSK, EKS, S3, IAM roles, and CloudWatch
  resources
</Check>

<Check>
  [AWS CLI v2](https://aws.amazon.com/cli/) installed and configured with
  credentials (`aws configure`)
</Check>

<Check>
  [Docker](https://www.docker.com/) installed (for building and pushing images
  to ECR)
</Check>

<Check>
  [kubectl](https://kubernetes.io/docs/tasks/tools/) installed (for
  EKS/ClickHouse management)
</Check>

<Check>
  [eksctl](https://eksctl.io/) installed (optional but recommended for EKS
  cluster creation)
</Check>

<Check>[Helm](https://helm.sh/) installed (for ClickHouse deployment)</Check>

### Region selection

Choose an AWS region that:

* Has all required services (see Cost estimation for the list)
* Is geographically close to your users for lower latency
* Meets your compliance requirements (e.g., GDPR for EU data)

This guide uses `us-east-1` as the example region. Replace with your preferred region.

### Cost estimation

We provide two configurations: a **development** setup for testing and a **production** setup for high-throughput workloads (100M+ events/month).

<Tabs>
  <Tab title="Production (~$5,500/month)">
    | Component               | Configuration                                                     | Monthly Cost        |
    | ----------------------- | ----------------------------------------------------------------- | ------------------- |
    | EC2 for ECS             | 10x m6g.xlarge (ARM64/Graviton)                                   | \~\$1,030           |
    | Aurora PostgreSQL       | 2x db.r8g.xlarge (Writer + Reader)                                | \~\$650             |
    | Amazon MSK              | 2 brokers, kafka.m5.large (4 vCPU, 8 GB), 1 TB storage per broker | \~\$350             |
    | EKS + ClickHouse        | Control plane + m5.8xlarge nodes                                  | \~\$1,900           |
    | ElastiCache Redis       | Multi-node cluster (cache.r6g.large, cluster mode)                | \~\$650             |
    | DynamoDB                | On-demand, \~100M events                                          | \~\$50              |
    | Storage (EBS)           | 3,000 GB across components (gp3)                                  | \~\$290             |
    | ALB + NAT Gateway       | 2x NAT for HA                                                     | \~\$130             |
    | S3, CloudWatch, Secrets | Storage + logs                                                    | \~\$50              |
    | **AWS Subtotal**        |                                                                   | **\~\$5,100**       |
    | Third-party services    | Temporal Cloud, Supabase, Svix, Grafana                           | \~\$400             |
    | **Total**               |                                                                   | **\~\$5,500/month** |
  </Tab>

  <Tab title="Development ($550-700/month)">
    | Component             | Configuration                  | Monthly Cost          |
    | --------------------- | ------------------------------ | --------------------- |
    | ECS Fargate           | 3 tasks (0.5 vCPU, 1 GB each)  | \~\$80                |
    | RDS PostgreSQL        | db.t3.small, Single-AZ         | \~\$30                |
    | Amazon MSK            | 2x kafka.t3.small, 100 GB each | \~\$90                |
    | EKS + ClickHouse      | 2x m5.large nodes              | \~\$200               |
    | ElastiCache Redis     | cache.t3.micro                 | \~\$15                |
    | NAT Gateway           | 1 gateway                      | \~\$35                |
    | ALB + S3 + CloudWatch | Standard                       | \~\$50                |
    | **Total**             |                                | **\~\$500-600/month** |
  </Tab>
</Tabs>

<Info>
  Costs vary by region and usage. Use the [AWS Pricing
  Calculator](https://calculator.aws/) for accurate estimates. ARM64/Graviton
  instances provide \~20% cost savings over x86.
</Info>

### Sizing for 100M events/month

| Component           | Development               | Production (100M events/month)             |
| ------------------- | ------------------------- | ------------------------------------------ |
| ECS API             | 1 task, 0.5 vCPU, 1 GB    | 6 tasks, 0.75 vCPU, 1.5 GB each            |
| ECS Consumer        | 1 task, 0.5 vCPU, 1 GB    | 30 tasks, 1 vCPU, 1.75 GB each             |
| ECS Temporal Worker | 1 task, 1 vCPU, 2 GB      | 3 tasks, 2 vCPU, 4 GB each                 |
| Database            | RDS db.t3.small           | Aurora 2x db.r8g.xlarge                    |
| Kafka               | 2x kafka.t3.small, 100 GB | 2 brokers, kafka.m5.large, 1 TB per broker |
| ClickHouse          | 2x m5.large (8 GB)        | m5.8xlarge node(s)                         |
| Redis               | cache.t3.micro            | cache.r6g.large, multi-node cluster mode   |

**Traffic and storage estimates:**

* 100M events/month = \~38.5 events/second average
* Peak traffic: 150-200 events/second (4-5x burst)
* ClickHouse storage: \~50 GB/month growth
* DynamoDB: \~20 GB/month growth

***

## Architecture overview

Flexprice on AWS runs with the following production architecture:

<img src="https://mintcdn.com/flexprice/Q2QfgDkkuWMQz8Vy/infra.png?fit=max&auto=format&n=Q2QfgDkkuWMQz8Vy&q=85&s=036d06645bb0f90234db9755cc876c6f" alt="AWS architecture for Flexprice" width="1455" height="2588" data-path="infra.png" />

**Data flow:**

* **Clients** → **Cloudflare** (DNS, WAF, rate limiting) → **ALB** → **ECS** (API, Consumer, Temporal Worker)
* **API** writes to **Aurora PostgreSQL**, publishes events to **MSK (Kafka)** and **DynamoDB**
* **Consumer** reads from Kafka and writes to **ClickHouse** (on EKS) for analytics
* **Temporal Worker** connects to **Temporal Cloud** for workflow orchestration
* **ElastiCache Redis** provides caching in cluster mode
* **S3** stores invoice PDFs; **CloudWatch** and **Grafana Cloud** collect logs and metrics

<Info>
  This guide uses **Temporal Cloud** (recommended for production). You can also
  self-host Temporal, but it requires additional infrastructure. Cloudflare is
  optional but recommended for DNS and WAF.
</Info>

### Component summary

| Component          | AWS Service        | Purpose                                      |
| ------------------ | ------------------ | -------------------------------------------- |
| Compute            | ECS on EC2 (ARM64) | API, Consumer, Temporal Worker services      |
| Primary Database   | Aurora PostgreSQL  | Transactional data, subscriptions, customers |
| Analytics Database | ClickHouse on EKS  | Event analytics, usage aggregation           |
| Message Queue      | Amazon MSK         | Event streaming between services             |
| Cache              | ElastiCache Redis  | Session cache, rate limiting                 |
| Event Store        | DynamoDB           | Durable event storage                        |
| Object Storage     | S3                 | Invoice PDFs, exports                        |
| Workflow Engine    | Temporal Cloud     | Billing workflows, scheduled jobs            |
| Authentication     | Supabase           | User authentication (optional)               |
| Webhooks           | Svix               | Webhook delivery (optional)                  |

***

## Step 1: VPC and networking

Create a VPC with public and private subnets across two Availability Zones for high availability. Unless otherwise specified, create each resource in this guide via AWS Console, CLI, or IaC using the configuration described in the tables.

### VPC configuration

| Setting                   | Value                                | Purpose                        |
| ------------------------- | ------------------------------------ | ------------------------------ |
| VPC CIDR                  | `10.0.0.0/16`                        | 65,536 IP addresses            |
| Availability Zones        | 2 (e.g., `us-east-1a`, `us-east-1b`) | High availability              |
| Public subnets            | 2 (`10.0.1.0/24`, `10.0.2.0/24`)     | ALB, NAT Gateway               |
| Private subnets (compute) | 2 (`10.0.10.0/24`, `10.0.20.0/24`)   | ECS tasks                      |
| Private subnets (data)    | 2 (`10.0.100.0/24`, `10.0.200.0/24`) | RDS, MSK, EKS                  |
| NAT Gateway               | 1 (or 2 for HA)                      | Private subnet internet access |
| Internet Gateway          | 1                                    | Public subnet internet access  |

### Create VPC with AWS CLI

Create the VPC, enable DNS hostnames, and attach an Internet Gateway.

### Create subnets

Create public and private subnets in two Availability Zones using the CIDRs in the VPC configuration table.

### Create NAT Gateway

Create an Elastic IP and NAT Gateway in a public subnet.

### Create route tables

Create public and private route tables and associate subnets (public: default route to Internet Gateway; private: default route to NAT Gateway).

### Create security groups

Create security groups for ALB, ECS, RDS, MSK, and EKS. Use the rules in the summary table below.

### Security group rules summary

| Security Group     | Inbound | Source      | Port(s)          | Purpose                  |
| ------------------ | ------- | ----------- | ---------------- | ------------------------ |
| `flexprice-alb-sg` | HTTPS   | `0.0.0.0/0` | 443              | Public API access        |
| `flexprice-alb-sg` | HTTP    | `0.0.0.0/0` | 80               | Redirect to HTTPS        |
| `flexprice-ecs-sg` | TCP     | `alb-sg`    | 8080             | ALB to API               |
| `flexprice-ecs-sg` | TCP     | `ecs-sg`    | All              | Inter-task communication |
| `flexprice-rds-sg` | TCP     | `ecs-sg`    | 5432             | PostgreSQL access        |
| `flexprice-msk-sg` | TCP     | `ecs-sg`    | 9092, 9094, 9096 | Kafka access             |
| `flexprice-eks-sg` | TCP     | `ecs-sg`    | 9000, 8123       | ClickHouse access        |

<Tip>
  For production, consider restricting the ALB security group to only Cloudflare
  IP ranges if you're using Cloudflare for DNS and WAF.
</Tip>

***

## Step 2: IAM roles and policies

Create IAM roles for ECS task execution and task runtime permissions.

### ECS Task Execution Role

This role allows ECS to pull container images and write logs. Create the role and attach the managed policy `AmazonECSTaskExecutionRolePolicy` plus an inline policy for Secrets Manager access.

### ECS Task Role

This role grants permissions for the Flexprice application at runtime (S3, CloudWatch Logs, Secrets Manager). Create the task role and attach the inline policy.

***

## Step 3: Secrets Manager

Store sensitive configuration in AWS Secrets Manager.

### Create secrets

Create secrets for PostgreSQL, ClickHouse, Kafka (SASL), auth, and Temporal Cloud. Store postgres (host, username, password, database), clickhouse (username, password), kafka (username, password), auth (64-char hex secret), and temporal (API key, key name, namespace) as needed.

<Warning>
  Replace placeholder values with strong, unique credentials. Use a password
  generator for production secrets.
</Warning>

***

## Step 4: Aurora PostgreSQL

Create an Aurora PostgreSQL cluster for Flexprice's primary database. Aurora provides higher availability and performance compared to standard RDS.

<Tabs>
  <Tab title="Production (Aurora)">
    Create a DB subnet group, Aurora cluster (with Secrets Manager managed
    credentials), writer instance (db.r8g.xlarge), and reader instance in the
    other AZ. Retrieve the cluster writer and reader endpoints for application
    configuration.
  </Tab>

  <Tab title="Development (RDS)">
    For development, use standard RDS PostgreSQL (e.g. db.t3.small) via AWS
    Console, CLI, or IaC.
  </Tab>
</Tabs>

### Aurora configuration summary

| Setting          | Development     | Production                          |
| ---------------- | --------------- | ----------------------------------- |
| Engine           | PostgreSQL 15.4 | Aurora PostgreSQL 17.4              |
| Instance class   | `db.t3.small`   | `db.r8g.xlarge` (4 vCPU, 32 GB)     |
| Instances        | 1 (Single-AZ)   | 2 (Writer + Reader, Multi-AZ)       |
| Storage          | 100 GB gp3      | Aurora I/O-Optimized (auto-scaling) |
| Multi-AZ         | No              | Yes (2 zones)                       |
| Encryption       | Enabled         | Enabled                             |
| Backup retention | 7 days          | 7 days                              |
| Monthly cost     | \~\$30          | \~\$650                             |

### Update Secrets Manager with Aurora endpoints

Update the postgres secret in Secrets Manager with the Aurora writer and reader endpoints and the managed master password ARN.

<Tip>
  Aurora with Secrets Manager managed credentials automatically rotates the
  master password. Use the `MasterUserSecret` ARN to retrieve the current
  password.
</Tip>

### Run database migrations

You can run migrations using a one-off ECS task or from a bastion host. Create a migration task definition and run it via ECS (or run `flexprice migrate up` from a host with DB access) using the configuration described above.

***

## Step 5: Amazon MSK (Kafka)

Create an Amazon MSK cluster for event streaming.

### Create MSK configuration

Create an MSK configuration (server properties) and register it.

### Create MSK cluster

Create the MSK cluster with **2 brokers** (1 per AZ), **kafka.m5.large** (4 vCPU, 8 GB) instance type, and **1024 GB (1 TB) storage per broker**. Enable SASL/SCRAM, TLS, encryption at rest, and enhanced monitoring.

### Create SASL/SCRAM secret for MSK

Create a secret in Secrets Manager with the prefix `AmazonMSK_` and associate it with the MSK cluster.

### Get MSK bootstrap brokers

Retrieve the SASL/SCRAM bootstrap broker string from the MSK cluster (AWS Console or CLI) for application configuration.

### Create Kafka topics

Use a bastion host or an EC2 instance with Kafka CLI tools to create the `events` and `events-dlq` topics (e.g. 6 partitions, replication factor 2). Use SASL\_SSL and SCRAM-SHA-512 in client configuration.

### MSK configuration summary

| Setting            | Development      | Production                            |
| ------------------ | ---------------- | ------------------------------------- |
| Kafka version      | 3.5.1            | 3.8.1                                 |
| Broker type        | `kafka.t3.small` | `kafka.m5.large` (4 vCPU, 8 GB)       |
| Number of brokers  | 2                | 2 (1 per AZ)                          |
| Storage per broker | 100 GB           | 1024 GB (1 TB)                        |
| Authentication     | SASL/SCRAM       | SASL/SCRAM + IAM                      |
| Encryption         | TLS in transit   | TLS in transit + at rest              |
| Monitoring         | Basic            | Enhanced partition-level + Prometheus |
| Monthly cost       | \~\$90           | \~\$350                               |

<Tip>
  For development, use `kafka.t3.small` with 100 GB storage. For production
  (100M+ events/month), use **2 brokers**, **kafka.m5.large**, and **1 TB
  storage per broker**.
</Tip>

***

## Step 6: EKS with ClickHouse

Create an EKS cluster and deploy ClickHouse for analytics storage. For production (100M+ events/month), use **m5.8xlarge** nodes for the ClickHouse node group.

### Create EKS cluster with eksctl

Create an EKS cluster with a managed node group (m5.8xlarge for production) via eksctl or IaC. Use private subnets and attach the EKS security group.

### Create gp3 StorageClass

Create a gp3 StorageClass (EBS CSI driver, encrypted, Retain, WaitForFirstConsumer) via kubectl or IaC.

### Create ClickHouse namespace and secrets

Create the `clickhouse` namespace and a Kubernetes secret with credentials from Secrets Manager via kubectl or IaC.

### Deploy ClickHouse with Helm

Add the Altinity ClickHouse Helm repo and install the ClickHouse Operator in the `clickhouse` namespace via Helm.

### Create ClickHouse cluster

Deploy a ClickHouseInstallation (Altinity operator) with the credentials secret, gp3 storage, and appropriate resources via kubectl or Helm.

### Create ClickHouse service for ECS access

Create a ClusterIP Service for ClickHouse (ports 9000, 8123) targeting the ClickHouse installation via kubectl or IaC.

### Get ClickHouse endpoint

For ECS tasks to access ClickHouse, you have several options:

1. **Internal NLB** (recommended): Create an internal Network Load Balancer pointing to the ClickHouse service
2. **VPC peering/Transit Gateway**: If ECS and EKS are in separate VPCs
3. **AWS PrivateLink**: For cross-account access

Create the internal NLB (type LoadBalancer with internal annotation) and use its DNS name as the ClickHouse endpoint (port 9000) for ECS configuration.

### Initialize ClickHouse database

Connect to ClickHouse (e.g. via port-forward or the NLB) and create the `flexprice` database using clickhouse-client.

***

## Step 7: ElastiCache Redis

Create an ElastiCache Redis cluster for caching and session management.

### Create Redis subnet group

Create a cache subnet group in the data subnets.

### Create Redis security group

Create a security group for Redis allowing TCP 6379 from the ECS security group.

### Create Redis replication group (cluster mode)

<Tabs>
  <Tab title="Production (Cluster Mode)">
    Create a Redis replication group with cache.r6g.large, cluster mode,
    multi-node (e.g. multiple node groups for \~\$600/month), TLS and at-rest
    encryption, and multi-AZ.
  </Tab>

  <Tab title="Development (Single Node)">
    Create a single-node Redis cluster (cache.t3.micro).
  </Tab>
</Tabs>

### Redis configuration summary

| Setting      | Development      | Production (multi-node cluster)   |
| ------------ | ---------------- | --------------------------------- |
| Node type    | `cache.t3.micro` | `cache.r6g.large` (2 vCPU, 13 GB) |
| Cluster mode | Disabled         | Enabled                           |
| Replicas     | 0                | 1 per shard                       |
| Multi-AZ     | No               | Yes                               |
| Encryption   | Optional         | TLS in transit + at rest          |
| Monthly cost | \~\$15           | \~\$600                           |

***

## Step 8: DynamoDB

Create a DynamoDB table for durable event storage alongside ClickHouse.

### Create events table

Create a DynamoDB table named `events` with partition key `pk` (String) and sort key `sk` (String), on-demand billing.

### Enable Point-in-Time Recovery

Enable point-in-time recovery (continuous backups) on the events table.

### DynamoDB configuration summary

| Setting       | Value         | Notes                        |
| ------------- | ------------- | ---------------------------- |
| Billing mode  | On-demand     | Pay per request, auto-scales |
| Partition key | `pk` (String) | Tenant/customer ID           |
| Sort key      | `sk` (String) | Event timestamp              |
| PITR          | Enabled       | Point-in-time recovery       |
| Encryption    | AWS managed   | Default encryption           |
| Monthly cost  | \~\$50        | For \~100M events/month      |

<Info>
  DynamoDB is used alongside ClickHouse for durable event storage. Events are
  written to both DynamoDB (for durability) and ClickHouse (for analytics).
</Info>

***

## Step 9: S3 and CloudWatch

### Create S3 bucket for invoices

Create an S3 bucket for invoice PDFs with versioning, AES256 encryption, block public access, and optional lifecycle rules (e.g. transition to STANDARD\_IA after 90 days).

### Create CloudWatch log groups

Create log groups for ECS services (api, worker, temporal-worker, migration) with a retention policy (e.g. 30 days).

### Create CloudWatch alarms

Create alarms for ECS API CPU, RDS CPU, and RDS connections (e.g. threshold 80%, 2 evaluation periods) and associate with an SNS topic for alerts.

***

## Step 10: ECR and container images

### Create ECR repositories

Create ECR repositories for api, worker, and temporal-worker with scan-on-push and AES256 encryption.

### Build and push images

Build Flexprice container images (api, worker, temporal-worker), authenticate to ECR, tag and push to your ECR repositories.

***

## Step 11: ECS cluster and services

### Create ECS cluster

<Tabs>
  <Tab title="Production (EC2/ARM64)">
    For production (100M+ events/month), create an ECS cluster with EC2
    capacity: launch template with **m6g.xlarge** (ARM64/Graviton), Auto Scaling
    Group with **10 nodes** (min/max as needed), capacity provider with managed
    scaling, and associate with the cluster.
  </Tab>

  <Tab title="Development (Fargate)">
    For development, create an ECS cluster with Fargate and FARGATE\_SPOT
    capacity providers.
  </Tab>
</Tabs>

### Create API task definition

Register an ECS task definition for the API service: production uses EC2/ARM64 (768 CPU, 1536 memory) with bridge network; development uses Fargate (1024 CPU, 2048 memory). Include environment variables and secrets from Secrets Manager (auth, postgres, clickhouse, kafka, temporal). Set FLEXPRICE\_DEPLOYMENT\_MODE=api, health check on :8080/health, and CloudWatch log group. See Step 12 for the full environment variable reference.

### Create Worker task definition

Register an ECS task definition for the Consumer (worker) service: FLEXPRICE\_DEPLOYMENT\_MODE=consumer, postgres/clickhouse/kafka secrets from Secrets Manager. For production use **30 tasks** (100M events/month). See Step 12 for environment variables.

### Create Temporal Worker task definition

Register an ECS task definition for the Temporal Worker: FLEXPRICE\_DEPLOYMENT\_MODE=temporal\_worker, postgres/clickhouse/kafka/temporal secrets. See Step 12 for environment variables.

### Create Application Load Balancer

Create an internet-facing Application Load Balancer in the public subnets, a target group (HTTP 8080, health check /health), an HTTPS listener with an ACM certificate, and an HTTP listener that redirects to HTTPS.

### Create ECS services

Create ECS services for API (desired count **6** for production), Worker/Consumer (desired count **30** for production), and Temporal Worker (e.g. 3 tasks). Attach the API service to the ALB target group. Use private subnets and the ECS security group.

### Configure Auto Scaling

Register scalable targets and target-tracking scaling policies for the API (and optionally Worker) services (e.g. min/max desired count, CPU target 70%).

***

## Step 12: Environment variables reference

Below is a complete reference of environment variables for each service. Variables marked with (secret) should be stored in AWS Secrets Manager.

### API service

| Variable                         | Value                     | Source      |
| -------------------------------- | ------------------------- | ----------- |
| `FLEXPRICE_DEPLOYMENT_MODE`      | `api`                     | Environment |
| `FLEXPRICE_SERVER_ADDRESS`       | `:8080`                   | Environment |
| `FLEXPRICE_AUTH_SECRET`          | 64-char hex               | Secret      |
| `FLEXPRICE_POSTGRES_HOST`        | RDS endpoint              | Secret      |
| `FLEXPRICE_POSTGRES_PORT`        | `5432`                    | Environment |
| `FLEXPRICE_POSTGRES_USER`        | `flexprice`               | Secret      |
| `FLEXPRICE_POSTGRES_PASSWORD`    | DB password               | Secret      |
| `FLEXPRICE_POSTGRES_DBNAME`      | `flexprice`               | Environment |
| `FLEXPRICE_POSTGRES_SSLMODE`     | `require`                 | Environment |
| `FLEXPRICE_CLICKHOUSE_ADDRESS`   | ClickHouse NLB endpoint   | Environment |
| `FLEXPRICE_CLICKHOUSE_USERNAME`  | `flexprice`               | Secret      |
| `FLEXPRICE_CLICKHOUSE_PASSWORD`  | ClickHouse password       | Secret      |
| `FLEXPRICE_CLICKHOUSE_DATABASE`  | `flexprice`               | Environment |
| `FLEXPRICE_CLICKHOUSE_TLS`       | `false`                   | Environment |
| `FLEXPRICE_KAFKA_BROKERS`        | MSK bootstrap brokers     | Environment |
| `FLEXPRICE_KAFKA_USE_SASL`       | `true`                    | Environment |
| `FLEXPRICE_KAFKA_SASL_MECHANISM` | `SCRAM-SHA-512`           | Environment |
| `FLEXPRICE_KAFKA_SASL_USER`      | `flexprice`               | Secret      |
| `FLEXPRICE_KAFKA_SASL_PASSWORD`  | Kafka password            | Secret      |
| `FLEXPRICE_KAFKA_TOPIC`          | `events`                  | Environment |
| `FLEXPRICE_KAFKA_CONSUMER_GROUP` | `flexprice-consumer-prod` | Environment |
| `FLEXPRICE_TEMPORAL_ADDRESS`     | Temporal Cloud endpoint   | Environment |
| `FLEXPRICE_TEMPORAL_TLS`         | `true`                    | Environment |
| `FLEXPRICE_TEMPORAL_NAMESPACE`   | Your namespace            | Environment |
| `FLEXPRICE_TEMPORAL_TASK_QUEUE`  | `billing-task-queue`      | Environment |
| `FLEXPRICE_TEMPORAL_API_KEY`     | Temporal API key          | Secret      |
| `FLEXPRICE_LOGGING_LEVEL`        | `info`                    | Environment |

### Worker and Temporal Worker services

Worker and Temporal Worker use the same variables as API, with `FLEXPRICE_DEPLOYMENT_MODE` set to `consumer` or `temporal_worker` respectively; omit `FLEXPRICE_SERVER_ADDRESS` for both.

### Additional environment variables (Production)

These variables are used in production deployments:

| Variable                              | Description                | Example                                |
| ------------------------------------- | -------------------------- | -------------------------------------- |
| `FLEXPRICE_DYNAMODB_IN_USE`           | Enable DynamoDB for events | `true`                                 |
| `FLEXPRICE_DYNAMODB_REGION`           | AWS region for DynamoDB    | `us-west-2`                            |
| `FLEXPRICE_DYNAMODB_EVENT_TABLE_NAME` | DynamoDB table name        | `events`                               |
| `FLEXPRICE_REDIS_HOST`                | ElastiCache Redis endpoint | `clustercfg.xxx.cache.amazonaws.com`   |
| `FLEXPRICE_REDIS_PORT`                | Redis port                 | `6379`                                 |
| `FLEXPRICE_REDIS_CLUSTER_MODE`        | Enable cluster mode        | `true`                                 |
| `FLEXPRICE_REDIS_USE_TLS`             | Enable TLS                 | `true`                                 |
| `FLEXPRICE_REDIS_KEY_PREFIX`          | Key prefix                 | `flexprice:prod`                       |
| `FLEXPRICE_EVENT_PUBLISH_DESTINATION` | Where to publish events    | `all` (Kafka + DynamoDB)               |
| `FLEXPRICE_LOGGING_FORMAT`            | Log format                 | `json`                                 |
| `FLEXPRICE_POSTGRES_READER_HOST`      | Aurora reader endpoint     | `xxx.cluster-ro-xxx.rds.amazonaws.com` |

***

## Step 13: Temporal Cloud configuration

Temporal Cloud is the recommended workflow orchestration service for production deployments.

### Sign up for Temporal Cloud

1. Go to [temporal.io/cloud](https://temporal.io/cloud)
2. Create an account and organization
3. Create a namespace (e.g., `flexprice-prod-usa`)

### Create service account and API key

1. In Temporal Cloud console, go to **Settings** > **API Keys**
2. Create a new API key with appropriate permissions
3. Note the API key and key name

### Store Temporal credentials

```bash theme={null}
aws secretsmanager create-secret \
  --name flexprice/${ENV}/temporal \
  --description "Flexprice Temporal Cloud credentials" \
  --secret-string '{
    "address": "us-west-2.aws.api.temporal.io:7233",
    "namespace": "your-namespace.your-account-id",
    "api_key": "YOUR_TEMPORAL_API_KEY",
    "api_key_name": "your-service-account-name"
  }'
```

### Temporal environment variables

| Variable                          | Value                                | Description             |
| --------------------------------- | ------------------------------------ | ----------------------- |
| `FLEXPRICE_TEMPORAL_ADDRESS`      | `us-west-2.aws.api.temporal.io:7233` | Temporal Cloud endpoint |
| `FLEXPRICE_TEMPORAL_NAMESPACE`    | `your-namespace.account-id`          | Your namespace          |
| `FLEXPRICE_TEMPORAL_TLS`          | `true`                               | TLS is required         |
| `FLEXPRICE_TEMPORAL_TASK_QUEUE`   | `billing-task-queue`                 | Task queue name         |
| `FLEXPRICE_TEMPORAL_API_KEY`      | (from Secrets Manager)               | API key                 |
| `FLEXPRICE_TEMPORAL_API_KEY_NAME` | Service account name                 | Key identifier          |

<Info>
  Temporal Cloud provides managed infrastructure, automatic upgrades, and 99.99%
  SLA. For self-hosted Temporal, refer to the [Temporal
  documentation](https://docs.temporal.io/self-hosted-guide).
</Info>

***

## Step 14: Third-party integrations (Optional)

Configure optional third-party services for enhanced functionality.

### Supabase (Authentication)

If using Supabase for authentication:

```bash theme={null}
aws secretsmanager create-secret \
  --name flexprice/${ENV}/supabase \
  --secret-string '{
    "base_url": "https://your-project.supabase.co",
    "service_key": "YOUR_SUPABASE_SERVICE_KEY"
  }'
```

| Variable                              | Value                |
| ------------------------------------- | -------------------- |
| `FLEXPRICE_AUTH_PROVIDER`             | `supabase`           |
| `FLEXPRICE_AUTH_SUPABASE_BASE_URL`    | Supabase project URL |
| `FLEXPRICE_AUTH_SUPABASE_SERVICE_KEY` | Service role key     |

### Svix (Webhooks)

For webhook delivery via Svix:

```bash theme={null}
aws secretsmanager create-secret \
  --name flexprice/${ENV}/svix \
  --secret-string '{
    "auth_token": "YOUR_SVIX_AUTH_TOKEN",
    "base_url": "https://api.us.svix.com"
  }'
```

| Variable                                   | Value                     |
| ------------------------------------------ | ------------------------- |
| `FLEXPRICE_WEBHOOK_SVIX_CONFIG_ENABLED`    | `true`                    |
| `FLEXPRICE_WEBHOOK_SVIX_CONFIG_AUTH_TOKEN` | Svix auth token           |
| `FLEXPRICE_WEBHOOK_SVIX_CONFIG_BASE_URL`   | `https://api.us.svix.com` |

### Sentry (Error Tracking)

For error tracking with Sentry:

| Variable                       | Value               |
| ------------------------------ | ------------------- |
| `FLEXPRICE_SENTRY_ENABLED`     | `true`              |
| `FLEXPRICE_SENTRY_DSN`         | Your Sentry DSN     |
| `FLEXPRICE_SENTRY_ENVIRONMENT` | `production`        |
| `FLEXPRICE_SENTRY_SAMPLE_RATE` | `1` (100% sampling) |

### Grafana Cloud (Observability)

For profiling with Pyroscope on Grafana Cloud:

| Variable                                  | Value                                   |
| ----------------------------------------- | --------------------------------------- |
| `FLEXPRICE_PYROSCOPE_ENABLED`             | `true`                                  |
| `FLEXPRICE_PYROSCOPE_SERVER_ADDRESS`      | `https://profiles-prod-xxx.grafana.net` |
| `FLEXPRICE_PYROSCOPE_APPLICATION_NAME`    | `flexprice-prod-api`                    |
| `FLEXPRICE_PYROSCOPE_BASIC_AUTH_USER`     | Grafana user ID                         |
| `FLEXPRICE_PYROSCOPE_BASIC_AUTH_PASSWORD` | Grafana API key                         |

### FluentD (Log Aggregation)

For centralized logging with FluentD:

| Variable                            | Value              |
| ----------------------------------- | ------------------ |
| `FLEXPRICE_LOGGING_FLUENTD_ENABLED` | `true`             |
| `FLEXPRICE_LOGGING_FLUENTD_HOST`    | FluentD service IP |
| `FLEXPRICE_LOGGING_FLUENTD_PORT`    | `30242`            |
| `FLEXPRICE_LOGGING_FORMAT`          | `json`             |

### Resend (Email)

For transactional emails via Resend:

| Variable                         | Value               |
| -------------------------------- | ------------------- |
| `FLEXPRICE_EMAIL_ENABLED`        | `true`              |
| `FLEXPRICE_EMAIL_RESEND_API_KEY` | Your Resend API key |
| `FLEXPRICE_EMAIL_FROM_ADDRESS`   | Sender email        |
| `FLEXPRICE_EMAIL_REPLY_TO`       | Reply-to email      |

### Third-party cost summary

Breakdown below; production total is in the Cost estimation table above.

| Service        | Purpose                | Monthly Cost    |
| -------------- | ---------------------- | --------------- |
| Temporal Cloud | Workflow orchestration | \~\$200         |
| Supabase       | Authentication         | \~\$25          |
| Svix           | Webhooks               | \~\$50          |
| Grafana Cloud  | Observability          | \~\$50          |
| Resend         | Email                  | \~\$20          |
| Sentry         | Error tracking         | \$0-29          |
| **Total**      |                        | **\~\$345-375** |

***

## Deployment checklist

Use this checklist to verify your deployment:

<Steps>
  <Step title="VPC and Networking">
    * [ ] VPC created with correct CIDR
    * [ ] 2 public subnets created
    * [ ] 4 private subnets created (2 compute, 2 data)
    * [ ] Internet Gateway attached
    * [ ] NAT Gateway(s) created and running
    * [ ] Route tables configured correctly
    * [ ] Security groups created with correct rules
  </Step>

  <Step title="IAM">
    * [ ] ECS Task Execution Role created
    * [ ] ECS Task Role created
    * [ ] Policies attached correctly (S3, Secrets Manager, CloudWatch, DynamoDB)
  </Step>

  <Step title="Secrets Manager">
    * [ ] PostgreSQL/Aurora credentials stored
    * [ ] ClickHouse credentials stored
    * [ ] Kafka SASL credentials stored
    * [ ] Auth secret stored
    * [ ] Temporal Cloud credentials stored
    * [ ] Third-party credentials stored (Supabase, Svix, etc.)
  </Step>

  <Step title="Aurora PostgreSQL">
    * [ ] DB subnet group created
    * [ ] Aurora cluster created and available
    * [ ] Writer and Reader instances running
    * [ ] Security group allows ECS access
    * [ ] Secrets Manager updated with endpoints
    * [ ] Database migrations completed
  </Step>

  <Step title="Amazon MSK">
    * [ ] MSK cluster created and active
    * [ ] SASL/SCRAM secret associated
    * [ ] Topics created (events, events\_lazy, events-dlq)
    * [ ] Security group allows ECS access
    * [ ] Prometheus exporters enabled
  </Step>

  <Step title="EKS and ClickHouse">
    * [ ] EKS cluster created
    * [ ] Node group running
    * [ ] gp3 StorageClass created
    * [ ] ClickHouse operator installed
    * [ ] ClickHouse cluster deployed
    * [ ] NLB created for ClickHouse access
    * [ ] Database initialized
  </Step>

  <Step title="ElastiCache Redis">
    * [ ] Redis subnet group created
    * [ ] Redis replication group created
    * [ ] Cluster mode enabled (production)
    * [ ] TLS encryption enabled
    * [ ] Security group allows ECS access
  </Step>

  <Step title="DynamoDB">
    * [ ] Events table created
    * [ ] Point-in-time recovery enabled
    * [ ] IAM policy allows ECS access
  </Step>

  <Step title="S3 and CloudWatch">
    * [ ] S3 bucket created with encryption
    * [ ] CloudWatch log groups created
    * [ ] CloudWatch alarms configured
  </Step>

  <Step title="ECR and Images">
    * [ ] ECR repositories created
    * [ ] Container images built and pushed
  </Step>

  <Step title="ECS">
    * [ ] ECS cluster created
    * [ ] Task definitions registered
    * [ ] ALB created with HTTPS listener
    * [ ] Target group configured
    * [ ] Services created and healthy
    * [ ] Auto Scaling configured
  </Step>

  <Step title="Verification">
    * [ ] API health check passing
    * [ ] Worker consuming from Kafka
    * [ ] Temporal workflows executing
    * [ ] Logs appearing in CloudWatch
  </Step>
</Steps>

***

## Troubleshooting

### API unreachable

1. **Check ALB health checks**:

   ```bash theme={null}
   aws elbv2 describe-target-health --target-group-arn $TG_ARN
   ```

2. **Check ECS task status**:

   ```bash theme={null}
   aws ecs describe-services \
     --cluster flexprice-${ENV} \
     --services flexprice-api-${ENV}
   ```

3. **Check ECS task logs**:

   ```bash theme={null}
   aws logs tail /ecs/flexprice-api-${ENV} --follow
   ```

4. **Verify security groups**:
   * ALB SG allows inbound 443 from internet
   * ECS SG allows inbound 8080 from ALB SG
   * ECS SG allows outbound to RDS, MSK, ClickHouse

### Worker not consuming

1. **Check Kafka connectivity**:

   ```bash theme={null}
   # From a bastion or EC2 instance with Kafka tools
   kafka-consumer-groups.sh \
     --bootstrap-server $MSK_BOOTSTRAP \
     --command-config client.properties \
     --group flexprice-consumer-${ENV} \
     --describe
   ```

2. **Check consumer lag in MSK CloudWatch metrics**

3. **Verify SASL credentials**:

   * Ensure `AmazonMSK_` prefixed secret is associated with cluster
   * Verify username/password match in Secrets Manager

4. **Check security group**:
   * MSK SG allows inbound 9094/9096 from ECS SG

### Temporal workflows failing

1. **Check Temporal Worker logs**:

   ```bash theme={null}
   aws logs tail /ecs/flexprice-temporal-worker-${ENV} --follow
   ```

2. **Verify Temporal Cloud connection**:

   * Correct `FLEXPRICE_TEMPORAL_ADDRESS`
   * Valid API key and namespace
   * TLS enabled

3. **Check Temporal Cloud UI** for workflow history and errors

### ClickHouse connection errors

1. **Verify ClickHouse pods are running**:

   ```bash theme={null}
   kubectl get pods -n clickhouse
   ```

2. **Check ClickHouse logs**:

   ```bash theme={null}
   kubectl logs -n clickhouse -l clickhouse.altinity.com/chi=flexprice
   ```

3. **Verify NLB is healthy**:

   ```bash theme={null}
   kubectl get svc clickhouse-nlb -n clickhouse
   ```

4. **Test connectivity from ECS**:
   * Ensure EKS SG allows inbound 9000 from ECS SG
   * Verify NLB DNS resolves correctly

### RDS connection issues

1. **Verify RDS is available**:

   ```bash theme={null}
   aws rds describe-db-instances \
     --db-instance-identifier flexprice-${ENV} \
     --query 'DBInstances[0].DBInstanceStatus'
   ```

2. **Check security group**:

   * RDS SG allows inbound 5432 from ECS SG

3. **Verify credentials**:

   * Check Secrets Manager values match RDS configuration

4. **Test from bastion**:
   ```bash theme={null}
   psql -h $RDS_ENDPOINT -U flexprice -d flexprice
   ```

***

## Scaling guidelines

Scale when metrics exceed the thresholds below.

| Component  | Metric               | Threshold         | Action                               |
| ---------- | -------------------- | ----------------- | ------------------------------------ |
| ECS        | CPU utilization      | > 70% sustained   | Scale out                            |
| ECS        | Memory utilization   | > 80% sustained   | Scale out or increase task memory    |
| ECS        | API latency (p99)    | > 500ms           | Scale out API tasks                  |
| ECS        | Kafka consumer lag   | Growing           | Scale out Worker tasks               |
| RDS        | CPU utilization      | > 80% sustained   | Upgrade instance class               |
| RDS        | Database connections | > 80% of max      | Upgrade instance or add read replica |
| RDS        | Read IOPS            | Hitting limits    | Upgrade to gp3 with higher IOPS      |
| RDS        | Storage              | > 80% used        | Increase allocated storage           |
| MSK        | Broker CPU           | > 60% sustained   | Add brokers                          |
| MSK        | Consumer lag         | Growing over time | Add partitions and consumers         |
| MSK        | Storage              | > 80% used        | Increase broker storage              |
| ClickHouse | Query latency        | Degrading         | Add replicas or upgrade nodes        |
| ClickHouse | Disk usage           | > 80%             | Expand PVCs or add shards            |
| ClickHouse | Memory pressure      | OOM events        | Increase node memory                 |

***

## Cost optimization

### Reserved instances

* **RDS**: Purchase Reserved Instances for 1-3 year commitment (up to 72% savings)
* **MSK**: Not available; consider Kafka on EC2 with Reserved Instances for significant savings

### Fargate Spot

Use Fargate Spot for non-critical workloads:

```bash theme={null}
# Update service to use Fargate Spot
aws ecs update-service \
  --cluster flexprice-${ENV} \
  --service flexprice-worker-${ENV} \
  --capacity-provider-strategy capacityProvider=FARGATE_SPOT,weight=2 capacityProvider=FARGATE,weight=1
```

### S3 lifecycle policies

Already configured to transition to IA after 90 days. Consider:

* Glacier for archives > 1 year
* Intelligent-Tiering for unpredictable access patterns

### CloudWatch log retention

Set appropriate retention periods:

* Production: 30-90 days
* Development: 7-14 days
* Archive to S3 for long-term storage

***

## Additional resources

<CardGroup cols={2}>
  <Card title="Configuration Reference" icon="gear" href="/docs/getting-started/configuration">
    Complete list of Flexprice environment variables
  </Card>

  <Card title="Architecture Overview" icon="diagram-project" href="/docs/getting-started/architecture">
    Understand Flexprice's internal architecture
  </Card>

  <Card title="Monitoring" icon="chart-line" href="/docs/event-ingestion/monitoring">
    Set up monitoring and observability
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/docs/event-ingestion/troubleshooting">
    Common issues and solutions
  </Card>
</CardGroup>

## Need help?

If you encounter issues during deployment:

* Check our [GitHub Issues](https://github.com/flexprice/flexprice/issues) for similar problems
* Join our [Slack community](https://join.slack.com/t/flexpricecommunity/shared_invite/zt-39uat51l0-n8JmSikHZP~bHJNXladeaQ) for real-time support
* Contact us at [support@flexprice.io](mailto:support@flexprice.io)
