Cloud-Native Application Development

Cloud-Native Principles

Cloud-native applications are designed from the ground up to exploit cloud infrastructure — elastic scaling, managed services, pay-per-use pricing, and global distribution. The 12-Factor App methodology provides foundational guidelines:

Codebase: One codebase, many deploys. Track in version control.
Dependencies: Explicitly declare and isolate dependencies. No system-level packages.
Config: Store config in environment variables, not code. Use AWS Secrets Manager / Parameter Store.
Backing services: Treat databases, caches, and queues as attached resources.
Build/release/run: Strictly separate build (compile), release (config injection), and run stages.
Processes: Execute as stateless, share-nothing processes. Persist state in backing services.
Concurrency: Scale out via process model. Lambda scales to thousands of concurrent invocations automatically.
Disposability: Fast startup and graceful shutdown. Lambda cold start optimisation matters here.
Dev/prod parity: Keep development, staging, and production as similar as possible. Terraform enables this.
Logs: Treat logs as event streams. Write to stdout; AWS captures to CloudWatch.

Immutable Infrastructure

Never SSH into production servers to apply changes. All infrastructure changes go through code (Terraform), which replaces rather than patches resources. This makes your infrastructure auditable, reproducible, and disaster-proof.

AWS Serverless Architecture

A serverless architecture on AWS eliminates server management entirely. You write functions; AWS handles provisioning, scaling, patching, and availability.

typescriptLambda handler with API Gateway

import { APIGatewayProxyHandlerV2 } from 'aws-lambda';
import { DynamoDBClient, GetItemCommand } from '@aws-sdk/client-dynamodb';
import { marshall, unmarshall } from '@aws-sdk/util-dynamodb';

const dynamo = new DynamoDBClient({ region: process.env.AWS_REGION });

export const handler: APIGatewayProxyHandlerV2 = async (event) => {
  const userId = event.pathParameters?.userId;
  if (!userId) {
    return { statusCode: 400, body: JSON.stringify({ error: 'userId required' }) };
  }

  try {
    const result = await dynamo.send(new GetItemCommand({
      TableName: process.env.USERS_TABLE!,
      Key: marshall({ PK: `USER#${userId}`, SK: 'PROFILE' }),
    }));

    if (!result.Item) {
      return { statusCode: 404, body: JSON.stringify({ error: 'User not found' }) };
    }

    const user = unmarshall(result.Item);
    return {
      statusCode: 200,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(user),
    };
  } catch (err) {
    console.error('DynamoDB error:', err);
    return { statusCode: 500, body: JSON.stringify({ error: 'Internal server error' }) };
  }
};

Lambda Cold Start Optimisation

Initialise AWS SDK clients and DB connections outside the handler function so they're reused across invocations.
Use Lambda SnapStart (Java) or Provisioned Concurrency (all runtimes) for latency-sensitive APIs.
Keep deployment packages small — use Lambda Layers for shared dependencies.

Infrastructure as Code with Terraform

Terraform declares AWS infrastructure as HCL code. Running terraform apply creates or updates resources to match your declaration. State is stored remotely in S3 with DynamoDB locking.

hclinfra/main.tf

terraform {
  required_version = ">= 1.7"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "prod/main.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = var.aws_region
  default_tags {
    tags = { Environment = var.environment, ManagedBy = "terraform" }
  }
}

# Lambda function
resource "aws_lambda_function" "api" {
  function_name = "${var.project}-api-${var.environment}"
  role          = aws_iam_role.lambda_exec.arn
  runtime       = "nodejs20.x"
  handler       = "dist/handler.handler"
  filename      = data.archive_file.lambda_zip.output_path
  timeout       = 30
  memory_size   = 512

  environment {
    variables = {
      USERS_TABLE = aws_dynamodb_table.users.name
      AWS_NODEJS_CONNECTION_REUSE_ENABLED = "1"
    }
  }

  tracing_config { mode = "Active" }  # X-Ray tracing
}

# DynamoDB table (single-table design)
resource "aws_dynamodb_table" "users" {
  name           = "${var.project}-users-${var.environment}"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "PK"
  range_key      = "SK"

  attribute { name = "PK" type = "S" }
  attribute { name = "SK" type = "S" }
  attribute { name = "GSI1PK" type = "S" }

  global_secondary_index {
    name            = "GSI1"
    hash_key        = "GSI1PK"
    range_key       = "SK"
    projection_type = "ALL"
  }

  point_in_time_recovery { enabled = true }
  server_side_encryption  { enabled = true }
  deletion_protection_enabled = var.environment == "production"
}

bashCommon Terraform commands

# Initialise (download providers, configure backend)
terraform init

# Preview changes
terraform plan -var-file=envs/production.tfvars -out=tfplan

# Apply changes
terraform apply tfplan

# Destroy (use with caution!)
terraform destroy -var-file=envs/staging.tfvars

# Import existing AWS resource into state
terraform import aws_s3_bucket.assets my-existing-bucket-name

Event-Driven Architecture

AWS provides first-class event infrastructure. Lambda functions react to events from dozens of sources, enabling loosely coupled, asynchronous workflows without managing message brokers.

hclEvent-driven pipeline: S3 → Lambda → SQS → Lambda

# Trigger Lambda when a file is uploaded to S3
resource "aws_s3_bucket_notification" "upload_trigger" {
  bucket = aws_s3_bucket.uploads.id
  lambda_function {
    lambda_function_arn = aws_lambda_function.process_upload.arn
    events              = ["s3:ObjectCreated:*"]
    filter_suffix       = ".csv"
  }
  depends_on = [aws_lambda_permission.s3_invoke]
}

# EventBridge rule: run Lambda on a schedule
resource "aws_cloudwatch_event_rule" "daily_report" {
  name                = "daily-report"
  schedule_expression = "cron(0 8 * * ? *)"   # 8 AM UTC daily
}
resource "aws_cloudwatch_event_target" "report_lambda" {
  rule = aws_cloudwatch_event_rule.daily_report.name
  arn  = aws_lambda_function.generate_report.arn
}

# SQS queue with DLQ for failed processing
resource "aws_sqs_queue" "jobs" {
  name                      = "${var.project}-jobs"
  visibility_timeout_seconds = 300
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.jobs_dlq.arn
    maxReceiveCount     = 3
  })
}

# Lambda SQS trigger
resource "aws_lambda_event_source_mapping" "job_processor" {
  event_source_arn = aws_sqs_queue.jobs.arn
  function_name    = aws_lambda_function.job_processor.arn
  batch_size       = 10
  function_response_types = ["ReportBatchItemFailures"]
}

Managed Databases

Choose the right managed database for each workload. AWS offers relational, key-value, document, and in-memory options — all fully managed with automated backups and multi-AZ failover.

Service	Type	Best For	Scaling
DynamoDB	Key-value / Document	Single-digit-ms latency, serverless, unpredictable traffic	Automatic, on-demand
Aurora Serverless v2	Relational (MySQL/PG)	Relational data model, complex joins, ACID transactions	Auto scales ACUs
RDS PostgreSQL	Relational	Steady workloads, complex queries, PostGIS, full-text search	Manual + Read Replicas
ElastiCache	In-memory	Session store, rate limiting, leaderboards, pub/sub	Cluster mode

typescriptDynamoDB single-table design pattern

// Access patterns for a blog: get user, list user posts, get post by id
// All stored in one DynamoDB table using entity prefixes

// User record
{ PK: "USER#alice",   SK: "PROFILE",          name: "Alice", email: "alice@example.com" }

// Post record
{ PK: "POST#post-1",  SK: "METADATA",          title: "Hello", authorId: "alice" }

// GSI: query all posts by author
{ PK: "POST#post-1",  SK: "METADATA",  GSI1PK: "AUTHOR#alice", GSI1SK: "2025-01-15T10:00:00Z" }

// Query: get all posts by Alice, newest first
const result = await dynamo.send(new QueryCommand({
  TableName: 'my-app',
  IndexName: 'GSI1',
  KeyConditionExpression: 'GSI1PK = :author',
  ExpressionAttributeValues: { ':author': { S: 'AUTHOR#alice' } },
  ScanIndexForward: false,   // newest first
  Limit: 20,
}));

Caching Strategies

Caching is the single highest-impact performance optimisation. Layer your caches from the network edge to the database to minimise latency and cost.

CloudFront (CDN): Cache static assets and API responses at 400+ edge locations globally. Configure Cache-Control headers to control TTLs. Invalidate on deploy.
API Gateway Cache: Cache Lambda responses at the gateway for GET endpoints. Keyed by URL + query string. Reduces Lambda invocation cost dramatically for read-heavy APIs.
ElastiCache (Redis): In-memory cache for computed results, session data, and rate-limiting counters. Use Redis Cluster for high availability.
DynamoDB Accelerator (DAX): In-memory cache layer in front of DynamoDB. Fully API-compatible; reduces read latency from single-digit ms to microseconds for hot data.

typescriptCache-aside pattern with ElastiCache

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function getUserWithCache(userId: string) {
  const cacheKey = `user:${userId}`;

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  // 2. Cache miss — fetch from DynamoDB
  const user = await fetchUserFromDynamo(userId);
  if (!user) return null;

  // 3. Store in cache with 5-minute TTL
  await redis.setEx(cacheKey, 300, JSON.stringify(user));
  return user;
}

// On user update — invalidate cache
async function updateUser(userId: string, data: Partial) {
  await updateUserInDynamo(userId, data);
  await redis.del(`user:${userId}`);   // force refresh on next read
}

Security Best Practices

Cloud-native security follows the principle of least privilege: every resource has only the permissions it needs, no more.

hclLeast-privilege Lambda IAM role

resource "aws_iam_role" "lambda_exec" {
  name = "${var.project}-lambda-exec"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "lambda_policy" {
  role = aws_iam_role.lambda_exec.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      # CloudWatch Logs
      {
        Effect   = "Allow"
        Action   = ["logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"]
        Resource = "arn:aws:logs:*:*:log-group:/aws/lambda/${var.project}-*"
      },
      # DynamoDB - specific table only
      {
        Effect   = "Allow"
        Action   = ["dynamodb:GetItem","dynamodb:PutItem","dynamodb:UpdateItem","dynamodb:Query"]
        Resource = [aws_dynamodb_table.users.arn, "${aws_dynamodb_table.users.arn}/index/*"]
      },
      # Secrets Manager - specific secret only
      {
        Effect   = "Allow"
        Action   = "secretsmanager:GetSecretValue"
        Resource = aws_secretsmanager_secret.app_secret.arn
      }
    ]
  })
}

Never Use AdministratorAccess on Lambda

Grant only the specific DynamoDB actions, S3 prefixes, and Secrets Manager ARNs the function actually needs. Overly broad permissions turn a compromised Lambda into a full account takeover. Use IAM Access Analyzer to validate policies.

Cost Optimisation

Cloud costs can spiral without guardrails. Embed cost awareness into your architecture decisions from day one.

Lambda: Billed per request + duration (ms). Optimise memory size — more memory = faster execution = often same or lower cost. Use Lambda Power Tuning to find the optimal memory setting.
DynamoDB: Use on-demand billing for unpredictable traffic; provision capacity with auto-scaling for steady workloads. Archive old data to S3 + Athena.
S3: Set lifecycle policies to move infrequently accessed objects to S3-IA or Glacier after 30/90 days.
Reserved / Savings Plans: Commit to 1-year usage for steady workloads (EC2, RDS, Fargate) for up to 72% savings.

bashAWS CLI cost monitoring

# Get monthly cost breakdown by service
aws ce get-cost-and-usage \
  --time-period Start=2025-01-01,End=2025-02-01 \
  --granularity MONTHLY \
  --metrics "BlendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE

# Set a billing alarm (alert at $100)
aws cloudwatch put-metric-alarm \
  --alarm-name "monthly-cost-100" \
  --alarm-description "Alert when monthly bill exceeds $100" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 86400 \
  --threshold 100 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123456789:billing-alerts