Terraform (reinforce)

State מרוחק, modules, workspaces, import, idempotency ו-Pulumi כחלופה — כל מה שנשאלים בראיון

תיאוריה

⏱ ~50 דקות

Terraform בשוק העבודה הישראלי 2026

לפי סריקת שוק אפריל 2026, Terraform מופיע ב-3.5% ממשרות ה-DevOps בישראל — בדרך כלל יחד עם AWS (9.4%) ו-Kubernetes (12.9%). בפועל, הדרישה גבוהה יותר: כמעט כל חברה שמשתמשת ב-AWS מצפה שמהנדס DevOps יבין IaC. הסיבה שהמספר נמוך בכותרות המשרות היא שחברות מניחות Terraform כ-baseline ולא מציינות אותו במפורש.

מה שנשאלים בראיון ב-2026:
- הבדל בין terraform plan ל-terraform apply בהקשר של CI/CD pipeline
- מה קורה כשמוחקים resource מה-state ידנית (terraform state rm)
- איך מאבטחים state file (S3 + encryption + IAM)
- מה Pulumi ומתי בוחרים בו על פני Terraform
- Terragrunt: מדוע? מה הוא פותר?

נתון חשוב ל-2026: OpenTofu (fork open-source של Terraform לאחר שינוי ה-license של HashiCorp ל-BSL ב-2023) צובר תאוצה בחברות שמסרבות ל-license commercial. ה-API זהה לחלוטין — כל מה שתלמד כאן תקף לשניהם.

Terraform state ו-remote backend: S3 + DynamoDB

State file (terraform.tfstate) הוא המסמך שמגשר בין ה-configuration שלך לבין מה שקיים בפועל ב-cloud. בלי state, Terraform לא יודע האם resource כבר קיים.

**בעיית local state בteam**

כאשר ה-state מאוחסן מקומית:
- שני מפתחים שמריצים terraform apply בו-זמנית גורמים ל-state corruption
- ה-state לא נגיש ל-CI/CD pipeline
- אין audit trail של מי שינה מה ומתי

**S3 backend + DynamoDB לocking**

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-company-tfstate-prod"
    key            = "services/api/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:eu-west-1:123456789:key/abc-def"
    dynamodb_table = "terraform-state-lock"
  }
}

טבלת DynamoDB לocking:

aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region eu-west-1

כאשר terraform apply רץ, Terraform:
1. כותב record ל-DynamoDB עם LockID = <bucket>/<key> ו-Info שמכיל מי נעל ומתי
2. מריץ את ה-apply
3. מוחק את ה-record (unlock)

אם תהליך קורס באמצע, ה-lock נשאר. הפתרון:

# בדוק מי מחזיק את ה-lock
terraform force-unlock <LOCK_ID>
# Lock ID מופיע בהודעת השגיאה

**IAM permissions מינימליות לS3 backend**

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::my-company-tfstate-prod/services/api/*"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::my-company-tfstate-prod",
      "Condition": {"StringLike": {"s3:prefix": ["services/api/*"]}}
    },
    {
      "Effect": "Allow",
      "Action": ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:DeleteItem"],
      "Resource": "arn:aws:dynamodb:eu-west-1:123456789:table/terraform-state-lock"
    }
  ]
}

חשוב: bucket ה-state עצמו לא מנוהל ב-Terraform (chicken-and-egg problem). יוצרים אותו ידנית פעם אחת, או דרך script נפרד.

Terraform modules: קוד שניתן לשימוש חוזר

Module הוא directory שמכיל .tf files ומייצג abstraction שניתן לקרוא לו מ-configurations שונות. חשבו עליו כ-function: מקבל variables, יוצר resources, מחזיר outputs.

**מבנה module תקני**

modules/
  vpc/
    main.tf       # resource definitions
    variables.tf  # input variables
    outputs.tf    # output values
    versions.tf   # required_providers + terraform version constraint
    README.md

**variables.tf עם types ו-validation**

variable "vpc_cidr" {
  type        = string
  description = "CIDR block for the VPC"
  default     = "10.0.0.0/16"

  validation {
    condition     = can(cidrnetmask(var.vpc_cidr))
    error_message = "vpc_cidr must be a valid CIDR block."
  }
}

variable "environment" {
  type        = string
  description = "Environment name (staging, production)"

  validation {
    condition     = contains(["staging", "production"], var.environment)
    error_message = "environment must be staging or production."
  }
}

variable "subnet_count" {
  type    = number
  default = 2
}

**outputs.tf עם descriptions**

output "vpc_id" {
  description = "The ID of the created VPC"
  value       = aws_vpc.main.id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

**קריאה ל-module**

module "production_vpc" {
  source      = "./modules/vpc"
  vpc_cidr    = "10.0.0.0/16"
  environment = "production"
  subnet_count = 3
}

# שימוש ב-output של ה-module
resource "aws_instance" "app" {
  subnet_id = module.production_vpc.private_subnet_ids[0]
}

**Terraform Registry modules**

Modules מ-registry.terraform.io (למשל terraform-aws-modules/vpc/aws) הם production-grade, נבדקו על ידי קהילה גדולה, ועוברים versioning תקני:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"  # ~> מאפשר patch ו-minor updates בלבד

  name = "my-vpc"
  cidr = "10.0.0.0/16"
  azs  = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}

כלל: תמיד pin לversion ספציפי או range מצומצם. version = "latest" לא קיים ב-Terraform, אך ללא pin תקין שדרוג module יכול לשבור infrastructure.

Terraform workspaces: ניהול multi-environment

Workspaces מאפשרים לשמור מספר state files נפרדים מאותו configuration directory — פתרון נפוץ לניהול staging ו-production מאותו קוד.

**פקודות בסיסיות**

terraform workspace new staging
terraform workspace new production
terraform workspace list      # מציג כל workspaces, * = נוכחי
terraform workspace select staging
terraform workspace show      # מציג workspace נוכחי

**שימוש ב-workspace name ב-configuration**

locals {
  env = terraform.workspace  # "staging" or "production"

  instance_type = {
    staging    = "t3.micro"
    production = "t3.large"
  }

  instance_count = {
    staging    = 1
    production = 3
  }
}

resource "aws_instance" "app" {
  count         = local.instance_count[local.env]
  instance_type = local.instance_type[local.env]

  tags = {
    Environment = local.env
    ManagedBy   = "Terraform"
  }
}

**Remote state עם workspaces**

כאשר משתמשים ב-S3 backend עם workspaces, Terraform שומר state files בנתיב:
<bucket>/<key prefix>/<workspace_name>/<key>

למשל: my-company-tfstate-prod/env:/staging/services/api/terraform.tfstate

**Workspace vs. separate directories**

Workspaces מתאימים כאשר הסביבות זהות כמעט לחלוטין (staging/production). כאשר environments שונים מהותית (dev קרוב לlaptop, production ב-multi-region), עדיף directories נפרדים (Terragrunt pattern — ראה סעיף 9).

אזהרה: terraform.workspace ב-backend config לא נתמך ב-S3 backend. אל תנסו להשתמש ב-${terraform.workspace} בתוך הגדרת ה-backend עצמה.

terraform plan + apply workflow ו--target לשינויים חירום

ה-workflow התקני בCI/CD:

# שלב 1: format check (חובה ב-PR)
terraform fmt -check -recursive

# שלב 2: validation
terraform validate

# שלב 3: plan (נשמר כ-artifact)
terraform plan -out=tfplan.binary

# שלב 4: show plan בפורמט קריא
terraform show -json tfplan.binary | jq .

# שלב 5: apply מה-plan השמור (ללא אפשרות חריגה)
terraform apply tfplan.binary

שמירת plan כ-binary ואחר כך apply ממנו מבטיחה שמה שאושר (ב-code review) הוא בדיוק מה שנפרס — אף אחד לא יכול לשנות resources בינתיים.

**-target: כלי חירום, לא workflow רגיל**

-target מאפשר להריץ plan/apply רק על resource ספציפי:

# חירום: רק הוסף security group rule, אל תיגע בשאר
terraform plan -target=aws_security_group_rule.allow_https
terraform apply -target=aws_security_group_rule.allow_https

מדוע זה מסוכן לשימוש שגרתי:
- Terraform לא מחשב dependencies מלאות — resource שתלוי ב-target שהשתנה לא יתעדכן
- ה-state יוצא מ-sync עם שאר ה-configuration
- CI/CD pipelines שמשתמשים ב--target כ-workflow קבוע הם anti-pattern

כלל production: terraform apply -target רק בחירום, עם תיעוד ב-ticket, ו-terraform plan full מיד אחריו לבדיקה שה-state עקבי.

**Refresh וstate drift**

# בדוק drift: מה שונה בין state לבין reality
terraform plan -refresh-only

# עדכן state מ-reality בלי לשנות infrastructure
terraform apply -refresh-only

Drift קורה כשמישהו שינה resource ידנית ב-console. terraform apply רגיל יחזיר את ה-resource לstate שרשום — גורם ל-surprise בproduction.

terraform import: הכנסת resources קיימים לניהול Terraform

תרחיש נפוץ: infrastructure קיים שנוצר ידנית ב-console ועכשיו צריך לעבור לניהול Terraform.

**import בTerraform 1.5+ (הגישה המודרנית)**

מ-Terraform 1.5 יש import block שמאפשר import declarative:

# main.tf
import {
  to = aws_s3_bucket.my_existing_bucket
  id = "my-actual-bucket-name-in-aws"
}

resource "aws_s3_bucket" "my_existing_bucket" {
  bucket = "my-actual-bucket-name-in-aws"
}

terraform plan  # מציג import action
terraform apply # מייבא ל-state

**import ב-CLI (שיטה ישנה — עדיין נפוצה)**

# סינטקס: terraform import <resource_type>.<name> <cloud_resource_id>
terraform import aws_security_group.web_sg sg-0a1b2c3d4e5f
terraform import aws_iam_role.app_role my-app-role-name
terraform import aws_rds_cluster.main db-cluster-identifier

**תהליך import ידני מלא**

1. כתוב skeleton ב-.tf עם resource block ריק (רק resource "type" "name" {})
2. הרץ terraform import
3. הרץ terraform plan — יראה כל ה-attributes שחסרים ב-configuration
4. העתק ערכים מה-plan לתוך ה-resource block
5. הרץ שוב terraform plan — צריך לקבל No changes

# terraform generate-config-out (Terraform 1.5+) יוצר .tf file אוטומטית
terraform plan -generate-config-out=generated.tf

אזהרה: attributes שTerraform מחשב (כמו arn, id) לא צריכים להיות ב-configuration. attributes שTerraform לא יכול לגלות מהAPI (כמו passwords) יצריכו הגדרה ידנית עם lifecycle { ignore_changes = [password] }.

Idempotent Bash provisioning: null_resource + local-exec

פער production

**production gap: idempotent Bash scripting for provisioning**

Terraform מנהל infrastructure declaratively — אבל לפעמים יש צורך לבצע פעולה שלא קיימת כ-Terraform resource: להתקין package על EC2 instance שנוצר, לאתחל DB schema, לרשום IP ב-load balancer חיצוני.

null_resource עם local-exec provisioner מאפשר הרצת Bash script מ-Terraform:

resource "null_resource" "db_init" {
  # triggers מגדיר מתי לרוץ מחדש
  # כאן: מריץ מחדש רק כש-db_endpoint משתנה
  triggers = {
    db_endpoint = aws_db_instance.main.endpoint
  }

  provisioner "local-exec" {
    command = <<-EOF
      bash ./scripts/init-db.sh "${aws_db_instance.main.endpoint}"
    EOF
    environment = {
      DB_PASSWORD = var.db_password
    }
  }
}

**הבעיה: provisioner שאינו idempotent**

Script ש-Terraform מריץ עלול לרוץ כמה פעמים: אם terraform apply נכשל באמצע, null_resource מסומן ב-tainted state ויריץ את ה-script שוב. Script לא-idempotent יגרום לבעיות:

#!/usr/bin/env bash
# BAD: לא idempotent — ייכשל בריצה שנייה
apt-get install -y nginx
useradd appuser
mkdir /var/app

**Idempotency patterns בBash**

#!/usr/bin/env bash
# GOOD: כל פקודה בודקת לפני שפועלת
set -euo pipefail

DB_ENDPOINT="$1"

# 1. בדיקה לפני התקנה
if ! command -v psql &>/dev/null; then
  apt-get install -y postgresql-client
fi

# 2. בדיקת קיום משתמש לפני יצירה
if ! id -u appuser &>/dev/null; then
  useradd --system --shell /usr/sbin/nologin appuser
fi

# 3. בדיקת קיום directory
if [ ! -d /var/app ]; then
  mkdir -p /var/app
  chown appuser:appuser /var/app
fi

# 4. בדיקה שDB schema לא קיים לפני init
# curl עם -f: מחזיר exit code שאינו 0 אם HTTP error
HTTP_STATUS=$(curl -f -s -o /dev/null -w "%{http_code}" \
  "http://${DB_ENDPOINT}/healthz" 2>&1) || {
  echo "ERROR: DB not reachable at ${DB_ENDPOINT}" >&2
  exit 1
}

if [ "$HTTP_STATUS" = "200" ]; then
  echo "INFO: DB already initialized, skipping"
  exit 0
fi

# 5. Lockfile למניעת ריצה מקבילית
LOCK=/var/run/db-init.lock
exec 9>"$LOCK"
if ! flock -n 9; then
  echo "ERROR: another instance is running" >&2
  exit 1
fi
trap 'rm -f "$LOCK"' EXIT

echo "INFO: Initializing DB schema..."
psql -h "$DB_ENDPOINT" -U admin -f /app/schema.sql

**Pattern checklist לscript idempotent בproduction**

| בדיקה | Pattern |
|--------|---------|
| set -euo pipefail | תמיד בשורה הראשונה |
| בדיקת binary לפני install | command -v X או which X |
| בדיקת user לפני useradd | id -u NAME |
| בדיקת directory/file | [ -d PATH ] / [ -f FILE ] |
| curl עם fail-on-error | curl -f ... |
| lockfile | flock עם trap cleanup |
| exit codes ברורים | exit 0 (success), exit 1 (error) |

Pulumi: IaC בשפות תכנות אמיתיות

Pulumi הוא כלי IaC שמאפשר לכתוב infrastructure בTypeScript, Python, Go, C# — ולא ב-HCL. זו הגישה הבסיסית שמבדילה אותו מTerraform.

**Terraform vs Pulumi: השוואה מעשית**

| היבט | Terraform / OpenTofu | Pulumi |
|------|----------------------|--------|
| שפה | HCL (declarative DSL) | TypeScript, Python, Go, C# |
| לוגיקה מותנית | count, for_each, dynamic | if/else, loops, functions מלאים |
| state | מקומי / S3+DynamoDB | Pulumi Cloud (managed) / S3 |
| providers | HashiCorp Registry (2,000+) | ממיר providers של Terraform |
| Secrets | external (Vault, SSM) | built-in encryption |
| learning curve | נמוך יחסית (HCL פשוט) | דורש שפת תכנות |

**מה Pulumi מאפשר שקשה בTerraform**

// Pulumi TypeScript: לוגיקה מורכבת שבTerraform דורשת hack
import * as aws from "@pulumi/aws";

const environments = ["staging", "production"];

const buckets = environments.map(env => {
  return new aws.s3.Bucket(`app-${env}`, {
    bucket: `my-company-${env}-assets`,
    tags: {
      Environment: env,
      ManagedBy: "Pulumi"
    }
  });
});

// שימוש בתוצאות חישוב בtime-of-apply
const largestBucket = buckets.reduce((acc, b) =>
  /* complex runtime logic here */
  acc
);

# Pulumi Python: ייבוא מודולים Python קיימים
import pulumi
import pulumi_aws as aws
import json
from mycompany.naming import generate_resource_name  # library פנימית!

for service in ["api", "worker", "scheduler"]:
    role = aws.iam.Role(
        f"{service}-role",
        assume_role_policy=json.dumps({
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Principal": {"Service": "lambda.amazonaws.com"},
                "Action": "sts:AssumeRole"
            }]
        })
    )

**מתי לבחור Pulumi על Terraform**

- הצוות הוא Python/TypeScript engineers שמסרבים ללמוד HCL
- ה-infrastructure logic מורכב מאוד (loops, conditions, חישובים runtime)
- יש library פנימית שרוצים לשתף בין application code ל-infra code
- ה-organization דורש Pulumi Cloud (managed state + secrets)

**מתי להישאר עם Terraform**

- רוב ה-infra פשוטה וdeklarative
- הצוות כבר יודע Terraform ויש state קיים
- אין resources לlearn שפת תכנות חדשה לצורך infra
- צריך ecosystem ענק של community modules

Terragrunt: DRY configurations ב-multi-environment

Terragrunt הוא thin wrapper מעל Terraform שפותר את בעיית הrepetition בconfigurations מרובות.

**הבעיה שTerragrunt פותר**

בלי Terragrunt, ל-3 environments (staging, production, dr) עם 5 modules כל אחד יש 15 directories שמכילים כמעט אותו backend.tf, אותו provider.tf, ואותן variables בslight variations. כל שינוי דורש עדכון ב-15 מקומות.

**מבנה Terragrunt אופייני**

infrastructure/
  terragrunt.hcl              # root config (shared)
  staging/
    terragrunt.hcl            # env-level config
    vpc/
      terragrunt.hcl
    eks/
      terragrunt.hcl
  production/
    terragrunt.hcl
    vpc/
      terragrunt.hcl
    eks/
      terragrunt.hcl

**root terragrunt.hcl**

# infrastructure/terragrunt.hcl
remote_state {
  backend = "s3"
  config = {
    bucket         = "my-company-tfstate"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
  # יצור bucket ו-DynamoDB table אוטומטית אם לא קיימים
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "eu-west-1"
  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = "${local.environment}"
    }
  }
}
EOF
}

**env-level terragrunt.hcl**

# infrastructure/production/terragrunt.hcl
locals {
  environment = "production"
}

include "root" {
  path = find_in_parent_folders()
}

**service-level terragrunt.hcl**

# infrastructure/production/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules//vpc"
}

inputs = {
  vpc_cidr     = "10.0.0.0/16"
  environment  = "production"
  subnet_count = 3
}

**הרצה**

# run-all: מריץ על כל modules ב-environment, מכבד dependencies
terragrunt run-all plan --terragrunt-working-dir infrastructure/production
terragrunt run-all apply --terragrunt-working-dir infrastructure/production

# module אחד בלבד
cd infrastructure/production/vpc && terragrunt apply

Terraform security: tfsec ו-Checkov כ-policy-as-code

Terraform configuration כולל בעיות אבטחה נפוצות: S3 buckets ללא encryption, security groups עם port 0.0.0.0/0, IAM roles עם * permissions. Static analysis tools סורקים .tf files לפני apply.

**tfsec**

# התקנה
brew install tfsec  # macOS
curl -s https://raw.githubusercontent.com/aquasecurity/tfsec/master/scripts/install_linux.sh | bash

# סריקה
tfsec ./infrastructure/

# דוגמה לפלט:
# HIGH - AWS CloudTrail should use a customer manager key
# CRITICAL - Security group allows ingress from 0.0.0.0/0 on port 22

# סריקה עם פורמט JSON לCI/CD
tfsec --format json --out tfsec-results.json ./

# ignore check ספציפי (עם justification comment)
# tfsec:ignore:aws-s3-enable-bucket-logging
resource "aws_s3_bucket" "logs" {
  # ...
}

**Checkov**

# התקנה
pip install checkov

# סריקת Terraform
checkov -d ./infrastructure/ --framework terraform

# output נקי לCI (exit code 1 אם נמצאו HIGH/CRITICAL)
checkov -d . --framework terraform --quiet --compact

# soft-fail: report אבל לא בלום pipeline
checkov -d . --framework terraform --soft-fail

# skip check ספציפי
checkov -d . --skip-check CKV_AWS_18,CKV_AWS_57

**שילוב ב-GitHub Actions**

- name: Run tfsec
  uses: aquasecurity/tfsec-action@v1.0.0
  with:
    soft_fail: false  # PR נחסם אם יש findings

- name: Run Checkov
  uses: bridgecrewio/checkov-action@v12
  with:
    directory: infrastructure/
    framework: terraform
    output_format: sarif
    output_file_path: checkov-results.sarif

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: checkov-results.sarif

**policy-as-code עם OPA + Terraform**

לארגונים שצריכים policies מותאמות אישית (כמו "כל resource חייב tag של cost-center"):

# הפקת plan בformat JSON
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json

# הרצת OPA policy
opa eval \
  --data policies/required_tags.rego \
  --input tfplan.json \
  --format pretty \
  'data.terraform.required_tags.deny'

תרגול מעשי

1.

2.

3.

שאלות חיבור

חבר את מה שלמדת בנושא זה לנושאים קודמים. אין תשובה אחת נכונה — חשיבה ביקורתית היא המטרה.

שאלת חיבור 1

Terraform's null_resource provisioner מריץ Bash scripts — בדיוק הנושא שלמדנו ב-Linux Advanced (Bash strict mode, flock, idempotency). הסבר: (1) כיצד set -euo pipefail שלמדנו ב-Linux Advanced מגן על provisioner scripts מכשל שקט, (2) כיצד flock ממנע ריצה מקבילית של שני terraform apply בו-זמנית על אותו DB init script, ו-(3) מה הקשר בין logrotate (Linux Advanced) לניהול logs של terraform apply ב-CI/CD.

מחבר ל:

Linux Advanced

שאלת חיבור 2

ה-S3 backend של Terraform מאחסן state ב-AWS S3 עם DynamoDB locking — שני שירותי AWS שלמדנו ב-AWS (reinforce). הסבר: (1) אילו IAM permissions מינימליות נדרשות לCI/CD role שמריץ terraform apply (S3 + DynamoDB), (2) מדוע bucket ה-state עצמו לא מנוהל ב-Terraform (chicken-and-egg problem), ו-(3) כיצד AWS Secrets Manager משתלב עם Terraform: כיצד Terraform provider יכול לקרוא secret ולהזריק אותו ל-RDS instance.

מחבר ל:

AWS (reinforce)

מוכן לבחינה?

בצע הערכה תיאורטית, תרגול CLI ושאלות חיבור כדי לסיים את הנושא.