Python for DevOps

אוטומציה, תשתיות וסקריפטינג בסביבת production — Python כפי שמשתמשים בו מהנדסי DevOps בפועל

תיאוריה

⏱ ~45 דקות

מדוע Python הפך לשפת ה-DevOps הסטנדרטית בישראל

לפי סריקת שוק העבודה האחרונה (אפריל 2026), Python מופיע ב-7.1% ממשרות ה-DevOps בישראל — בעיקר בהקשר של automation scripting, AWS/cloud tooling, ו-MLOps infrastructure. כמעט כל כלי DevOps מרכזי — Ansible, AWS CLI, Terraform providers, SaltStack, Pulumi — כתוב ב-Python או חושף Python SDK.

אולם שימוש Python ב-DevOps שונה לחלוטין מ-Python ל-data science או web: הדגש הוא על scripts קצרים ומדויקים שמריצים פקודות מערכת, מנהלים environment variables, מנתחים קבצי config, ומתקשרים עם cloud APIs. שגיאה אחת ב-production script יכולה לגרום לdowntime, לחשיפת credentials, או למחיקת משאבים. לכן הסעיפים הבאים מדגישים דיוק ובטיחות — לא רק תחביר.

subprocess: הרצת פקודות מערכת בבטיחות

פער production

מודול subprocess הוא הדרך הנכונה ב-Python 3 להריץ פקודות מערכת. הכלל הראשון: לעולם לא להשתמש ב-shell=True עם input שמגיע מהמשתמש — זו פרצת command injection קלאסית.

import subprocess

# שיטה בטוחה: list של arguments, ללא shell=True
result = subprocess.run(
    ["git", "rev-parse", "--short", "HEAD"],
    capture_output=True,
    text=True,
    timeout=10  # תמיד הגדירו timeout!
)

if result.returncode == 0:
    commit_hash = result.stdout.strip()
    print(f"Current commit: {commit_hash}")
else:
    print(f"ERROR: {result.stderr.strip()}", file=sys.stderr)
    sys.exit(1)

מדוע shell=True מסוכן:

# מסוכן! אם user_input = 'main; rm -rf /'
branch = user_input
subprocess.run(f"git checkout {branch}", shell=True)  # RCE!

# בטוח:
subprocess.run(["git", "checkout", branch])  # branch הוא ארגומנט, לא שורת פקודה

דפוס נפוץ ב-DevOps: הרצת פקודה וזריקת שגיאה אוטומטית אם נכשלת:

# check=True מעלה CalledProcessError אם returncode != 0
try:
    result = subprocess.run(
        ["aws", "s3", "cp", "backup.tar.gz", "s3://my-bucket/"],
        capture_output=True,
        text=True,
        timeout=120,
        check=True
    )
except subprocess.CalledProcessError as e:
    print(f"Upload failed (exit {e.returncode}): {e.stderr}")
    sys.exit(1)
except subprocess.TimeoutExpired:
    print("Upload timed out after 120s")
    sys.exit(1)

לפקודות שמייצרות output גדול ואין צורך לאסוף אותו — השתמשו ב-subprocess.run ללא capture_output והוא יכתוב ישירות ל-stdout/stderr של הסקריפט.

os.environ: ניהול environment variables בצורה נכונה

פער production

ב-DevOps, credentials ו-configuration מגיעים תמיד דרך environment variables — לעולם לא hardcoded בקוד. מודול os מספק גישה מלאה.

import os
import sys

# קריאת משתנה — זורק KeyError אם לא קיים
api_key = os.environ["API_KEY"]

# קריאה עם ערך ברירת מחדל — בטוחה יותר
log_level = os.environ.get("LOG_LEVEL", "INFO")

# בדיקת חובה של משתנים קריטיים בתחילת הסקריפט
REQUIRED_VARS = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "S3_BUCKET"]
missing = [v for v in REQUIRED_VARS if not os.environ.get(v)]
if missing:
    print(f"FATAL: missing required env vars: {', '.join(missing)}", file=sys.stderr)
    sys.exit(1)

דפוס production חשוב: העברת env vars לsubprocess:

import os
import subprocess

# יצירת environment dictionary עם override
custom_env = os.environ.copy()  # העתק של הסביבה הנוכחית
custom_env["KUBECONFIG"] = "/home/deploy/.kube/prod-config"
custom_env["LOG_LEVEL"] = "DEBUG"

result = subprocess.run(
    ["kubectl", "rollout", "status", "deploy/api"],
    env=custom_env,  # subprocess מקבל env מותאם
    capture_output=True,
    text=True
)

סכנה קלאסית בproduction: לא לבדוק האם משתנה קיים לפני שימוש — הסקריפט קורס באמצע פעולה:

# גרוע: קורס עם KeyError בזמן runtime
bucket = os.environ["S3_BUCKET"]
delete_all_files(bucket)  # אם S3_BUCKET לא מוגדר, אנחנו לא מגיעים לכאן

# טוב: בדיקה מוקדמת עם הודעת שגיאה ברורה
bucket = os.environ.get("S3_BUCKET")
if not bucket:
    sys.exit("ERROR: S3_BUCKET env var is required")
delete_all_files(bucket)

pathlib: ניהול נתיבים cross-platform

פער production

מודול pathlib (Python 3.4+) מחליף את os.path בממשק object-oriented שקריא יותר ועובד על Windows, macOS ו-Linux ללא שינוי בקוד. ב-DevOps scripts שרצים ב-containers ועל VPS — הכירו ב-pathlib כ-standard.

from pathlib import Path

# בניית נתיבים — slash operator הוא syntactic sugar ל-joinpath
base = Path("/etc/myapp")
config_file = base / "config" / "settings.yaml"
logs_dir = base / "logs"

# בדיקות קיום
if not config_file.exists():
    sys.exit(f"Config not found: {config_file}")

if not config_file.is_file():
    sys.exit(f"Path exists but is not a file: {config_file}")

# יצירת directories — mkdir -p
logs_dir.mkdir(parents=True, exist_ok=True)

# קריאת קובץ
content = config_file.read_text(encoding="utf-8")

# כתיבת קובץ
(logs_dir / "deploy.log").write_text(f"Deployed at {datetime.now()}\n")

# מציאת כל קבצי YAML בתיקייה
for yaml_file in base.glob("**/*.yaml"):
    print(yaml_file)  # Path object עם מידע עשיר

# קבלת home directory של המשתמש הנוכחי
home = Path.home()
kubeconfig = home / ".kube" / "config"
print(f"Kubeconfig: {kubeconfig}")

שימוש נפוץ ב-DevOps: מציאת תיקיית הסקריפט הנוכחי (לא תלוי ב-cwd):

# מיקום הסקריפט עצמו — בטוח יותר מ-os.getcwd()
SCRIPT_DIR = Path(__file__).parent.resolve()
config_path = SCRIPT_DIR / "config.yaml"

argparse: בניית CLI tools לאוטומציה

פער production

כשסקריפט DevOps מתבגר מ-one-liner לכלי שמשתמשים בו חברי צוות, הוא צריך ממשק CLI מסודר. argparse מספק argument parsing עם --help אוטומטי, type checking, ו-subcommands.

import argparse
import sys

def parse_args():
    parser = argparse.ArgumentParser(
        description="Deploy application to environment",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  python deploy.py --env staging --version 1.4.2
  python deploy.py --env production --version 1.4.2 --dry-run
"""
    )

    parser.add_argument(
        "--env",
        required=True,
        choices=["staging", "production"],
        help="Target environment"
    )
    parser.add_argument(
        "--version",
        required=True,
        help="Docker image tag to deploy (e.g. 1.4.2)"
    )
    parser.add_argument(
        "--dry-run",
        action="store_true",
        default=False,
        help="Print actions without executing"
    )
    parser.add_argument(
        "--timeout",
        type=int,
        default=120,
        help="Deployment timeout in seconds (default: 120)"
    )

    return parser.parse_args()

def main():
    args = parse_args()
    if args.dry_run:
        print(f"[DRY RUN] Would deploy v{args.version} to {args.env}")
    else:
        deploy(args.env, args.version, args.timeout)

if __name__ == "__main__":
    main()

דפוס subcommands (לכלים כמו git, kubectl):

parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(dest="command", required=True)

up_cmd = subparsers.add_parser("up", help="Start services")
up_cmd.add_argument("--service", help="Specific service name")

down_cmd = subparsers.add_parser("down", help="Stop services")
down_cmd.add_argument("--volumes", action="store_true")

args = parser.parse_args()
if args.command == "up":
    start_services(args.service)
elif args.command == "down":
    stop_services(args.volumes)

logging: הגדרה נכונה של logs בסקריפטים

פער production

שימוש ב-print() לlogging הוא אנטי-פטרן בproduction. המודול logging מספק: רמות severity (DEBUG, INFO, WARNING, ERROR, CRITICAL), timestamp אוטומטי, שם המודול, הפנייה לקבצים, ו-rotation — כל אלה ניתן להגדיר דרך configuration בלי לשנות את הקוד.

import logging
import sys

def setup_logging(level: str = "INFO") -> None:
    """הגדרת logging לסקריפטים בproduction."""
    log_level = getattr(logging, level.upper(), logging.INFO)

    # Format שכולל timestamp + level + שם המודול
    formatter = logging.Formatter(
        fmt="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%S"
    )

    # Handler לstderr (logs לא מערבבים עם output הסקריפט)
    handler = logging.StreamHandler(sys.stderr)
    handler.setFormatter(formatter)

    # root logger
    root_logger = logging.getLogger()
    root_logger.setLevel(log_level)
    root_logger.addHandler(handler)

# שימוש:
setup_logging(os.environ.get("LOG_LEVEL", "INFO"))
logger = logging.getLogger(__name__)

logger.info("Starting deployment for env=%s version=%s", args.env, args.version)
logger.debug("Full config: %s", config)  # ייראה רק ב-LOG_LEVEL=DEBUG
logger.warning("Retrying S3 upload (attempt %d/3)", attempt)
logger.error("Database migration failed: %s", error_msg)

הבדל חשוב בין רמות:
- DEBUG — מידע לdebug: ערכים, state, קריאות API. לא מופיע בproduction כברירת מחדל.
- INFO — אירועים תקינים: "Deployment started", "Service healthy".
- WARNING — מצב חריג שלא עוצר את הסקריפט: retry, fallback, deprecation.
- ERROR — שגיאה שגרמה לכשל בפעולה ספציפית.
- CRITICAL — כשל שגורם לסקריפט להיפסק לחלוטין.

logging לקובץ עם rotation:

from logging.handlers import RotatingFileHandler

file_handler = RotatingFileHandler(
    "/var/log/myapp/deploy.log",
    maxBytes=10 * 1024 * 1024,  # 10MB
    backupCount=3
)
file_handler.setFormatter(formatter)
root_logger.addHandler(file_handler)

json ו-yaml: ניתוח קבצי config ב-production

פער production

כמעט כל infrastructure tool משתמש ב-JSON או YAML: Kubernetes manifests, Terraform state, CloudFormation templates, GitHub Actions workflows, docker-compose.yml. Python מטפל בשניהם בצורה טבעית.

**json — stdlib, ללא התקנה:**

import json
import sys
from pathlib import Path

# קריאת JSON config
config_path = Path("/etc/myapp/config.json")
try:
    config = json.loads(config_path.read_text())
except FileNotFoundError:
    sys.exit(f"Config not found: {config_path}")
except json.JSONDecodeError as e:
    sys.exit(f"Invalid JSON in {config_path}: {e}")

db_host = config["database"]["host"]
db_port = config["database"].get("port", 5432)  # ערך ברירת מחדל

# כתיבת JSON output (לpipelines)
result = {"status": "deployed", "version": version, "timestamp": datetime.now().isoformat()}
print(json.dumps(result, indent=2))

**yaml — דורש PyYAML (pip install pyyaml):**

import yaml

# קריאה בטוחה — yaml.safe_load מונע arbitrary code execution
with open("docker-compose.yml") as f:
    compose = yaml.safe_load(f)

services = compose.get("services", {})
for service_name, service_config in services.items():
    image = service_config.get("image", "no image specified")
    print(f"{service_name}: {image}")

# שינוי ו-dump
compose["version"] = "3.9"
with open("docker-compose.yml", "w") as f:
    yaml.dump(compose, f, default_flow_style=False)

חשוב: תמיד השתמשו ב-yaml.safe_load ולא ב-yaml.load. yaml.load יכול להריץ קוד Python שרירותי מהקובץ — פרצת אבטחה שנמצאת בקוד production!

ניתוח JSON מפלט כלים:

import subprocess, json

# כלים רבים מחזירים JSON עם --output json
result = subprocess.run(
    ["aws", "ec2", "describe-instances", "--output", "json"],
    capture_output=True, text=True, check=True
)
instances_data = json.loads(result.stdout)
for reservation in instances_data["Reservations"]:
    for instance in reservation["Instances"]:
        print(f"{instance['InstanceId']}: {instance['State']['Name']}")

boto3: AWS automation מ-Python

פער production

boto3 הוא ה-AWS SDK הרשמי ל-Python. ב-DevOps, משתמשים בו לאוטומציה שמעבר ל-CLI: העתקת קבצים ל-S3, הפעלה/כיבוי של EC2, ניהול Secrets Manager, הפעלת Lambda — כל זה מסקריפט Python.

**S3 — דפוסים נפוצים:**

import boto3
from botocore.exceptions import ClientError

# boto3 מקרא credentials מ-env vars אוטומטית:
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
# (לעולם אל תעבירו credentials כארגומנטים לקוד!)
s3 = boto3.client("s3")

# העלאת קובץ
try:
    s3.upload_file(
        Filename="backup.tar.gz",
        Bucket="my-backups-bucket",
        Key=f"backups/2026-04-17/backup.tar.gz"
    )
    print("Upload successful")
except ClientError as e:
    error_code = e.response["Error"]["Code"]
    print(f"S3 upload failed ({error_code}): {e}")
    sys.exit(1)

# רשימת objects עם prefix
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket="my-backups-bucket", Prefix="backups/"):
    for obj in page.get("Contents", []):
        print(f"{obj['Key']} ({obj['Size']} bytes)")

**EC2 — שאילתת instances:**

ec2 = boto3.client("ec2", region_name="eu-west-1")

# רשימת instances עם filter
response = ec2.describe_instances(
    Filters=[
        {"Name": "tag:Environment", "Values": ["production"]},
        {"Name": "instance-state-name", "Values": ["running"]}
    ]
)

for reservation in response["Reservations"]:
    for instance in reservation["Instances"]:
        name = next(
            (t["Value"] for t in instance.get("Tags", []) if t["Key"] == "Name"),
            "unnamed"
        )
        print(f"{instance['InstanceId']} ({name}): {instance['PublicIpAddress']}")

חשוב: בסביבת production, אל תכניסו credentials ישירות לקוד. boto3 מחפש credentials בסדר קבוע: env vars → ~/.aws/credentials → IAM instance role. בEC2 ו-Lambda, כמעט תמיד מומלץ להשתמש ב-IAM role ולא ב-keys.

pytest: בדיקת סקריפטי DevOps

פער production

סקריפטי DevOps שאינם נבדקים שוברים production. pytest הוא הtesting framework הנפוץ ביותר בPython, ומתאים גם לבדיקת logic בסקריפטי automation.

# tests/test_deploy_utils.py
import pytest
from unittest.mock import patch, MagicMock
from deploy_utils import get_required_env, build_image_tag, validate_version

# בדיקת פונקציה פשוטה
def test_build_image_tag():
    tag = build_image_tag("myapp", "1.4.2", "production")
    assert tag == "myapp:1.4.2-production"

def test_build_image_tag_staging():
    tag = build_image_tag("myapp", "1.4.2", "staging")
    assert tag == "myapp:1.4.2-staging"

# בדיקת env var — מונע שגיאות ב-production
def test_get_required_env_missing(monkeypatch):
    monkeypatch.delenv("S3_BUCKET", raising=False)
    with pytest.raises(SystemExit):
        get_required_env("S3_BUCKET")

def test_get_required_env_present(monkeypatch):
    monkeypatch.setenv("S3_BUCKET", "my-test-bucket")
    assert get_required_env("S3_BUCKET") == "my-test-bucket"

# בדיקת subprocess calls עם mock
def test_aws_upload_called_with_correct_args():
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
        upload_to_s3("backup.tar.gz", "my-bucket")
        call_args = mock_run.call_args[0][0]
        assert "aws" in call_args
        assert "s3" in call_args
        assert "cp" in call_args
        assert "my-bucket" in " ".join(call_args)

הרצה:

# בסיסי
pytest tests/

# עם verbose output וcoverage
pytest tests/ -v --tb=short

# רק tests עם marker מסוים
pytest tests/ -m "not integration"  # דלג על integration tests

dפוס לCI: pytest ב-GitHub Actions:

- name: Run Python tests
  run: |
    pip install pytest pytest-cov
    pytest tests/ --tb=short --junit-xml=test-results.xml

Virtual environments ו-requirements.txt: ניהול dependencies

פער production

כל סקריפט Python ב-production חייב לרוץ בvirtual environment מבודד — לא על ה-system Python. הסיבה: כל project דורש גרסאות שונות של packages, ועירוב גורם לcollisions.

# יצירת venv
python3 -m venv .venv

# הפעלה (Linux/macOS)
source .venv/bin/activate

# הפעלה ב-scripts ו-CI (ללא activate)
.venv/bin/python script.py
.venv/bin/pytest tests/

# ביטול
deactivate

ניהול requirements.txt:

# הקפאת dependencies לאחר התקנה
pip install boto3 pyyaml requests
pip freeze > requirements.txt  # יצירת קובץ עם גרסאות מדויקות

# התקנה בסביבה חדשה (CI, server)
pip install -r requirements.txt

מבנה מומלץ לפרויקט DevOps scripting:

scripts/
  .venv/                    # לא ב-git (.gitignore)
  requirements.txt          # boto3==1.34.69, pyyaml==6.0.1, ...
  requirements-dev.txt      # pytest==8.1.1, pytest-cov==5.0.0, ...
  deploy.py
  utils/
    aws.py
    config.py
  tests/
    test_deploy.py
    test_config.py

חשוב: הוסיפו .venv/ ל-.gitignore. אי פעם commit של venv גורם לrepository כבד מאוד ולconflicts בין מערכות הפעלה.

דפוס CI/CD — cache של venv:

# GitHub Actions
- uses: actions/cache@v4
  with:
    path: .venv
    key: venv-${{ hashFiles('requirements.txt') }}

- name: Install dependencies
  run: |
    python3 -m venv .venv
    .venv/bin/pip install -r requirements.txt

ב-Docker: אל תעתיקו venv לתוך container. במקום, השתמשו ב-COPY requirements.txt + RUN pip install בשלב build:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "deploy.py"]

תרגול מעשי

1. הרצת פקודת מערכת בבטיחות עם subprocess

פער production

השתמש ב-subprocess.run כדי להריץ `ls /tmp`, לאסוף את ה-stdout, ולהדפיס PASS אם ה-returncode הוא 0. הדגם הרצה בטוחה עם list arguments (ללא shell=True), capture_output=True, text=True, ו-timeout=5.

2. קריאת environment variable עם os.environ

פער production

הרץ פקודת Python שקוראת את ה-environment variable בשם APP_ENV ומדפיסה את ערכו. הגדר את המשתנה בשורת ההרצה. אם המשתנה לא קיים, הדפס ERROR. זה מדמה את דפוס הבדיקה הנהוג ב-production scripts.

3. ניתוח JSON config עם json.loads

פער production

נתח string JSON שמייצג config של שירות, חלץ את ערך המפתח `environment`, והדפס PASS אם הערך הוא `production`. זה מדמה קריאת config שחוזר מ-AWS API או מקובץ settings.

שאלות חיבור

חבר את מה שלמדת בנושא זה לנושאים קודמים. אין תשובה אחת נכונה — חשיבה ביקורתית היא המטרה.

שאלת חיבור 1

ב-Docker למדת שסוד ה-Docker Compose production הוא לעולם לא לכניס secrets ל-compose.yml, אלא להשתמש ב-env_file שמצביע ל-.env.production. כיצד הידע שלך על `os.environ` ו-`subprocess` ב-Python for DevOps מסביר למה דפוס זה עובד, ומה הסכנה הספציפית שאתה מזהה כאשר מישהו מעביר API key כ-argument ל-subprocess.run במקום כ-environment variable?

מחבר ל:

Docker

שאלת חיבור 2

ב-Linux Advanced למדת על Bash scripting: strict mode (`set -euo pipefail`), trap handlers לניקוי, ו-flock למניעת הרצה מקבילה. מתי כדאי להחליף Bash script ב-Python script, ומה ה-Python DevOps tools — subprocess, argparse, logging, pytest — מספקים שקשה לממש בBash בסביבת production?

מחבר ל:

Linux Advanced

מוכן לבחינה?

בצע הערכה תיאורטית, תרגול CLI ושאלות חיבור כדי לסיים את הנושא.