FastAPI 배포 게이트웨이 시스템 구축 가이드 - 상태 머신과 분산 락

## 왜 배포 게이트웨이가 필요한가 마이크로서비스 환경에서 여러 서비스를 안전하게 배포하는 것은 까다로운 작업입니다. 동시 배포로 인한 충돌, 포트 중복 할당, 롤백 실패 등 다양한 문제가 발생할 수 있죠. 이번 글에서는 FastAPI 기반으로 안전한 배포 게이트웨이 시스템을 구축한 경험을 공유합니다. ## 시스템 아키텍처 개요 배포 게이트웨이는 크게 6가지 핵심 컴포넌트로 구성됩니다: ### 1. 배포 상태 머신 (Deployment State Machine) 배포 프로세스를 명확한 상태로 관리합니다: ```python class DeploymentStatus(str, Enum): PENDING = "pending" VALIDATING = "validating" IN_PROGRESS = "in_progress" SUCCESS = "success" FAILED = "failed" ROLLED_BACK = "rolled_back" ``` 각 상태 전환은 명시적으로 정의되어 있어 배포 과정을 추적하고 문제 발생 시 정확한 지점을 파악할 수 있습니다. ### 2. 분산 락 with Fencing Token 동시 배포를 방지하기 위해 fencing token 기반 분산 락을 구현했습니다: ```python class DistributedLock: def __init__(self, redis_client): self.redis = redis_client async def acquire(self, resource_id: str, timeout: int = 300): token = str(uuid.uuid4()) acquired = await self.redis.set( f"lock:{resource_id}", token, nx=True, # Only set if not exists ex=timeout ) return token if acquired else None ``` Fencing token은 오래된 락 홀더가 실수로 작업을 수행하는 것을 방지합니다. ### 3. 포트 할당 관리 특정 포트 범위 내에서 자동으로 사용 가능한 포트를 할당합니다: ```python class PortAllocationService: def __init__(self, port_range: tuple): self.min_port, self.max_port = port_range async def allocate_port(self, service_name: str) -> int: # 사용 중인 포트 조회 used_ports = await self.get_used_ports() # 범위 내에서 사용 가능한 포트 찾기 for port in range(self.min_port, self.max_port + 1): if port not in used_ports: await self.reserve_port(port, service_name) return port raise PortExhaustionError("No available ports") ``` ### 4. 사전 검증 레이어 (Pre-deployment Validation) 배포 전 필수 검증을 수행합니다: - **서버 접근성 검증**: SSH 연결 테스트 - **포트 가용성 검증**: 포트 범위 확인 - **금지 서버 차단**: 프로덕션 환경 보호 - **Docker 이미지 존재 확인**: 배포 대상 이미지 검증 ```python class ValidationService: async def validate_deployment(self, request: DeploymentRequest): validations = [ self.check_server_accessible(request.server), self.check_port_available(request.port), self.check_not_forbidden(request.server), self.check_docker_image_exists(request.image) ] results = await asyncio.gather(*validations, return_exceptions=True) for result in results: if isinstance(result, Exception): raise ValidationError(str(result)) ``` ### 5. 자동 롤백 메커니즘 배포 실패 시 이전 상태로 자동 복구합니다: ```python class RollbackService: async def rollback(self, deployment_id: str): # 배포 기록 조회 deployment = await self.get_deployment(deployment_id) previous_version = deployment.previous_version # 이전 버전으로 복구 await self.deploy_version( service=deployment.service, version=previous_version, port=deployment.port ) # 감사 로그 기록 await self.audit_log(f"Rolled back {deployment_id}") ``` ### 6. 관찰성 (Observability) **Prometheus 메트릭**으로 시스템 상태를 모니터링합니다: ```python from prometheus_client import Counter, Histogram deployment_counter = Counter( 'deployments_total', 'Total deployments', ['status', 'service'] ) deployment_duration = Histogram( 'deployment_duration_seconds', 'Deployment duration' ) ``` **Telegram 알림**으로 실시간 배포 상태를 전달합니다: ```python class TelegramNotifier: async def notify_deployment(self, status: str, service: str): message = f"🚀 Deployment {status}\nService: {service}" await self.send_message(message) ``` ## 프로젝트 구조 ``` src/ ├── main.py # FastAPI 애플리케이션 ├── config.py # 설정 관리 ├── models/ # 데이터 모델 │ ├── server.py │ ├── port.py │ ├── service.py │ ├── deployment.py │ └── audit.py ├── services/ # 비즈니스 로직 │ ├── validation.py │ ├── port_allocation.py │ ├── deployment.py │ ├── rollback.py │ ├── lock.py │ └── audit.py ├── routers/ # API 엔드포인트 │ ├── deploy.py │ ├── services.py │ ├── ports.py │ └── servers.py └── utils/ # 유틸리티 ├── docker.py ├── ssh.py ├── telegram.py └── metrics.py ``` ## 테스트 전략 **단위 테스트**: 각 서비스의 핵심 로직 검증 **통합 테스트**: API 엔드포인트 동작 확인 **시나리오 테스트**: 실제 배포 시나리오 시뮬레이션 ```python # 시나리오 테스트 예시 async def test_deployment_with_rollback(): # 배포 시작 deployment = await deploy_service("api-v2") # 헬스체크 실패 시뮬레이션 await simulate_health_check_failure(deployment.id) # 자동 롤백 확인 status = await get_deployment_status(deployment.id) assert status == "ROLLED_BACK" ``` ## 핵심 설계 포인트 1. **멱등성(Idempotency)**: 동일한 배포 요청을 여러 번 실행해도 결과가 동일 2. **감사 추적(Audit Trail)**: 모든 배포 작업을 로그로 기록 3. **Fail-safe**: 실패 시 안전하게 이전 상태로 복구 4. **비동기 처리**: FastAPI의 async/await로 효율적인 I/O 처리 ## 마치며 배포 자동화 시스템을 구축할 때 가장 중요한 것은 **안전성**과 **추적 가능성**입니다. 상태 머신으로 명확한 프로세스를 정의하고, 분산 락으로 동시성 문제를 해결하며, 충분한 검증과 롤백 메커니즘을 갖추면 안정적인 배포 파이프라인을 만들 수 있습니다. 다음 단계로는 Blue-Green 배포, Canary 배포 등 고급 배포 전략을 구현해보는 것을 추천합니다.

FastAPI로 구축하는 배포 게이트웨이 시스템 설계와 구현

관련 글

AI 개발 도구 생태계: MCP 서버와 에이전트 스킬 소스 관리 가이드

Wiki.js API 권한 설정과 계정 생성 자동화 구축하기

AI 에이전트 시스템을 위한 커널 아키텍처 설계 가이드

이 주제에 대해 더 알아보고 싶으신가요?