2025-10-28 08:50:13 +01:00

328 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Project Context: PetCLinic Microservices -> AWS
## Context and Scope
This project will migrate the Spring PetClinic Microservices demo from its local/on-premise setup to AWS Cloud. The focus is infrastructure modernization, CI/CD automation, observability, and resilience but not application feature development.
[**Pdf Download (click me)**](assets/week1/week1.pdf)
### Stakeholders
| Role | Responsibility |
|----|----|
| Project Sponsor | Funding, final approval |
| Project Manager | Scheduling, stakeholder coordination |
| Cloud Architect | Architecture, service selection |
| Dev Lead | App changes for cloud readiness |
| DevOps Engineer | CI/CD, IaC, deployments, monitoring |
| Security Team | IAM, encryption |
| End Users / Demo Audience | Acceptance and usability feedback
### Expectations
- No app feature development unless necessary for cloud deployment.
- AWS is the only target cloud
### Objectives
- Run full PetClinic microservices on AWS with CI/CD.
- Observability: logs, metrics, traces for 100% of services.
- Cost target: keep monthly infra cost under a defined limit
.
- Security: secrets encrypted, least-privilege IAM, HTTPS for all endpoints.
### Deadlines
| Milestone | Date |
|----|----|
| Project approval | Oct 27, 2025 |
| CI/CD & Automation | Nov 3, 2025 |
| Infrastructure | Nov 10, 2025 |
| Data | Nov 17, 2025 |
| Observability | Nov 24, 2025 |
| Prep: Presentation, Demo, and Pre-defense | Dec 3, 2025 |
## In Scope
| Included items | Objective |
|----|----|
| Application | Only necessary changes (if applicable) to facilitate cloud integration |
| Infrastructure | Design and deploy a reproducible, cloud-native architecture |
| CI/CD Automation | Implement automated build, test, and deployment pipelines |
| Containerization | Adapt existing microservices to use AWS.|
| Monitoring & Logging | Centralized logs, metrics, and traces |
| Security & IAM | Least-privilege IAM roles, encryption, and subnet segmentation. |
| Backup & Recovery | Redundancy, failover, backup, BCP/DRP |
| Documentation | Architecture diagrams, specifications, and operational runbooks. |
## Out of Scope
| Excluded items | Reason |
|----|----|
| Application feature or UI changes | Functionality remains unchanged. |
| Multi-cloud or hybrid deployment | Focus solely on AWS environment. |
| Cost-optimization | Addressed in a later project if necessary |
## Requirements
### Functional requirements
| Stakeholder / Role | Requirement | Description |
|----|----|----|
| **Developers** | Continuous Integration | Each merge must trigger automated build, test, and image creation. |
| | Local to Cloud Parity | Development environment must mirror AWS setup. |
| **DevOps Engineers** | Automated Deployment | CI/CD pipeline must deploy microservices to Staging and Prod environments automatically. |
| | Test Automation | Integration tests must run automatically in CI/CD pipeline. |
|| Infrastructure as Code | All AWS resources defined through configuration files |
| | Monitoring & Alerts | Centralized logging, metrics, and tracing for all microservices. Automated alerting for service downtime or threshold breaches. |
| | Scalability | Services must be scalable |
| **Security Team** | Access Control | Roles per service with least-privilege permissions.|
| | Secrets Management | All secrets stored securely. |
| **Product / Management** | Availability & Demo Readiness | System must be reliable and presentable for client or internal demos.|
| **End Users (Demo Audience)** | Stable Access | Web UI and APIs must remain responsive under typical load. |
### Non-Functional Requirements
| Category | Requirement | Standard |
|----|----|----|
| **Development** | Local to Cloud parity | Docker Compose or local ECS simulation |
| **Maintainability** | IaC | Code stored in version control |
| **Performance** | Pipeline execution time | < 10 minutes per merge |
| | Scaling | Services can be scaled horizontally |
| | API / UI response | < 200 ms under normal demo load |
| **Reliability** | Deployment success rate | 99% successful deployments |
| | Alert response | Alerts trigger within < 5 minutes of failure detection |
| | Error tolerance | < 0.1% failed requests |
| **Availability** | System uptime | 99.9% uptime |
| **Observability** | Logs, Metrics, Traces | Centralized in monitoring solution |
| **Security** | Least-privileged Roles | Roles restricted per service; no default full-access policies |
| | Secret encryption | Secrets stored in AWS |
| **Cost** | Budget target | Monthly AWS cost defined cap |
## System Components — Spring PetClinic Microservices
| Component | Role / Function | Dependencies |
|----|----|----|
| `spring-petclinic-admin-server` | Provides admin UI and dashboards | Microservices, Config Server |
| `spring-petclinic-api-gateway` | Routes external requests to microservices | Customers, Vets, Visits, GenAI services |
| `spring-petclinic-config-server` | Centralized configuration | Git repo |
| `spring-petclinic-customers-service` | Manages customer data | RDBMS, Config Server |
| `spring-petclinic-vets-service` | Manages veterinary staff | RDBMS, Config Server |
| `spring-petclinic-visits-service` | Manages pet visit records | RDBMS, Customers Service |
| `spring-petclinic-genai-service` | Optional AI chat-bot | Microservices, RDBMS |
| `spring-petclinic-discovery-server` | Service registry / discovery | All microservices |
| RDBMS | Persistent storage | Customers, Vets, Visits |
## Architecture and Specifications
### Project
- Kanban as agile methodology
- Breakdown of work and phases:
- Infrastructure Setup
- Service Orchestration
- Configuration Management
- CI/CD Automation
- Security
- Resilience
- Observability
#### Assignments:
| Role | Responsibilities |
|----|----|
| **Cloud Architect** | Design AWS target architecture, network, and IAM structure |
| **DevOps Engineer** | Build CI/CD pipelines, container orchestration, monitoring setup |
| **Dev Lead** | Containerize services, modify configs for cloud compatibility |
| **Database Engineer** | Migrate data from local RDBMS to AWS RDS, manage schema updates |
| **Security Team** | Set up access and roles for services |
| **Everyone** | Validate deployments, pipeline runs, rollback testing |
| **Project Lead** | Manage Asana Kanban board, ensure alignment and progress tracking |
### Source Code
- Architecture Type: Microservices deployed via containers, managed by ECS, behind an AWS Application Load Balancer (ALB).
- Review via pull request process:
- All commits merged via PRs.
- Peer review required before merging.
- Vaildation: Run tests during pipeline build
## CI/CD
- Goal: Automate the entire software delivery process for all PetClinic microservices via Jenkins
### Development Cycle Stages
| Stage | Description |
|----|----|
| **01. Code & Merge** | Developer writes code and merges it with the staging branch |
| **02. Build** | Compile, resolve dependencies |
| **03. Unit Test** | Run service-level tests |
| **04. Containerize** | Build Docker image for service |
| **05. Security Scan** | Security validation via Trivy |
| **06. Push to Registry** | Push validated image |
| **07. Deploy to Staging** | Deploy for validation |
| **08. Integration tests** | Validate service communication|
| **09. Deploy to Production** | Promote validated build |
| **11. Observability Check** | Validate monitoring and alerts |
### Jobs and environments
- Each microservices has his own Jenkins pipeline per environment.
| Environment | Purpose | Infrastructure |
|----|----|----|
| **Development (Local)** | Local testing, feature validation | Docker Compose |
| **Staging (AWS)** | Integration and pre-prod testing | ECS/EKS (staging cluster), RDS (test DB) |
| **Production (AWS)** | Live system | ECS/EKS (prod cluster), RDS (prod DB) |
## Storage
| **Type** | **Service** | **Use / Description** | **IOPS / Performance** | **Volume / Size** | **Backup Strategy** |
|-----|-----|-----|-----|-----|-----|
| **1. Database (RDBMS)** | Amazon RDS (MySQL) | Structured data for each microservice schema | 3,0006,000 (gp3 default) or provisioned as needed | 20 GB per schema | Automated daily snapshots (14-day retention) |
| **2. Block Storage** | Amazon EBS (gp3) | EC2-hosted Jenkins & ECS servers| 3,000 baseline | / | Not necessary |
| **3. Object Storage** | Amazon S3 | Logs, backups, images | Standard or Infrequent Access tiers | / | Cross-region replication or versioning enabled |
## Data
### 1. Location
- Eu-central-1 region
- Place database (RDS) and services in the same region and AZs.
### 2. Replication / Distribution
| Data Type | Replication / Distribution Strategy |
|----|----|
| **RDS (Postgres/MySQL)** | Multi-AZ synchronous replication |
| **S3 (images, artifacts)** | Automatic cross-AZ durability |
### 3. Links / Access
| Access type | Route|
|----|----|
| **Internal** | Microservices access RDS via private VPC links. |
| | Images in S3 accessed via IAM roles or pre-signed URLs. |
| **External** | ALB routes external requests. |
| | HTTPS enforced for secure data transfer. |
## Network
### Location
- Eu-central-1 region
- Deploy services across multiple AZs for high availability.
- All microservices, databases, and supporting infrastructure live inside a single VPC.
- Isolate the network from public internet by default
### Network Segmentation & Filtering
- Public subnets: ALB, NAT gateway.
- Private subnets: ECS, RDS.
- Security groups: Service-specific firewall rules
- Tweak default ACLs if necessary
### Addressing
- VPC: `10.0.0.0/16`
- Public subnet: `10.0.1.0/24`
- Private subnet: `10.0.2.0/24`
- Every service/database gets a internal IP in the private subnet
- Only load balancer or NAT gateway have public IP
## Compute
### Nodes
| Environment | Nodes | Notes |
|----|----|----|
| **Staging** | 3 ECS container instances (EC2) | Handles staging microservices, mirrors production setup |
| **Production / Live** | 3 ECS container instances (EC2) | Fixed-size cluster, no autoscaling to reduce costs |
| **Scalability** | N/A for autoscaling | Fixed node count to reduce cost but still allow horizontal scaling via ECS task count or manual node addition. |
### Container Management
#### Container Registry:
- Amazon ECR for all microservice Docker images.
- Each microservice image tagged by Git commit SHA.
#### Deployment Strategy:
- ECS tasks run one or more containers per node.
- Service definitions ensure each microservice has the desired number of tasks.
- Jenkins updates ECS service definition after build.
### ECS Orchestration
#### Cluster Setup:
- One ECS cluster per environment (staging and production).
- EC2 launch type for fixed nodes.
#### Service Definitions:
- Each microservice has an ECS service with a desired task count.
- Service linked to ALB .
## Security
| **Area** | **Implementation / Notes** |
|----|----|
| **1. Authentication, Authorization, Auditing (AAA)** | Spring Security |
| | IAM roles restrict AWS access per service |
| | Auditing: Not relevant since we don't handle sensitive data|
| | CloudWatch for app/service logs |
| **2. Code Security** | Static analysis via SonarQube|
| | No hardcoded credentials |
| | Secrets in AWS Secrets Manager |
| | Dependency scanning via Dependabot |
| **3. Traffic Security** | HTTPS enforced via ALB |
| | Internal TLS optional for microservices |
| | Security groups restrict inbound/outbound ports |
| | Private subnets for internal services and databases |
| **4. Instance / Container Security** | Use minimal and updated AMIs |
| | Regular patching, no direct SSH (bastion-only) |
| | Containers run as non-root users |
| | Vulnerability scanning before deploy |
| | Secrets passed via IAM roles or ECS environment vars |
## Observability
| **Aspect** | **Tools** | **Notes** |
|----|----|----|
| **Metrics** | **Prometheus** | Collect CPU, memory, and ECS task metrics from node exporters |
| | | If microservices expose `prometheus-metrics`, integrate directly. |
| | **Grafana** | Dashboards for system and service health |
| **Logs** | **AWS CloudWatch Logs** | ECS task logs streamed to CloudWatch|
| | |Structured JSON logging for easy filtering and search.|
| **Traces** | **AWS X-Ray** | Trace API calls across microservices. |
| **Alerts** | **CloudWatch Alarms** | CloudWatch for infrastructure-level alerts (CPU, memory, ECS health)
| | **Grafana Alerts** | Grafana alert rules for application metrics from Prometheus. |
| | | Alerts via email or Slack webhook.|
## Continuity & Recovery
| **Aspect** | **Approach / Tooling** | **Notes** |
|----|----|----|
| **Redundancy** | Multi-AZ deployment | RDS and ECS nodes deployed across multiple Availability Zones for high availability.|
| | | Load balancer automatically routes traffic to healthy tasks. |
| **Failover** | AWS-managed failover | RDS Multi-AZ provides automatic database failover.
| | |ECS services automatically restart failed tasks on healthy nodes.|
| | | Manual intervention only needed for regional failures. |
| **Backup** | AWS Backup / RDS Snapshots| Automated RDS daily backups with retention policy.
| | S3 Versioning | S3 bucket versioning for uploaded images and configs.|
| **Business Continuity Plan** | Operate from secondary region if needed | Documented procedure to restore environment in another AWS region using IaC templates (Terraform). |
| | | Prioritize restoring RDS, Config Server, and API Gateway. |
| **Disaster Recovery Plan** | Cold standby in alternate region | No live duplication to save cost.|
| | | Periodic replication of backups and images to secondary region. |
## Architecture Diagram
<p align="center">
<img src="assets/week1/aws-architecture-diagram.png" alt="Main Menu"/>
</p>
## Solutions stack
| **Layer** | **Technologies / Services** |
|----|----|
| **Application Layer** | Spring Boot microservices |
| **Runtime / Platform Layer** | Docker, Amazon ECS, Amazon ECR |
| **CI/CD Layer** | Jenkins, Gitea |
| **Infrastructure Layer** | Terraform, Ansible, Amazon EC2, VPC, subnets, security groups |
| **Database / Storage Layer** | Amazon RDS (MySQL), Amazon S3, Amazon EBS |
| **Observability Layer** | Prometheus, Grafana, CloudWatch |
| **Security Layer** | AWS IAM, Security Groups, HTTPS via ALB, Secrets Manager |
| **Continuity & Recovery Layer** | RDS automated snapshots, S3 versioning/replication, multi-AZ RDS, Terraform for redeploy |
| **Network & Delivery Layer** | Application Load Balancer (ALB), Route 53, NAT Gateway, Internet Gateway |