We are looking for a Middle Site Reliability Engineer (SRE) to join our team and help to maintain, scale, and improve the reliability of modern cloud-native platforms. This role is ideal for an engineer with solid production experience in Kubernetes and distributed systems who enjoys solving complex reliability, performance, and operational challenges.
As an SRE, your primary focus will be ensuring platform stability, availability, and operational excellence across customer environments. You will work closely with a Tech Lead and cross-functional teams to improve system resilience, strengthen incident response processes, and continuously enhance infrastructure reliability, security, and performance.
A key part of this role is active participation in incident management, including on-call, investigation of P1/P2 incidents, root cause analysis, and implementation of long-term preventive measures.
Location: Ukraine/Europe, remote.Be responsible for platform stability and ad hoc implementations on the projects you are assigned to.
Continuously evaluate infrastructure and propose improvements in: Security, Performance, Cost optimization, (with support from a Tech Lead).
Participate in on-call for critical P1/P2 incidents
Prepare runbooks and a knowledge base for L2 engineers
Handle day-to-day operational and infrastructure tasks raised by customers
Cloud & Infrastructure:
Strong hands-on experience with AWS (minimum 3 years).
Solid production experience with Kubernetes (minimum 2+ years)
Experience with Amazon EKS (preferred) or other managed Kubernetes platforms
Creating and managing clusters
Performing cluster upgrades
Troubleshooting production workloads
Infrastructure as Code using Terraform
Strong Linux knowledge
Kubernetes & Container Ecosystem:
Deep understanding of Kubernetes architecture (API Server, Scheduler, Controller Manager, etcd, CNI, CSI)
Strong hands-on experience troubleshooting Kubernetes workloads (pods, nodes, networking, DNS, storage, scheduling)
Proven ability to debug production issues (CrashLoopBackOff, OOMKilled, Pending pods, failing probes, resource pressure)
Troubleshooting service-to-service communication issues (ClusterIP, NodePort, Ingress, LoadBalancer)
Experience operating workloads on managed Kubernetes platforms (e.g., EKS)
Experience managing and deploying workloads using Helm (chart development, templating, dependency management, values structuring)
Hands-on experience with: HPA (Horizontal Pod Autoscaler), Karpenter, KEDA (would be a plus)
Understanding of scaling strategies, limits, and cost implications
Experience working with GitOps workflows
Familiarity with ArgoCD — nice to have
Understanding of deployment strategies (rolling, blue/green, canary)
Experience operating stateful workloads inside Kubernetes
Understanding of Persistent Volumes, StorageClasses, and CSI drivers
Experience working with databases deployed in Kubernetes (PostgreSQL, MySQL, Redis, or other stateful services)
Databases & Data Layer:
Experience managing AWS databases: RDS (MySQL, PostgreSQL, etc.), DynamoDB.
Understanding of:
Backup and restore strategies
Scaling (read replicas, autoscaling, provisioned vs on-demand capacity)
Monitoring database health and performance
Solid knowledge of SQL queries, indexing, query optimization, and execution plans
Database configuration and tuning (connections, memory, storage engines, replication settings, parameter groups, etc.)
Experience managing databases running inside Kubernetes (stateful workloads)
Understanding of networking and security between applications and databases
AWS Services & Networking:
Managing applications on EC2, ECS, EKS, Serverless architecture (Lambda).
Experience with S3, SNS, SQS.
Networking & Edge: CloudFront distributions, ALB / NLB, AWS WAF.
Observability & Monitoring:
Setting up and maintaining an observability stack:
Prometheus
Grafana
Alloy (or similar collectors)
Victoria metrics would be a plus
Configuring alerting
Working with CloudWatch metrics and logs
Using SNS + Lambda for alert automation and routing
Monitoring Kubernetes clusters and databases