All positions
Senior Site Reliability Engineer - AI Platform
Department:Platform Engineering
Location:Barcelona
About the opportunity
We are seeking a Senior Site Reliability Engineer to join the Platform Engineering Domain in the AI Platform Team.
The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build "the bank the world loves to use." The AI Platform team contributes to this mission by creating scalable, secure, and compliant infrastructure solutions that support MLOps and GenAI capabilities.
The ideal candidate is not only a seasoned SRE expert ready to apply their skills to the challenges of AI infrastructure but also an enthusiastic learner excited to grow alongside a team pioneering cutting-edge platform solutions. If you thrive in an environment where expertise meets curiosity, and where mentorship and innovation go hand in hand, we’d love to hear from you.
In this role, you will:
- Design, develop, and implement platform solutions that enhance the reliability, security, and scalability of the AI Platform infrastructure.
- Provide technical leadership in cloud infrastructure, networking, CI/CD, and security for AI and MLOps workloads.
- Collaborate closely with Data Scientists, ML Engineers, and Product Teams to ensure seamless model deployment and operational efficiency.
- Mentor and coach team members, fostering a culture of knowledge sharing, technical excellence, and continuous improvement.
- Take an active role in shaping the team's strategy, roadmap, and architecture.
- Drive incident management and troubleshooting efforts, ensuring a stable and predictable AI development and deployment environment.
- Improve observability and monitoring, ensuring the AI Platform meets performance and compliance requirements.
What you need to be successful
Background and skills:- Strong hands-on experience in designing, implementing, and maintaining cloud-based infrastructure, particularly in AWS.
- Expertise in orchestration for AI/ML workloads.
- Strong experience in infrastructure as code (Terraform, CloudFormation, or similar).
- Proficiency in at least one programming language (Python preferred).
- Experience with networking and security best practices in cloud environments.
- Familiarity with MLOps tools (e.g., AWS SageMaker, Bedrock, Kubeflow, MLflow).
- Hands-on experience with CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, or similar).
- Experience in AI/ML production systems and the unique challenges of scaling AI workloads.
- Strong understanding of compliance and governance in AI/ML platforms.
- Familiarity with observability tools (DataDog, Prometheus, Grafana, OpenTelemetry).
- Excellent collaboration and communication skills, with the ability to work across teams and mentor engineers.
- Strong sense of ownership, with a proactive approach to problem-solving and process improvements.
- Passion for building high-quality, scalable, and secure AI infrastructure.
- Eagerness to learn and contribute to the evolution of AI platforms.
What’s in it for you
- Accelerate your career growth by joining one of Europe’s most talked about disruptors 🚀
- Employee benefits that range from a competitive personal development budget, work from home budget, discounts to fitness & wellness memberships, language apps and public transportation
- As an N26 employee you will have access to a Premium subscription on your personal N26 bank account, as well as subscriptions for friends and family members
- Additional day of annual leave for each year of service
- A high degree of autonomy and access to cutting edge technologies - all while working with a friendly team of peers of diverse nationalities, life experiences and backgrounds
- A relocation package with visa support for those who need it