Nr ref.: LP/MLOPS/ZD/11
Development of ML Ops platform
- Design and implement automations to minimize the operational footprint of our MLOPs platform in operationalizing ML models
- Focus on improving reliability of our MLOPs platform by following best engineering practices and by supporting trouble-shooting effort
- Plan infrastructure resources and strategies to best support scalability requirements from different types of ML models
- Enable observability with state-of-art tools and solutions for our MLOPs platform users as well as for our own operational effort
Musthaves:
- Extensive handson DevOps experience in with Docker, Kubernetes, CI/CD tools (e.g. GitlabCI, ArgoCD, Kustomization, Helm), and shell script
- Experience with Service Mesh (e.g. istio), Cloud Infrastructure (e.g. preferably AWS), Monitoring and logging tool stacks (e.g. Grafana, Loki, Prometheus, ELK)
- Good knowledge about modern tools/solutions for authentication/authorization, secret management, and IT security
- Good communication skills and team player
Nicetohaves:
- Experience with Kubeflow
- Expereince in MLOps activities
- Good Python coding skills