Nr ref.: LP/DEP/ZD/10
Strong Python + Pandas knowledge
- good SQL knowledge
- knowledge around AWS, Docker, Gitlab CI/CD
Duties:
- data flow design
- data pipelines optimization
- social media data preprocessing and cleaning for data modelling purposes
The project is about building a system that collects, analyze and generate insights based on Social Media data to improve market research and better support our patients.
Requirements:
- 2+ years of working with programming language focused on data pipelines,eg. Python or R
- 1+ years of experience working on GCP, Cloud (AWS/Azure/Google) or other cloud platform (optional)
- 1+ years of experience in data pipelines maintenance
- 1+ years of experience with different types of storage (filesystem, relation, MPP, NoSQL) and working with various kinds of data (structured, unstructured, metrics, logs, etc.)
- 1+ years of experience in working in data architecture concepts (in any of following areas data modeling,
- metadata mng., workflow management, ETL/ELT, real-time streaming, data quality, distributed systems)
- 2+ years of experience working with SQL
- Exposure to open source and proprietary cloud data pipeline tools such as Airflow, Glue and Dataflow (optional)
- Very good knowledge of relational databases (optional)
- Very good knowledge of Git, Gitflow and DevOps tools (e.g. Docker, Bamboo, Jenkins, Terraform
- Very good knowledge of Unix
- Good knowledge of Java and/or Scala
- Pharma data formats is a big plus (SDTM)