Responsibilities
- Design, develop, and maintain scalable Python-based data pipelines for processing large volumes of structured and unstructured data.
- Build automation scripts to streamline recurring data workflows, monitoring tasks, and data quality validation processes.
- Collaborate with data scientists to support AI and machine learning model deployment, feature engineering, and production data flow integration.
- Write efficient and maintainable SQL queries to support reporting, analytics, and data exploration needs.
- Participate in the development of internal tools to improve data access and usability across the organization.
- Monitor and improve data pipeline reliability, including logging, alerting, and performance tuning.
Knowledge and Experience
- 3+ years of experience as a Data Scientist or in a similar role.
- Strong programming skills in Python, with experience writing reusable libraries and working with data manipulation libraries
- Advanced proficiency in SQL and experience working with large-scale databases (e.g., PostgreSQL, MSSQL, Oracle…).
- Experience with AI/ML workflows, supporting model training and inference pipelines in production environments.
- Solid understanding of data modeling, warehousing, and distributed data processing concepts.
Advantage
- Exposure to real-time data processing technologies (e.g., Kafka, Spark Streaming).
- Background in finance, trading systems, or financial data pipelines.