Job Description: -
The Position is a senior technical, hands-on delivery role, requiring the knowledge of data engineering, cloud infrastructure and platform engineering, platform operations and production support using ground-breaking cloud and big data technologies.
In this role you will:
- Develop a thorough understanding of the data science lifecycle, including data exploration, preprocessing, modelling, validation, and deployment
- Design, build, and maintain tree-based predictive models, such as decision trees, random forests, and gradient-boosted trees, with a low-level understanding of their algorithms and functioning.
- Ingestion and provisioning of raw datasets, enriched tables, and/or curated, re-usable data assets to enable variety of use cases.
- Evaluate modern technologies, frameworks, and tools in the data engineering space to drive innovation and improve data processing capabilities.
Core/Must Have skills.
- Significant data analysis experience using python, SQL and spark. Experience in python or spark to write scripts for data transformation, integration and automation tasks.
- 3-6 years’ experience with cloud ML (AWS) or any similar tools.
- Design, build, and maintain tree-based predictive models, such as decision trees, random forests, and gradient-boosted trees, with a low-level understanding of their algorithms and functioning.
- Strong Experience with statistical analytical techniques, data mining and predictive models.
- Conduct A/B testing and other model validation techniques to ensure the accuracy and reliability of data models..
- Experience with optimization modelling, machine learning, forecasting and/or natural language processing.
- Hands-on experience with Amazon S3 data storage, data lifecycle policies, and integration with other AWS services.
- Maintain, optimize, and scale AWS Redshift clusters to ensure efficient data storage, retrieval, and query performance
- Utilize Amazon S3 to store raw data, manage large datasets, and integrate with other AWS services to ensure secure, scalable, and cost-effective data solutions.
- Experience in implementing CI/CD Pipelines in AWS.
- At least 4+ years of experience in Database Design and Dimension modelling using SQL
- Advanced working SQL Knowledge and experience working with relational and NoSQL databases as well as working familiarity with a variety of databases (SQL Server, Neo4J)
- Strong analytical and critical thinking skills, with ability to identify and resolve issues in data pipelines and systems.
- Strong communication skills to effectively collaborate with team members and present findings to stakeholders.
- Collaborate with cross-functional teams to ensure successful implementation of solutions.
- Experience with OLAP, OLTP databases, and data structuring/modelling with understanding of key data points.
Good to have:
- Apply domain knowledge (if applicable) in financial fraud to enhance predictive modelling and anomaly detection capabilities.
- Knowledge of AWS IAM for managing secure access to data resources.
- Familiarity with DevOps practices and automation tools like Terraform or CloudFormation.
- Experience with data visualization tools like Quick Sight or integrating Redshift data with BI tools (Tableau, PowerBI, etc.).
- AWS certifications such as AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect are a plus.
EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.