The point where experts and best companies meet
Share
As a Data Engineer with IPC team, you will partner with Software Engineers, Applied Scientists and Data Scientist. You will turn the data requirements of machine learning (ML) and reinforcement learning (RL) models into products that can be used for training and production. In close collaborations with Software Engineers and Senior DEs across teams, you will provide technical expertise and build end-to-end data solutions that are highly available, scalable, stable, secure, and cost-effective. You are passionate about working with huge unstructured, semi-structured and structured datasets and have experience with the organization and curation of data for analytics and model training. You have a strategic and long-term view on architecting advanced data eco systems. You will work on analyzing, cleaning and transform data from various data sets into useable data for ML/RL models. You are experienced in building efficient and scalable data services and have the ability to integrate data systems with AWS tools and services to support a variety of customer use cases/applications.
Key job responsibilities
• Designing, implementing, and maintaining data infrastructure to support a wide variety of large and complex data sets, ensuring high performance, availability, and integrity for RL/ML models.
• Ability to analyze data from various sources, develop and execute Python notebooks for validating the data consumptions needs of the models.
• Implement data ingestion routines both real time and batch using best practices in data modeling, ETL/ELT processes by leveraging AWS technologies and big data tools.
• Gather business and functional requirements and translate these requirements into robust, scalable, operable solutions with a flexible and adaptable data architecture.
• Collaborate with engineers to help adopt best practices in data system creation, data integrity, test design, analysis, validation, and documentation.
• Collaborate with applied and data scientists to create fast and efficient algorithms that exploit our rich data sets for optimization, statistical analysis, prediction, clustering, and machine learning.
• Help continually improve ongoing reporting and analysis processes, automating or simplifying self-service modeling and production support for customers.
• Develop comprehensive monitoring, alarming, and data quality controls for all of the above.
A day in the life
- 3+ years of data engineering experience
- 2+ years of analyzing and interpreting data with Redshift, Oracle, NoSQL etc. experience
- Knowledge of distributed systems as it pertains to data storage and computing
- Experience with data modeling, warehousing and building ETL pipelines
- Experience working on and delivering end to end projects independently
- Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
- Experience with Redshift, Oracle, NoSQL etc.
- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
- Master's degree in computer science, engineering, analytics, mathematics, statistics, IT or equivalent
- Familiarity with various AI/ML modeling techniques and the ability to adapt to different project requirements.
These jobs might be a good fit