Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Amazon Data Engineer Sales Recommendations & Insights Science 
United States, Washington, Seattle 
356732323

16.03.2025
DESCRIPTION

• Admin for a small Redshift cluster
• Create + manage basic glue jobs that make structured data in S3 accessible via Athena, the Redshift cluster.
• Leverage Glue (or other appropriate tooling) to develop better training data pipelines
• Handle security and admin work for the account, particularly interfacing with internal corporate tools in a compliant manner
• Improve AWS Batch set-up. We use Batch for running model jobs, but I doubt our current set-up is ideal
• Work with scientists to improve training infrastructure. See last bullet to a degree; we don’t leverage Sagemaker to the full extent we could, and would be interested in improving on that front
• Work with scientists to deploy models. Potentially. We don’t know if we’ll be doing our own deployments, but if we do collaboration with scientists on setting up API end-points for external model access would be a value-addKey job responsibilities
• Comfort, or at least familiarity, with S3, Glue, Athena, Redshift, IAM/Secrets Manager, EC2 + security configs, etc.
• Some familiarity with Quicksight
• Basics of DB (Redshift) management, best practices
• Comfort, or at least familiarity, with PySpark
o Optimization of highly distributed Spark SQL jobs may well come up
o Some experience running Spark jobs on distributed clusters might be helpful. We have internal tools that do this, but understanding how to leverage them better would be a value-add.
• SQL
• Python (basics)
• Data pipeline management
• Ideally comfortable with Amazon internal tooling (internal candidates only, obviously)
o Cradleo Quicksight
o Has handled internal AWS stuff before


BASIC QUALIFICATIONS

- 3+ years of data engineering experience
- Experience in at least one modern scripting or programming language, such as Python, Java, Scala, or NodeJS
- Knowledge of batch and streaming data architectures like Kafka, Kinesis, Flink, Storm, Beam
- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions