Share
You should be expert at implementing and operating stable, scalable data flow solutions from production systems into end-user facing applications/reports. These solutions will be fault tolerant, self-healing and adaptive. You will be working on developing solutions that provide some of the unique challenges of space, size and speed. You will implement data analytics using cutting edge analytics patterns and technologies that are inclusive of but not limited to various AWS Offerings - EMR, Lambda, Kinesis, and Spectrum. You will extract huge volumes of structured and unstructured data from various sources (Relational /Non-relational/No-SQL database) and message streams and construct complex analyses. You will write scalable code and tune performance running over billion of rows of data. You will implement data flow solutions that process data on Spark, Redshift and store in both Redshift and File based storage (S3) for reporting and adhoc analysis.You should be detail-oriented and must have an aptitude for solving unstructured problems. You should work in a self-directed environment, own tasks and drive them to completion.
- 2+ years of data engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Experience with one or more query language (e.g., SQL, PL/SQL, DDL, MDX, HiveQL, SparkSQL, Scala)
- Experience with one or more scripting language (e.g., Python, KornShell)
- Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
- Experience with any ETL tool like, Informatica, ODI, SSIS, BODI, Datastage, etc.
These jobs might be a good fit