Design, build, and maintain robust and efficient data pipelines that collect, process, and storedata from various sources, including user interactions, listing details, and external data feeds.
Develop data models that enable the efficient analysis and manipulation of data formerchandising optimization. Ensure data quality, consistency, and accuracy.
Build scalable data pipelines (SparkSQL & Scala) leveraging Airflow scheduler/executorframework
Collaborate with cross-functional teams, including Data Scientists, Product Managers, andSoftware Engineers, to define data requirements, and deliver data solutions that drivemerchandising and sales improvements.
Contribute to the broader Data Engineering community at Airbnb to influence tooling andstandards to improve culture and productivity
Improve code and data quality by leveraging and contributing to internal tools to automaticallydetect and mitigate issues
Your Expertise:
5-9+ years of relevant industry experience with a BS/Masters, or 2+ years with a PhD
Extensive experience designing, building, and operating robust distributed data platforms (e.g., Spark, Kafka, Flink, HBase) and handling data at the petabyte scale.
Strong knowledge of Java, Scala, or Python, and expertise with data processing technologies and query authoring (SQL).
Demonstrated ability to analyze large data sets to identify gaps and inconsistencies, provide data insights, and advance effective product solutions
Expertise with ETL schedulers such as Apache Airflow, Luigi, Oozie, AWS Glue or similar frameworks
Solid understanding of data warehousing concepts and hands-on experience with relational databases (e.g., PostgreSQL, MySQL) and columnar databases (e.g., Redshift, BigQuery, HBase, ClickHouse)