Design, build, and maintain robust and efficient data pipelines and APIs that collect, process, and serve data from various sources, including backend events logged as part of LLM flows, customer interactions across multiple channels, CS agents, LLM evaluations etc.
Work closely with Machine Learning team, Data Science and cross-functional engineering teams in the Community Support Platform, understanding their productivity and feature pain points, and build solutions to resolve them scalably and flexibly.
Develop, automate and standardize: logging, enriching, serving data for ML training, inference, benchmarking and monitoring (anomaly detection, safe deploys) to build the next generation of Generative AI products
Advance the state of our 3rd party data integrations by building and extending framework for data exchange, governance and lineage of data
Lead Technological Advancement: Drive the evolution of CS data architecture towards modern technologies and collaborate with infrastructure engineering teams to evolve how we integrate data between batch and serving layers worlds, ML and non-ML, and allow systems to deal with data more effectively
Participate in all phases of software development including architecture design, implementation and testing.
Work collaboratively with cross-functional partners including product managers, operations and data scientists, identify opportunities for business impact, understand and prioritize requirements for data pipelines, drive engineering decisions and quantify impact.
Support teammates in enabling code quality, operational excellence, and shared learning.
Your Expertise:
9+ years industry data engineering or backend engineering with data background
Proven background in developing distributed batch/streaming data pipelines (e.g. Spark, Kafka/Flink) using distributed storage systems (e.g., HDFS, S3)
Good knowledge of query authoring (SQL) and data processing (batch and streaming),
Demonstrated ability to analyze large data sets to identify gaps and inconsistencies, provide data insights, and advance effective product solutions
Expertise with ETL schedulers such as Apache Airflow, Luigi, Oozie, AWS Glue or similar frameworks
Solid understanding of data warehousing concepts and hands-on experience with relational databases (e.g., PostgreSQL, MySQL) and columnar databases (e.g., Redshift, BigQuery, HBase, ClickHouse)
Experience working on/with end-to-end Machine Learning products is a significant plus.
Experience developing and maintaining large-scale backend distributed systems using Java or Kotlin is a significant plus.
Excellent collaboration and communication abilities, with the ability to work effectively with cross-functional teams.
Strong architectural knowledge, comfort working across multiple repositories, services, and environments.
Comfortable navigating ambiguity and ownership of problem definitions