About the RoleAs a Staff Engineer on the Atlas Data Federation and Archiving team, you will lead the design, optimization, and scalability of our storage and federated query systems. This role focuses on high-performance distributed storage, data lifecycle management, and efficient data retrieval at scale.
This role can be based in New York City, Austin, San Francisco, Seattle, or remotely in the United States.
What You’ll DoStorage & Data Processing Performance
- Architect and optimize large-scale storage solutions for federated data access, ensuring efficient retrieval, indexing, and query performance
- Optimize data archival pipelines for high-throughput ingestion, durability, and cost-efficiency
- Improve data tiering and lifecycle policies for moving and querying data efficiently across hot, warm, and cold storage tiers
- Reduce operational costs through intelligent storage layout, compaction strategies, and query execution optimizations
Distributed Query & Execution Engine
- Improve and scale our distributed query execution engine, optimizing it for multi-source federated queries and data lake processing
- Enhance query performance across object storage (e.g., S3, GCS, Azure Blob) by optimizing indexing, partitioning, and compaction techniques
- Implement workload-aware autoscaling for query execution and data processing
- Reduce incident rates by improving system resilience, failover mechanisms, and observability
Technical Leadership & Mentorship
- Guide architectural decisions and lead design reviews across engineering teams
- Mentor engineers in distributed systems, data storage optimization, and operational excellence
- Partner with Product Management to define the technical roadmap for storage and data federation solutions
- Participate in on-call rotation, providing senior oversight for incident response and postmortem retrospectives
What We Look For- 10+ years experience in software engineering, with a focus on backend and distributed storage systems
- Expertise in large-scale storage systems, such as distributed databases, cloud object storage (S3, Azure Blob, GCS), or data lake technologies (Iceberg, Delta Lake, Hudi, etc.)
- Strong background in designing and optimizing storage layers, indexing, and data lifecycle management
- Experience optimizing query engines for high-volume, low-latency federated data access
- Track record of improving system reliability, observability, and cost-efficiency
- Experience with Kubernetes-based deployment of distributed storage or query systems
- Proficiency in Go or Java (preferred, but not required)
- Deep understanding of query optimizers, storage formats (Parquet, ORC), and indexing strategies
- Experience with disaggregated storage and cloud-native data lake solutions
- Proven ability to lead technical initiatives as an individual contributor while mentoring senior engineers and driving technical excellence within a team.