Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

3M Principal Data Engineer 
United States, Minnesota, Maplewood 
930068238

17.12.2024

The Impact You’ll Make in this Role

As a Principal Data Engineer, you will have the opportunity to design and support an Enterprise Data Mesh to empower informatics and digital technologies for users across the globe:

  • Architect, design, and build scalable, efficient, and fault-tolerant data operations.
  • Collaborate with senior leadership, analysts, engineers, and scientists to implement new mesh domain nodes and data initiatives.
  • Drive technical architecture for accelerated solution designs, including data integration, modeling, governance, and applications.
  • Explore and recommend new tools and technologies to optimize the data platform.
  • Improve and implement data engineering and analytics engineering best practices.
  • Collaborate with data engineering and domain nodes teams to design physical data models and mappings.
  • Work with scientists and informaticians to develop advanced digital solutions and promote digital transformation and technologies.
  • Perform code reviews, manage code performance improvements, and enforce code maintainability standards.
  • Develop and maintain scalable data pipelines for ingesting, transforming, and distributing data streams.
  • Advise and mentor 3M businesses, data scientists, and data consumers on data standards, pipeline development, and data consumption.
  • Provide technical guidance and mentorship, ensure adherence to best practices, and maintain high software quality through rigorous testing and code reviews.
  • Guide project planning and execution, manage timelines and resources, and facilitate effective communication between team members and stakeholders.
  • Foster a positive team environment, assist in recruitment, and provide training opportunities to address skill gaps.

Your Skills andExpertise

To set you up for success in this role from day one, 3M requires (
at a minimum) the following qualifications:

  • Bachelor’s degree or higher in Computer Science from an accredited university.
  • Ten (10) years of professional experience in data management, data engineering, data governance, and data warehouse/lakehouse design and development with proficiency. across SQL and NoSQL data management systems and having comfort working with structured and unstructured data and analyses.
  • Five (5) years of extensive experience and proficiency with Python, Apache Spark, PySpark, and Databricks
  • Three (3) of hands-on experience in Python to extract data from APIs, build data pipelines.

Additional qualifications that could help you succeed even further in this role include:

  • Exposure to data and data types in the Materials science, chemistry, computational chemistry, physics space.
  • Proficiency in developing or architecting modern distributed cloud architecture and workloads (AWS, Databricks preferred). Familiarity with data mesh style architecture design principles.
  • Proficiency in building data pipelines to integrate business applications and procedures.
  • Solid understanding preferred of advanced Databricks concepts like Delta Lake, MLFlow, Advanced Notebook Features, Custom Libraries and Workflows, Unity Catalog, etc.
  • Experience with AWS cloud computing services and infrastructure developing data lakes and data pipelines leveraging multiple technologies such as AWS S3, AWS Glue, Elastic MapReduce, etc. and awareness of considerations for building scalable, distributed computational systems on Spark.
  • Experience with stream-processing systems: Amazon Kinesis, Spark, Storm, Kafka, etc.
  • Data quality and validation principles experience, security principles data encryption, access control, authentication & authorization.
  • Deep experience in definition and implementation of feature engineering.
  • Experience with Docker containers and Kubernetes, experience developing or interacting with APIs.
  • Experience in using data orchestration workflows using open-source tools like Temporal.io, Apache Airflow is a plus.
  • Knowledge of data visualization tools like Dash Apps, Tableau, Power BI, etc.
  • Good experience with agile development processes and concepts with leveraging project management tools like JIRA and Confluence.
  • Devise and implement data engineering best practices across teams, optimize and redesign existing data engineering solutions to improve efficiency or stability, as well as monitoring of and consulting with domain node teams.
  • Excellent interpersonal, collaborative, team building, and communication skills to ensure effective collaborations with matrixed teams.


Please access the linked document by clicking select the country where you are applying for employment, and review. Before submitting your application, you will be asked to confirm your agreement with the terms.