Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

3M Principal Data Engineer 
United States, Minnesota, Maplewood 
132045101

05.05.2024

The Impact You’ll Make in this Role

The Principal Data Engineer will join the Corporate Research Systems Lab (CRSL) to develop scalable Data Systems. As part of an agile team, you will enable applications in diverse markets including energy, manufacturing, personal safety, transportation, electronics, and consumer. You will have the opportunity to design and support an Enterprise Data Mesh to empower informatics and digital technologies for users across the globe:

  • Architect, design, and build scalable, efficient, and fault-tolerant data operations.

  • Collaborate with senior leadership, analysts, engineers, and scientists to implement new mesh domain nodes and data initiatives.

  • Drive technical architecture for accelerated solution designs, including data integration, modeling, governance, and applications.

  • Explore and recommend new tools and technologies to optimize the data platform.

  • Improve and implement data engineering and analytics engineering best practices.

  • Collaborate with data engineering and domain nodes teams to design physical data models and mappings.

  • Work with scientists and informaticians to develop advanced digital solutions and promote digital transformation and technologies.

  • Perform code reviews, manage code performance improvements, and enforce code maintainability standards.

  • Develop and maintain scalable data pipelines for ingesting, transforming, and distributing data streams.

  • Advise and mentor 3M businesses, data scientists, and data consumers on data standards, pipeline development, and data consumption.

Your Skills and Expertise

To set you up for success in this role from day one, 3M requires (at a minimum) the following qualifications:

  • Bachelor’s degree or higher (completed and verified prior to start) from an accredited university.

  • Twelve (12) years of professional experience in data warehouse/lakehouse design and development in a private, public, government or military environment

  • Completely proficient in advanced SQL, Python/PySpark/Scala (any object-oriented language concepts), ML Libraries

  • Must have hands-on experience in Python to extract data from APIs, build data pipelines.

Additional qualifications that could help you succeed even further in this role include:

  • Exceptional background in data engineering, data systems, and data governance and having comfort working with structured and unstructured data and analyses. Exposure to data and data types in the Materials science, chemistry, computational chemistry, physics space a definite plus, but not required.

  • Proficiency in developing or architecting modern distributed cloud architecture and workloads (AWS, Databricks preferred). Familiarity with data mesh style architecture design principles.

  • Proficiency in building data pipelines to integrate business applications and procedures.

  • Solid understanding preferred of advanced Databricks concepts like Delta Lake, MLFlow, Advanced Notebook Features, Custom Libraries and Workflows, Unity Catalog, etc.

  • Experience with AWS cloud computing services and infrastructure developing data lakes and data pipelines leveraging multiple technologies such as AWS S3, AWS Glue, Elastic MapReduce, etc. and awareness of considerations for building scalable, distributed computational systems on Spark.

  • Experience with stream-processing systems: Amazon Kinesis, Spark, Storm, Kafka, etc.

  • Hands-on experience with relational SQL and NoSQL databases.

  • Data quality and validation principles experience, security principles data encryption, access control, authentication & authorization.

  • Deep experience in definition and implementation of feature engineering.

  • Experience with Docker containers and Kubernetes, experience developing or interacting with APIs.

  • Experience in using data orchestration workflows using open-source tools like Temporal.io, Apache Airflow is a plus.

  • Knowledge of data visualization tools like Dash Apps, Tableau, Power BI, etc.

  • Good experience with agile development processes and concepts with leveraging project management tools like JIRA and Confluence.

  • Devise and implement data engineering best practices across teams, optimize and redesign existing data engineering solutions to improve efficiency or stability, as well as monitoring of and consulting with domain node teams.

  • Excellent interpersonal, collaborative, team building, and communication skills to ensure effective collaborations with matrixed teams.


Please access the linked document by clicking select the country where you are applying for employment, and review. Before submitting your application you will be asked to confirm your agreement with the terms.