In this role, you will:
- Lead complex, broad impact initiatives including provision of high level systems consultation for the technology teams
- Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
- Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
- Make decisions on technical changes and enhancements
- Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
- Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Required Qualifications:
- 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Desired Qualifications:
- Must have 5+ Years of experience as Site Reliability Engineer
- Knowledge/experience of Python/Shell scripting.
- Knowledge/experience of Puppet/Ansible.
- 5+ years of big data experience needed (Big Query, Hadoop)
- 5+ years with Linux O/S capabilities
- 3 years of experience in AIML area (MLOps)
- 3+ years of Pyspark experience
- 3-5+ years of experience with Tableau/ MicroStrategy or similar BI tools
- Strong experience with monitoring systems such as Splunk, App Dynamics.
- Working knowledge of Auto ML technologies such as H2O Driverless AI, DataRobot, VertexAI, Elastic and Vector DB
- Good understanding and hands on with GCP
Job Expectations:- Participate and lead development of Generative AI Platform Capabilities
- Responsible for AI model delivery to on-prem infrastructure and cloud platforms (GCP, Azure ML)
- Participate in day-to-day scrum calls for platform capability build
- Research industry best practices, evaluate new technologies, develop standards and engineering best practices and recommend innovative solutions that support automation and improve platform resiliency and fault tolerance of critical applications
- Lead and execute on roadmaps that align with technology and business strategy. Perform hardware and capacity planning, analysis and forecasts for your portfolio of applications with focus on highest availability, scalability, performance, and timely delivery
- Act as an expert resource for other technical teams within DTI
- Lead and deliver day-to-day Application/Platform support services for Digital, AI/ML Platforms
- Responsible for support functions and driving the execution of multiple Application/Platform support services including incident triage, root cause analysis, changeevaluation-execution-validation,deployment management, and risk & vulnerability management.
- Provides on-call production support of Mission Critical applications and resolve issues with in RTO.
- Ensure effective production systems monitoring, alarming and notificationresponse/maintenance.
- Leverage diagnostic tools to maintain, troubleshoot and restore service or data tosystems
- Structure Operational data and come up with creative data visualization solutions (Build Dashboards)
- Maintain and update support documentation (e.g. game plans, run books, procedures, and process).
- Communicate, co-ordinate and collaborate with multiple support teams and stakeholders
- 1+year of experience in LLM , Generative AI (dev/ops)
- 1+year of experience in Elastic Search, Vector Database, Model Development would be added benefit.
- Experience with data processing technology (AbInitio, Informatica, IBM DataStage)
- Experience with large data technology (Hadoop, Teradata, Elasticsearch, etc.)
- Understanding of Agile practices and ability to work with Agile teams to define and track user stories
- Experience with implementing complex F5 or other Load Balancer Technologies
- Working knowledge of building high resiliency grid/cloud computing infrastructure supporting AIML and NLP workloads
- Knowledge and understanding of Cloud computing, PaaS design principles and micro services and containers
- Working knowledge/experience with Azure and/or GCP
- Working knowledge/experience with on-premise and Public Cloud technologies, such as Cloud Foundry, Kubernetes, Docker
- Experience in leading / facilitating analysis of current systems and problem identification and resolution
- Ability to lead / facilitate technically complex discussions and working sessions in person or via teleconference
- Excellent verbal, written, and interpersonal communication skills. Ability to articulate technical solutions to both technical and business audiences
- Recent and demonstrated ability to influence management on technical or business solutions
- Working knowledge of design and build grid computing with CPU and GPU supporting AIML and NLP
- Working knowledge of high-performance storage technologies along with Object Storage
- Knowledge and understanding of network infrastructure to support high throughput and low latency grid computing
- Willing to work in shifts
22 Jun 2025
Wells Fargo Recruitment and Hiring Requirements:
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.