Expoint – all jobs in one place
The point where experts and best companies meet
Limitless High-tech career opportunities - Expoint

Intercontinental Exchange - ICE Lead Site Reliability Engineering SRE 
United States, Florida, Jacksonville 
664278960

29.06.2025

Responsibilities

Lead SRE to assist with day-to-day activities supporting Mortgage Servicing Application services related to production support, releases, and incident management. Build actionable alerts/automation for preventing incidents, detecting performance bottlenecks, and identifying maintenance activities.

  • Build and maintain tools and solutions for our operations platform, ensuring that we meet our customer service standards and reduce errors
  • Lead complex projects such as data center migrations, major systems upgrades, tech stacks
  • Update existing processes and design new processes as needed to optimize performance
  • Actively participate in or own continuous improvement projects driven by automation
  • Employ deep troubleshooting skills to improve the availability, performance, and security of IMT Services.
  • Implement automated tests, automated deployments, and operational tools
  • Collaborate with Product and Support teams to plan and deploy product releases
  • Conduct root cause analysis and post-mortems for production incidents
  • Participate in on-call rotations and lead incident response efforts
  • Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
  • Ensure services are designed with 24/7 availability and operational readiness and rigor
  • Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
  • Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
  • Identify, evaluate, and execute preventive measures to minimize/avoid impact to the customers experience. Proactive v/s Customer escalated
  • Resolution of product/service defects or design changes, infrastructure changes, or operational changes
  • Partner with other SREs and lead by example - contributor more than a delegator

Knowledge and Experience

  • 7+ years of experience in DevOps, SRE, or infrastructure engineering roles in 24x7 Production support services environments
  • BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
  • Fluency with one or more current generation scripting language (Python/Shell/Perl/ PHP/Ruby) AND/OR Java Development and .NET
  • Excellent troubleshooting skills, utilizing a systematic problem-solving approach
  • Demonstrated experience in designing, analysing, and diagnosing large-scale distributed systems + Windows Server and Linux systems internals (system libraries, file systems, client-server protocols)
  • Experience in Windows, Linux, OCP, and AWS
  • Experience with Continuous Integration and Continuous Delivery concepts
  • Hand-on experience in Infrastructure as code tools like Terraform, Spacelift AND/OR Chef, Salt Stack, Ansible, Puppet
  • Good to have experience in Containerization concepts like Kubernetes, Docker
  • Proven strength in SaaS services, experience in massive scale web operations
  • Experience with monitoring and alerting tools (Splunk, BigPanda, PagerDuty)
  • Experience with automation of business continuity/disaster recovery/application resiliency
  • Process-oriented with great documentation skills (Confluence)
  • Experience with data structures/formats such as XML, JSON, YAML, and HCL
  • Must be able to multitask in a fast-paced environment with focus on timeliness, documentation, and communications with peers and business users alike
  • Experience with deployment automation tools like UCD and Azure DevOps (ADO)