Job responsibilities
- Develop automation scripts and tools using Python, Bash, or other scripting languages.
- Streamline operational tasks through automation to reduce manual intervention.
- Set up and manage monitoring tools like Dynatrace, Splunk, Prometheus, and Grafana.
- Create alerting mechanisms to identify issues before they affect performance.
- Participate in incident management, troubleshooting, and root cause analysis.
- Conduct post-incident reviews to identify improvements and prevent recurrences.
- Optimize systems for performance, latency, and capacity management.
- Monitor and maintain databases for data integrity and availability.
- Collaborate with the AD team on database performance tuning and scaling.
- Architect, deploy, and manage AWS services (EC2, S3, Lambda, RDS) for scalable infrastructure.
- Implement and manage CI/CD pipelines using Jenkins, GitLab, or similar tools, and support Docker and Kubernetes environments for automated deployment and scaling
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Proficiency in Python, Bash, or other relevant scripting languages.
- Experience with relational and NoSQL databases.
- Familiarity with monitoring and observability tools like Dynatrace, Splunk, Prometheus, Grafana, CloudWatch, and AppDynamics.
- Strong debugging experience in Core Java, Spring Boot, Spring modules, Oracle, or PostgreSQL.
- Experience with messaging and integration frameworks such as Kafka.
- Proficiency with AWS or other cloud platforms (GCP, Azure), including services like EC2, S3, EKS, Lambda, SQS, and RDS.
- Familiarity with CI/CD pipelines using tools like Jenkins, Jules, GitHub, and Spinnaker.
- Strong problem-solving skills with the ability to troubleshoot complex systems.
Preferred qualifications, capabilities, and skills
- Ability to code in at least one programming language, Java , Python etc