Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

IBM Senior Development Architect 
India, Karnataka, Bengaluru 
948120455

03.07.2024

Your Role and Responsibilities
How can we effectively design and deliver Observability platforms and services to be employed for IBM Cloud IaaS offerings, a large scale, highly distributed cloud infrastructure? We are looking for an individual who will work on a team and will design, develop and implement automated platforms and services along with lead the AI mission for IBM Cloud Observability that will enable SRE’s and developers to manage their services, reduce costs for operation, identify anomalies and reduce MTTR. The job provides the opportunity to work and collaborate with experienced and knowledgeable technical leaders and grow your career and build expertise as a platform engineer for aaS model.
  • Experienced in conceptualization, analysis, architecture, solution design and development of software products and services. This involves engaging in and improving the lifecycle of a observability service built for monitoring, eventing, logging, dashboarding etc. from inception and design through deployment, operation and refinement.
  • Experienced in using ML or Generative AI to build analytics and AI usecases, demonstrate value from inception to implementation including collaboration with the appropriate stakeholders
  • Experience in building, training & deploying ML models and automation of these activities
  • Adopt and build on automation solutions governed by SRE principles including CI CD pipelines, configuration management, immutable infrastructure deployment, auto healing systems etc.
  • Ensure compliance and security integrity of the environment and build secure practices. Have a deep understanding of how security impacts each stage of the development pipeline and the final product or service. Identify gaps and embed secure practices into our processes.
  • Support services before they go live through activities such as system design consulting, developing, testing and identifying software platforms and frameworks, capacity planning and launch reviews.
  • Work with and adopt open source technologies as well as participate in new IBM innovations across IaaS
  • A self-driven attitude to propose, test and implement solutions and improvements for review and consideration with your peers
  • Practice sustainable incident response and blameless post mortem.
  • In addition, you will mentor, share expertise and help build a self sustaining tea
  • You will also work with wider teams to enable consumption of service, identify gaps and provide thought leadership.


Required Technical and Professional Expertise

  • 5+ years of experiences in building, deploying and managing large scale services / platforms for Cloud Platforms like AWS, Azure, IBM Cloud or Google Cloud.
  • Delivering micro services at scale; designing micro services solutions
  • Good understanding of real time streaming analytics and batch data processing architectures.
  • Familiarity with distributed architecture, event-driven architecture.
  • Python, Go Lang experience (Can also have Java, Scala, Rust)
  • Container orchestration, performance and security (Kubernetes / Docker)
  • DevSecOps practices
  • Experience with one or more of the following: NoSQL Databases (eg: Cassandra, Mongo), Columnar Databases (eg RedShift, Snowflake), Search Engines (eg Elastic Search), Spark, Hadoop.
  • Sound understanding of data science concepts, model development & performance tuning processes as well as coding, version control and CI/CD best practices
  • 2+ years UI Development experience using React (Can also have angular)
  • Scripting languages like Python, JavaScript, shell
  • Release Engineering (Git Branching, versioning, tagging) and experience with Agile software development
  • CI/CD tooling (Preferred ArgoCD, Tekton)
  • 1+ years experience with one or more of these Monitoring tools: Sysdig, Zabbix, Grafana, Prometheus, ELK, etc..
  • Experience with one or more automation and configuration management tools/solutions: Ansible, Terraform, Salt, Chef, python, bash, puppet, Rundeck, etc.
  • Working knowledge of Compute, Network, Storage and Virtualization technologies
  • Excellent verbal and written communication skills
  • Highly responsible, motivated, follow / establish development practices and able to work with little direction

Preferred Technical and Professional Expertise

  • Experience with design and development of complex system
  • Ability to troubleshoot complex problems and customer issues
  • 5+ years experience in AI/ML model development and/or deployment of AI/ML workloads
  • Experience designing & developing automated pipelines for data extraction, ML model training, ML model production deployments and production monitoring
  • Working knowledge of Linux clustering, HA, and Fault Tolerant system
  • implementations: active/active, active/passive, pacemaker, keepalived, haproxy, LVM
  • Familiarity with implementation of one or more telemetry systems like Grafana-Loki, Mimir, OTel and their architectures
  • 2+ years of experience with complex systems and layered architecture models: OSI, Kubernetes, virtualization, TCP/IP, etc.
  • Data modeling and experience with data engineering tooling and platforms such as Kafka, Spark.
  • Working knowledge of what TCP/IP, BGP, Sockets, routing protocols and how they participate in debugging and Highly available systems at scale.
  • Ability to debug an issue across the entire OSI stack of a typical Linux environment across storage, network, compute, OS, system tuning, orchestration.
  • Ability to debug stack traces to particular libraries in code and root cause identification.
  • Extensive experience with databases and debugging their usage with application stacks
  • Experience with and understanding of the interaction and dependencies of a typical three tier model of application stacks, as well as cloud