Day to day work involves understanding near real-time (NRT) and batch data pipeline systems developed by engineering teams.
Carry out data profiling and understand schema, data interrelationships, and data flows using SparkSQL, Jupyter
Document test plans, writing test case automation and working closely with other teams (engineering, project management, etc.), bug reporting and isolation
This position demands a self-motivated individual with strong technical and communication skills who can contribute in a team environment.
Be dynamic and prepared to test/regress a high volume of changes on a day-to-day basis.
Candidate should possess the ability to implement automated tests for NRT and batch data pipelines using QA automation tools, Java, Python, Scala
Demonstrate excellent bug reporting skills and the ability to communicate clearly with third parties
Preferred Qualifications
Experience with Big Data technologies (e.g. HDFS, AWS, Spark, Kafka, Cassandra)
Good Knowledge in Python, Java and/or Scala. Interest and experience on coding is a must for this position.
Experience with Big Data query tools
Experience with near real-time (NRT) and Batch data pipelines Experience black box testing
Experience Client-Server products Knowledge in Data Quality, Data Profiling and Data Integration tools.