Locates and extracts data from various information systems, such as Hadoop and BPC, for use in analytics solutions.
Provides documentation of Hadoop data taxonomy for key datasets including, but not limited to, schema and attribute descriptions, leveraging existing meta-data on the bigdata.
Performs data de-bugging, validation and testing and makes recommendations on data cleaning and/or transformations to enhance data fidelity.
Prepare clean, transformed data from multiple data sources ready for analysis
Identify opportunities for enriching and integrating data from multiple, diverse sources.
Identify and design new model features and implement them in our product.
Establish standardized data, implement code version control, schedule workflows, and develop business and schema tests.
Able to work both independently and in a team environment
Experience with SQL, Python, or other scripting languages.
Good document tracking strategies
Work with the development team to identify, document, and resolve software issues
Work with a cross-functional agile team that is responsible for multiple projects
Excellent document maintenance procedures
Documenting and communicating results
Knowledge of statistical concepts
Proficiency in conducting exploratory data analysis (EDA) and applying various analytical techniques to uncover insights from data
Effectively visualize and communicate findings using data visualization tools (e.g., Tableau, Power BI)