Use available libraries for implementation of algorithms
Evaluate and refine algorithms using relevant data
Perform exploratory and targeted data analysis using descriptive statistics and other methods
Communicate the business value of technical solutions
Perform ad-hoc analysis and assist in developing reproducible analytical approaches to meet business requirements
Experiment with and evaluate the output of Large Language Models for various tasks
Utilization of the following technologies: Python, Jupyter, Spark, and AWS Sagemaker
Data Engineering / Data Ops tasks:
Cataloging research datasets from various groups – identifying use cases, tagging meta-data, etc.
Perform data normalization
Writing ETL scripts to convert to standardized form
Perform data exploration on various datasets
Build data pipelines and transform data sets to support the application of a variety of machine learning techniques
Participate in data storage architecture design discussions
Minimum Requirements:
Pursuing BS in Mathematics, Computer Science, Information Management or Statistics
Knowledge of machine learning algorithms
Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
Adept at documenting experiments, report writing and presenting findings
Knowledge of cloud-based data processing platforms such as AWS SageMaker, Google Vertex AI, Microsoft Azure ML, etc.
Knowledge of SQL, statistics, and experience using statistical packages for analyzing datasets (Excel, Statsmodels, R etc.)
Successful completion of a background screening process including, but not limited to, employment verifications, criminal search, OFAC, SS Verification, as well as credit and drug screening, where applicable and in accordance with federal and local regulations.