Expoint - all jobs in one place

מציאת משרת הייטק בחברות הטובות ביותר מעולם לא הייתה קלה יותר

Limitless High-tech career opportunities - Expoint

Amazon Language Data Scientist 
United States, California 
778483716

10.06.2024
DESCRIPTION

As a Language Data Scientist, you will start by diving deep into a couple of critical projects for Bedrock services to drive these projects forward. You will collaborate with fellow language data scientists, program managers, as well as stakeholders in science, engineering, and product teams to understand the role data plays in developing models that meet customer needs. You will analyze, follow, and improve processes for collecting and annotating LLM inputs and outputs, assessing data quality, and automating where appropriate.You will then expand your scope by using the principles of data-centric AI to understand the role our data plays with regard to model performance specifically, as well as the larger ML pipeline. You will apply state-of-the-art Generative AI techniques to analyze how well our data represents human language and run experiments to gauge downstream interactions. You will work collaboratively with other language data scientists and scientists to design and implement principled strategies for data optimization.
Key job responsibilities
- Source, validate, and deliver high-quality language artifacts and linguistic data- Oversee the progress and quality of several data collection and annotation projects at a time
- Advocate for strict adherence to data collection guidelines and quality thresholds
- Extend existing data collection, annotation, and quality assurance efforts to support feature and language expansion
- Innovate on data collection methodologies, guidelines, quality metrics to support new requests
- Automate repetitive workflows and improve existing processes


BASIC QUALIFICATIONS

- 1+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
- 2+ years of data/research scientist, statistician or quantitative analyst in an internet-based company with complex and big data sources experience
- MA in Computational Linguistics, Linguistics with a computational component, or an equivalent field
- Excellent knowledge of semantics, pragmatics, conversation analysis, and/or discourse analysis
- Experience designing and executing data collection projects, including guidelines, labelset and annotation workflow development
- Experience developing and evaluating data annotation and data quality metrics