As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.
Your Role and ResponsibilitiesYour Role and Responsibilities :
- Machine Learning Model Development: Design, develop, and implement machine learning models using Python libraries such as TensorFlow, PyTorch, or scikit-learn.
- Large Language Model Integration: Collaborate with architects to leverage Large Language Models (LLMs) like Watsonx AI, GPT-4, or Bedrock (if applicable) for project success.
- GenAI Project Experience: Bring at least 2 years of experience in GenAI projects, either for clients or IBM assets/products.
- GenAI Task Expertise: Demonstrate expertise in at least one GenAI task, such as:
- RAG content generation
- Code generation
- Code conversion
- AI Agent development
- Data mapping
- Graph RAG
- Solution Design and Requirements: Work closely with business SMEs to define project requirements, select appropriate AI/ML/GenAI models with architects, and set realistic model benchmarks to meet business needs.
- AI Solution Design with Ethics: Collaborate with architects to design AI solutions that incorporate knowledge of AI ethics.
Required Technical and Professional Expertise
- Machine Learning and GenAI: Strong understanding of machine learning and GenAI concepts. Awareness of ethical implications in GenAI, including bias, fairness, and privacy.
- Python and Data Science Libraries: Proficiency in Python and relevant data science libraries (e.g., NumPy, Pandas, scikit-learn). Practice on Graph database and Vector DB.
- Large Language Models: Familiarity with LLMs, including their architectures, capabilities, and limitations. Skill in crafting effective prompts to guide LLMs and obtain desired outputs. Ability to handle and prepare data for analysis, including data cleaning, normalization, and feature engineering
- GenAI Frameworks: Deep understanding of GenAI frameworks like TensorFlow, PyTorch, Hugging Face Transformers, Langchain, and Llama index.
- Latest GenAI Practices: Knowledge of the latest GenAI practices on at least two prominent LLMs (e.g., GPT-4, Llama3).
- Fluent in both English and Romanian
Preferred Technical and Professional Expertise
- Knowledge of Model Ops practices on cloud and containerization
- Familiarity with design lead development methodologies in complex data platforms and analytics
- Ability to work with data from the data platform and communicate insights effectively