Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Microsoft Senior Data Scientist 
India, Telangana, Hyderabad 
366598395

04.02.2025

Web Data Platform team at Microsoft is looking for a highly skilled and motivated **Senior Data Scientist** to join our team focused on improving web data quality by identifying and mitigating junk URLs at scale. In this role, you will work on some of the largest and most complex datasets in the world, leveraging state-of-the-art machine learning techniques and cutting-edge technologies to enhance the quality of data ingested for search and other web-based services. Your work will directly impact millions of users by improving search relevance, content quality, and user experience.

**Basic Qualifications**:

  • Bachelor’s or Master’s degree in Computer Science, Data Science, Statistics, Mathematics, or a related field.
  • 7+ years of experience in data science, machine learning, or a related field.

**Preferred Qualifications**:

  • Expertise in **Python**, **C#**, or another programming language for building scalable solutions.
  • Strong hands-on experience with **big data technologies** such as Apache Spark, Databricks, or Azure Data Lake.
  • Proficiency in **machine learning frameworks** like PyTorch, TensorFlow, or scikit-learn.
  • In-depth knowledge of algorithms for classification, clustering, and anomaly detection.
  • Experience in working with web data, including techniques for crawling, parsing, and feature engineering on unstructured data.
  • Familiarity with techniques for handling noisy or imbalanced datasets.
  • Knowledge of **search engines**, **URL patterns**, or web-related NLP is a strong plus.

**Soft Skills**:

  • Strong problem-solving and analytical skills.
  • Excellent communication and stakeholder management abilities.
  • A growth mindset with a passion for learning and innovation.

- Design and implement robust feature extraction pipelines to characterize web content quality.


- Build and deploy scalable machine learning models to classify URLs as junk, spam, or relevant.
- Develop ensemble methods combining rules-based systems with supervised and unsupervised models.


- Leverage distributed computing frameworks like **Apache Spark** or **Azure Synapse** to process and analyze large-scale datasets efficiently.
- Optimize data pipelines for performance, scalability, and maintainability.

- Work closely with product managers and stakeholders to define success metrics and iterate on solutions.


- Stay updated with the latest developments in machine learning, NLP, and web data analytics.