Overview
Microsoft Teams is the hub for teamwork that integrates all the people, content, and tools your team needs to be more engaged and effective. It is core to Microsoft’s modern work, modern life & modern education value prop. We are reinventing the way people communicate and work together across the globe.
We are looking to hire a PhD (or published MSc) candidate for a
12-week internship to join
CMD Labs – an applied science team within Microsoft Teams – to work on improving transcription accuracy by applying existing or novel research and leveraging training, fine tuning, and prompt engineering of speech transformer models, as well as LLMs and audio-enabled foundations models as post-processing and re-scoring modules.
Our flagship AI applications for Teams Meetings such Meeting Agent, Meeting Copilot and Intelligent Recap are all fully dependent on an accurate meeting transcription as the primary grounding data. When used as grounding data for AI, transcription quality can significantly affect AI reliability. For instance, the importance of named entities - names of people, projects, products, companies and places - is often the most important, and yet the most challenging for the transcription engine since the names might not be a part of the model's training data. An important part of the challenge is to unravel the aspects of accuracy that affect AI reliability the most, and thu setting relevant metrics and objectives beyond WER.
The intern will be onboarded to our evaluation pipeline code processing real internally donated meetings and work on improving existing algorithms as well as proposing novel solutions to the problem based on recent academic literature. The work done in the internship will contribute towards the algorithm that the engineering team will implement in production. Given substantial scientific novelty of the approach and results, collaboration on a mutual publication is encouraged.
Responsibilities
- Conduct experiments, create and validate metrics, and develop candidate algorithms to improve the accuracy of transcription and reduce chances of error in downstream LLM-based applications.
- Collaborate closely with CMD Labs researchers and engineers to leverage existing assets, datasets, and ensure results can contribute to the product.
- Embody our culture and values
Qualifications
Required
- Currently enrolled in a PhD program (or published candidate in MSc program) in Computer Science, Electrical or Computer Engineering, Statistics, or a related field.
- Practical experience in training, fine-tuning, and prompt engineering of transformer models or LLMs.
- Practical Python coding experience leveraging PyTorch or TensorFlow or similar framework
Preferred
- Field of research and publications directly related to transcription or the Audio LLMs.
- Please note that this is a 12 weeks intenship with start date between April to June (Flexible during this range)
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about
requesting accommodations.