Anirudh Khatry

Anirudh Khatry

Computer Science PhD Student

University of Texas at Austin

Biography

Hi, I am a first year PhD student in the Department of Computer Science at the University of Texas at Austin, advised by Prof. Işıl Dillig and Prof. Greg Durrett. I work on secure code transpilation using neurosymbolic techniques.

I have, previously, been a Pre-doctoral Research Fellow at Microsoft, working with the PROSE (Program Synthesis) team where I worked on the Copilot experience for data wrangling in Fabric. I received my bachelor’s degree in Information Technology from Veermata Jijabai Technological Institute (V.J.T.I.), Mumbai, India.

I have had the good fortune to work with Dr. Ashish Tiwari, Dr. Sumit Gulwani, Dr. Vu Le, Dr. Gust Verbruggen, Dr. Sandeep Udmale, and Dr. Vijay Sambhe.

Outside of work, you can find me running, playing the guitar, and listening to songs.

Download my CV .

Interests
  • AI4Code
  • Artificial Intelligence
  • Information Retrieval
  • Program Synthesis
Education
  • Ph.D. in Computer Science, started 2024.

    University of Texas at Austin.

  • B.Tech. in Information Technology, 2017-2021

    Veermata Jijabai Technological Institute, India.

Recent News

All news»

[10/10/2024] I am on the PC for ICLR, 2025.

[01/10/2024] I am on the PC for the Table Representation Learning Workshop at NeurIPS ‘24.

[22/05/2024] I am serving on the PC for ASE-Industry Track 2024.

[05/05/2024] I will be on the PC for OOPSLA ‘24.

[15/04/2024] :trophy: Our paper “Semantically Aligned Question and Code Generation for Automated Insight Generation” is selected as the Best Paper at LLM4Code at ICSE ‘24.

Recent Publications

Quickly discover relevant content by filtering publications.
(2024). An Empirical Study of Validating Synthetic Data for Formula Generation.

PDF

(2024). Semantically Aligned Question and Code Generation for Automated Insight Generation (Best Paper). In LLM4Code, ICSE ‘24.

PDF

(2023). Alternate Task Technique for Natural Language to Code in Low-Resource Languages.

(2023). COOPER: Learning what to teach models for code generation.

(2023). TSTR: Target Similarity Tuning Meets the Real World. In EMNLP-Findings'23.

PDF Video

(2023). Augmented Embeddings for Custom Retrievals.

PDF

(2023). From Words to Code: Harnessing Data for Program Synthesis from Natural Language. In MLAIDS'23.

PDF

(2022). Landmarks and Regions: A Robust Approach to Data Extraction. In PLDI'22.

PDF

Experience

 
 
 
 
 
Microsoft
Pre-doctoral Research Fellow
Aug 2022 – Jun 2024 Redmond
  • Conceptualized and built the natural language to code feature for the Power Query M language, used for wrangling tables in Excel, Fabric and PowerBI.
  • Collaborated towards building the Copilot experience as a part of the Power Query experience in Fabric and Excel.
  • Devised two state-of-the-art strategies TSTR (EMNLP-Findings 2023) and COOPER (Under submission to ASE 2024) for optimal dynamic prompt construction aiding in-context learning for natural language to code tasks.
  • Developed Alternate Task Technique (ATT) (Under submission), a generalized framework to post-process LLM outputs using alternate tasks that improved performance on low resource languages, like Power Query M, by 13%.
  • Developed Adapted Dense Retrieval (ADDER) (Under submission) framework for Information Retrieval tasks using dense embedding for efficient code retrieval in low-resource settings.
 
 
 
 
 
Microsoft Research
Research Intern
Jul 2021 – Aug 2022 India
  • Collaborated with Microsoft Edge team for web-based data extraction tasks to improve product purchasing experience.
  • Successfully automated invoice data extraction tasks for the Microsoft IDC Finance team to improve productivity.
  • Employed techniques to combat low-resource name entity recognition tasks by employing ML and program synthesis techniques.
  • Devised Landmark-based Robust Synthesis (LRSyn), a state-of-the-art interpretable data extraction framework, robust to version changes in data.
  • Spearheaded the clustering and landmark detection tasks during the development of LRSyn, and developed a novel fingerprinting technique for images.
  • Successfully published our research paper titled “Landmarks and Regions: A Robust Approach to Data Extraction” at the Conference on Programming Languages Design and Implementation 2022, San Diego.
 
 
 
 
 
Human Rights First
Machine Learning Intern
May 2021 – Jul 2021 Remote
  • Collaborated with 30 changemakers to develop a war-crime detection tool using social media channels.
  • Fine-tuned a distil-RoBERTa model for binary classification of war crimes
  • Spearheaded the development of a novel two stage prediction pipeline for multi-label classification of warcrimes.
 
 
 
 
 
Samsung Research and Development Institute, India
Research Intern
May 2020 – Jul 2020 Remote
  • Worked with the On-Device AI team to improve system performance using Reinforcement Learning.
  • Built a State-Of-The-Art Multi-Agent Deep Q-network leveraging prioritized experience replay(PER) and time-bound dynamic reward functions
  • Designed a landmark agent simulation environment to show proof of concept.
 
 
 
 
 
Pexabyte Technology Solutions Pvt. Ltd.
Programming Analyst Intern
May 2019 – Jul 2019 Remote
  • Coordinated with the product development team to build an ERP application for manufacturing and service-based industries.
  • Employed JavaFX for the development of the application and MySQL for database management.
  • Followed an agile based product development life cycle with constant interaction with key product owners.

Accomplish­ments

Coursera
Neural Networks and Deep Learning
See certificate
Coursera
Blockchain Fundamentals
See certificate
Coursera
Fundamentals of Reinforcement Learning
See certificate
Introduction to Quantum Computing

Contact