Karthik Abinav Sankararaman

About

Researcher building personal superintelligence—LLMs that reason, collaborate, and act in human-centered environments. Led Meta AI post-training for the 2023 debut of Meta AI and launches of Llama 2/3/4 across the Family of Apps, improving model quality and scaling MetaAI to over 1 billion MAU.

Key areas of focus

Frontier post-training: RL, Model behavior alignment, EQ and adaptive reasoning
Agents that plan, use tools, and interact safely with people
Data flywheels and online RL for continuously improving models

Earlier work: novel RL algorithms with formal guarantees, shipped across Ads, Recommendations, and Safety & Integrity to power large-scale decision systems at Meta.

Prior to Meta, Karthik completed his PhD and MS at the University of Maryland working with collaborators, focusing on mathematical foundations of algorithms & machine learning, and held intern positions at Adobe, IBM Almaden, IISc and Microsoft Research NYC. He earned a B.Tech (Hons) in Computer Science with a minor in Operations Research from IIT Madras.

Research Interests

Large language models: RL, agents, adaptive reasoning, Human interaction
Online learning, bandits and RL
Online algorithms and market design
Applied ML for advertising, recommender systems, and safety

Spotlight

Selected highlights

Parallel Scaling

Interdependent generations for LLMs

Bridge treats batched hidden states holistically to share signal across parallel generations, boosting accuracy with minimal params.

Read the paper →

RLHF

The perfect blend: Mixture of Judges

CGPO balances multi-objective RLHF with mixture-of-judges to curb reward hacking and align models across tasks.

Read the paper →

Inference Optimization

Adaptive reasoning under budgets

ICML 2025: IBPO learns to allocate reasoning depth based on problem difficulty, improving MATH500 with 2–4x inference budgets.

Read paper →

Adversarial Bandits

Bandits with Knapsacks

Beyond worst-case analysis for adversarial bandits with knapsacks; tighter instance-dependent insights.

Read paper →

Teaching

Online learning mini-series

Video series introducing adversarial and stochastic online learning foundations for upper-level undergrads.

Watch playlist →

Post-training

LLM Post-training 101

Practical overview of post-training fundamentals, techniques, and workflows to align and improve LLMs.

Read post →

Publications

Work spans RLHF, Inference-time Optimization, Parallel Scaling, Hallucination detection for LLMs, Bandits, Online Matching and Advertising systems. The complete conference, journal, and workshop list lives below.

View on Scholar

Latest

ACL 2025 – Reference-free hallucination detection with RATE-FT
ICML 2025 – Inference Budget-Constrained Policy Optimization (IBPO)
Bridge – Generalized Parallel Scaling with Interdependent Generations
CGPO – The perfect blend: RLHF with Mixture of Judges

Full list of research and publications

View detailed publications

Loading publications…

Community

Teaching

Teaching and mentoring across online learning, algorithms, and programming. Instructor roles at UMD and IITM.

Online learning playlist →

Teaching details

View courses & roles

Teaching Experience

I have served as an instructor for the following class, teaching along with Bill Gasarch

Honors class on Discrete Structures (Spring 2019)

Teaching Assistant

Graduate Algorithms (Spring 2018)
Instructors: Aravind Srinivasan

Design and Analysis of Computer Algorithms (Fall 2016, Spring 2017, Fall 2017, Fall 2018)
Instructors: Samir Khuller, Jessica Chang, Aravind Srinivasan

Discrete Structures (Fall 2014 and Spring 2015)
Instructors: Clyde Kruskal, Fawzi Emad

Introduction to Programming (Fall 2015 and Spring 2016)
Instructors: Nelson Padua-Perez, Fawzi Emad

Paradigms of Programming(IITM)
Instructors: Narayanaswamy NS

Service

Community Service

Program leadership and reviewing across top AI/ML conferences and journals, plus mentorship for new researchers.

View service details

Connect

Contact

Social

My brother works in AI and my wife works on de-carbonization, sustainability and energy.

Karthik Abinav Sankararaman

About

Research Interests

Selected highlights

Interdependent generations for LLMs

The perfect blend: Mixture of Judges

Adaptive reasoning under budgets

Bandits with Knapsacks

Online learning mini-series

LLM Post-training 101

Publications

Full list of research and publications

Teaching

Teaching details

Teaching Experience

Community Service

Contact

Social

Collaborators & Mentors

PhD Committee

Mentors

Student Collaborators

Community Service

Senior PC / Area Chair / Meta-reviewer

Conference Reviewer & PC

Journal Reviewer

Mentorship & Leadership

Conference Publications

Journal Publications

Manuscripts, Theses and Surveys