LLM · Online Continual Learning · Market Design · Optimization

Karthik Abinav Sankararaman

Researcher working on LLMs and large-scale online learning. Currently at Meta Superintelligence Labs.

Portrait of Karthik Abinav Sankararaman
Meta | Superintelligence Labs PhD & MS, University of Maryland B.Tech (Hons.), IIT Madras

About

Researcher building personal superintelligence—LLMs that reason, collaborate, and act in human-centered environments. Led Meta AI post-training for the 2023 debut of Meta AI and launches of Llama 2/3/4 across the Family of Apps, improving model quality and scaling MetaAI to over 1 billion MAU.

Key areas of focus

  • Frontier post-training: RL, Model behavior alignment, EQ and adaptive reasoning
  • Agents that plan, use tools, and interact safely with people
  • Data flywheels and online RL for continuously improving models

Earlier work: novel RL algorithms with formal guarantees, shipped across Ads, Recommendations, and Safety & Integrity to power large-scale decision systems at Meta.

Prior to Meta, Karthik completed his PhD and MS at the University of Maryland working with collaborators, focusing on mathematical foundations of algorithms & machine learning, and held intern positions at Adobe, IBM Almaden, IISc and Microsoft Research NYC. He earned a B.Tech (Hons) in Computer Science with a minor in Operations Research from IIT Madras.

Research Interests

  • Large language models: RL, agents, adaptive reasoning, Human interaction
  • Online learning, bandits and RL
  • Online algorithms and market design
  • Applied ML for advertising, recommender systems, and safety

Spotlight

Selected highlights

Parallel Scaling

Interdependent generations for LLMs

Bridge treats batched hidden states holistically to share signal across parallel generations, boosting accuracy with minimal params.

Read the paper →

RLHF

The perfect blend: Mixture of Judges

CGPO balances multi-objective RLHF with mixture-of-judges to curb reward hacking and align models across tasks.

Read the paper →

Inference Optimization

Adaptive reasoning under budgets

ICML 2025: IBPO learns to allocate reasoning depth based on problem difficulty, improving MATH500 with 2–4x inference budgets.

Read paper →

Adversarial Bandits

Bandits with Knapsacks

Beyond worst-case analysis for adversarial bandits with knapsacks; tighter instance-dependent insights.

Read paper →

Teaching

Online learning mini-series

Video series introducing adversarial and stochastic online learning foundations for upper-level undergrads.

Watch playlist →

Post-training

LLM Post-training 101

Practical overview of post-training fundamentals, techniques, and workflows to align and improve LLMs.

Read post →

Publications

Publications

Work spans RLHF, Inference-time Optimization, Parallel Scaling, Hallucination detection for LLMs, Bandits, Online Matching and Advertising systems. The complete conference, journal, and workshop list lives below.

Latest

  • ACL 2025 – Reference-free hallucination detection with RATE-FT
  • ICML 2025 – Inference Budget-Constrained Policy Optimization (IBPO)
  • Bridge – Generalized Parallel Scaling with Interdependent Generations
  • CGPO – The perfect blend: RLHF with Mixture of Judges

Full list of research and publications

View detailed publications

Loading publications…

Community

Teaching

Teaching and mentoring across online learning, algorithms, and programming. Instructor roles at UMD and IITM.

Online learning playlist →

Teaching details

View courses & roles

Teaching Experience

I have served as an instructor for the following class, teaching along with Bill Gasarch
  • Honors class on Discrete Structures (Spring 2019)

  • I have served as a Teaching Assistant for the following classes during my time at UMD
  • Graduate Algorithms (Spring 2018)
    Instructors: Aravind Srinivasan

  • Design and Analysis of Computer Algorithms (Fall 2016, Spring 2017, Fall 2017, Fall 2018)
    Instructors: Samir Khuller, Jessica Chang, Aravind Srinivasan

  • Discrete Structures (Fall 2014 and Spring 2015)
    Instructors: Clyde Kruskal, Fawzi Emad

  • Introduction to Programming (Fall 2015 and Spring 2016)
    Instructors: Nelson Padua-Perez, Fawzi Emad

  • As an undergrad I have served as a Teaching Assistant at IITM.
  • Paradigms of Programming(IITM)
    Instructors: Narayanaswamy NS

Service

Community Service

Program leadership and reviewing across top AI/ML conferences and journals, plus mentorship for new researchers.

Connect

Contact

Social

My brother works in AI and my wife works on de-carbonization, sustainability and energy.