Karthik Abinav Sankararaman Homepage

Conference Publications

(C29) Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Joint Work with Chengwei Qin, Wenxuan Zhou, Nanshu Wang, Tengyu Xu, Alexander Radovic, Eryk Helenowski, Arya Talebzadeh, Aditya Tayade, Sinong Wang, Shafiq Joty, Han Fang, Hao Ma
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025 [Main Conference]

Paper

Abstract: Hallucination, the generation of factually incorrect information, remains a significant challenge for large language models (LLMs), especially in open-domain long-form generation. Existing approaches for detecting hallucination in long-form tasks either focus on limited domains or rely heavily on external fact-checking tools, which may not always be available. In this work, we systematically investigate reference-free hallucination detection in open-domain long-form responses. Our findings reveal that internal states (e.g., model's output probability and entropy) alone are insufficient for reliably (i.e., better than random guessing) distinguishing between factual and hallucinated content. To enhance detection, we explore various existing approaches, including prompting-based methods, probing, and fine-tuning, with fine-tuning proving the most effective. To further improve the accuracy, we introduce a new paradigm, named RATE-FT, that augments fine-tuning with an auxiliary task for the model to jointly learn with the main task of hallucination detection. With extensive experiments and analysis using a variety of model families & datasets, we demonstrate the effectiveness and generalizability of our method, e.g., +3% over general fine-tuning methods on LongFact.

Keywords: Large Language Model; Hallucination detection; Factuality

(C28) Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Joint Work with Zishun Yu, Tengyu Xu, Di Jin, Yun He, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma, Han Fang
The 42st International Conference on Machine Learning (ICML 2025)

Paper

Abstract: Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a 4.14\% and 5.74\% absolute improvement (8.08\% and 11.2\% relative improvement) on MATH500 using 2.16x and 4.32x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately 2x those of self-consistency under the same budgets.

Keywords: Large Language Model; Inference Time Scaling; Reasoning

(C27) Promoting External and Internal Equities Under Ex-Ante/Ex-Post Metrics in Online Resource Allocation
Joint Work with Aravind Srinivasan, Pan Xu
The 41st International Conference on Machine Learning (ICML 2024)
Spotlight (Top 3.5% of ICML 2024 accepted papers)

Paper

Abstract: This paper proposes two different models for equitable resource allocation in online settings. The first one is called external equity promotion, where sequentially arriving agents are heterogeneous in their external attributes, namely how many resources they demand, which is drawn from a probability distribution (accessible to the algorithm). The focus is then to devise an allocation policy such that every requester can get a fair share of resources proportional to their demands, regardless of their arrival time. Motivating examples include allocating food donations to different agencies. The second is called internal equity promotion, where arriving requesters can be treated homogeneously in external attributes (demands) but are heterogeneous in internal traits such as demographics. In particular, each requester can be identified as belonging to one or several groups, and an allocation of resources is regarded as equitable when every group of requesters can receive a fair share of resources proportional to the percentage of that group in the whole population. An inspiring instance is rationing (limited) COVID-19 vaccines during the early stage of the pandemic. For both models above, we consider as the benchmark a clairvoyant optimal solution that has the privilege to access all random demand realizations in advance. We aim to design efficiently-computable policies whose performance is as close as possible to that of a clairvoyant optimal, where the largest possible ratio of the former to the latter over all possible instances is measured as competitive ratio (CR). We consider two equity metrics, namely ex-post and ex-ante, and discuss the challenges under the two metrics in detail. Specifically, we present two linear program (LP)-based policies for external equity promotion under ex-ante with independent demands, each achieving an optimal CR of with respect to the benchmark LP. For internal equity promotion, we present optimal policies under both ex-ante and ex-post metrics.

Keywords: Machine Learning; Online Algorithms; Market Design

(C26) Effective Long-Context Scaling of Foundation Models
Joint Work with Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024

Paper

Abstract: We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Keywords: Machine Learning; LLM; Generative AI

(C25) Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?
Joint Work with Fan Yao, Chuanhao Li, Yiming Liao, Yan Zhu, Qifan Wang, Hongning Wang, Haifeng Xu
The 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

Paper

Abstract: The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affect their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the platform's reward mechanism in order to steer the creators' competition towards a desirable welfare outcome in the long run. This work makes two major contributions in this regard: first, we uncover a fundamental limit about a class of widely adopted mechanisms, coined Merit-based Monotone Mechanisms, by showing that they inevitably lead to a constant fraction loss of the welfare. To circumvent this limitation, we introduce Backward Rewarding Mechanisms (BRMs) and show that the competition games resulting from BRM possess a potential game structure, which naturally induces the strategic creators' behavior dynamics to optimize any given welfare metric. In addition, the class of BRM can be parameterized so that it allows the platform to directly optimize welfare within the feasible mechanism space even when the welfare metric is not explicitly defined.

Keywords: Machine Learning; Recommender Systems; Creator Incentives;

(C24) Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression
Joint Work with Aleksandrs Slivkins and Dylan Foster
The 36th Annual Conference on Learning Theory (COLT), 2023

Paper

Abstract: We consider a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We present a new algorithm that is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for CBwK when an algorithm must stop once some constraint is violated. Our algorithm builds on LagrangeBwK (Immorlica et al., 2019, 2022), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.

Keywords: Machine Learning; Contextual Bandit; Mixed Packing/Covering Constraints;

(C23) "Allocation Problem in Remote Teleoperation: Online Matching with Offline Reusable Resources and Delayed Assignments"
Joint Work with Osnat Ackerman Viden, Yohai Trabelsi, Pan Xu, Oleg Maksimov and Sarit Kraus
The 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023

Paper

Abstract: Many applications where tasks should be assigned to agents can be modeled as matching in bipartite graphs. In this paper, we consider applications where tasks arrive dynamically and rejection of a task may have significant adverse effects on the requester, therefore performing the task with some delay is preferred over complete rejection. The performance time of a task depends on the task, the agent, and the assignment, and only its distribution is known in advance. The actual time is known only after the task performance whentheagent is available for a new assignment. We consider such applications to be one of two arrival types. With the first type, the arrival distribution is known in advance, while there is no assumption about the arrival times and order with the second type. For the f irst type, we present an LP-based online algorithm with a competitive ratio of 0.5. For the second type, we show no online algorithm with a constant competitive ratio. We run extensive experiments to evaluate our algorithm in a real-world dataset, demonstrating the advantages of the LP approach.

Keywords: Machine Learning; Matching Theory; Autonomous Vehicles

(C22) "Robust Identifiability in Linear Structural Equation Models of Causal Inference"
Joint Work with Anand Louis and Navin Goyal
The 38th Conference on Uncertainty in Artificial Intelligence (UAI), 2022

Paper Poster

Abstract: In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called “bow-free paths.” In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely “bow free” models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.

Keywords: Machine Learning; Causal Inference; Theory of Computation; Robust Algorithms

(C21) "Bandits with Knapsacks beyond the Worst-Case Analysis"
Joint Work with Alex Slivkins
The 35th Conference on Neural Information Processing (NeurIPS), 2021

Paper Poster

Abstract: ``Bandits with Knapsacks" (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider ``simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general template for extensions from bandits to BwK which takes advantage of some known helpful structure, and apply this template to combinatorial semi-bandits and linear contextual bandits. Our results build on the BwK algorithm from Agrawal & Devanur EC '14, providing new analyses thereof.

Keywords: Machine Learning; Reinforcement Learning; Theory of Computation

(C20) Beyond log^2(T) Regret for Decentralized Bandits in Matching Markets
Joint Work with Abishek Sankararaman and Soumya Basu
The 38th International Conference on Machine Learning (ICML), 2021

Paper

Abstract: We design decentralized algorithms for regret minimization in two sided matching markets with one-sided bandit feedback that significantly improves upon the prior works. First, for general markets, for any epsilon > 0, we design an algorithm that achieves a O(log^{1+epsilon}(T)) regret to the agent-optimal stable matching, with unknown time horizon T, improving upon the O(log^{2}(T)) regret achieved in prior work. Second, we provide the optimal $\Theta(\log(T))$ regret for markets satisfying uniqueness consistency -- markets where leaving participants don't alter the original stable matching. Previously, $\Theta(\log(T))$ regret was achievable in the much restricted serial dictatorship setting, when all arms have the same preference over the agents. We propose a phase based algorithm, where in each phase, besides deleting the globally communicated dominated arms, the agents locally delete arms with which they collide often. This local deletion is pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms. We further demonstrate superiority of our algorithm over existing works through simulations.

Keywords: Machine Learning; Reinforcement Learning; Theory of Computation; Economics and Computation; Matching Markets

(C19) "Multi-armed Bandits with Cost Subsidy"
Joint Work with Deeksha Sinha, Abbas Kazerouni and Vashist Avadhanula
The 24th International Conference on Artificial Intelligence and Statistics (AIStats), 2021

Paper

Abstract: In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing cumulative costs and rewards. We present two applications, intelligent SMS routing problem and ad audience optimization problem faced by a number of businesses (especially online platforms) and show how our problem uniquely captures key features of these applications. We show that naive generalizations of existing MAB algorithms like Upper Confidence Bound and Thompson Sampling do not perform well for this problem. We then establish fundamental lower bound on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem (where T is the time horizon and K is the number of arms). We also present a simple variant of explore-then-commit and establish near-optimal regret bounds for this algorithm. Lastly, we perform extensive numerical simulations to understand the behavior of a suite of algorithms for various instances and recommend a practical guide to employ different algorithms.

Keywords: Multi-armend bandits; Machine Learning

(C18) "Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation"
Joint Work with Abishek Sankararaman and Soumya Basu
The 24th International Conference on Artificial Intelligence and Statistics (AIStats), 2021

Paper

Abstract: We study regret minimization problems in a two-sided matching market where uniformly valued demand side agents (a.k.a. agents) continuously compete for getting matched with supply side agents (a.k.a. arms) with unknown and heterogeneous valuations. Such markets abstract online matching platforms (for e.g. UpWork, TaskRabbit) and falls within the purview of matching bandit models introduced in Liu et al. [24]. The uniform valuation in the demand side admits a unique stable matching equilibrium in the system. We design the first decentralized algorithm - UCB with Decentralized Dominant-arm Deletion (UCB-D3), for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in [24]. UCB-D3 works in phases of exponentially increasing length. In each phase, an agent first deletes dominated arms – the arms preferred by agents ranked higher than itself. Deletion follows dynamic explore-exploit using UCB algorithm on the remaining arms. Finally, the preferred arm is broadcast in a decentralized fashion to other agents through pure exploitation. Comparing the obtained reward with respect to the unique stable matching, we show that UCB-D3 achieves O(log(T)) regret in T rounds. We provide a (orderwise) matching regret lower-bound

Keywords: Multi-armend bandits; Machine Learning; Matching Markets; Theory of Computation

(C17) "Stochastic bandits for multi-platform budget optimization in online advertising"
Joint Work with Vashist Avadhanula, Riccardo Colini-Baldeschi, Stefano Leonardi and Okke Schrijvers
The 30th Web Conference 2021 (formerly known as WWW)
Paper

(C16) "Analyzing the effect of neural network architecture on training performance"
Joint Work with Soham De, Zheng Xu, Ronny Huang, Tom Goldstein
The 37th International Conference on Machine Learning (ICML), 2020
NeurIPS Workshop on Integration of Deep Learning Theories, 2018

Paper Poster

Abstract: The goal of this paper is to study why stochastic gradient descent (SGD) is efficient for neural networks, and how neural net design affects SGD. In particular, we investigate how overparameterization -- an increase in the number of parameters beyond the number of training data -- affects the dynamics of SGD. We introduce a simple concept called gradient confusion. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, we show that SGD has better convergence properties than predicted by classical theory. Using theoretical and empirical results, we study how overparameterization affects gradient confusion, and thus the convergence of SGD, on linear models and neural networks. We show that increasing the number of parameters of linear models or increasing the width of neural networks leads to lower gradient confusion, and thus faster and easier model training. We further show how overparameterization by increasing the depth of neural networks results in higher gradient confusion, making deeper models harder to train. Finally, we observe empirically that batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of deep networks.

Keywords: Deep Learning; Machine Learning; Optimization; Theory of Computation

(C15) "Matching Algorithms for Blood Donation"
Joint Work with Duncan McElfresh (primary author), Sergey Pupyrev, Christian Kroer, John Dickerson, Eric Sodomka, Zack Chauvin and Neil Dexter
The 21st ACM Conference on Economics and Computation (EC) 2020

Paper

Abstract: Managing perishable inventory, such as blood stock awaiting use by patients in need, has been a topic of research for decades. This has been investigated across several disciplines: medical and social scientists have investigated \emph{who} donates blood, \emph{how frequently}, and \emph{why}; management science researchers have long studied the blood supply chain from a logistical perspective. Yet global demand for blood still far exceeds supply, and unmet need is greatest in low- and middle-income countries. Both academics and policy experts suggest that large-scale coordination is necessary to alleviate demand for donor blood. We conduct the first large-scale algorithmic matching of blood donors with donation opportunities, using matching and machine learning. Using the recently-deployed Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. In both simulations and real experiments we match potential donors with opportunities, guided by a machine learning model trained on prior observations of donor behavior. While measuring actual donation rates remains a challenge, we measure \emph{donor action} (i.e., calling a blood bank or making an appointment) as a proxy for actual donation. Simulations suggest that even a simple matching strategy can increase donor action rate by $10$-$15\%$; a pilot experiment with real donors finds a slightly smaller increase of roughly $5\%$. While overall action rates remain low, even this modest increase among donors in a global network corresponds to \emph{hundreds of thousands} of more potential donors taking action toward donation. Further, observing donor action on a social network can shed light onto donor behavior and response to incentives. Our initial findings align with several observations made in the medical and social science literature regarding donor behavior.

Keywords: Matching Algorithms; Machine Learning; Applications of Economics and Computation

(C14) "Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms during High-Demand Hours"
Joint Work with Vedant Nanda (primary author), Pan Xu, John Dickerson, Aravind Srinivasan
The 34th AAAI Conference on Artificial Intelligence (AAAI), 2020
Extended Abstract in The 3rd AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES), 2020 (Oral Presentation)

Paper

Abstract: Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e.g., from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, non-adaptive algorithm that allows the platform designer to control the profit and fairness of the system via parameters \alpha and \beta respectively. We model the matching problem as an online bipartite matching where the set of drivers is offline and requests arrive online. Upon the arrival of a request, we use the algorithm to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using our algorithm, the competitive ratios for profit and fairness measures would be no worse than \alpha/e and \beta/e respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that our algorithm, under some choice of (\alpha, \beta), can beat two natural heuristics, Greedy and Uniform, on both fairness and profit.

Keywords: Artificial Intelligence; Fairness; Rideshare; Randomized Algorithms

(C13) "Mix and Match: Markov Chains and Mixing Times in Matching for Rideshare"
Joint Work with Mike Curry, John Dickerson, Aravind Srinivasan, Yuhao Wan, Pan Xu
The 15th Conference on Web and Internet Economics (WINE), 2019

Paper

Abstract: Rideshare platforms such as Uber and Lyft dynamically dispatch drivers to match riders' requests. We model the dispatching process in rideshare as a Markov chain that takes into account the geographic mobility of both drivers and riders over time. Prior work explores dispatch policies in the limit of such Markov chains; we characterize when this limit assumption is valid, under a variety of natural dispatch policies. We give explicit bounds on convergence in general, and exact (including constants) convergence rates for special cases. Then, on simulated and real transit data, we show that our bounds characterize convergence rates---even when the necessary theoretical assumptions are relaxed. Additionally these policies compare well against a standard reinforcement learning algorithm which optimizes for profit without any convergence properties.

Keywords: Machine Learning; Randomized Algorithms; Rideshare; Markov Chains

                 
                 @InProceedings{CDSSWXWINE19,
                    author="Curry, Michael
                    and Dickerson, John P.
                    and Sankararaman, Karthik Abinav
                    and Srinivasan, Aravind
                    and Wan, Yuhao
                    and Xu, Pan",
                    title="Mix and Match: Markov Chains and Mixing Times for Matching in Rideshare",
                    booktitle="Web and Internet Economics",
                    year="2019",
                    pages="129--141",
                    isbn="978-3-030-35389-6"
                  }

(C12) "Adversarial Bandits with Knapsacks"
Joint Work with Nicole Immorlica, Robert Schapire, Aleksandrs Slivkins
The 60th IEEE Symposium on Foundations of Computer Science (FOCS), 2019
INFORMS Workshop on Market Design, 2019

Paper Poster

Abstract: We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the “classic” adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to algorithm’s reward. We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter.

Keywords: Online Learning; Machine Learning; Randomized Algorithms; Theory of Computation

                 
                @inproceedings{ISSS19,
                  title={Adversarial Bandits with Knapsacks},
                  author={Immorlica, Nicole and Sankararaman, Karthik Abinav and Schapire, Robert and Slivkins, Aleksandrs},
                  booktitle={IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)},
                  year={2019},
                  organization={IEEE}
                }

(C11) "Stability of Linear Structural Equation Model of Causal Inference"
Joint Work with Navin Goyal and Anand Louis
The 35th Conference on Uncertainty in Artificial Intelligence (UAI), 2019
NeurIPS Workshop on Causality, 2018

Paper Poster

Abstract: We consider the numerical stability of the parameter recovery problem in Linear Structural Equation Model (LSEM) of causal inference. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of LSEM allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of this paper is complementary to this line of work; we want to understand the stability of the recovery problem in the cases when efficient recovery is possible. Numerical stability of Pearl’s notion of causality was first studied in Schulman and Srivastava (2016) using the concept of condition number where they provide ill-conditioned examples. In this work, we provide a condition number analysis for the LSEM. First we prove that under a sufficient condition, for a certain sub-class of LSEM that are bow-free (Brito and Pearl (2002)), the parameter recovery is stable. We further prove that randomly chosen input parameters for this family satisfy the condition with a substantial probability. Hence for this family, on a large subset of parameter space, recovery is numerically stable. Next we construct an example of LSEM on four vertices with unbounded condition number. We then corroborate our theoretical findings via simulations as well as real-world experiments for a sociology application. Finally, we provide a general heuristic for estimating the condition number of any LSEM instance.

Keywords: Causal Inference; Machine Learning; Robust Algorithms

                 
                 @inproceedings{SLG19,
                  author = {Sankararaman, Karthik Abinav and Louis, Anand and Goyal, Navin},
                  title = {Stability Of Linear Structural Equation Models Of Causal Inference},
                  booktitle = {Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence},
                  series = {UAI '19},
                  year = {2019},
                  }

(C10) "Online Resource Allocation with Matching Constraints"
Joint Work with John Dickerson, Kanthi Sarpatwar, Aravind Srinivasan, Kun-Lung Wu, Pan Xu
The 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019

Paper Code+Dataset

Abstract: Matching markets with historical data are abundant in many applications, e.g., matching candidates to jobs in hiring, workers to tasks in crowdsourcing markets, and jobs to servers in cloud services. In all these applications, a match consumes one or more shared and limited resources and the goal is to best utilize these to maximize a global objective. Additionally, one often has historical data and hence some statistics (usually first-order moments) of the arriving agents (e.g., candidates, workers, and jobs) can be learnt. To model these scenarios, we propose a unifying framework, called Multi-Budgeted Online Assignment with Known Adversarial Distributions. In this model, we have a set of offline servers with different deadlines and a set of online job types. At each time, a job of type $j$ arrives. Assigning this job to a server $i$ yields a profit w_{i, j} while consuming a_e \in [0, 1]^K quantities of distinct resources. The goal is to design an (online) assignment policy that maximizes the total expected profit without violating the (hard) budget constraint. We propose and theoretically analyze two linear programming (LP) based algorithms which are almost optimal among all LP-based approaches. We also propose several heuristics adapted from our algorithms and compare them to other LP-agnostic algorithms using both synthetic as well as real-time cloud scheduling and public safety datasets. Experimental results show that our proposed algorithms are effective and significantly out-perform the baselines. Moreover, we show empirically the trade-off between fairness and efficiency of our algorithms which does well even on fairness metrics without explicitly optimizing for it.

Keywords: Randomized Algorithms; Online Algorithms; Artificial Intelligence; Applications

                 
                  @inproceedings{DSSSWX19,
                      author = {Dickerson, John P. and Sankararaman, Karthik Abinav and Sarpatwar, Kanthi Kiran and Srinivasan, Aravind and Wu, Kun-Lung and Xu, Pan},
                      title = {Online Resource Allocation with Matching Constraints},
                      booktitle = {Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems},
                      series = {AAMAS '19},
                      year = {2019},
                      isbn = {978-1-4503-6309-9},
                      pages = {1681--1689},
                      numpages = {9}
                  }

(C9) "A Unified Approach to Online Matching with Conflict-Aware Constraints"
Joint Work with Pan Xu, Yexuan Shi, Hao Cheng, John Dickerson, Aravind Srinivasan, Yongxin Tong, Leonidas Tsepenekas
The 33rd AAAI Conference on Artificial Intelligence (AAAI), 2019

Paper

Abstract: Online bipartite matching and allocation models are widely used to analyze and design markets such as Internet advertising, online labor, and crowdsourcing. Traditionally, vertices on one side of the market are fixed and known a priori, while vertices on the other side arrive online and are matched by a central agent to the offline side. The issue of possible conflicts among offline agents emerges in various real scenarios when we need to match each online agent with a set of offline agents. For example, in event-based social networks (e.g., Meetup), offline events conflict for some users since they will be unable to attend mutually-distant events at proximate times; in advertising markets, two competing firms may prefer not to be shown to one user simultaneously; and in online recommendation systems (e.g., Amazon Books), books of the same type conflict with each other in some sense due to the diversity requirement for each online buyer. The conflict nature inherent among certain offline agents raises significant challenges in both modeling and online algorithm design. In this paper, we propose a unifying model, generalizing the conflict models proposed in (She et al., TKDE 2016) and (Chen et al., TKDE 16). Our model can capture not only a broad class of conflict constraints on the offline side (which is even allowed to be sensitive to each online agent), but also allows a general arrival pattern for the online side (which is allowed to change over the online phase). We propose an efficient linear programming (LP) based online algorithm and prove theoretically that it has nearly-optimal online performance. Additionally, we propose two LP-based heuristics and test them against two natural baselines on both real and synthetic datasets. Our LP-based heuristics experimentally dominate the baseline algorithms, aligning with our theoretical predictions and supporting our unified approach.

Keywords: Randomized Algorithms; Online Algorithms; Artificial Intelligence; Applications

                 
                  @inproceedings{XSCDSSTT19,
                      author = {Xu, Pan and Shi, Yexuan and Cheng, Hao and Dickerson, John P. and Sankararaman, Karthik Abinav and Srinivasan, Aravind
                      and Tong, Yongxin and Tsepenekas, Leonidas},
                      title = {A Unified Approach to Online Matching and Conflict-Aware Constraints},
                      booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
                      series = {AAAI '19},
                      year = {2019}
                  }

(C8) "Balancing Relevance and Diversity in Online Matching via Submodularity"
Joint Work with John Dickerson, Aravind Srinivasan, Pan Xu
The 33rd AAAI Conference on Artificial Intelligence (AAAI), 2019

Paper Code+Dataset

Abstract: In bipartite matching problems, vertices on one side of a bipartite graph are paired with those on the other. In its online variant, one side of the graph is available offline, while vertices on the other arrive online and are irrevocably and immediately matched (or ignored) by an algorithm. Examples of such problems include matching workers to firms, advertisers to keywords, organs to patients, and so on. Much of the literature focuses on maximizing the total relevance modeled via total weight of the matching. However, in many real-world problems, it is also important to consider contributions of diversity: hiring a diverse pool of candidates, displaying a relevant but diverse set of ads, and so on. In this paper, we propose the Online Submodular Bipartite Matching (OSBM) problem, where the goal is to maximize a submodular function f over the set of matched edges. This objective is general enough to capture the notion of both diversity (e.g., a weighted coverage function) and relevance (e.g., the traditional linear function) as well as many other natural objective functions occurring in practice (e.g., limited total budget in advertising settings). We propose novel algorithms that have provable guarantees and are essentially optimal when restricted to various special cases. We also run experiments on real-world and synthetic datasets to validate our algorithms.

Keywords: Randomized Algorithms; Online Algorithms; Artificial Intelligence; Applications

                 
                 @inproceedings{DSSX19,
                    title={Balancing Relevance and Diversity in Online Bipartite Matching via Submodularity},
                    author={Dickerson, John P and Sankararaman, Karthik A and Srinivasan, Aravind and Xu, Pan},
                    booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
                    series = {AAAI '19},
                    year = {2019}
                  }

(C7) "Assigning Tasks to Workers based on Historical Data: Online Matching with Two-sided Arrivals"
Joint Work with John Dickerson, Aravind Srinivasan, Pan Xu
The 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2018

Paper Code+Dataset

Abstract: Efficient allocation of tasks to workers is a central problem in crowdsourcing. In a typical setting, different workers arrive at different times. Similarly, different tasks arrive at different times. The goal is to assign the available workers to tasks, such that we are able to complete as many tasks as possible. Usually, the tasks are heterogeneous and have different rewards. Hence, we would like to maximize the total reward (as opposed to just the number of completed tasks). Often, the arrival times of worker "types" and task "types" are not erratic and can be predicted from historical data. In this paper, we model the problem to account for this history and propose a new mathematical model, called Online Matching with Two-Sided Arrival (OM-TSA). We have a known bipartite graph G = (U,V,E) where U is the set of worker types and V is the set of task types. An edge e = (u, v) exists iff worker u is compatible with task v. The setting proceeds in rounds for T time-steps. At each time-step, a worker u from U and a task v from V are sampled independently from two respective (known) distributions. Further, we assume the sampling processes are independent and identical across the T rounds. We assume that (1) when a worker u arrives, they stay in the sys- tem until matched and (2) when a task v arrives, it needs to be assigned to an available worker u immediately (and obtaining a reward w(u, v)); if not, we do not get any reward for this task (and the action is irrevocable). Our goal is to maximize the expected reward. Our model is a significant generalization of the classical online matching model, where one of the sets U and V , say U, is assumed to be available offline. We relax this assumption to capture the transient nature of worker arrivals. For the general version of OM-TSA, we present an optimal non-adaptive algorithm which achieves an online competitive ratio of 0.295. For the special case of OM-TSA where the reward is a function of just the worker type, we present an improved algorithm (which is adaptive) and achieves a competitive ratio of at least 0.345. On the hardness side, along with showing that the ratio obtained by our non-adaptive algorithm is the best possible among all non-adaptive algorithms, we further show that no (adaptive) algorithm can achieve a ratio better than 0.581 (unconditionally), even for the special case of OM-TSA with homogenous tasks (i.e., all rewards are same). At the heart of our analysis lies a new technical tool (which is a refined notion of the birth-death process) called the two-stage birth-death process, which may be of independent interest. Finally, we perform numerical experiments on both simulated as well as two real-world datasets obtained from crowdsourcing platforms to complement our theoretical results.

Keywords: Randomized Algorithms; Online Algorithms; Artificial Intelligence; Applications

                 
                 @inproceedings{DSSX18b,
                  author = {Dickerson, John P. and Sankararaman, Karthik Abinav and Srinivasan, Aravind and Xu, Pan},
                  title = {Assigning Tasks to Workers based on Historical Data: Online Matching with Two-sided Arrivals},
                  booktitle = {Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems},
                  series = {AAMAS '18},
                  year = {2018},
                  pages={318--326},
                  }

(C6) "Combinatorial Semi-Bandits with Knapsacks"
Joint Work with Aleksandrs Slivkins
The 21st International Conference on Artificial Intelligence and Statistics (AIStats), 2018
(Invited for Oral Presentation)
(Top 5% of submissions)

Paper Simulations Poster

Abstract: This paper unifies two lines of work on multi-armed bandits, Bandits with Knapsacks (BwK) and semi-bandits. The former concerns scenarios with limited "resources" consumed by the algorithm, e.g., limited inventory in a dynamic pricing problem. The latter has a huge number of actions, but there is combinatorial structure and additional feedback which makes the problem tractable. Both lines of work has received considerable recent attention, and are supported by numerous application examples. We define a common generalization, and design a general algorithm for this model. Our regret rates are comparable with those for BwK and semi-bandits in general, and essentially optimal for important special cases.

Keywords: Randomized Algorithms; Online Learning; Theory of Computation; Machine Learning

                 
                  @inproceedings{SS18,
                    title={Combinatorial semi-bandits with knapsacks},
                    author={Sankararaman, Karthik Abinav and Slivkins, Aleksandrs},
                    booktitle={International Conference on Artificial Intelligence and Statistics},
                    series = {AIStats '18},
                    pages={1760--1770},
                    year={2018}
                  }

(C5) "Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources"
Joint Work with John Dickerson, Aravind Srinivasan, Pan Xu
The 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018
(Oral Presentation)

Paper Code

Abstract: Bipartite matching markets pair agents on one side of a market with agents, items, or contracts on the opposing side. Prior work addresses online bipartite matching markets, where agents arrive over time and are dynamically matched to a known set of disposable resources. In this paper, we propose a new model, Online Matching with (offline) Reusable Resources under Known Adversarial Distributions (OM-RR-KAD), in which resources on the offline side are reusable instead of disposable; that is, once matched, resources become available again at some point in the future. We show that our model is tractable by presenting an LP-based adaptive algorithm that achieves an online competitive ratio of 1/2 − ε for any given ε > 0. We also show that no non-adaptive algorithm can achieve a ratio of 1/2 + o(1) based on the same benchmark LP. Through a data-driven analysis on a massive openly-available dataset, we show our model is robust enough to capture the application of taxi dispatching service. We also present heuristics that perform well in practice.

Keywords: Randomized Algorithms; Online Algorithms; Artificial Intelligence; Applications

                 
                  @inproceedings{DSSX18a,
                      title={Allocation problems in ride-sharing platforms: Online matching with offline reusable resources},
                      author={Dickerson, John P and Sankararaman, Karthik A and Srinivasan, Aravind and Xu, Pan},
                      booktitle={Proceedings of the 32nd AAAI Conference on Artificial Intelligence},
                      year={2018}  
                  }

(C4) "Algorithms to Approximate Column-Sparse Packing Programs"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
The 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2018

Paper

Abstract: Column-sparse packing problems arise in several contexts in both deterministic and stochastic discrete optimization. We present two unifying ideas, (non-uniform) attenuation and multiple-chance algorithms, to obtain improved approximation algorithms for some well-known families of such problems. As three main examples, we attain the integrality gap, up to lower-order terms, for known LP relaxations for k-column sparse packing integer programs (Bansal et al., Theory of Computing, 2012) and stochastic k-set packing (Bansal et al., Algorithmica, 2012), and go “half the remaining distance” to optimal for a major integrality-gap conjecture of Furedi, Kahn and Seymour on hypergraph matching (Combinatorica, 1993).

Keywords: Randomized Algorithms; Theory of Computation

                 
                 @inproceedings{BSSX18,
                   title={Algorithms to approximate column-sparse packing problems},
                   author={Brubach, Brian and Sankararaman, Karthik A and Srinivasan, Aravind and Xu, Pan},
                   booktitle={Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms},
                   pages={311--330},
                   year={2018},
                   organization={Society for Industrial and Applied Mathematics}
                }

(C3) "Attenuate Locally, Win Globally: Attenuation-based Frameworks for Online Stochastic Matching with Timeouts"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
The 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017

Paper

Abstract: The Online Stochastic Matching with Timeouts problem introduced by Bansal, Gupta, Li, Mestre, Nagarajan, and Rudra (Algorithmica, 2012) models matching markets (e.g. E-Bay, Amazon). Buyers arrive from an independent and identically distributed (i.i.d.) known distribution on buyer profiles and can be shown a list of items one at a time. Each buyer has some probability of purchasing each item and a limit (timeout) on the number of items they can be shown. Bansal et al. (Algorithmica, 2012) gave a 0.12-competitive algorithm which was improved by Adamczyk, Grandoni, and Mukherjee (ESA, 2015) to 0.24. We present an online attenuation framework that uses an algorithm for offline stochastic matching as a black box. Our main contributions are as follows. On the upper bound side, we show that this framework combined with a black box adapted from Bansal et al. (Algorithmica, 2012) yields an online algorithm which nearly doubles the ratio to 0.46. On the lower bound side, we show that no algorithm can achieve a ratio better than 0.632 using the common LP for this problem. We then introduce a natural generalization: Online Stochastic Matching with Two-sided Timeouts in which both online and offline vertices have timeouts. Our framework provides the first algorithm for this problem achieving a ratio of 0.31. We accomplish this by proposing a new black box algorithm for offline stochastic matching on star graphs, which may be of independent interest. This new black box improves the approximation ratio for the offline stochastic matching problem on star graphs from 0.5 by Adamczyk et al. (ESA 2015) to 0.56.

Keywords: Randomized Algorithms; Online Algorithms; Theory of Computation

                 
                  @inproceedings{BSSX17,
                    title={Attenuate locally, win globally: An attenuation-based framework for online stochastic matching with timeouts},
                    author={Brubach, Brian and Sankararaman, Karthik Abinav and Srinivasan, Aravind and Xu, Pan},
                    booktitle={Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems},
                    pages={1223--1231},
                    year={2017},
                    organization={International Foundation for Autonomous Agents and Multiagent Systems}
                  }

(C2) "New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
The 24th Annual European Symposium on Algorithms (ESA), 2016

Paper

Abstract: We develop algorithms with improved competitive ratios for some basic variants of the known i.i.d. arrival model with integral arrival rates, including (a) the case of general weighted edges, where we improve the best-known ratio of 0.667 due to Haeupler, Mirrokni and Zadimoghaddam to 0.705 and (b) the vertex-weighted case, where we improve the 0.7250 ratio of Jaillet and Lu to 0.7299. We also consider two extensions, one is known I.I.D. with non-integral arrival rate and stochastic rewards. The other is known I.I.D. b-matching with non-integral arrival rate and stochastic rewards. We present a simple non-adaptive algorithm which works well simultaneously on the two extensions.

Keywords: Randomized Algorithms; Online Algorithms; Theory of Computation

                 
                  @inproceedings{BSSX16,
                    title={New algorithms, better bounds, and a novel model for online stochastic matching},
                    author={Brubach, Brian and Sankararaman, Karthik Abinav and Srinivasan, Aravind and Xu, Pan},
                    booktitle={24th Annual European Symposium on Algorithms (ESA 2016)},
                    year={2016},
                    organization={Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik}
                  }

(C1) "Ensuring Privacy in Location Based Services: An Approach Based on Opacity Enforcement"
Joint work with Yi-Chin Wu, St`ephane Lafortune.
The 14th International Workshop of Discrete Event Systems (WODES), 2014

Paper

Abstract: With the proliferation of mobile devices, Location-Based Services (LBS) that provide networked services based on users’ locations have become increasingly popular. Such services, providing personalized and timely information, have raised privacy concerns such as unwanted revelation of users’ current locations to potential stalkers. In this work, we show that this problem can be formally (for first time) addressed using the notion of opacity in discrete event systems. We use non-deterministic finite-state automata to capture the mobility patterns of users and label the transitions by the location information in the queries. Using opacity verification techniques, we show that the technique of sending cloaking queries to the server can still reveal the exact location of the user. To enforce location privacy, we apply the opacity enforcement technique by event insertion. Specifically, we synthesize suitable insertion functions that insert fake queries into the cloaking query sequences. The generated fake queries are always consistent with the mobility model of the user and provably ensure privacy of the user’s current location. Finally, to minimize the overhead from fake queries, we design an optimal insertion function that introduces minimum average number of fake queries.

Keywords: Discrete Event Systems; Formal Verification; Applications

                 
                  @article{WSL14,
                    title={Ensuring privacy in location-based services: An approach based on opacity enforcement},
                    author={Wu, Yi-Chin and Sankararaman, Karthik Abinav and Lafortune, St{\'e}phane},
                    journal={IFAC Proceedings Volumes},
                    volume={47},
                    number={2},
                    pages={33--38},
                    year={2014},
                    publisher={Elsevier}
                  }

Journal Publications

(J1) "Attenuate Locally, Win Globally: Attenuation-based Frameworks for Online Stochastic Matching with Timeouts"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
Algorithmica, 2019
Paper

(J2) "Algorithms to approximate column-sparse packing problems"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
ACM Transactions on Algorithms (TALG), 2019
Paper

(J3) "Online Stochastic Matching: New Algorithms and Bounds"
Joint Work with Brian Brubach, Aravind Srinivasan, Pan Xu
Algorithmica, 2020
Paper

(J4) "Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources"
Joint Work with John Dickerson, Aravind Srinivasan, Pan Xu
Transactions of Economics and Computation (TEAC), 2021
[Top-5 highest downloaded articles for TEAC papers published in 2021 and 2022.]
[Top-20 highest downloaded articles for TEAC among all papers published.]
Paper

(J5) "Online minimum matching with uniform metric and random arrivals"
Joint Work with Sharmila Duppala, Pan Xu
Operations Research Letters, 2022
Paper

(J6) "Adversarial Bandits with Knapsacks (Authoritative version)"
Joint Work with Nicole Immorlica, Robert Schapire, Aleksandrs Slivkins
Journal of the ACM, 2022
Paper

(J7) "Matching Algorithms for Blood Donation"
Joint Work with Duncan McElfresh, Sergey Pupyrev, Christian Kroer, John Dickerson, Eric Sodomka, Zack Chauvin and Neil Dexter
Nature Machine Intelligence, 2023

(J8) "Assigning Tasks to Workers based on Historical Data: Online Matching with Two-sided Arrivals"
Joint Work with John Dickerson, Aravind Srinivasan, Pan Xu, Yifan Xu
Transactions of Economics and Computation (TEAC), 2024

(J9) "On the Equivalence of Graph Convolution and Mixup"
Joint Work with Xiaotian Han, Hanqing Zeng, Yu Chen, Shaoliang Nie, Jingzhou Liu, Kanika Narang, Zahra Shakeri, Song Jiang, Madian Khabsa, Qifan Wang, Xia Hu
Transactions of Machine Learning Research, 2024
Learning on Graphs, 2024 (Oral Presentation)

(J10) "Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression"
Joint Work with Aleksandrs Slivkins, Dylan Foster and Xingyu Zhou
Journal of Machine Learning Research (JMLR), 2024

Manuscripts, Theses and Surveys

(W1) Improved Approximation Algorithms for Stochastic-Matching Problems
Joint Work with Marek Adamczyk, Brian Brubach, Fabrizio Grandoni, Aravind Srinivasan and Pan Xu

Paper

Abstract: We consider the Stochastic Matching problem, which is motivated by applications in kidney exchange and online dating. In this problem, we are given an undirected graph. Each edge is assigned a known, independent probability of existence and a positive weight (or profit). We must probe an edge to discover whether or not it exists. Each node is assigned a positive integer called a timeout (or a patience). On this random graph we are executing a process, which probes the edges one-by-one and gradually constructs a matching. The process is constrained in two ways. First, if a probed edge exists, it must be added irrevocably to the matching (the querycommit model). Second, the timeout of a node v upper-bounds the number of edges incident to v that can be probed. The goal is to maximize the expected weight of the constructed matching. For this problem, Bansal et al. provided a 0.33-approximation algorithm for bipartite graphs and a 0.25-approximation for general graphs. We improve the approximation factors to 0.39 and 0.269, respectively. The main technical ingredient in our result is a novel way of probing edges according to a notuniformly-random permutation. Patching this method with an algorithm that works best for large-probability edges (plus additional ideas) leads to our improved approximation factors.

Keywords: Theory of Computation; Economics and Computation; Matching Markets

(W2) BayesFormer: Transformer with Uncertainty Estimation
Joint Work with Sinong Wang, Han Fang

Paper

Abstract: Transformer has become ubiquitous due to its dominant performance in various NLP and image processing tasks. However, it lacks understanding of how to generate mathematically grounded uncertainty estimates for transformer architectures. Models equipped with such uncertainty estimates can typically improve predictive performance, make networks robust, avoid over-fitting and used as acquisition function in active learning. In this paper, we introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory. We proposed a new theoretical framework to extend the approximate variational inference-based dropout to Transformer-based architectures. Through extensive experiments, we validate the proposed architecture in four paradigms and show improvements across the board: language modeling and classification, long-sequence understanding, machine translation and acquisition function for active learning.

Keywords: Machine Learning; Large Language Models; Sequential Decision Making

(W3) Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler
Joint Work with Yifang Chen, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang

Paper

Abstract: Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming setting, where decisions must be taken \textit{online}. We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy. Prior active learning algorithms with access to weak labelers learn a difference classifier which predicts where the weak labels differ from strong labelers; this requires the strong assumption of realizability of the difference classifier (Zhang and Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus is applicable to many real-world scenarios such as random corrupted weak labels and high dimensional family of difference classifiers (\textit{e.g.,} deep neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with full exploitation of weak labelers, which allows to convert any active learning strategy to one that can leverage weak labelers. We provide an instantiation of this template that achieves the optimal query complexity for any given weak labeler, without knowing its accuracy a-priori. Empirically, we propose an instantiation of the WL-AC template that can be efficiently implemented for large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.

Keywords: Machine Learning; Active Learning; Zero/Few Shot Learning;

(W4) The Perfect Blend: Redefining RLHF with Mixture of Judges
Joint Work with Tengyu Xu, Eryk Helenowski, Di Jin, Kaiyan Peng, Eric Han, Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh, Madian Khabsa, Gabriel Cohen, Yuandong Tian, Hao Ma, Sinong Wang, Han Fang

Paper

Abstract: Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives. Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications.

Keywords: Large Language Models; RLHF

(W5) Preference Optimization with Multi-Sample Comparisons
Joint Work with Chaoqi Wang, Zhuokai Zhao, Chen Zhu, Michal Valko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Yuxin Chen, Hao Ma, Sinong Wang

Paper

Abstract: Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large Recent advancements in generative models, particularly large language models (LLMs) and diffusion models, have been driven by extensive pretraining on large datasets followed by post-training. However, current post-training methods such as reinforcement learning from human feedback (RLHF) and direct alignment from preference methods (DAP) primarily utilize single-sample comparisons. These approaches often fail to capture critical characteristics such as generative diversity and bias, which are more accurately assessed through multiple samples. To address these limitations, we introduce a novel approach that extends post-training to include multi-sample comparisons. To achieve this, we propose Multi-sample Direct Preference Optimization (mDPO) and Multi-sample Identity Preference Optimization (mIPO). These methods improve traditional DAP methods by focusing on group-wise characteristics. Empirically, we demonstrate that multi-sample comparison is more effective in optimizing collective characteristics~(e.g., diversity and bias) for generative models than single-sample comparison. Additionally, our findings suggest that multi-sample comparisons provide a more robust optimization framework, particularly for dataset with label noise.

Keywords: Large Language Models; RLHF

(W6) Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following
Joint Work with Yun He, Di Jin, Chaoqi Wang, Chloe Bi, Karishma Mandyam, Hejia Zhang, Chen Zhu, Ning Li, Tengyu Xu, Hongjiang Lv, Shruti Bhosale, Chenguang Zhu, Eryk Helenowski, Melanie Kambadur, Aditya Tayade, Hao Ma, Han Fang, Sinong Wang

Paper

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including instruction following, which is crucial for aligning model outputs with user expectations. However, evaluating LLMs' ability to follow instructions remains challenging due to the complexity and subjectivity of human language. Current benchmarks primarily focus on single-turn, monolingual instructions, which do not adequately reflect the complexities of real-world applications that require handling multi-turn and multilingual interactions. To address this gap, we introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions. Multi-IF, which utilizes a hybrid framework combining LLM and human annotators, expands upon the IFEval by incorporating multi-turn sequences and translating the English prompts into another 7 languages, resulting in a dataset of 4,501 multilingual conversations, where each has three turns. Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks. All the models tested showed a higher rate of failure in executing instructions correctly with each additional turn. For example, o1-preview drops from 0.877 at the first turn to 0.707 at the third turn in terms of average accuracy over all languages. Moreover, languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models' multilingual capabilities. We release Multi-IF prompts and the evaluation code base to encourage further research in this critical area.

Keywords: Large Language Models; Evals

(W7) Reinforcement Learning from User Feedback
Joint Work with Eric Han, Jun Chen, Xiaoliang Peng, Tengyu Xu, Eryk Helenowski, Kaiyan Peng, Mrinal Kumar, Sinong Wang, Han Fang, Arya Talebzadeh

Paper

Abstract: As large language models (LLMs) are increasingly deployed in diverse user facing applications, aligning them with real user preferences becomes essential. Existing methods like Reinforcement Learning from Human Feedback (RLHF) rely on expert annotators trained on manually defined guidelines, whose judgments may not reflect the priorities of everyday users. We introduce Reinforcement Learning from User Feedback (RLUF), a framework for aligning LLMs directly to implicit signals from users in production. RLUF addresses key challenges of user feedback: user feedback is often binary (e.g., emoji reactions), sparse, and occasionally adversarial. We train a reward model, P[Love], to predict the likelihood that an LLM response will receive a Love Reaction, a lightweight form of positive user feedback, and integrate P[Love] into a multi-objective policy optimization framework alongside helpfulness and safety objectives. In large-scale experiments, we show that P[Love] is predictive of increased positive feedback and serves as a reliable offline evaluator of future user behavior. Policy optimization using P[Love] significantly raises observed positive-feedback rates, including a 28% increase in Love Reactions during live A/B tests. However, optimizing for positive reactions introduces reward hacking challenges, requiring careful balancing of objectives. By directly leveraging implicit signals from users, RLUF offers a path to aligning LLMs with real-world user preferences at scale.

Keywords: Large Language Models; Reinforcement Learning; Human Feedback Optimization

(Thesis 2) "Sequential Decision Making with Limited Resources"
PhD Thesis 2019, University of Maryland, College Park

Thesis

Abstract: One of the goals of Artificial Intelligence (AI) is to enable multiple agents to interact, co-ordinate and compete with each other to realize various goals. Typically, this is achieved via a system which acts as a mediator to control the agents' behavior via incentives. Such systems are ubiquitous and include online systems for shopping (e.g., Amazon), ride-sharing (e.g., Uber, Lyft) and Internet labor markets (e.g., Mechanical Turk). The main algorithmic challenge in such systems is to ensure that they can operate under a variety of informational constraints such as uncertainty in the input, committing to actions based on partial information or being unaffected by noisy input. The mathematical framework used to study such systems are broadly called sequential decision makin} problems where the algorithm does not receive the entire input at once; it obtains parts of the input by interacting (also called "actions") with the environment. In this thesis, we answer the question, under what informational constraints can we design efficient algorithms for sequential decision making problems. The first part of the thesis deals with the Online Matching problem. Here, the algorithm deals with two prominent constraints: uncertainty in the input and choice of actions being restricted by a combinatorial constraint. We design several new algorithms for many variants of this problem and provide provable guarantees. We also show their efficacy on the ride-share application using a real-world dataset. In the second part of the thesis, we consider the Multi-armed bandit problem with additional informational constraints. In this setting, the algorithm does not receive the entire input and needs to make decisions based on partial observations. Additionally, the set of possible actions is controlled by global resource constraints that bind across time. We design new algorithms for multiple variants of this problem that are worst-case optimal. We provide a general reduction framework to the classic multi-armed bandits problem without any constraints. We complement some of the results with preliminary numerical experiments.

Keywords: Online Matching; Multi-armed Bandits; Machine Learning; Randomized Algorithms; Theory of Computing

(Thesis 1) "Maximum-Flow problem in undirected graphs"
Karthik Abinav S
B.Tech Thesis, 2014, Indian Institute Technology Madras
(Thesis available on request)

(Survey 1) "Semi-Bandit Feedback: A survey of results (2015)"
PDF

Language	Karthik Abinav	Sankararaman
Tamil	கார்த்திக் அபினவ்	சங்கரராமன்
Kannada	ಕಾರ್ತಿಕ್ ಅಬಿನವ್	ಸಂಕರರಾಮನ್
Hindi	कार्तिक अबिनव्	सन्कररामन