Workshop of Aligning Robot Representations with Humans

Schedule

Time (NZDT)
08:30 am - 08:45 am		Organizers Introductory Remarks
08:45 am - 09:15 am		Mark Ho Artificial intelligence, natural stupidity, and resource rational cognition Abstract There is a fundamental tension in AI and cognitive science between human intelligence (we want to build systems with human-like intelligence) and human stupidity (we know that humans are cognitively limited and can be irrational). As the psychologist Amos Tversky, whose work on people's cognitive biases won the Nobel Prize in Economics, put it: "My colleagues, they study artificial intelligence; me, I study natural stupidity." How can these two views on human cognition be reconciled and inform the design of AI systems? My talk will discuss recent advances in resource rationality, a general theoretical framework that seeks to explain humans' puzzling combination of intelligence and stupidity as a consequence of our condition as boundedly rational decision makers. I will focus on my own work on resource rational representations, the challenges and promise of this approach, and how this perspective can help guide the development of AI systems that effectively and safely help us overcome our cognitive limitations.
09:15 am - 09:55 am		Jacob Andreas Toward natural language supervision Abstract In the age of deep networks, "learning" almost invariably means "learning from examples". Image classifiers are trained with large datasets of images, machine translation systems with corpora of translated sentences, and robot policies with rollouts or demonstrations. When human learners acquire new concepts and skills, we often do so with richer supervision, especially in the form of language---we learn new concepts from exemplars accompanied by descriptions or definitions, and new skills from demonstrations accompanied by instructions. In natural language processing, recent years have seen a number of successful approaches to learning from task definitions and other forms of auxiliary language-based supervision. But these successes have been largely confined to tasks that also involve language as an input and an output---what will it take to make language-based training useful for the rest of the machine learning ecosystem? In this talk, I'll present two recent applications of natural language supervision to tasks outside the traditional domain of NLP: using language to guide visuomotor policy learning and inductive program synthesis. In these applications, natural language annotations reveal latent compositional structure in the space of programs and plans, helping models discover reusable abstractions for perception and interaction. This kind of compositional structure is present in many tasks beyond policy learning and program synthesis, and I'll conclude with a brief discussion of how these techniques can be applied even more generally.
09:45 am - 10:00 am		Coffee Break
10:00 am - 10:30 am		Lerrel Pinto Teaching Robots to Manipulate in an Hour Abstract I want to teach robots complex and dexterous behaviors in diverse real-world environments. But what is the fastest way to teach robots in the real world? — Among the prominent options in our robot learning toolbox, Sim2real requires careful modeling of the world, while real-world self-supervised learning or RL is far too slow. Currently, the only reasonably efficient approach that I know of is imitating humans. But making imitation learning feasible on real robots is not ‘easy’. They often require complicated demonstration collection setups, rely on having expert roboticists train them, and even then need a significant number of demonstrations to learn effectively. In this talk, I will present two ideas that can make robots learning far easier than it currently is. First, to collect demonstrations more easily we will use vision-based demonstration collection devices. This allows untrained humans to easily collect demonstrations from consumer-grade products. Second, to learn from these visual demonstrations, I will propose a new imitation learning algorithm that puts data efficiency on the forefront. Together this allows for significantly faster and easier imitation on a variety of real-world manipulation tasks.
10:30 am - 11:00 am		Matthew Gombolay Confronting the Correspondence Problem with Self-supervised and Interactive Machine Learning Abstract New advances in robotics offer a promise of revitalizing final assembly manufacturing, assisting in personalized at-home healthcare, and even scaling the power of earth-bound scientists for robotic space exploration. Yet, manually programming robots for each end user's ad hoc needs is intractable. Interactive Machine Learning techniques seek to enable end users to intuitively program robots such as through skill demonstration, natural language instruction, and feedback. Yet, humans and robots alike struggle in situated learning interactions because of the correspondence problem: humans and robots perceive, think, and physically act differently. In this talk, I will present our latest work in developing interactive machine learning methods that seek to (1) enable users to program robots intuitively, (2) enable robots to characterize misspecified input and feedback from human end-users, and (3) close the loop on situated learning interactions through explainable Artificial Intelligence (XAI) techniques. The outcome of our research is a set of design principles that go towards addressing fundamental issues of the correspondence problem for democratizing robotics.
11:00 am - 12:00 pm		Contributed Talks
12:00 pm - 12:30 pm		Daniel Brown Latent Spaces and Learned Representation for Better Human Preference Learning Abstract In this talk I will discuss some of our recent work that uses latent spaces and representation learning to enable better human-robot interaction. I will discuss the importance of having the “right” latent space to better teach robots to act in ways that are aligned with human preferences, approaches for learning latent space embeddings for efficient Bayesian reward learning and generalizable robot assistance, and the use of task-agnostic similarity queries as a step towards the goal of enabling efficient learning of multiple down-stream tasks using a single shared representation.
12:30 pm - 01:30 pm		Lunch Break
01:30 pm - 02:00 pm		Coffee Break
02:00 pm - 02:30 pm		Conference Opening Session
02:30 pm - 03:00 pm		Amy Zhang Attending to What Matters with Representation Learning Abstract In this talk, we focus on three different ways to extract additional signal from various, easily available data sources to improve human-robot alignment. We first present how state abstractions can accelerate reinforcement learning from rich observations, such as images, by disentangling task-relevant from irrelevant details using reward signal. However, while reward is the canonical way to specify task in reinforcement learning, it is often difficult to specify a well-shaped reward function in robotics. We then focus on goal-conditioned tasks and ways to extract and generalize functional equivariance. Finally, we explore how human demonstrations can be used to learn a representation that captures dense reward signal for robotics tasks.
03:00 pm - 03:30 pm		Dorsa Sadigh Aligning Humans and Robots : Active Elicitation of Informative and Compatible Queries Abstract Aligning robot objectives with human preferences is a key challenge in robot learning. In this talk, I will start with discussing how active learning of human preferences can effectively query humans with the most informative questions to learn their preference reward functions. I will discuss some of the limitations of prior work, and how approaches such as few-shot learning can be integrated with active preference based learning for the goal of reducing the number of queries to a human expert and allowing for truly bringing in humans in the loop of learning neural reward functions. I will then talk about how we could go beyond active learning from a single human, and tap into large language models (LLMs) as another source of information to capture human preferences that are hard to specify. I will discuss how LLMs can be queried within a reinforcement learning loop and help with reward design. Finally I will discuss how the robot can also provide useful information to the human and be more transparent about its learning process. We demonstrate how the robot’s transparent behavior would guide the human to provide compatible demonstrations that are more useful and informative for learning.
03:30 pm - 04:00 pm		George Konidaris Reintegrating AI: Skills, Symbols, and the Sensorimotor Dilemma Abstract I will address the question of how a robot should learn an abstract, task-specific representation of an environment, which I will argue is the key capability required to achieve generally-intelligent robots. I will present a constructivist approach, where the computation the representation is required to support - here, planning using a given set of motor skills - is precisely defined, and then its properties are used to build the representation so that it is capable of doing so by construction. The result is a formal link between the skills available to a robot and the symbols it should use to plan with them. I will present an example of a robot autonomously learning a (sound and complete) abstract representation directly from sensorimotor data, and then using it to plan. I will also discuss ongoing work on making the resulting abstractions portable across tasks.
04:00 pm - 05:00 pm		Panel Session
05:00 pm - 05:10 pm		Organizers Concluding Remarks
05:10 pm - 06:00 pm		In person: 4th floor of ENG 405; Virtual: On Gather.Town Poster Session

Papers

Congratulations to Abhijat Biswas (Mitigating causal confusion in driving agents via gaze supervision) and Ruohan Zhang (A Dual Representation Framework for Robot Learning with Human Guidance) for each winning a Best Paper Award!

Mitigating causal confusion in driving agents via gaze supervision [link] (spotlight)
Abhijat Biswas; Badal Arun Pardhi; Caleb Chuck; Jarrett Holtz; Scott Niekum; Henny Admoni; Alessandro Allievi
Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased [link]
Chao Yu; Jiaxuan Gao; Weilin Liu; Botian Xu; Hao Tang; Jiaqi Yang; Yu Wang; Yi Wu
Spatial Generalization of Visual Imitation Learning with Position-Invariant Regularization [link]
Zhao-Heng Yin; Yang Gao; Qifeng Chen
Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training [link]
Yecheng Ma; Shagun Sodhani; Dinesh Jayaraman; Osbert Bastani; Vikash Kumar; Amy Zhang
Do you see what I see? Using questions and answers to align representations of robotic actions [link]
Chad DeChant; Iretiayo Akinola; Daniel Bauer
A Sequential Group VAE for Robot Learning of Haptic Representations [link]
Ben Richardson; Katherine J. Kuchenbecker; Georg Martius
A Dual Representation Framework for Robot Learning with Human Guidance [link] (spotlight)
Ruohan Zhang; Dhruva Bansal; Yilun Hao; Ayano Hiranaka; Jialu Gao; Chen Wang; Roberto Martín-Martín; Li Fei-Fei; Jiajun Wu
Learning Abstract Representations of Agent-Environment Interactions [link]
Tanmay Shankar; Jean Oh
Learning Visualization Policies of Augmented Reality for Human-Robot Collaboration [link]
Kishan Chandan; Jack Albertson; Shiqi Zhang
A Graph Neural Network Approach for Choosing Robot Addressees in Group Human-Robot Interactions [link]
Sarah Gillet; Iolanda Leite; Marynel Vázquez
Graph Inverse Reinforcement Learning from Diverse Videos [link]
Sateesh Kumar; Jonathan Zamora; Nicklas A Hansen; Rishabh Jangir; Xiaolong Wang
Watch and Match: Supercharging Imitation with Regularized Optimal Transport [link]
Siddhant Haldar; Vaibhav Mathur; Denis Yarats; Lerrel Pinto

Reviewers

We thank the following people for their assistance in reviewing submitted papers.

Andrea Bajcsy
Arjun Sripathy
Daniel Brown
Eoin Kenny

Erdem Biyik
Felix Wang
Jerry He
Megha Srivastava

Micah Carroll
Minae Kwon
Nick Walker
Rohin Shah

Serena Booth
Xavier Puig
Xuning Yang
Yuchen Cui