Quest | CBMM Seminar Series: Incomplete Objectives and AI Safety: The Theory and Practice of AI Alignment

December 4, 2023 - 4:00 pm to 5:30 pm

Speaker/s:

Dylan Hadfield-Menell (CSAIL)

Organizer:

Abstract: For AI systems to be safe and effective, they need to be aligned with the goals and values of users, designers, and society. In this talk, I will discuss the challenges of AI alignment and go over research directions to develop safe AI systems. I'll begin with theoretical results that motivate the alignment problem broadly. In particular, I will show how optimizing incomplete goal specifications reliably causes systems to select unhelpful or harmful actions. Next, I will discuss mitigation measures that counteract this failure mode. I will focus on approaches for incorporating human feedback into objectives, interpreting and understanding learned policies, and maintaining uncertainty about intended goals.

Details

MIT Building 46

Date:

December 4, 2023

Time:

4:00 pm to 5:30 pm

Venue:

Singleton Auditorium (46-3002)

The Center for Brains, Minds & Machines

News + Events

Quest | CBMM Seminar Series: Incomplete Objectives and AI Safety: The Theory and Practice of AI Alignment

Details

Search form

You are here

News + Events

Details