Alignment Research Team

Our alignment team ensures that AI systems remain helpful, honest, and aligned with human values as they become more capable. We develop methods to steer AI towards beneficial outcomes while maintaining human oversight and control in healthcare applications.

Mission

As AI systems grow more autonomous and capable, ensuring their alignment with human values becomes increasingly critical—especially in healthcare. Our mission is to develop practical techniques for value alignment that allow AI systems to remain under human control while maximizing patient benefit and safety.

Research Focus Areas

Preference Learning from Clinical Data

Learning human preferences and values from clinical outcomes, expert decisions, and patient feedback. We develop methods to encode clinician and patient values into AI systems.

Reward Modeling for Healthcare

Creating reward functions that capture complex clinical objectives—not just accuracy on a single metric, but holistic patient wellbeing, quality of life, and equitable outcomes.

Value-Aligned Reinforcement Learning

Applying reinforcement learning techniques in ways that ensure the learned policies remain aligned with human values and remain interpretable to clinicians.

Robustness and Adversarial Evaluation

Testing whether AI systems maintain their intended behavior under adversarial conditions, distribution shifts, and edge cases that might occur in real clinical settings.

Current Projects

Patient Preference Learning

Developing methods to incorporate patient preferences and values into AI treatment recommendations, ensuring personalized care alignment.

Clinician-in-the-Loop Systems

Designing AI systems that maintain meaningful human oversight, where clinicians can easily correct and guide AI behavior.

Fairness and Equity Audits

Evaluating whether AI systems make recommendations fairly across different patient populations and detecting potential biases.

Alignment Testing Framework

Creating comprehensive evaluation frameworks to test whether healthcare AI systems behave as intended in diverse scenarios.

Principles

Patient autonomy and informed consent must be preserved
Alignment is not one-time—it requires continuous evaluation
Diverse stakeholder perspectives matter: patients, clinicians, administrators
Robust systems must handle adversarial and edge-case scenarios
Transparency about AI limitations is as important as capability

Key Partnerships

We collaborate with clinical partners, ethics experts, and patient advocacy groups to ensure our alignment research reflects real-world healthcare values and needs. Our work is informed by direct engagement with clinicians and patients who depend on these systems.

Join Our Team

We're seeking researchers in AI safety, ethicists, healthcare professionals, and engineers passionate about building trustworthy AI systems. Help us ensure AI systems in healthcare remain aligned with human values.