AI Safety, Ethics & Society course

Chapter 4: Safety Engineering

This chapter introduces the idea that AI safety should be seen as a specialized part of safety engineering, a concept borrowed from fields like aviation and medicine that focuses on designing systems to manage and reduce risks effectively. It also points out that AI brings unique challenges and risks that set it apart from these other areas.

Finally, the chapter emphasizes that even with strong safety engineering practices, managing AI risks is still inherently complex and difficult.

Understand the Risks

We can analyze and quantify risks by decomposing it:

  • probability: How likely is it that the event happens?
  • vulnerability: How exposed is the system if it does?
  • severity: How bad are the consequences if it happens?

For example:

How likely is it that a storm will hit my house? (probability) How strong or resilient is the roof of my house against storm damage? (vulnerability) How severe would the consequences be if the roof were torn off by the storm? (severity)

This decomposition helps identify a) which factors contribute most to risk and b) where interventions can be most effective.

Another concept introduced related to this is Nines of Reliability, which I haven’t really bothered to type out because it’s essentially just more nines = better. (99.999% uptime » 99.9% uptime)

Build Safer Systems

Safe Design Principles are basic guidelines for building systems that are naturally safer and easier to control. These include having backups (redundancy), making sure systems default to safe modes if something goes wrong (fail-safe), keeping things simple to avoid mistakes, and making sure humans can clearly monitor and step in when needed. Using these principles helps stop accidents before they happen and reduces their impact if they do. AI systems should be built following these ideas.

High-Reliability Organizations (HROs) are organizations that operate in high-risk, complex environments yet manage to maintain an exceptionally high level of safety and reliability over long periods. Examples include air traffic control centers, nuclear power plants, aircraft carriers, and emergency medical services. They are different from other organizations as they:

  • pay close attention even to small problems
  • prioritize responding quickly and effectively
  • trust people with the most knowledge to make decisions
  • create a culture where close calls are carefully examined instead of ignored

AI organizations should strive to be HROs.

Why It’s Still Hard

Normal Accident Theory argues that in complex and tightly coupled systems, accidents are not just possible but inevitible. This is because complexity means the system has many interacting parts with unpredictable relationships, which in turn can produce unexpected behaviors. Tight coupling means that processes happen fast, which leaves little room for catching errors and intervention once a problem occurs. Because of this, small failures can cascade through the system quickly, causing large-scale accidents.

Many AI risks arise not just from technical errors, but from Systemic Factors, such as how AI technologies integrate with human and organizational systems. For example, an AI used in a hospital might make accurate diagnoses but still cause problems if the staff aren’t properly trained to use it. That kind of failure isn’t about the technology itself.

In addition to this, we have to consider tail events and black swan events. Tail events are rare occurrences that lie at the extreme ends (tails) of probability distributions and have very large impacts when they happen. Black swan events are a subset of tail events that are not only rare and high-impact but also fundamentally unpredictable and unknown before they occur. These kinds of events make traditional risk management hard because they’re so rare and surprising that we can’t easily predict or prepare for them based on past data.