Human Compatible

17 Aug 2022 05:26 - 17 Jun 2023 08:29
Open in Logseq
    • book by Stuart Russell, mostly about AI Risk
    • Realized I read this some time ago, took notes, and remember nothing.
    • Ch 1
      • The problem is right there in the basic definition of AI. We say that machines are intelligent to the extent that their actions can be expected to achieve their objectives, but we have no reliable way to make sure that their objectives are the same as our objectives. What if, instead of allowing machines to pursue their objectives, we insist that they pursue our objectives? Such a machine, if it could be designed, would be not just intelligent but also beneficial to humans. So let’s try this:
        • Machines are beneficial to the extent that their actions can be expected to achieve our objectives. (p11)
      • All the emphasis in the original, he's doing some formal term definition here.
      • Something in me really wants to sneer at this, you can tell that while it's trying to think more clearly about the nature of goals and purpose than AI people usually do, it's not going to be enough.
      • This sounds good:
      • Removing the assumption that machines should have a definite objective means that we will need to tear out and replace part of the foundations of artificial intelligence—the basic definitions of what we are trying to do. That also means rebuilding a great deal of the superstructure—the accumulation of ideas and methods for actually doing AI.
      • Yeah man that sounds just like my groove.
    • Ch 2
      • goes back to evolution and E coli. OK.
      • Consciousness as red herring (p16), competence is important, not consciousness. I'd say I'm in first-order agreement, but there are some subtleties here.
      • Reward system, and even gets into wireheading. And Baldwin effect.
      • Aristotle on practical reason (p20)
        • We deliberate not about ends, but about means. For a doctor does not deliberate whether he shall heal, nor an orator whether he shall persuade. . . . They assume the end and consider how and by what means it is attained
      • This passage, one might argue, set the tone for the next two-thousand-odd years of Western thought about rationality. It says that the “end”—what the person wants—is fixed and given; and it says that the rational action is one that, according to logical deduction across a sequence of actions, “easily and best” produces the end.
      • Aristotle doesn't account for uncertainty (not sure how that is relevant but OK)
      • Bernoulli introduction of utiility Von Neumann and Morgenstern, maximization (p23)
      • Some critiques of utility theory, quickly disposed of:
        • its all about greed
        • you can't actually produce and calculate utilities and probabilities for everything
        • empirical objections (humans are not actually that rational)
        • This one is more interesting
          • Another critique of the theory of rationality lies in the identification of the locus of decision making. That is, what things count as agents? It might seem obvious that humans are agents, but what about families, tribes, corporations, cultures, and nation-states? If we examine social insects such as ants, does it make sense to consider a single ant as an intelligent agent, or does the intelligence really lie in the colony as a whole, with a kind of composite brain made up of multiple ant brains and bodies that are interconnected by pheromone signaling instead of electrical signaling?
      • Lays out a few different agential structures (basic ones), strong suggestion that they are inadequate (agree).
    • Ch 3
      • A proposal to make intelligent assisstants for daily life, reminds me of what I was trying to do at Quixey. (p68) Sort of obvious and maybe doable.
      • ref: Nelson Goodman, Fact, Fiction, and Forecast (highly rec'd)
      • Good quote from Whitehead: "Civilization advances by extending the number of important operations we can perform without thinking about them: (p 88)
      • p90 introduces concept of self-management and reflection, which is good, Minskyish
        • If managing activity in the real world seems complex, spare a thought for your poor brain, managing the activity of the “most complex object in the known universe”—itself. We don’t start out knowing how to think, any more than we start out knowing how to walk or play the piano. We learn how to do it. We can, to some extent, choose what thoughts to have.
    • Ch 4 Mundane Risk
      • Skipping most of this, I know it already
    • Ch 5
      • p133 didn't know people were worrying about computational x-risk in 1847
      • p138 starts to talk about goal alignment, HAL problem, etc
    • Ch 6 AI-risk debate
      • There are, however, some useful clues in what Brooks and Pinker say. It does seem stupid to us for the machine to, say, change the color of the sky as a side effect of pursuing some other goal, while ignoring the obvious signs of human displeasure that result. It seems stupid to us because we are attuned to noticing human displeasure and (usually) we are motivated to avoid causing it—even if we were previously unaware that the humans in question cared about the color of the sky. That is, we humans (1) care about the preferences of other humans and (2) know that we don’t know what all those preferences are. In the next chapter, I argue that these characteristics, when built into a machine, may provide the beginnings of a solution to the King Midas problem.
    • Ch 7 AI: A Different Approach