from LWMap/*A Map That Reflects the Territory*

- LWMap/The Rocket Alignment Problem (Eliezer Yudkowsky)

from LWMap/Being a Robust Agent

- try to repair the problem with aftermarket alignment techniques (see LWMap/The Rocket Alignment Problem)

from Introduction to *Inventive Minds*

- Marvin’s emphasis on the social nature of learning might come as a surprise to those accustomed to the usual emphasis on the mechanisms of individual minds that is the default methodological stance of AI. Certainly the AI of Marvin’s period of greatest activity did not pay a great deal of attention to the social embeddedness of learning and intelligence. But Marvin was not one to let the current limits of computation interfere with his forward-looking theories of mind. More recently, the social transmission of goals has resurfaced as the focus of attempts to mitigate the supposed existential risks of AIs by achieving “value alignment.”

08 Mar 2022 08:39 - 17 Jun 2023 08:29

- A review of the original essay by Eliezer Yudkowsky about alignment .

- This essay is a cute extended metaphor about the "Mathematics of Intentional Rocketry Institute". If I read him right he is saying that the problem is not so much
*malevolent*AI, or AI in the hands of malevolent people, but that we have no idea how to think about powerful goal-directed systems*at all*. Given how important this, shouldn't we give it our best shot?

- While I kind of agree with that, I'm less taken with the implication that MIRI folks are the only ones clear-eyed enough to see this problem and its importance, and also the ones who are smart enough to perhaps have solutions. This has an unavoidable air of arrogance and crackpottery.

- The essay takes the form of a dialog between Beth, representing MIRI/Yudkowsky, and the somewhat dim critic Alfonso, who raises various objections to the MIRI program only to have them easily refuted.

- It touches on most of my own quibbles with Rationalism, or at least the AI-alignment aspect, eg the observation that MIRI's approach is grounded almost exclusively in abstract mathematics, whereas real intelligent machines are engineered artifacts that are embedded in the physical world and need to be thought of in an engineering mode. Here's how that is represented:
**ALFONSO:**... This gets me into the main problem I have with your project in general. I just don’t believe that any future rocket design will be the sort of thing that can be analyzed with absolute, perfect precision so that you can get the rocket to the Moon based on an absolutely plotted trajectory with no need to steer. That seems to me like a bunch of mathematicians who have no clue how things work in the real world, wanting everything to be perfectly calculated. Look at the way Venus moves in the sky; usually it travels in one direction, but sometimes it goes retrograde in the other direction. We’ll just have to steer as we go.**BETH:**... we both agree that rocket positions are hard to predict exactly during the atmospheric part of the trajectory, due to winds and such. And yes, if you can’t exactly predict the initial trajectory, you can’t exactly predict the later trajectory. So, indeed, the proposal is definitely not to have a rocket design so perfect that you can fire it at exactly the right angle and then walk away without the pilot doing any further steering. The point of doing rocket math isn’t that you want to predict the rocket’s exact position at every microsecond, in advance.**ALFONSO:**Then why obsess over pure math that’s too simple to describe the rich, complicated real universe where sometimes it rains?**BETH:**It’s true that a real rocket isn’t a simple equation on a board. It’s true that there are all sorts of aspects of a real rocket’s shape and internal plumbing that aren’t going to have a mathematically compact characterization. What MIRI is doing isn’t the right degree of mathematization for all rocket engineers for all time; it’s the mathematics for us to be using right now (or so we hope).

- This stuff is cute but it also able to easily skirt real objections, and makes me hungry to see a dialog between Yudkowsky and a non-cartoon opponent. In this case, the real objection is more like this: any real intelligent system, just like any non-intelligent computational system, is not an abstract mathematical construct, but a physical embodiment of one. And as such its failure modes can't be determined by reasoning about the mathematical abstraction.

- This is easy to see in the real-world example of computer security. Mathematics and proof is very important in this area, but insufficient to actually achieve security, since real systems have many vulnerabilities that have nothing to do with their algorithmic specification (generically these are known as side-channel attacks). The best encryption algorithm in the world can't do anything if someone figures out how to read the cleartext from changes in the power consumption.

- The problem of constraining superintelligent AIs is weirdly similar to the general computer security problem – in essence, you are trying to ensure that a system is supercapable but somehow barred from hacking
*itself*. It doesn't seem possible, and if it is, my intuition is that the kind of mathematical thinking that MIRI likes to do won't have a whole lot to do with the solution.One can't proceed from the informal to the formal by formal means. – Alan Perlis

- I could be wrong of course, and far be it from me to tell people they shouldn't do mathematics. My own approach to the problem is to try to think hard about the relationship between agency and computation; which is the subject of the rest of the text aka Agency Made Me Do It.

- Yudkowsky's long list of objections and refutations makes me realize that my own quarrels with Rationalism probably aren't that interesting; they've already heard my objections a hundred times and have already dealt with and dismissed them. (They remain interesting to me though, if only because they help me clarify my own ideas).

- Further reading:
*Autonomous Technology*, Langdon Winner*Computation and Human Experience*, Phil Agre

Copyright © Hyperphor 2020-2023