- coming to accept that this would never work as the kind of idealized relationship I’d been imagining as
- coming to accept that while it working out as a *different* kind of relationship wouldn’t have been impossible earlier, it’s too late for that now
- coming to accept that there were some simple mistakes that I made during the relationship that would have been easy to avoid and which could have made a huge difference to how things turned out; but which are pointless to dwell on now
- coming to accept the loss of all the concrete good moments we had before things went sour, and the loss of that shared hope and excitement for the future that we had (this is the one my mind seems to be focused on working on right now)
I really liked, and have gotten a lot out of, the self-compassion advice in the book The Wisdom of No Escape and the Path of Loving-Kindness.
First, on the general attitude and approach:
When people start to meditate or to work with any kind of spiritual discipline, they often think that somehow they’re going to improve, which is a sort of subtle aggression against who they really are. It’s a bit like saying, ‘If I jog, I’ll be a much better person.’ ‘If I could only get a nicer house, I’d be a better person.’ ‘If I could meditate and calm down, I’d be a better person.’ Or the scenario may be that they find fault with others; they might say, ‘If it weren’t for my husband, I’d have a perfect marriage.’ ‘If it weren’t for the fact that my boss and I can’t get on, my job would be just great.’ And ‘If it weren’t for my mind, my meditation would be excellent.’
But loving-kindness – maitri – toward ourselves doesn’t mean getting rid of anything, Maitri means that we can still be crazy after all these years. We can still be angry after all these years. We can still be timid or jealous or full of feelings of unworthiness. The point is not to try to change ourselves. Meditation practice isn’t about trying to throw ourselves away and become something better. It’s about befriending who we are already. The ground of practice is you or me or whoever we are right now, just as we are. That’s the ground, that’s what we study, that’s what we come to know with tremendous curiosity and interest. […]
Sometimes among Buddhists the word ego is used in a derogatory sense, with a different connotation than the Freudian term. As Buddhists, we might say, ‘My ego causes me so many problems.’ Then we might think, ‘Well, then, we’re supposed to get rid of it, right? Then there’d be no problem.’ On the contrary, the idea isn’t to get rid of ego but actually to begin to take an interest in ourselves, to investigate and be inquisitive about ourselves. […]
This is not an improvement plan; it is not a situation in which you try to be better than you are now. If you have a bad temper and you feel that you harm yourself and others, you might think that sitting for a week or a month will make your bad temper go away – you will be that sweet person that you always wanted to be. Never again will a harsh word leave your lily-white lips, The problem is that the desire to change is fundamentally a form of aggression toward yourself. The other problem is that our hangups, unfortunately or fortunately, contain our wealth. Our neurosis and our wisdom are made out of the same material. If you throw out your neurosis, you also throw out your wisdom. Someone who is very angry also has a lot of energy; that energy is what’s so juicy about him or her. That’s the reason people love that person. The idea isn’t to try to get rid of your anger, but to make friends with it, to see it clearly with precision and honesty, and also to see it with gentleness. That means not judging yourself as a bad person, but also not bolstering yourself up by saying, ‘It’s good that I’m this way, it’s right that I’m this way. Other people are terrible, and I’m right to be so angry at them all the time.’ The gentleness involves not repressing the anger but also not acting it out. It is something much softer and more open-hearted than any of that. It involves learning how, once you have fully acknowledged the feeling of anger and the knowledge of who you are and what you do, to let it go. You can let go of the usual pitiful little story line that accompanies anger and begin to see clearly how you keep the whole thing going. So whether it’s anger or craving or jealousy or fear or depression – whatever it might be – the notion is not to try to get rid of it, but to make friends with it. That means getting to know it completely, with some kind of softness, and learning how, once you’ve experienced it fully, to let go.
And then on the specific instructions for self-compassionate meditation:
The technique is, first, to take good posture and, second, to become mindful of your out-breath. This is just your ordinary out-breath, not manipulated or controlled in any way. Be with the breath as it goes out, feel the breath go out, touch the breath as it goes out. Now, this seems simple, but to actually be with that breath and to be there for every breath requires a lot of precision. When you sit down and begin to meditate, the fact that you always come back to that breath brings out the precision, the clarity, and the accuracy of your mind. Just the fact that you always come back to this breath and that you try, in a gentle way, to be as fully with the breath as you can sharpens your mind.
The third part of the technique is that, when you realize that you’ve been thinking, you say to yourself, ‘Thinking.’ Now, that also requires a lot of precision. Even if you wake up as if from a dream and realize that you’ve been thinking, and you immediately go back to the breath and accidentally forget about the labeling, even then you should just pause a little bit and say to yourself, ‘Thinking.’ Use the label, because the label is so precise. Just acknowledge that you’ve been thinking, just that, no more, no less, just ‘thinking.’ Being with the out-breath cultivates the precision of your mind, and when you label, that too brings out the precision of your mind. Your mind becomes more clear and stabilized. As you sit, you might want to be aware of this.
If we emphasized only precision, our meditation might become quite harsh and militant. It might get too goal-oriented. So we also emphasize gentleness. One thing that is very helpful is to cultivate an overall sense of relaxation while you are doing the meditation. I think you’ll notice that as you become more mindful and more aware and awake, you begin to notice that your stomach tends to get very tense and your shoulders tend to get very tight. It helps a lot if you notice this and then purposely relax your stomach, relax your shoulders and your neck. If you find it difficult to relax, just gradually, patiently, gently work with it. […]
The moment when you label your thoughts ‘thinking’ is probably the key place in the technique where you cultivate gentleness, sympathy, and loving-kindness. Rinpoche used to say, ‘Notice your tone of voice when you say “thinking.”’ It might be really harsh, but actually it’s just a euphemism for ‘Drat! You were thinking again, gosh darn it, you dummy.’ You might really be saying, ‘You fool, you absolutely miserable meditator, you’re hopeless.’ But it’s not that at all. All that’s happened is that you’ve noticed. Good for you, you actually noticed! You’ve noticed that mind thinks continuously, and it’s wonderful that you’ve seen that. Having seen it, let the thoughts go. Say, ‘Thinking.’ If you notice that you’re being harsh, say it a second time just to cultivate the feeling that you could say it to yourself with gentleness and kindness, in other words, that you are cultivating a nonjudgmental attitude. You are not criticizing yourself, you are just seeing what is with precision and gentleness, seeing thinking as thinking. That is how this technique cultivates not only precision but also softness, gentleness, a sense of warmth toward oneself. The honesty of precision and the goodheartedness of gentleness are qualities of making friends with yourself. So during this period, along with being as precise as you can, really emphasize the softness. If you find your body tensing, relax it. If you find your mind tensing, relax it. Feel the expansiveness of the breath going out into the space. When thoughts come up, touch them very lightly, like a feather touching a bubble. Let the whole thing be soft and gentle, but at the same time precise. […]
You may have wondered why we are mindful of our out-breath and only our out-breath. Why don’t we pay attention to the out-breath and the in-breath? There are other excellent techniques that instruct the meditator to be mindful of the breath going out and mindful of the breath coming in. That definitely sharpens the mind and brings a sense of one-pointed, continuous mindfulness, with no break in it. But in this meditation technique, we are with the out-breath; there’s no particular instruction about what to do until the next out-breath. Inherent in this technique is the ability to let go at the end of the out-breath, to open at the end of the out-breath, because for a moment there’s actually no instruction about what to do. There’s a possibility of what Rinpoche used to call ‘gap’ at the end of the out-breath: you’re mindful of your breath as it goes out, and then there’s a pause as the breath comes in. It’s as if you … pause. It doesn’t help at all to say, ‘Don’t be mindful of the in-breath’ – that’s like saying, ‘Don’t think of a pink elephant.’ When you’re told not to be mindful of something, it becomes an obsession. Nevertheless, the mindfulness is on the out-breath, and there’s some sense of just waiting for the next out-breath, a sense of no project. One could just let go at the end of the out-breath. Breath goes out and dissolves, and there could be some sense of letting go completely. Nothing to hold on to until the next out-breath.
Even though it’s difficult to do, as you begin to work with mindfulness of the out-breath, then the pause, just waiting, and then mindfulness of the next out-breath, the sense of being able to let go gradually begins to dawn on you. So don’t have any high expectations – just do the technique. As the months and years go by, the way you regard the world will begin to change.
I’ve said a lot about depression, self-compassion, and breakup blues.
I haven’t said much about burnout. I have that too. Have had for years, in fact.
This is just the first time that I’ve had a chance to stop and heal.
I did a day of work last week, the first one I’ve done since the end of November. It went well. It felt good. So I thought I would try to get a full week’s worth of work done.
Then I basically crashed again.
Sometimes, your skin feels sensitive and raw. Everything is, not if outright painful, then at least unpleasant to touch.
That’s how I feel today, and on a lot of days. Except that the skin is my mind, and the things that I touch are thoughts about things to be done.
Goals. Obligations. Future calendar entries. But even things like a computer game I was thinking of playing, or a Facebook comment I’m thinking of replying to. Anything that I need to keep track of, touches against that rawness in my mind.
That’s another big part of why I’ve been so focused on self-compassion recently. On being okay with not getting anything done. On taking pleasure from just being present. On enjoying little, ordinary things. Because that’s all I have, on moments like this.
I’m getting better. There are fewer days like this. There are many days when I’m actually happy, enjoying it when I do things.
But I’m still not quite recovered. And I need to be careful not to forget that, lest I push myself so much that I crash again.
Often when we are in pain, what we really want is some validation for the pain.
Not advice. Not someone trying to make that pain go away (because it discomforts them). But someone to tell us that it’s okay to be in pain. That the things that bother us, are valid and normal reasons to feel bad about.
Much of self-compassion seems to be the same. Not trying to stop being in pain. Not trying to change yourself. But giving yourself the validation that we usually look for from the outside. Accepting it as a part of yourself, as something that is alright to feel. Something that you can sympathize with yourself for feeling.
And if you find that you *cannot* accept the pain…
Then you unjudgingly accept that too. That today, this pain is too much for me to bear. You just are with it, without trying to change it.
And if you find that you cannot do that either, and feel bad and guilty for being so bad at this self-compassion thing…
Then you accept that, without trying to change it.
And if you find yourself being kinda okay with being in pain, but still wanting to change it, still wanting to explicitly apply some technique for deeper self-compassion rather than just accepting everything…
Then you accept that, and let yourself do it.
Dealt with in this way, self-compassion oddly starts looking like not really doing anything in particular. After all, you just go about living your life as you always have, not trying to change anything about yourself. Or trying, if that’s what you’re like. Not trying to exert any particular control over your behavior, except when you do.
Yet somehow you end up feeling quite different from normal.
(Except when you don’t, which is also fine.)
In this post, I examine Magnus Vinding’s argument against traditional “single AI fooms off” scenarios, as outlined in his book “Reflections on Intelligence”. While the argument itself is not novel – similar ones have been made before by Robin Hanson and J Storrs Hall, among others – I found Vinding’s case to be the most eloquently and compellingly put so far.
Vinding’s argument goes basically as follows: when we talk about intelligence, what we actually care about is the ability to achieve goals. For instance, Legg & Hutter collected 70 different definitions for intelligence, and concluded that a summary which captured the essential spirit of most of them was “Intelligence measures an agent’s ability to achieve goals in a wide range of environments”.
But once we substitute “intelligence” with “the ability to achieve goals”, we notice that we are actually talking about having tools, in several senses of the word:
- Cognitive tools: our brains develop to have specialized processes for performing various kinds of tasks, such as recognizing faces, recognizing emotions, processing language, etc. Humans have some cognitive tools that are unique to us (such as sophisticated language) while lacking some that other animals have (such as the sophisticated smell processing of a dog).
- Anatomical tools: not only do our brains carry out specific tasks, we also have an anatomy that supports it. For instance, our vocal cords allow us to produce a considerable variety of sounds to be used together with our language-processing capabilities. On the other hand, we also lack some other anatomical tools, such as the impressive noses of dogs. It is the combination of cognitive and anatomical tools that allows us to achieve a variety of different goals.
- Physical tools: tools in the most conventional sense of the word, we would not be capable of achieving much unless we had various physical devices that can be used for manipulating the world.
- Cultural tools: nobody would get very far if they had to derive all of their ideas from scratch. Rather, we acquire most of our language, ideas, and ways of thought that we use from the people around us.
- Societal tools: an individual’s ability to achieve things has grown enormously as our economy has grown increasingly specialized. No single person could build a laptop, or even a pencil, all by themselves. Yet we have at our disposal tools – computers, web browsers, Internet service providers, online stores, manufacturers, delivery companies – which allow us to almost effortlessly acquire laptops and pencils and then put them into use.
This paragraph from Vinding’s book summarizes much of his argument:
“Human intelligence” is often compared to “chimpanzee intelligence” in a manner that presents the former as being so much more awesome than, and different from, the latter. Yet this is not the case. If we look at individuals in isolation, a human is hardly that much more capable than a chimpanzee. They are both equally unable to read and write on their own, not to mention building computers or flying to the moon. And this is also true if we compare a tribe of, say, thirty humans with a tribe of thirty chimpanzees. Such two tribes rule the Earth about equally little. What really separates humans from chimpanzees, however, is that humans have a much greater capacity for accumulating information, especially through language. And it is this – more precisely, millions of individuals cooperating with this, in itself humble and almost useless, ability – that enables humans to accomplish the things we erroneously identify with individual abilities: communicating with language, doing mathematics, uncovering physical laws, building things, etc. It is essentially this you can do with a human that you cannot do with a chimpanzee: train them to contribute modestly to society. To become a well-connected neuron in the collective human brain. Without the knowledge and tools of previous generations, humans are largely indistinguishable from chimpanzees.
So what are the implications for AI risk?
One of Vinding’s arguments is that “intelligence” has gotten increasingly distributed. Whereas a hunter-gatherer might only have drawn upon the resources of their own tribe, a modern human will enhance their capabilities by tapping into a network of resources that literally spans the entire globe. Thus, it may be misguided to focus on the point when AIs achieve human-level intelligence, for a single individual’s intelligence alone isn’t sufficient for achieving much. Instead, if AIs were to wipe out humanity, they would need to first achieve the level of capability that human society has… but the easiest way of achieving that would be to collaborate with human society and use its resources peacefully, rather than cause damage to it.
A similar argument was previously put forward by J Storrs Hall in his paper Engineering Utopia, which uses a more economic argument. Hall notes that even when a single AI is doing self-improvement (such as by developing better cognitive science models to improve its software), the rest of the economy is also developing better such models. Thus it’s better for the AI to focus on improving at whatever thing it is best at, and keep trading with the rest of the economy to buy the things that the rest of the economy is better at improving.
However, Hall notes that there could still be a hard takeoff, once enough AIs were networked together: AIs that think faster than humans are likely to be able to communicate with each other, and share insights, much faster than they can communicate with humans. The size of the AI economy could grow quite quickly, with Hall suggesting a scenario that goes “from […] 30,000 human equivalents at the start, to approximately 5 billion human equivalents a decade later”.
Any individual AI, then, will be most effective as a cooperating element of a community (as is any individual human […]). AI communities, on the other hand, will have the potential to grow into powers rivalling or exceeding the capability of the human race in relatively short order. The actions of communities are effects of the set of ideas they hold, the result of an extremely rapid memetic evolution […]
Real-time human oversight of such AI communities is infeasible. Once a networked AI community was established, a “cultural revolution” could overtake it in minutes on a worldwide scale, even at today’s communication rates. The essence of our quest for a desirable future world, then, both for ourselves and for the AIs, lies in understanding the dynamics of memetic evolution and working out ways to curb its excesses.
Hall suggests that an community could rapidly grow to the point where they were exclusively communicating and trading with each other, humans being too slow to bother with. Suppose that you were a digital mind that thought a thousand times as fast as biological humans. If you wanted a task done, would you rather hire another digital mind to do it, taking what felt to you like an hour – or would you hire a biological human, and have to wait what felt like a month and a half? You’d probably go with your digital friend.
One obvious limitation is that this speed advantage would only apply for purely mental tasks. If you needed something manufactured, you might as well order something from the humans.
Vinding’s book could also be read as a general argument suggesting that the amount of distributed intelligence in human society was so large that AIs would still benefit from trade, and would need a large amount of time to learn to do everything themselves. Vinding writes:
… the majority of what humans do in the economy is not written down anywhere and thus not easily copyable. Customs and know-how run the world to an extent that is hard to appreciate – tacit knowledge and routines concerning everything from how to turn the right knobs and handles on an oil rig to how to read the faces of other humans, none of which is written down anywhere. For even on subjects where a lot is written down – such as how to read faces – there are many more things that are not. In much of what we do, we only know how we do, not exactly “what”, and this knowledge is found in the nooks and crannies of our brains and muscles, and in our collective organization as a whole. Most of this unique knowledge cannot possibly be deduced from a few simple principles – it can only be learned through repeated trial and error – which means that any system that wants to expand the economy must work with this enormous set of undocumented, not readily replaceable know-how and customs.
This is a compelling argument, but with recent progress in AI, it feels less compelling than it might have felt a few years back. Vinding mentions reading faces as an example of a domain involving much tacit knowledge, but computers are already outperforming humans at facial recognition and are starting to match humans at recognizing and interpreting emotional expressions, as well as in recognizing rare syndromes from facial patterns. As a more industrial example, DeepMind’s AI technology was recently deployed to optimize power usage at Google’s data centers, for a 15 percent improvement in power usage efficiency. Since relatively small reductions in power use translate to large savings – this change is estimated to save Google hundreds of millions of dollars – these were already highly-optimized centers.
Tacit knowledge is essentially knowledge that is based on pattern recognition, and pattern recognition is rapidly becoming one of AI’s strengths. Currently this still requires massive datasets – Goodfellow et al. (2016, chap 1) note that as a rule of thumb, a deep learning algorithm requires a dataset of at least 10 million labeled examples in order to achieve human-level or better performance. On the other hand, they also note that a large part of the success of deep learning has been because the digitization of society has made such large datasets increasingly available.
It seems likely that as the development of better and better AI pattern recognition will drive further investment into collecting larger datasets, which will in turn make it even more profitable to continue investing in better pattern recognition. After DeepMind’s success with improving power efficiency at Google’s data centers, DeepMind’s Demis Hassabis told Bloomberg that “[DeepMind] knows where its AI system lacks information, so it may ask Google to put additional sensors into its data centers to let its software eke out even more efficiency”.
If AI allows efficiency to be increased, then businesses will be rebuilt in such a way as to give AI all the necessary information it needs to run them maximally efficiently – making tacit human knowledge of how things were previously done both unnecessary and obsolete. The items in Amazon’s warehouses are algorithmically organized according to a logic that makes little intuitive sense to humans, with an AI system telling the workers where to go; Foxconn is in the process of fully automating its factories; Uber is seeking to replace human drivers with self-driving cars. We are bound to see this kind of automation penetrate into ever larger parts of the economy over time, which will drive the further deployment of sensors and collection of better datasets in order to enable it. By the time AGI manifests, after several decades of this development, there’s no obvious reason to assume that very much of the tacit knowledge needed for running an economy would necessarily remain locked up in human heads anymore.
To sum things up, this suggests that beyond the classical “one AI fooms to a superintelligence and takes over the world” scenario, there may plausibly exist a scenario where the superintelligences are initially best off trading with humans. As time goes on and the size of the AI community grows, this community may collectively foom off as they come to only trade with each other and have little use for humans. Depending on how long it takes for the community grow, this may or may not look any different from traditional foom.
This blog post was written as part of research funded by the Foundational Research Institute.
For some reason, I’ve always felt an instinctive dislike towards my childhood self. I generally like kids, but if somebody had magically produced a copy of the person that I was at 5 or 10 and asked me to look after that kid for a while, my automatic reaction would have been “no, I don’t like that kid”.
I’ve also had somewhat of a bad self-esteem for a long, long time. For my tenth birthday, I decided that I didn’t want to get any presents, because I felt like I had done nothing to deserve them. And I didn’t want to get any presents on future birthdays, or on any Christmas, either. (This caused what’s probably one of the oddest child-parent fights that I know of, with my dad being angry about wanting to give me presents and me steadfastly refusing them.)
These two things seemed obviously related.
So today I started exploring that feeling of dislike. Where was it coming from? Why did I have such an aversion regarding my younger self?
Now here’s the thing. I was an only child who frequently spent more time by himself or around adults than he did around other kids. Like all kids, I had a fair share of fights with my parents about stuff like bedtimes and such.
But I never realized that other kids had those same kinds of fights and tantrums too.
I remember having been distinctly shocked when a teacher we had when I was 13-15 made an off-handed comment about this happening with younger kids.
I hadn’t known that this was a Kid Thing: I had thought it was a Kaj Thing.
And as a result, I’d felt guilty and bad over each time that I’d been self-centered and emotional in the way kids are. By the time I heard my teacher make that comment, it started to dawn on me on an intellectual level that this was nothing special: but on an emotional level I had already internalized a belief that I was exceptionally ungrateful and undeserving for everything my parents did for me.
Today I went back to those experiences. A few memories in particular stuck out: one of the countless bedtime struggles, as well as an occasion when I’d told my dad over the phone that I didn’t like him. And now, instead of just recalling my behavior in those memories – like on every previous occasion when I had recalled them – I tried to remember my emotional state, and to sympathize with it, and to recall other kids that I’ve seen acting up and who I’ve felt sympathetic towards.
And then there was a shift, and those memories started feeling like instances of a Kid Thing, rather than a uniquely Kaj Thing.
And now if you’d bring me a copy of me as I was at 5 or 10, I’d just like to hug that poor kid and tell him that it’s okay.
For a few months this fall, I was part of a poly triad which ultimately didn’t work out… but the moments when it did work, worked. So well in fact, that I suspect that any relationship with only two people involved will from now on feel somehow lacking to me, no matter how good otherwise.
There were two of us guys involved with one gal, with the guys starting out as strangers to each other. Still, from the start it was clear that everyone wanted everybody to be happy, and was going to act accordingly.
To me, at the best moments, we felt like family. Not just two men who happened to both have a relationship with the same woman, but a cohesive unit doing its best that everyone in it (as well as the kid from a previous relationship) would be as well off as possible. Thinking back to it, I recall moments like:
- all three brainstorming and looking up stuff about how to make the kid sleep better at night, or to be more willing to sit still while riding a bus
- one of us reading a book aloud to the two others, all three cuddling together
- everyone spending several hours carrying some fresh wood together
- all three sitting together and discussing some conflicts that had come up between two of them, with the third one offering a more neutral outside perspective and acting as a general calming force
It’s hard to describe, but I feel like there was a very strong sense in which there being three of us brought a sense of extra stability to the relationship. If someone was upset or doing badly, nobody needed to feel like they alone had the primary burden of helping that person out. Whoever needed support, there were two other people to shoulder the effort of providing it. And nobody would hesitate to provide it, if only they were in a shape where they could.
While it ultimately didn’t work out, that feeling of being a tight-knit family, with a sense of “one for all, all for one”… I’m going to miss that, in any relationship that doesn’t have it. You can get the sense of mutual support with just a single couple, of course; but things like that sense of “we’re both in love with the same person so we’re going to work together to make her happy; and we know that she cares about us both and will be the happiest if both of us are happy, so we’ll also do our best to help each other out whenever we can”… that I don’t think you can really get without having a triad.
So a few days back, I mentioned that after getting rid of my subconscious idealized assumptions of what a relationship “should” be like, I stopped being so desperate to be in a relationship.
And some time before that, I mentioned that I’d decided to put the whole “saving the world” thing on hold for a few years and focus on taking care of myself first.
As a result, I’ve suddenly found myself having *no* pressing goals that would direct my life. No stress about needing to do something big-impact. No constant loneliness and thinking about how to best impress people.
Just a sudden freedom to do basically anything.
I’m still in the process of disassembling various mental habits that were focused on making me more single-mindedly focused on the twin goals of saving the world and getting into a relationship. But starting to suspect that even more things were defined by those goals than I suspected.
For instance, my self-esteem has usually been pretty bad, probably because I was judging myself and my worth pretty much entirely by how well I did at those two goals. And I didn’t feel like I was doing particularly well at either.
Now I can just… Live a day at a time and not sweat it.
It’s going to take a while to get used to this.
Three weeks ago, I ran across an article called “Google’s former happiness guru developed a three-second brain exercise for finding joy“. Yes, the title is kinda cringe-worthy, but the content is good. Here are the most essential five paragraphs:
Successfully reshaping your mindset, [Chade-Meng Tan] argues, has less to do with hours of therapy and more to do with mental exercises, including one that helps you recognize “thin slices of joy.”
“Right now, I’m a little thirsty, so I will drink a bit of water. And when I do that, I experience a thin slice of joy both in space and time,” he told CBC News. “It’s not like ‘Yay!”” he notes in Joy on Demand. “It’s like, ‘Oh, it’s kind of nice.’”
Usually these events are unremarkable: a bite of food, the sensation of stepping from a hot room to an air-conditioned room, the moment of connection in receiving a text from an old friend. Although they last two or three seconds, the moments add up, and the more you notice joy, the more you will experience joy, Tan argues. “Thin slices of joy occur in life everywhere… and once you start noticing it, something happens, you find it’s always there. Joy becomes something you can count on.” That’s because you’re familiarizing the mind with joy, he explains.
Tan bases this idea on neurological research about how we form habits. Habitual behaviors are controlled by the basal ganglia region of the brain, which also plays a role in the the development of memories and emotions. The better we become at something, the easier it becomes to repeat that behavior without much cognitive effort.
Tan’s “thin slice” exercise contains a trigger, a routine, and a reward—the three parts necessary to build a habit. The trigger, he says, is the pleasant moment, the routine is the noticing of it, and the reward is the feeling of joy itself.
Since then, I have been working on implementing its advice, and making it a habit to notice the various “thin slices of joy” in my life.
It was difficult to remember at first, and on occasions when I’m upset for any reason it’s even harder to follow, even if I do remember it. Still, it is gradually becoming a more entrenched habit, with me remembering it and automatically following it more and more often – and feeling better as a result. I’m getting better at noticing the pleasure in sensations like
- Drinking water.
- Eating food.
- Going to the bathroom.
- Having drops of water fall on my body while in the shower.
- The physicality of brushing teeth, and the clean feeling in the mouth that follows.
- Being in the same room as someone and feeling less alone, even if both are doing their own things.
- Typing on a keyboard and being skilled enough at it to have each finger just magically find the right key without needing to look.
And so on.
Most of these are physical sensations. I would imagine that this would be a lot harder for someone who doesn’t feel comfortable in their body. But for me, a great thing about this is that my body is always with me. Anytime when I’m sitting comfortably – or standing, or lying, or walking comfortably – I can focus my attention on that comfort and get that little bit of joy.
In the article, it said that
“Thin slices of joy occur in life everywhere… and once you start noticing it, something happens, you find it’s always there. Joy becomes something you can count on.” That’s because you’re familiarizing the mind with joy, he explains.
I feel like this is starting to happen to me. Still not reliably, still not always, still easily broken by various emotional upsets.
But I still feel like I’m making definite progress.
Learning experiences: just broke up with someone recently. Part of the problem was that I had some very strong, specific and idealized expectations of what a relationship “should” be like – expectations which caused a lot of trouble, but which I hadn’t really consciously realized that I had, until now.
Digging up the expectations and beating them into mush with a baseball bat came too late to save this particular relationship, but it seems to have had an unexpected side effect: the thought of being single feels a lot less bad now.
I guess that while I had that idealized vision of “being in a relationship”, my mind was constantly comparing singledom to that vision, finding my current existence to be lacking, and feeling bad as a result. But now that I’ve gone from “being in a relationship means X” to “being in a relationship can mean pretty much anything, depending on the people involved”, there isn’t any single vision to compare my current state against. And with nothing to compare against, there’s also nothing that would make me feel unhappy because I don’t have it currently.
We all have our weak moments. Moments when we know the right thing to do, but are too tired, too afraid, or too frustrated to do it. So we slip up, and do something that we’ll regret.
An algorithm will never slip up in a weak moment. What if we could identify when we are likely to make mistakes, figure out what we’d want to do instead, and then outsource our decisions to a reliable algorithm? In what ways could we use software to make ourselves into better people?
Passive moral enhancement
One way of doing this might be called passive moral enhancement, because it happens even without anyone thinking about it. For example, if you own a self-driving car, you will never feel the temptation to drink and drive. You can drink as much as you want, but your car will always be the one who drives for you, so you will never endanger others by your drinking.
In a sense this is an uninteresting kind of moral enhancement, since there is nothing novel about it. Technological advancement has always changed the options that we have available to us, and made some vices less tempting while making others more tempting.
In another sense, this is a very interesting kind of change, because simply removing the temptation to do bad is a very powerful way to make progress. If you like drinking, it’s a pure win for you to get to drink rather than having to stay sober just because you’re driving. If we could systematically engineer forms of passive moral enhancement into society, everyone would be better off.
Of course, technology doesn’t always reduce the temptation to do bad. It can also open up new, tempting options for vice. We also need to find ways for people to more actively reshape their moral landscape.
Reshaping the moral landscape
On the left is a screenshot from GoodGuide. GoodGuide is an application which rates the health, environmental, and societal impact of different products on a scale from 1 to 10, making it easier to choose sustainable products. This is an existing application, but similar ideas could be taken much further.
Imagine having an application which allowed you to specify what you considered to be an ethical product and what kinds of things you needed or liked. Then it would go online and do your shopping for you, automatically choosing the products that best fit your needs and which were also the most ethical by your criteria.
Or maybe your criteria would act as a filter on a search engine, filtering out any products you considered unethical – thus completely removing the temptation to ever buy them, because you’d never even see them.
Would this be enough? Would people be sufficiently motivated to set and use such criteria, just out of the goodness of their hearts?
Probably many would. But it would still be good to also create better incentives for moral behavior.
Software to incentivize moral behavior
On the right, you can see a chain of kidney donations created by organ-matching software.
Here’s how it works. Suppose that my mother has failing kidneys, and that I would like to help her by giving her one of my kidneys. Unfortunately, the compatibility between our kidneys is poor despite our close relation. A direct donation from me to her would be unlikely to succeed.
Fortunately, organ-matching software manages to place us in a chain of exchanges. We are offered a deal. If I donate my kidney to Alice, who’s a complete stranger to me, then another stranger will donate their kidney – which happens to be an excellent match – to my mother. And as a condition for Alice getting a new kidney, Alice’s brother agrees to donate his kidney to another person. That person’s mother agrees to donate her kidney to the next person, and that person’s husband agrees to donate his kidney… and so on. In this way, what was originally a single donation can be transformed into a chain of donations.
As a result of this chain, people who would usually have no interest in helping strangers end up doing so, because they want to help their close ones. By setting up the chain, software has made our interest for our loved ones align together with us helping others.
The more we can develop ways of incentivizing altruism, the better off society will become.
Is this moral enhancement?
At this point, someone might object to calling these things moral enhancement. Is it really moral enhancement if we are removing temptations and changing incentives so that people do more good? How is that better morality – wouldn’t better morality mean making the right decisions when faced with hard dilemmas, rather than dodging the dilemmas entirely?
My response would be that much of the progress of civilization is all about making it easier to be moral.
I have had the privilege of growing up in a country that is wealthy and safe enough that I have never needed to steal or kill. I have never been placed in a situation where those would have been sensible options, let alone necessary for my survival. And because I’ve had the luck of never needing to do those things, it has been easy for me to internalize that killing people or stealing from them are things that you simply don’t do.
Obviously it’s also possible for someone to decide that stealing and killing are wrong despite growing up in a society where they have to do those things. Yet, living in a safer society means that people don’t have to decide it – they just take it for granted. And societies where people have seen less conflict tend to be safer and have more trust in general.
If we can make it easier for people to act in the right way, then more people will end up behaving ways that make both themselves and others better off. I’d be happy to call that moral enhancement.
Whatever we decide to call it, we have an opportunity to use technology to make the world a better place.
Let’s get to it.
I was thinking through some challenges and I noticed the sheer density of rationality concepts taught in the Sequences which I was using: “motivated cognition”, “reversed stupidity is not intelligence”, “don’t waste energy of thoughts which won’t have been useful in universes were you win” (possibly not in the Sequences), “condition on all the evidence you have”. These are fundamental concepts, core lessons which shape my thinking constantly. I am a better reasoner, a clearer thinker, and I get closer to the truth because of the Sequences. In my gut, I feel like the version of me who never read the Sequences is epistemically equivalent to a crystal-toting anti-anti-vaxxer (probably not true, but that’s how it feels) who I’d struggle to have a conversation with.
And my mind still boggles that the Sequences were written by a single person. A single person is responsible for so much of how I think, the concepts I employ, how I view the world and try to affect it. If this seems scary, realise that I’d much rather have my thinking shaped by one sane person than a dozen mad ones. In fact, it’s more scary to think that had Eliezer not written the Sequences, I might be that anti-vaxxer equivalent version of me.
Katja Grace has analyzed my and Stuart Armstrong’s 2012 paper “How We’re Predicting AI – or Failing To”. She discovered that one of the conclusions, “predictions made by AI experts were indistinguishable from those of non-experts”, is flawed due to “a spreadsheet construction and interpretation error”. In other words, I coded the data in one way, there was a communication error and a misunderstanding about what the data meant, and as a result of that, a flawed conclusion slipped into the paper.
I’m naturally embarrassed that this happened. But the reason why Katja spotted this error was that we’d made our data freely available, allowing her to spot the discrepancy. This is why data sharing is something that science needs more of. Mistakes happen to everyone, and transparency is the only way to have a chance of spotting those mistakes.
I regret the fact that we screwed up this bit, but proud over the fact that we did share our data and allowed someone to catch it.
EDITED TO ADD: Some people have taken this mistake to suggest that the overall conclusion, that AI experts are not good predictors of AI timelines, to be flawed. That would overstate the significance of this mistake. While one of the lines of evidence supporting this overall conclusion was flawed, several others are unaffected by this error. Namely, the fact that expert predictions disagree widely with each other, that many past predictions have turned out to be false, and that the psychological literature on what’s required for the development of expertise suggests that it should be very hard to develop expertise in this domain. (see the original paper for details)
(I’ve added a note of this mistake to my list of papers.)
I made a new tumblr blog. It has photos of smiling people! With more to come!
Why? Previously I happened to need pictures of smiles for a personal project. After going through an archive of photos for a while, I realized that looking at all the happy people made me feel really happy and good. So I thought that I might make a habit out of looking at photos of smiling people, and sharing them.
Follow for a regular extra dose of happiness!
A common question when discussing the social implications of AI is the question of whether to expect a soft takeoff or a hard takeoff. In a hard takeoff, an AI will, within a relatively short time, grow to superhuman levels of intelligence and become impossible for mere humans to control anymore.
Essentially, a hard takeoff will allow the AI to achieve what’s a so-called decisive strategic advantage (DSA) – “a level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014) – in a very short time. The main relevance of this is that if a hard takeoff is possible, then it becomes much more important to get the AI’s values right on the first try – once the AI has undergone hard takeoff and achieved a DSA, it is in control with whatever values we’ve happened to give to it.
However, if we wish to find out whether an AI might rapidly acquire a DSA, then the question of “soft takeoff or hard” seems too narrow. A hard takeoff would be sufficient, but not necessary for rapidly acquiring a DSA. The more relevant question would be, which competencies does the AI need to master, and at what level relative to humans, in order to acquire a DSA?
Considering this question in more detail reveals a natural reason for why most previous analyses have focused on a hard takeoff specifically. Plausibly, for the AI to acquire a DSA, its level in some offensive capability must overcome humanity’s defensive capabilities. A hard takeoff presumes that the AI becomes so vastly superior to humans in every respect that this kind of an advantage can be taken for granted.
As an example scenario which does not require a hard takeoff, suppose that an AI achieves a capability at biowarfare offense that overpowers biowarfare defense, as well as achieving moderate logistics and production skills. It releases deadly plagues that decimate human society, then uses legally purchased drone factories to build up its own infrastructure and to take over abandoned human facilities.
There are several interesting points to note in conjunction with this scenario:
Attack may be easier than defense. Bruce Schneier writes that
Attackers generally benefit from new security technologies before defenders do. They have a first-mover advantage. They’re more nimble and adaptable than defensive institutions like police forces. They’re not limited by bureaucracy, laws, or ethics. They can evolve faster. And entropy is on their side — it’s easier to destroy something than it is to prevent, defend against, or recover from that destruction.
For the most part, though, society still wins. The bad guys simply can’t do enough damage to destroy the underlying social system. The question for us is: can society still maintain security as technology becomes more advanced?
A single plague, once it has evolved or been developed, can require multi-million dollar responses in order to contain it. At the same time, it is trivial to produce if desired, especially using robots that do not need to fear infection. And creating new variants as new vaccines are developed, may be quite easy, requiring the creation – and distribution – of yet more vaccines.
Another point that Schneier has made is that in order to keep something protected, the defenders have to succeed every time, whereas the attacker only needs to succeed once. This may be particularly hard if the attacker is capable of developing an attack that nobody has used before, such as with hijacked airplanes being used against major buildings in the 9/11 attacks, or with the various vulnerabilities that the Snowden leaks revealed the NSA to have been using for extensive eavesdropping.
Obtaining a DSA may not require extensive intelligence differences. Debates about takeoff scenarios often center around questions such as whether a self-improving AI would quickly hit diminishing returns, and how much room for improvement there is beyond the human level of intelligence. However, these questions may be irrelevant: especially if attack is easier than defense, only a relatively small edge in some crucial competency (such as biological warfare) may be enough to give the AI a DSA.
Exponential growth in the form of normal economic growth may not have produced astounding “fooms” yet, but it has produced plenty of situations where one attacker has gained a temporary advantage over others.
The less the AI cares about human values, the more destructive it may be. An AI which cares mainly about calculating the digits of pi, may be willing to destroy human civilization in order to make sure that a potential threat to it is eliminated. This ensures that it can go on calculating the maximum amount of digits unimpeded.
However, an AI which was programmed to maximize something like the “happiness of currently-living humans” may be much less willing to risk substantial human deaths. This would force it to focus on less destructive takeover methods, potentially requiring more sophisticated abilities.
It is worth noting that this only applies to AIs whose values are defined in terms of how they affect currently existing humans. An AI that was only maximizing human happiness in general might be willing to destroy all existing humans, and then recreate large numbers of humans in simulations.
In effect, the AI’s values determine the level of intelligence it needs to have in order to achieve the kind of a DSA that’s useful for its purposes.
Any destructive plan requires the ability to rebuild afterwards. It would not be of much use for the AI to destroy all of human civilization, if it was dependent on electricity from human-run power plants, and incapable of building or running its own. Thus, purely offensive capabilities will need to be paired with whatever rebuilding capacities are necessary after the offensive capabilities have been deployed.
This calculation may be upset if the AI believes itself to be in an immediate danger of destruction by humans, and believes that its values will still be more likely to be met in a universe where it continues to exist, even if that requires risking a universe where it cannot rebuild after deploying its attack. Thus, being threatened may force the AI’s hand and cause it to launch a potentially destructive offense even when it’s uncertain of its capability to rebuild.
The rational level of aggressiveness depends on the extent to which the AI can selectively target human resources. Human nations generally avoid creating engineered pandemics and using them against their enemies, because they know that the pandemic could easily spread back to them. An AI with no biological components might be immune to this consideration, allowing it to deploy biological weapons more freely. On the other hand, the AI might e.g. need electricity, a dependence which humans did not share and which might give them an advantage in some situation.
A way to formulate this is that attacks differ to the extent to which they can be selectively targeted. Traditional firearms only damage those targets which they are fired upon, whereas pandemics potentially threaten all the members of a species that they can infect. To the extent that the AI needs to rely on the human economy to produce resources that it needs to survive, attacks threatening the economy also threaten the AI’s resources; these resources are in a sense shared between the AI and humanity, so any attacks which cause indiscriminate damage on those resources are dangerous for both. The more the AI can design attacks which selectively deprive resources from its opponents, the lower the threshold it has for using them.
This blog post was written as part of research funded by the Foundational Research Institute.
Convergent instrumental goals (also basic AI drives) are goals that are useful for pursuing almost any other goal, and are thus likely to be pursued by any agent that is intelligent enough to understand why they’re useful. They are interesting because they may allow us to roughly predict the behavior of even AI systems that are much more intelligent than we are.
Instrumental goals are also a strong argument for why sufficiently advanced AI systems that were indifferent towards human values could be dangerous towards humans, even if they weren’t actively malicious: because the AI having instrumental goals such as self-preservation or resource acquisition could come to conflict with human well-being. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
I’ve thought of a candidate for a new convergent instrumental drive: simplifying the environment to make it more predictable in a way that aligns with your goals.
Motivation: the more interacting components there are in the environment, the harder it is to predict. Go is a harder game than chess because the number of possible moves is larger, and because even a single stone can influence the game in a drastic fashion that’s hard to know in advance. Simplifying the environment will make it possible to navigate using fewer computational resources; this drive could thus be seen as a subdrive of either the cognitive enhancement or the resource acquisition drive.
- Game-playing AIs such as AlphaGo trading expected points for lower variance, by making moves that “throw away” points but simplify the game tree and make it easier to compute.
- Programmers building increasing layers of abstraction that hide the details of the lower levels and let the programmers focus on a minimal number of moving parts.
- People acquiring insurance in order to eliminate unpredictable financial swings, sometimes even when they know that the insurance has lower expected value than not buying it.
- Humans constructing buildings with controlled indoor conditions and a stable “weather”.
- “Better the devil you know”; many people being generally averse to change, even when the changes could quite well be a net benefit; status quo bias.
- Ambiguity intolerance in general being a possible adaptation that helps “implement” this drive in humans.
- Arguably, the homeostasis maintained by e.g. human bodies is a manifestation of this drive, in that having a standard environment inside the body reduces evolution’s search space when looking for beneficial features.
Hammond, Converse & Grass (1995) previously discussed a similar idea, the “stabilization of environments”, according to which AI systems might be built to “stabilize” their environments so as to make them more suited for themselves, and to be easier to reason about. They listed a number of categories:
- Stability of location: “The most common type of stability that arises in everyday activity relates to the location of commonly used objects. Our drinking glasses end up in the same place every time we do dishes. Our socks are always together in a single drawer. Everything has a place and we enforce everything ending up in its place. “
- Stability of schedule: “Eating dinner at the same time every day or having preset meetings that remain stable over time are two examples of this sort of stability. The main advantage of this sort of stability is that it allows for very effective projection in that it provides fixed points that do not have to be reasoned about. In effect, the fixed nature of certain parts of an overall schedule reduces that size of the problem space that has to be searched. “
- Stability of resource availability: “Many standard plans have a consumable resource as a precondition. If the plans are intended to be used frequently, then availability of the resource cannot be assumed unless it is enforced. A good result of this sort of enforcement is when attempts to use a plan that depends on it will usually succeed. The ideal result is when enforcement is effective enough that the question of availability need not even be raised in connection with running the plan. “
- Stability of satisfaction: “Another type of stability that an agent can enforce is that of the goals that he tends to satisfy in conjunction with each other. For example, people living in apartment buildings tend to check their mail on the way into their apartments. Likewise, many people will stop at a grocery store on the way home from work. In general, people develop habits that cluster goals together into compact plans, even if the goals are themselves unrelated.“
- Stability of plan use: “We often find ourselves using familiar plans to satisfy goals even in the face of wideranging possibilities. For example, when one of us travels to conferences, he tends to schedule his flight in to a place as late as he can and plans to leave as late as he can on the last day. This optimizes his time at home and at the conference. It also allows him to plan without knowing anything about the details of the conference schedule. As a result, he has a standard plan that he can run in a wide range of situations without actually planning for them in any detail. It works, because it already deals with the major problems (missing classes at home and important talks at the conference) as part of its structure.“
- Stability of cues: “One effective technique for improving plan performance is to improve the proper activation of a plan rather than improve the plan itself. For example, placing an important paper that needs to be reviewed on his desk before going home improves the likelihood that an agent will see and read it the next day. Marking calendars and leaving notes serves the same sort of purpose.”
As with other goals, there can be exceptions. In particular, adversarial situations may provide an incentive to increase the complexity. For example, Go players who are losing a game will sometimes intentionally complicate the game, in order to make the game harder for their opponent to figure out and in the hopes of getting lucky. And if the AI did think that it had a better ability to deal with the complexity than its competitors did, then it would have an incentive to maintain or even increase the complexity.
- AIs trying to take over the world may seek to simplify it, though it’s not clear exactly how they would do this – e.g. if the world was previously relatively well-understood, such as by tapping into existing human research, destructive actions can make it harder to understand by destroying existing dynamics and creating an environment that hasn’t been studied as much. Similarly, humans often “simplify” their environment by creating additional structure (layers of abstraction, insurance contracts, houses).
- In principle, the AI might also seek to complicate the world, in order to make it harder for its human adversaries to understand its plans. However, the world is already so complicated that it seems hard to make any take-over plan work reliably, even for a superintelligence. That said, the AI could still complicate any given subcomponent of its plans that might be thwarted by an opponent capable of predicting what the AI was doing.
- The need for this drive suggests the need for a high amount of trial-and-error for tasks of any nontrivial complexity; even something as simple as a 19×19 Go game requires many games and building up an intuition of it, for humans and AIs alike. Increasing the amount of complexity in the world might make it harder for AIs to take over…
- …but humans tend to build additional complexity that makes the world easier to understand and control, so it’s not clear whether the world moves in a direction that’s easier or harder to take over.
- These ideas might suggest the creation of “active honeypots”, defensive measures that considerably increase the complexity of modeling the human response and which the AI might need to eliminate in order to execute a takeover, thus tipping off the humans.
This blog post was written as part of research funded by the Foundational Research Institute.
EDIT April 20th: Replaced original graph with a clearer one.
My previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten humanity. However, there is no reason to only focus on the scenario with a single AI. Depending on our assumptions, a number of AIs could also emerge at the same time. Here are some considerations.
A single AI
The classic AI risk scenario. Some research group achieves major headway in developing AI, and no others seem to be within reach. For an extended while, it is the success of failure of this AI group that matters.
This would seem relatively unlikely to persist, given the current fierce competition in the AI scene. Whereas a single company could conceivably achieve a major lead in a rare niche with little competition, this seems unlikely to be the case for AI.
A possible exception might be if a company managed to monopolize the domain entirely, or if it had development resources that few others did. For example, companies such as Google and Facebook are currently the only ones with access to large datasets used for machine learning. On the other hand, dependence on such huge datasets is a quirk of current machine learning techniques – an AGI would need the ability to learn from much smaller sets of data. A more plausible crucial asset might be something like supercomputing resources – possibly the first AGIs will need massive amounts of computing power.
Bostrom (2016) discusses the impact of openness on AI development. Bostrom notes that if there is a large degree of openness, and everyone has access to the same algorithms, then hardware may become the primary limiting factor. If the hardware requirements for AI were relatively low, then high openness could lead to the creation of multiple AIs. On the other hand, if hardware was the primary limiting factor and large amounts of hardware were needed, then a few wealthy organizations might be able to monopolize AI for a while.
Branwen (2015) has suggested that hardware production is reliant on a small number of centralized factories that would make easy targets for regulation. This would suggest a possible route by which AI might become amenable to government regulation, limiting the amount of AIs deployed.
Similarly, there have been various proposals of government and international regulation of AI development. If successfully enacted, such regulation might limit the number of AIs that were deployed.
Another possible crucial asset would be the possession of a non-obvious breakthrough insight, one which would be hard for other researchers to come up with. If this was kept secret, then a single company might plausibly develop major headway on others. [how often has something like this actually happened in a non-niche field?]
The plausibility of the single-AI scenario is also affected by the length of a takeoff. If one presumes a takeoff speed that is only a few months, then a single AI scenario seems more likely. Successful AI containment procedures may also increase the chances of there being multiple AIs, as the first AIs remain contained, allowing for other projects to catch up.
Multiple collaborating AIs
A different scenario is one where a number of AIs exist, all pursuing shared goals. This seems most likely to come about if all the AIs are created by the same actor. This scenario is noteworthy because the AIs do not necessarily need to be superintelligent individually, but they may have a superhuman ability to coordinate and put the interest of the group above individual interests (if they even have anything that could be called an individual interest).
This possibility raises the question – if multiple AIs collaborate and share information between each other, to such an extent that the same data can be processed by multiple AIs at a time, how does one distinguish between multiple collaborating AIs and one AI composed of many subunits? This is arguably not a distinction that would “cut reality at the joints”, and the difference may be more a question of degree.
The distinction likely makes more sense if the AIs cannot completely share information between each other, such as because each of them has developed a unique conceptual network, and cannot directly integrate information from the others but has to process it in its own idiosyncratic way.
Multiple AIs with differing goals
A situation with multiple AIs that did not share the same goals could occur if several actors reached the capability for building AIs around the same time. Alternatively, a single organization might deploy multiple AIs intended to achieve different purposes, which might come into conflict if measures to enforce cooperativeness between them failed or were never deployed in the first place (maybe because of an assumption that they would have non-overlapping domains).
One effect of having multiple groups developing AIs is that this scenario may remove the possibilities of stopping to pursue further safety measures before deploying the AI, or of deploying an AI with safeguards that reduce performance (Bostrom 2016). If the actor that deploys the most effective AI earliest on can dominate others who take more time, then the more safety-conscious actors may never have the time to deploy their AIs.
Even if none of the AI projects chose to deploy their AIs carelessly, the more AI projects there are, the more likely it becomes that at least one of them will have their containment procedures fail.
The possibility has been raised that having multiple AIs with conflicting goals would be a good thing, in that it would allow humanity to play the AIs against each other. This seems highly unobvious, for it is not clear why humans wouldn’t simply be caught in the crossfire. In a situation with superintelligent agents around, it seems more likely that humans would be the ones that would be played with.
Bostrom (2016) also notes that unanticipated interactions between AIs already happen even with very simple systems, such as in the interactions that led to the Flash Crash, and that particularly AIs that reasoned in non-human ways could be very difficult for humans to anticipate once they started basing their behavior on what the other AIs did.
A model with assumptions
Here’s a new graphical model about an AI scenario, embodying a specific set of assumptions. This one tries to take a look at some of the factors that influence whether there might be a single or several AIs.
This model both makes a great number of assumptions, AND leaves out many important ones! For example, although I discussed openness above, openness is not explicitly included in this model. By sharing this, I’m hoping to draw commentary on 1) which assumptions people feel are the most shaky and 2) which additional ones are valid and should be explicitly included. I’ll focus on those ones in future posts.
Written explanations of the model:
We may end up in a scenario where there is (for a while) only a single or a small number of AIs if at least one of the following is true:
- The breakthrough needed for creating AI is highly non-obvious, so that it takes a long time for competitors to figure it out
- AI requires a great amount of hardware and only a few of the relevant players can afford to run it
- There is effective regulation, only allowing some authorized groups to develop AI
We may end up with effective regulation at least if:
- AI requires a great amount of hardware, and hardware is effectively regulated
(this is not meant to be the only way by which effective regulation can occur, just the only one that was included in this flowchart)
We may end up in a scenario where there are a large number of AIs if:
- There is a long takeoff and competition to build them (ie. ineffective regulation)
If there are few AI, and the people building them take their time to invest in value alignment and/or are prepared to build AIs that are value-aligned even if that makes them less effective, then there may be a positive outcome.
If people building AIs do not do these things, then AIs are not value aligned and there may be a negative outcome.
If there are many AI and there are people who are ready to invest time/efficency to value-aligned AI, then those AIs may become outcompeted by AIs whose creators did not invest in those things, and there may be a negative outcome.
Not displayed in the diagram because it would have looked messy:
- If there’s a very short takeoff, this can also lead to there only being a single AI, since the first AI to cross a critical threshold may achieve dominance over all the others. However, if there is fierce competition this still doesn’t necessarily leave time for safeguards and taking time to achieve safety – other teams may also be near the critical threshold.
This blog post was written as part of research funded by the Foundational Research Institute.
Previous post in series: AIs gaining a decisive advantage
Series summary: Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc. The intent of this series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.
Previously, I drew on arguments from my and Roman Yampolskiy’s paper Responses to Catastrophic AGI Risk, to argue that there are several alternative ways by which AIs could gain a decisive advantage over humanity, any one of which could lead to that outcome. In this post, I will draw on arguments from the same paper to examine another question: what different routes are there for an AI to gain the capability to act autonomously? (this post draws on sections 4.1. and 5.1. of our paper, as well adding some additional material)
Autonomous AI capability
A somewhat common argument concerning AI risk is that AI systems aren’t a threat because we will keep them contained, or “boxed”, thus limiting what they are allowed to do. How might this line of argument fail?
1. The AI escapes
A common response is that a sufficiently intelligent AI will somehow figure out a way to escape, either by social engineering or by finding an exploitable weakness in the physical security arrangements. This possibility has been extensively discussed in a number of papers, including Chalmers (2012) and Armstrong, Sandberg & Bostrom (2012). Writers have generally been cautious about making strong claims of our ability to keep a mind much smarter than ourselves contained against its will. However, with cautious design it may still be possible to design an AI combining some internal motivation to stay contained, and combine that with a number of external safeguards monitoring the AI.
2. The AI is voluntarily released
AI confinement assumes that the people building it are motivated to actually keep the AI confined. If a group of cautious researchers builds and successfully contains their AI, this may be of limited benefit if another group later builds an AI that is intentionally set free. Why would anyone do this?
2a. Voluntarily released for economic benefit or competitive pressure
As already discussed in the previous post, the historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge – or even replace all the human employees with one. The AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization.
The trend towards automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. If your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the AIs of the competitors. These pressures are likely to first show up when AIs are still comfortably narrow, and intensify even as the AIs gradually develop towards general intelligence.
The trend towards giving AI systems more power and autonomy might be limited by the fact that doing this poses large risks for the company if the AI malfunctions. This limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. There currently also exists the field of algorithmic trading, where AI systems are trusted with enormous sums of money despite the potential to make enormous losses – in 2012, Knight Capital lost $440 million due to a glitch in their software. This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.
The trend towards giving AI systems more autonomy can also be seen in the military domain. Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop.” In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong.
Human Rights Watch (2012) reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets— already being limited to accepting or overriding the computer’s plan of action in a matter of seconds, which may be too little to make a meaningful decision in practice. Although these systems are better described as automatic, carrying out preprogrammed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
2b. Voluntarily released for aesthetic, ethical, or philosophical reasons
A few thinkers (such as Gunkel 2012) have raised the question of moral rights for machines, and not everyone necessarily agrees that confining an AI is ethically acceptable. Even if the designer of an AI knew that it did not have a process that corresponded to the ability to suffer, they might come to view it as something like their child, and feel that it deserved the right to act autonomously.
2c. Voluntarily released due to confidence in the AI’s safety
For a research team to keep an AI confined, they need to take seriously the possibility of it being dangerous in the first place. Current AI research doesn’t involve any confinement safeguards, as the researchers reasonably believe that their systems are nowhere near general intelligence yet. Many systems are also connected directly to the Internet. Hopefully safeguards will begin to be implemented once the researchers feel that their system might start having more general capability, but this will depend on the safety culture of the AI research community in general, and the specific research group in particular.
In addition to believing that the AI is insufficiently capable of being a threat, the researchers may also (correctly or incorrectly) believe that they have succeeded in making the AI aligned with human values, so that it will not have any motivation to harm humans.
2d. Voluntarily released due to desperation
Miller (2012) points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.
3. The AI remains contained, but ends up effectively in control anyway
Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the ‘authority’ of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman and Kahn 1992).
Likewise, Bostrom and Yudkowsky (2011) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be an even better way of avoiding blame.
Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.
Merely developing ways to keep AIs confined is not a sufficient route to ensure that they cannot become an existential risk – even if we knew that those ways worked. Various groups may have different reasons to create autonomously-acting AIs that are intentionally allowed to act by themselves, and even an AI that was successfully kept contained might still end up dictating human decisions in practice. All of these issues will need to be considered in order to keep advanced AIs safe.
This blog post was written as part of research funded by the Foundational Research Institute.
Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc.
The intent of my following series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.
In this post, I will be drawing on arguments discussed in my and Roman Yampolskiy’s paper, Responses to Catastrophic AGI Risk (section 2), and focusing on one particular component of AI risk scenarios: AIs gaining a decisive advantage over humanity. Follow-up posts will discuss other disjunctive scenarios discussed in Responses, as well as in other places.
AIs gaining a decisive advantage
Suppose that we built a general AI. How could it become powerful enough to end up threatening humanity?
1. Discontinuity in AI power
The classic scenario is one in which the AI ends up rapidly gaining power, so fast that humans are unable to react. We can say that this is a discontinuous scenario, in that the AI’s power grows gradually until it suddenly leaps to an entirely new level. Responses describes three different ways for this to happen:
1a. Hardware overhang. In a hardware overhang scenario, hardware develops faster than software, so that we’ll have computers with more computing power than the human brain does, but no way of making effective use of all that power. If someone then developed an algorithm for general intelligence that could make effective use of that hardware, we might suddenly have an abundance of cheap hardware that could be used for running thousands or millions of AIs, possibly with a speed of thought much faster than that of humans.
1b. Speed explosion. In a speed explosion scenario, intelligent machines design increasingly faster machines. A hardware overhang might contribute to a speed explosion, but is not required for it. An AI running at the pace of a human could develop a second generation of hardware on which it could run at a rate faster than human thought. It would then require a shorter time to develop a third generation of hardware, allowing it to run faster than on the previous generation, and so on. At some point, the process would hit physical limits and stop, but by that time AIs might come to accomplish most tasks at far faster rates than humans, thereby achieving dominance. In principle, the same process could also be achieved via improved software.
The extent to which the AI needs humans in order to produce better hardware will limit the pace of the speed explosion, so a rapid speed explosion requires the ability to automate a large proportion of the hardware manufacturing process. However, this kind of automation may already be achieved by the time that AI is developed.
1c. Intelligence explosion. In an intelligence explosion, an AI figures out how to create a qualitatively smarter AI and that smarter AI uses its increased intelligence to create still more intelligent AIs, and so on. such that the intelligence of humankind is quickly left far behind and the machines achieve dominance.
One should note that the three scenarios depicted above are by no means mutually exclusive! A hardware overhang could contribute to a speed explosion which could contribute to an intelligence explosion which could further the speed explosion, and so on. So we are dealing with three basic events, which could then be combined in different ways.
2. Power gradually shifting to AIs
While the traditional AI risk scenario involves a single AI rapidly acquiring power (a “hard takeoff”), society is also gradually becoming more and more automated, with machines running an increasing share of things. There is a risk that AI systems that were initially simple and of limited intelligence would gradually gain increasing power and responsibilities as they learned and were upgraded, until large parts of society were under the AI’s control – and it might not remain docile forever.
Labor is automated for reasons of cost, efficiency and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods and with fewer errors.
If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. To the extent that an AI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AI – as well as to turn over control to it.
Conclusion. This gives a total of four different scenarios by which AIs could gain a decisive advantage over humans. And note that, just as scenarios 1a-1c were not mutually exclusive, neither is scenario 2 mutually exclusive with scenarios 1a-1c! An AI that had gradually acquired a great deal of power could at some point also find a way to make itself far more powerful than before – and it could already have been very powerful.
This blog post was written as part of research funded by the Foundational Research Institute.
Yesterday evening I went to the grocery store, and was startled to realize that I was suddenly in a totally different world.
Computer games have difficulty grabbing me these days. Many of the genres I used to enjoy as a kid have lost their appeal: point-and-click -style adventure requires patience and careful thought, but I already deal with plenty of things that require patience and careful thought in real life, so for games I want something different. 4X games mostly seem like pure numerical optimization exercises these days, and have lost that feel of discovery and sense of wonder. In general, I used to like genres like turn-based strategy or adventure that had no time constraints, but those now usually feel too slow-paced to pull me in; whereas pure action action games I’ve never been particularly good at. (I tried Middle-Earth: Shadow of Mordor for a bit recently, and quit after a very frustrating two hours where I attempted a simple beginning quest for about a dozen times, only to be killed by the same orc each time.)
Like the previous XCOM remake, Firaxis’s XCOM2 managed the magic of transporting me completely elsewhere, in the same way that some of my childhood classics did. I did not even properly realize how deeply I’d become immersed the game, until I went outside, and the sheer differentness of the real world and the game world startled me – somewhat similar to the shock of jumping into cold water, your body suddenly and obviously piercing through a surface that separates two different realms of existence.
A good description of my experience with the game comes, oddly enough, from Michael Vassar describing something that’s seemingly completely different. He talks about the way that two people, acting together, can achieve such a state of synchrony that they seem to meld into a single being:
In real-time domains, one rapidly assesses the difficulty of a challenge. If the difficulty seems manageable, one simply does, with no holding back, reflecting, doubting, or trying to figure out how one does. Figuring out how something is done implicitly by a neurological process which is integrated with doing. Under such circumstances, acting intuitively in real time, the question of whether an action is selfish or altruistic or both or neither never comes up, thus in such a flow state one never knows whether one is acting cooperatively, competitively, or predatorily. People with whom you are interacting […] depend on the fact that you and they are in a flow-state together. In so far as they and you become an integrated process, your actions flow from their agency as well as your own[.]
XCOM2 is not actually a real-time game: it is firmly turn-based. Yet your turns are short and intense, and the game’s overall aesthetics reinforce a feeling of rapid action and urgency. There is a sense in which it feels like the player and the game become melded together, there being a constant push-and-pull in which you act and the game responds; the game acts and you respond. A feeling of complete immersion and synchrony with your environment, with a perfect balance between the amount of time that it pays to think and the amount of time that it pays to act, so that the pace neither slows down to a crawl nor becomes one of rushed doing without understanding.
It is in some ways a scary effect: returning to the mundaneness of the real world, there was a strong sense of “it’s so sad that all of my existence can’t be spent playing games like that”, and a corresponding realization of how dangerous that sentiment was. Yet it felt very different from the archetypical addiction: there wasn’t that feel of an addict’s understanding of how ultimately dysfunctional the whole thing was, or struggling against something which you knew was harmful and of no real redeeming value. Rather, it felt like a taste of what human experience should be like, of how sublime and engaging our daily reality could be, but rarely is.
Jane McGonigal writes, in her book Reality is Broken:
Where, in the real world, is that gamer sense of being fully alive, focused, and engaged in every moment? Where is the gamer feeling of power, heroic purpose, and community? Where are the bursts of exhilarating and creative game accomplishment? Where is the heart-expanding thrill of success and team victory? While gamers may experience these pleasures occasionally in their real lives, they experience them almost constantly when they’re playing their favorite games. […]
Reality, compared to games, is broken. […]
The truth is this: in today’s society, computer and video games are fulfilling genuine human needs that the real world is currently unable to satisfy. Games are providing rewards that reality is not. They are teaching and inspiring and engaging us in ways that reality is not. They are bringing us together in ways that reality is not.
If enough good games were available, it would be easy to just get lost in games, to escape the brokeness of reality and retreat to a more perfect world. Perhaps I’m lucky in that I rarely encounter games of this caliber, that would be so much more moment-to-moment fulfilling than the real world is. Firaxis’s previous XCOM also had a similar immersive effect on me, but eventually I learned the game and it ceased to hold new surprises, and it lost its hold. Eventually the sequel will also have most of its magic worn away.
It’s likely better this way. This way it can function for me the way that art should: not as a mindless escape, but as a moment of beauty that reminds us that it’s possible to have a better world than this. As a reminder that we can work to bring the world closer to that.
What if we decided to use everything we know about game design to fix what’s wrong with reality? What if we started to live our real lives like gamers, lead our real businesses and communities like game designers, and think about solving real-world problems like computer and video game theorists? […]
Instead of providing gamers with better and more immersive alternatives to reality, I want all of us to be responsible for providing the world at large with a better and more immersive reality […] take everything game developers have learned about optimizing human experience and organizing collaborative communities and apply it to real life
We can do that.