xuenay | Entries tagged with disjunctive ai risk

EDIT April 20th: Replaced original graph with a clearer one.

My previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten humanity. However, there is no reason to only focus on the scenario with a single AI. Depending on our assumptions, a number of AIs could also emerge at the same time. Here are some considerations.

A single AI

The classic AI risk scenario. Some research group achieves major headway in developing AI, and no others seem to be within reach. For an extended while, it is the success of failure of this AI group that matters.

This would seem relatively unlikely to persist, given the current fierce competition in the AI scene. Whereas a single company could conceivably achieve a major lead in a rare niche with little competition, this seems unlikely to be the case for AI.

A possible exception might be if a company managed to monopolize the domain entirely, or if it had development resources that few others did. For example, companies such as Google and Facebook are currently the only ones with access to large datasets used for machine learning. On the other hand, dependence on such huge datasets is a quirk of current machine learning techniques – an AGI would need the ability to learn from much smaller sets of data. A more plausible crucial asset might be something like supercomputing resources – possibly the first AGIs will need massive amounts of computing power.

Bostrom (2016) discusses the impact of openness on AI development. Bostrom notes that if there is a large degree of openness, and everyone has access to the same algorithms, then hardware may become the primary limiting factor. If the hardware requirements for AI were relatively low, then high openness could lead to the creation of multiple AIs. On the other hand, if hardware was the primary limiting factor and large amounts of hardware were needed, then a few wealthy organizations might be able to monopolize AI for a while.

Branwen (2015) has suggested that hardware production is reliant on a small number of centralized factories that would make easy targets for regulation. This would suggest a possible route by which AI might become amenable to government regulation, limiting the amount of AIs deployed.

Similarly, there have been various proposals of government and international regulation of AI development. If successfully enacted, such regulation might limit the number of AIs that were deployed.

Another possible crucial asset would be the possession of a non-obvious breakthrough insight, one which would be hard for other researchers to come up with. If this was kept secret, then a single company might plausibly develop major headway on others. [how often has something like this actually happened in a non-niche field?]

The plausibility of the single-AI scenario is also affected by the length of a takeoff. If one presumes a takeoff speed that is only a few months, then a single AI scenario seems more likely. Successful AI containment procedures may also increase the chances of there being multiple AIs, as the first AIs remain contained, allowing for other projects to catch up.

Multiple collaborating AIs

A different scenario is one where a number of AIs exist, all pursuing shared goals. This seems most likely to come about if all the AIs are created by the same actor. This scenario is noteworthy because the AIs do not necessarily need to be superintelligent individually, but they may have a superhuman ability to coordinate and put the interest of the group above individual interests (if they even have anything that could be called an individual interest).

This possibility raises the question – if multiple AIs collaborate and share information between each other, to such an extent that the same data can be processed by multiple AIs at a time, how does one distinguish between multiple collaborating AIs and one AI composed of many subunits? This is arguably not a distinction that would “cut reality at the joints”, and the difference may be more a question of degree.

The distinction likely makes more sense if the AIs cannot completely share information between each other, such as because each of them has developed a unique conceptual network, and cannot directly integrate information from the others but has to process it in its own idiosyncratic way.

Multiple AIs with differing goals

A situation with multiple AIs that did not share the same goals could occur if several actors reached the capability for building AIs around the same time. Alternatively, a single organization might deploy multiple AIs intended to achieve different purposes, which might come into conflict if measures to enforce cooperativeness between them failed or were never deployed in the first place (maybe because of an assumption that they would have non-overlapping domains).

One effect of having multiple groups developing AIs is that this scenario may remove the possibilities of stopping to pursue further safety measures before deploying the AI, or of deploying an AI with safeguards that reduce performance (Bostrom 2016). If the actor that deploys the most effective AI earliest on can dominate others who take more time, then the more safety-conscious actors may never have the time to deploy their AIs.

Even if none of the AI projects chose to deploy their AIs carelessly, the more AI projects there are, the more likely it becomes that at least one of them will have their containment procedures fail.

The possibility has been raised that having multiple AIs with conflicting goals would be a good thing, in that it would allow humanity to play the AIs against each other. This seems highly unobvious, for it is not clear why humans wouldn’t simply be caught in the crossfire. In a situation with superintelligent agents around, it seems more likely that humans would be the ones that would be played with.

Bostrom (2016) also notes that unanticipated interactions between AIs already happen even with very simple systems, such as in the interactions that led to the Flash Crash, and that particularly AIs that reasoned in non-human ways could be very difficult for humans to anticipate once they started basing their behavior on what the other AIs did.

A model with assumptions

Here’s a new graphical model about an AI scenario, embodying a specific set of assumptions. This one tries to take a look at some of the factors that influence whether there might be a single or several AIs.

This model both makes a great number of assumptions, AND leaves out many important ones! For example, although I discussed openness above, openness is not explicitly included in this model. By sharing this, I’m hoping to draw commentary on 1) which assumptions people feel are the most shaky and 2) which additional ones are valid and should be explicitly included. I’ll focus on those ones in future posts.

Written explanations of the model:

We may end up in a scenario where there is (for a while) only a single or a small number of AIs if at least one of the following is true:

The breakthrough needed for creating AI is highly non-obvious, so that it takes a long time for competitors to figure it out
AI requires a great amount of hardware and only a few of the relevant players can afford to run it
There is effective regulation, only allowing some authorized groups to develop AI

We may end up with effective regulation at least if:

AI requires a great amount of hardware, and hardware is effectively regulated

(this is not meant to be the only way by which effective regulation can occur, just the only one that was included in this flowchart)

We may end up in a scenario where there are a large number of AIs if:

There is a long takeoff and competition to build them (ie. ineffective regulation)

If there are few AI, and the people building them take their time to invest in value alignment and/or are prepared to build AIs that are value-aligned even if that makes them less effective, then there may be a positive outcome.

If people building AIs do not do these things, then AIs are not value aligned and there may be a negative outcome.

If there are many AI and there are people who are ready to invest time/efficency to value-aligned AI, then those AIs may become outcompeted by AIs whose creators did not invest in those things, and there may be a negative outcome.

Not displayed in the diagram because it would have looked messy:

If there’s a very short takeoff, this can also lead to there only being a single AI, since the first AI to cross a critical threshold may achieve dominance over all the others. However, if there is fierce competition this still doesn’t necessarily leave time for safeguards and taking time to achieve safety – other teams may also be near the critical threshold.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

Previous post in series: AIs gaining a decisive advantage

Series summary: Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc. The intent of this series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.

Previously, I drew on arguments from my and Roman Yampolskiy’s paper Responses to Catastrophic AGI Risk, to argue that there are several alternative ways by which AIs could gain a decisive advantage over humanity, any one of which could lead to that outcome. In this post, I will draw on arguments from the same paper to examine another question: what different routes are there for an AI to gain the capability to act autonomously? (this post draws on sections 4.1. and 5.1. of our paper, as well adding some additional material)

Autonomous AI capability

A somewhat common argument concerning AI risk is that AI systems aren’t a threat because we will keep them contained, or “boxed”, thus limiting what they are allowed to do. How might this line of argument fail?

1. The AI escapes

A common response is that a sufficiently intelligent AI will somehow figure out a way to escape, either by social engineering or by finding an exploitable weakness in the physical security arrangements. This possibility has been extensively discussed in a number of papers, including Chalmers (2012) and Armstrong, Sandberg & Bostrom (2012). Writers have generally been cautious about making strong claims of our ability to keep a mind much smarter than ourselves contained against its will. However, with cautious design it may still be possible to design an AI combining some internal motivation to stay contained, and combine that with a number of external safeguards monitoring the AI.

2. The AI is voluntarily released

AI confinement assumes that the people building it are motivated to actually keep the AI confined. If a group of cautious researchers builds and successfully contains their AI, this may be of limited benefit if another group later builds an AI that is intentionally set free. Why would anyone do this?

2a. Voluntarily released for economic benefit or competitive pressure

As already discussed in the previous post, the historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge – or even replace all the human employees with one. The AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization.

The trend towards automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. If your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the AIs of the competitors. These pressures are likely to first show up when AIs are still comfortably narrow, and intensify even as the AIs gradually develop towards general intelligence.

The trend towards giving AI systems more power and autonomy might be limited by the fact that doing this poses large risks for the company if the AI malfunctions. This limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. There currently also exists the field of algorithmic trading, where AI systems are trusted with enormous sums of money despite the potential to make enormous losses – in 2012, Knight Capital lost $440 million due to a glitch in their software. This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.

The trend towards giving AI systems more autonomy can also be seen in the military domain. Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop.” In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong.

Human Rights Watch (2012) reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets— already being limited to accepting or overriding the computer’s plan of action in a matter of seconds, which may be too little to make a meaningful decision in practice. Although these systems are better described as automatic, carrying out preprogrammed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.

2b. Voluntarily released for aesthetic, ethical, or philosophical reasons

A few thinkers (such as Gunkel 2012) have raised the question of moral rights for machines, and not everyone necessarily agrees that confining an AI is ethically acceptable. Even if the designer of an AI knew that it did not have a process that corresponded to the ability to suffer, they might come to view it as something like their child, and feel that it deserved the right to act autonomously.

2c. Voluntarily released due to confidence in the AI’s safety

For a research team to keep an AI confined, they need to take seriously the possibility of it being dangerous in the first place. Current AI research doesn’t involve any confinement safeguards, as the researchers reasonably believe that their systems are nowhere near general intelligence yet. Many systems are also connected directly to the Internet. Hopefully safeguards will begin to be implemented once the researchers feel that their system might start having more general capability, but this will depend on the safety culture of the AI research community in general, and the specific research group in particular.

In addition to believing that the AI is insufficiently capable of being a threat, the researchers may also (correctly or incorrectly) believe that they have succeeded in making the AI aligned with human values, so that it will not have any motivation to harm humans.

2d. Voluntarily released due to desperation

Miller (2012) points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.

3. The AI remains contained, but ends up effectively in control anyway

Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the ‘authority’ of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman and Kahn 1992).

Likewise, Bostrom and Yudkowsky (2011) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be an even better way of avoiding blame.

Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.

Conclusion.

Merely developing ways to keep AIs confined is not a sufficient route to ensure that they cannot become an existential risk – even if we knew that those ways worked. Various groups may have different reasons to create autonomously-acting AIs that are intentionally allowed to act by themselves, and even an AI that was successfully kept contained might still end up dictating human decisions in practice. All of these issues will need to be considered in order to keep advanced AIs safe.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc.

The intent of my following series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.

In this post, I will be drawing on arguments discussed in my and Roman Yampolskiy’s paper, Responses to Catastrophic AGI Risk (section 2), and focusing on one particular component of AI risk scenarios: AIs gaining a decisive advantage over humanity. Follow-up posts will discuss other disjunctive scenarios discussed in Responses, as well as in other places.

AIs gaining a decisive advantage

Suppose that we built a general AI. How could it become powerful enough to end up threatening humanity?

1. Discontinuity in AI power

The classic scenario is one in which the AI ends up rapidly gaining power, so fast that humans are unable to react. We can say that this is a discontinuous scenario, in that the AI’s power grows gradually until it suddenly leaps to an entirely new level. Responses describes three different ways for this to happen:

1a. Hardware overhang. In a hardware overhang scenario, hardware develops faster than software, so that we’ll have computers with more computing power than the human brain does, but no way of making effective use of all that power. If someone then developed an algorithm for general intelligence that could make effective use of that hardware, we might suddenly have an abundance of cheap hardware that could be used for running thousands or millions of AIs, possibly with a speed of thought much faster than that of humans.

1b. Speed explosion. In a speed explosion scenario, intelligent machines design increasingly faster machines. A hardware overhang might contribute to a speed explosion, but is not required for it. An AI running at the pace of a human could develop a second generation of hardware on which it could run at a rate faster than human thought. It would then require a shorter time to develop a third generation of hardware, allowing it to run faster than on the previous generation, and so on. At some point, the process would hit physical limits and stop, but by that time AIs might come to accomplish most tasks at far faster rates than humans, thereby achieving dominance. In principle, the same process could also be achieved via improved software.

The extent to which the AI needs humans in order to produce better hardware will limit the pace of the speed explosion, so a rapid speed explosion requires the ability to automate a large proportion of the hardware manufacturing process. However, this kind of automation may already be achieved by the time that AI is developed.

1c. Intelligence explosion. In an intelligence explosion, an AI figures out how to create a qualitatively smarter AI and that smarter AI uses its increased intelligence to create still more intelligent AIs, and so on. such that the intelligence of humankind is quickly left far behind and the machines achieve dominance.

One should note that the three scenarios depicted above are by no means mutually exclusive! A hardware overhang could contribute to a speed explosion which could contribute to an intelligence explosion which could further the speed explosion, and so on. So we are dealing with three basic events, which could then be combined in different ways.

2. Power gradually shifting to AIs

While the traditional AI risk scenario involves a single AI rapidly acquiring power (a “hard takeoff”), society is also gradually becoming more and more automated, with machines running an increasing share of things. There is a risk that AI systems that were initially simple and of limited intelligence would gradually gain increasing power and responsibilities as they learned and were upgraded, until large parts of society were under the AI’s control – and it might not remain docile forever.

Labor is automated for reasons of cost, efficiency and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods and with fewer errors.

If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. To the extent that an AI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AI – as well as to turn over control to it.

Conclusion. This gives a total of four different scenarios by which AIs could gain a decisive advantage over humans. And note that, just as scenarios 1a-1c were not mutually exclusive, neither is scenario 2 mutually exclusive with scenarios 1a-1c! An AI that had gradually acquired a great deal of power could at some point also find a way to make itself far more powerful than before – and it could already have been very powerful.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.