xuenay | Travel diary, days 10-11

Yesterday, I:

- Wrote a blog post ( http://lesswrong.com/lw/23r/the_concepts_problem/ )
- Again wrote some personal e-mails
- Got my intro speech from Justin

Today, I:

- Will either make a new LW post or research one

From today onwards, I will attempt to have a new LW post up on an average of once per two days. We'll see for how long it lasts.

Yesterday, I realized a cause for my insecurities here. I hadn't been given any guidelines of how much I should be achieving, so to make up for that, I was imposing myself requirements that were possibly way too strict. So during our daily ten-minute meeting, I asked Anna about it. She was reluctant to give a direct answer ("you should do the best you can - what use would it be to place a minimum requirement on you?"), so I reworded it as "well, the Singularity Institute did have some goal in mind when the Visiting Fellows program was instituted, right?" That got me a longer answer (disclaimer: I've probably forgotten some parts of it already). Part of the purpose is to simply bring together people with potential and interest in SIAI/existential risk and improve their skills so that they can benefit the organization's cause even after having left the program. On the other hand, it would also be good if we "did stuff". She specifically mentioned that some people thought Less Wrong was dying (a claim which surprised me, personally) and that it'd be good to get more quality posts up there, especially ones of a slightly more technical nature. Furthermore, we should try to look productive so as to inspire new visitors to be productive as well, plus to build a growth atmosphere in general.

Justin also explained to me his conception of what SIAI's long-term strategy should look like. Briefly, growth -> (global) intelligence enhancement -> uploads -> Friendly AI. Right now, the organization should concentrate on outreach and teaching activities and seek to grow, then attempting to leverage its size and resources for a general raising of the sanity waterline as well as for global intelligence enhancement. Eventually, we should get the technology for uploads and for uploading FAI programmers, who could then hopefully build FAI. That's a rather ambitious plan, which I found myself mostly agreeing with. I do think that IA methods are sorely needed in general, and that a partially upload-mediated Singularity would be the safest one, if it's possible. Notably, it makes the operational assumption that real AI is still several decades away. That may or may not be true, but if somebody does have an almost finished AI project in their basement, there probably isn't very much we can do in any case. Justin's going to discuss his plans with more of the SIAI leadership.

People are optimistic about SIAI's growth prospects. Michael Vassar was here some days back, and he mentioned 40% yearly growth as a realistic goal. On the downside, the rapid growth SIAI has had so far has also left things in a somewhat chaotic state, without a cohesive large-scale strategy and different members of the organization being somewhat out of touch of what the others are doing. Justin is hoping to get that fixed.

We finally shared our mindmaps. The instructions for those had been rather loose, so everyone had rather different-looking ones. (Alicorn didn't have a mindmap at all, but rather a long outline and a list of questions.) Jasen's was probably the scariest. He discussed the threat of bioterrorism, and thought it possible that in some decades, synthetic biology might allow for the creation of diseases that no human's immune system can defend against. Furthermore, mixing together e.g. a disease that is very good at infecting people and a disease with a long incubation period might become possible and easy even before that. Until now, biowarfare has been relatively hard, but future breakthroughs could make it possible even for your basic graduate student to create such nightmare combinations. Also, there apparently are real death cults (or at least individuals out there), which doesn't exactly help me feel more safe.

I thought the presentations were good, though, and got a bunch of ideas for various things I could write about. For now, I've set myself the goal of just writing a lot here. We'll see how that goes.

Before I came here, I was feeling rather burnt out on my studies. I was figuring that I'd spend several months abroad, concentrate purely on whatever I'd be doing there and not think about my studies. Then I'd come back home, spend one month doing mostly nothing but relaxing, and then return to the studies filled with energy. Unfortunately, as good as that plan sounded, it doesn't seem to be working so far. I'm spending a lot of time worrying about the courses I should be taking after I get home, wondering which ones are the ones I should be taking, whether I should switch majors to CS after my bachelor's or stay in cognitive science, wondering whether it was mistake to come here and forgoing the chance to finish some more courses and to maybe net a summer job... meh.

Previously (like two paragraphs ago - this entry was composed over a period of several hours), I was thinking that I'd been spending most of my time here trying to get some academic writing done, in the hopes I could get enough publications together that I could pass them off as my Master's thesis in a year or two. But now I'm increasingly getting the feeling that I really don't want to do a Master's degree after getting the Bachelor's done. Unfortunately, the Master is the norm in Finland, so trying to get some kind of a job with just a Bachelor is going to be tricky. So maybe I should concentrate more on deepening my programming skills and maybe contributing to some open source project while here, to get something to show on a resume...

Flat | Top-Level Comments Only

From:

vnesov.livejournal.com

There should be a serious debate on the danger of value drift that will be caused by any modification of human mind.

From:

xuenay.livejournal.com

I suspect IA-caused value drift is going to be inevitable regardless of what we do (assuming humanity doesn't get entirely wiped out before that, of course). The technology is going to become available at some point, and huge amounts of people will be wanting it.

From:

vnesov.livejournal.com

Value drift is a form of being wiped out, a slow and non-obvious one. It should be seen as an existential threat, not a technology to be embraced (and I don't need to explain here how some advanced technologies offer similar seduction, wanted and liked, but potentially harmful beyond reason). Which preference do you think the IA-affected people will prefer to instantiate in the FAI? It might be in the interest of present humanity to avoid involving IA-affected people in FAI research, to the extent it will be possible.

That something is inevitable is not a moral argument for it being good, and I'm arguing specifically that it's not a good idea to use mind modification on the path to FAI.

From:

xuenay.livejournal.com

We experience value drift constantly, both as individuals and societies. The values of 2000s society are a lot different than those of 1700s society, and the values I have today are a lot different than the ones I had as a newborn. I fail to see this as a threat. Aside for a few core values (like not wanting to cause unnecessary suffering to anyone) which I find unlikely to change even if we did have IA, I have no particular interest in freezing my current set of values as permanent, any more than I have an interest in permanently freezing my set of memories and skills to their current state. (I would probably have chosen to merge with the baby-eaters and superhappies.)

Furthermore, if I experience value drift as a consequence of IA, that implies that my increased intelligence causes me to see inconsistencies in my previous values that I didn't see before. I would welcome that kind of value revision.

From:

vnesov.livejournal.com

> We experience value drift constantly, both as individuals and societies

Of course, but again, not a moral argument. From the point of view of given preference, any drift away from that preference is a bad thing.

> I fail to see this as a threat.

Humans are still humans, our preference is reset to more or less the same thing with each new generation having the same genetically determined construction. Culture has some influence, but given that preference is what you want in the limit of reflection, you'd probably be able to reinvent all the existing cultures twice over (metaphorically speaking), thus making the distinction between different environments in which different people happen to be brought up insignificant.

Changing the architecture of human mind is a whole new level of modification. Compare this with using an argument about humans of different IQ in a discussion of superintelligent AIs, or using an argument about religious zealots in a discussion about AGI values. It's just not the right order of variation to model the implications of the discussed order of variation on.

> I have no particular interest in freezing my current set of values as permanent, any more than I have an interest in permanently freezing my set of memories and skills to their current state

This statement shows that either you don't understand the idea of fixed preference (more likely), or that you are talking about humans as stuff that gets optimized, rather than agents that do the optimizing. Preference is *defined* as that which you won't want ever changed, because it talks about what the world should actually be, and there is only one world, which can't ever be changed (in the timeless sense). You should read my blog (go through the current sequence, ask questions, discuss with people at SIAI -- I expect agreement on the major issues).

> Furthermore, if I experience value drift as a consequence of IA, that implies that my increased intelligence causes me to see inconsistencies in my previous values that I didn't see before.

You might get better at implementing your values, but the values themselves can also change. You'll be better at implementing the changed values, not the original values. The changed values will be different for reasons other than getting more consistent, as you can't hold a magical property (see "magical categories" of LW) fixed while varying an object having that property, without rigorous idea of what that property is, exactly. You can't change some property of human mind without altering preference without knowing exactly what preference is. And we don't.

From:

xuenay.livejournal.com

Oh, I certainly understand the idea of fixed preference (or at least I think I do, and yes, I've read the current sequence in your blog). What I'm saying is that I don't have fixed preferences outside a very narrow set. I would certainly be very cautious about using any IA that would threaten to change the preferences falling into that set.

Though I feel this discussion is getting rather abstract. Of course we should consider the pros and cons of each individual IA technique as they come along. But I don't think saying "we shouldn't use IA because it might change some of our values" is very useful when we don't know what realistic IA techniques might actually be and how they work. Certainly none of the techniques that are currently available, or of the ones that will be available in say 15 years, will be in the category of being able to radically change our values.

From:

vnesov.livejournal.com

> Though I feel this discussion is getting rather abstract.

It's no more abstract than asserting that all AGIs, except very specific FAIs constructed with rigorous understanding of preference, are fatal.

> What I'm saying is that I don't *have* fixed preferences outside a very narrow set.

Translated to my terminology, this is still an assertion about your fixed preference (even Microsoft Word gets a fixed preference), namely that your preference involves a lot of indifference to detail. But why would it be this way? And how could you possibly know? We don't know our preference, we can only use it (rather inaptly). Even if preference significantly varies during one's life (varying judgment doesn't imply varying preference!), it's a statement independent of how it can be characterized at specific moments.

From:

xuenay.livejournal.com

Okay, I re-read your posts on preference to see if there was something to the definition that I missed, but I still don't see how your questions make sense.

I would understand the "how could you know your preference" question if it was used to in the context of "we're going to alter the world in way X, how can you know that you'll actually like it". In that case, it'd mean that the model I have about my preferences is incorrect, and if I actually experiences that world, I'd find I preferred the unaltered world. But that's not the question you're asking. You're asking "we're going to alter your preferences in way X, how can you know that you won't disapprove of being changed". If I have no reason to assume beforehand that I'd disapprove, then I don't disapprove beforehand, and afterwards I presumably won't disapprove either, because I will like having my new preferences.

From:

vnesov.livejournal.com

Kaj: I would understand the "how could you know your preference" question if it was used to in the context of "we're going to alter the world in way X, how can you know that you'll actually like it". In that case, it'd mean that the model I have about my preferences is incorrect, and if I actually experiences that world, I'd find I preferred the unaltered world.

You may find that you like/dislike the altered/unaltered world upon actually experiencing it, not that you prefer it. You may be unable to say whether you prefer something even when you see it (even if you can say that you like it). Of course, predicting in abstract is even harder, but personal experience doesn't give you a direct line with preference either. The "in the limit of reflection" requirement is a very strong one, one that can't be crossed by human experience, not on the more subtle questions anyway. Preference is about your "extrapolated volition", not your emotions. Only the most clear-cut questions about what you prefer can be answered now (which is one more reason why preference seems like something simple).

Kaj: You're asking "we're going to alter your preferences in way X, how can you know that you won't disapprove of being changed". If I have no reason to assume beforehand that I'd disapprove, then I don't disapprove beforehand, and afterwards I presumably won't disapprove either, because I will like having my new preferences.

That you "have to reason to assume beforehand" that change X is bad, means you are uncertain about what is the correct answer to the factual question of "whether change X is bad?". Confusion exists in the map, not in the territory, and blank moral estimate doesn't correspond to blank moral distinction. If you don't know whether change X is bad, this is a fact about your mind, not a fact about the reality of whether X is bad.

We have independent reasons to suspect that preference is very detailed (the "complexity of value" thesis, plus see what I wrote in "preference is general and precise"). Given that you can't observe your own preference, the hypothesis that it has the precise property of ignoring-details would take some kind of evidence I don't see.

(All of the above is not specific to changing agent's preference, and I think should be dealt with separately.)

When you consider changing one's preference, you should keep in mind that preference determines disambiguation of moral questions, and so when you have a moral question, you should know which preference is meant. The fact that a given agent changes its preference doesn't alter the questions. So, if I ask "Is X bad?", and then my preference changes in a way that X is clearly bad, this doesn't say anything about what the answer to the original question is, since that was the question about valuation of X according to my original preference.

Thus, of course evaluation of changing one's preference is meant from the point of view of the pre-change preference. But this question is rather easy to answer, on theoretical grounds (and such a theoretical argument has to beat any intuitive perception of like/dislike -- bite the bullet!). Changing one's preference means changing the optimization criteria for the world away from your current criteria for what the world should be. This means that changing one's preference results in a world that is worse than if you didn't change your preference. It's that simple (but requires believing the theory over the fuzzy intuitive perception, which has to be understood for what it is -- weak evidence).

From:

xuenay.livejournal.com

I think we're talking past each other somehow. I'll try moving the discussion to a meta-level in an attempt to figure out the reason before we go on.

The current impression I get is that you've a fancy theoretical model of preference, and now you're trying to use those theoretical grounds in an attempt to tell me what I should want. This doesn't seem to me much more plausible than saying "mathematicians have decided to define 1 + 1 as 2, therefore you should like cookies even though you claim you don't". It sounds far more likely that you're simply misapplying the model, and it prevents me from taking seriously anything you say. You can't just claim "this is how I've defined preference, and therefore I know better than you what you prefer".

From:

vnesov.livejournal.com

Preference is not about what you like (which experiences produce what impressions). Preference is about what to do with the world. The choice of what to do with the world may be based on what you like, or what experiences do you have in situations where world would be this or that way, but if the topic of discussion is "what to do with the world", this creates certain theoretical restrictions on what this thing is.

For example, it's impossible to do contradictory things with the world, to do A and also to do B, where A is mutually exclusive with B. Thus, you can't prefer A and also B, no matter what you feel about A and B. What is done with the world doesn't depend on when you consider the question of what to do with it. Thus, it's impossible for A to be preferable at 10AM, but for B to be preferable, according to the same preference, at 3PM. What is done with the world doesn't depend on who considers the question. Thus, it's impossible for A to be preferable when you think about it, but for B to be preferable when I think about it (assuming that we discuss the same preference), or when you think about it after having a brain surgery.

When you change your own preference, what changes is your valuation of things, but not the value of things according to your preference before the change. After the change, you are no longer moved by what you considered valuable before, and as a result you won't be moving the world in that direction.

The definition is not artificial, it just says "what to do with the world" rather than "what feels good". The latter is not what I discuss (and not what's relevant for FAI).

From:

xuenay.livejournal.com

Sorry for the (nearly one-month!) delay.

Let met try to start by summarizing the way I understand your definitions for "preference" and "like", and you can correct me if I got anything wrong.

The difference between a like and a preference is that a like is merely a description of how I react to things, while a preference is an ordering between possible worlds. I may like eating ice cream, but that only means that there are some situations in which doing so feels pleasurable. There is no unambiguous translation of likes to preferences. A liking for eating ice cream might suggest a preference for a world where there is ice cream, or where there exist people (including me) who have a chance to eat ice cream. Or it might not have any direct effect at all on the preference. I may merely prefer a world where there exist sensory experiences that are at least as enjoyable than eating ice cream, even if those experiences are entirely different in kind.

If so, I agree that this definition of preference sounds good and valuable in theory. But I'm confused of how you are managing the translation between preference-in-general and human preference. (Even after your brief discussion of this in "Preference of programs".) You say that since preference is an ordering of worlds, then if A and B are contradictory it's impossible to prefer both A and B. Given your definition, I agree; however, it doesn't seem to be obvious that humans are consistent enough to have just a single set of preferences that fits your criteria. A human may in fact have one preference at 3 PM and another at 10 AM. (As just one datapoint, I notice that my mood seems to have a major impact on whether I'm leaning more towards negative or positive utilitarianism.) Are you using a CEV-style "the preference that you most wanted to be taken into account" criteria, or something similar?

From:

vnesov.livejournal.com

What does it mean for a person to "have" a preference? It's certainly not going to be written explicitly somewhere. Let's just ask: Which of the formal preferences fits you best? You must decide on one of these things, because it's about how the world gets determined, and the world will be determined somehow anyway, so by not deciding to take a particular course of action, you are only implicitly choosing a less optimal path, determined by something other than what you (refused to) decide. Randomness hath no power.

I agree that any definition of preference that takes most of its data from a person snapshot is going to define (slightly) different things based on you at 3PM and at 10AM. But when it decides based on you at 3PM, it's still going to take into account you at 10AM, to the full extent you at 3PM prefer you at 10AM to be taken into account, even if that leads to satisfying some of the likes you at 3PM have to a lesser extent.

There are many ideas that move you, and what you want is only one of them. If something literally can't move you, it's won't succeed in taking over your preference, but among the things that do move you, there is a nontrivial balance. Maybe a moral principle will act directly apposite to an emotional drive in some situation, and win. Preference is the decision of the sum total of the things that move you.

From:

vnesov.livejournal.com

(To avoid ambiguity, the third paragraph should read: "and "what you want" is only one of them.")

From:

vnesov.livejournal.com

Another rendering of this argument I gave over at Less Wrong:
http://lesswrong.com/lw/29c/be_a_visiting_fellow_at_the_singularity_institute/21cy

From:

vnesov.livejournal.com

(Does the previous reply help? I don't want to cause the conversation to end by appearing to not hear about the problem in communication mode.)

From:

vnesov.livejournal.com

Also, as reminded about by Andreas, this thread is highly relevant to the present discussion:

http://lesswrong.com/lw/1s3/hedging_our_bets_the_case_for_pursuing_whole/1ozf

From:

xuenay.livejournal.com

(I think it does, somewhat. I'm still thinking it over; I think I'm getting a better grasp of it, but haven't had the time to compose a proper reply yet.)

From:

andreas-st.livejournal.com

To provide another data point: Given the dangers and uncertainties pointed out in the comments here (http://lesswrong.com/lw/1s3/hedging_our_bets_the_case_for_pursuing_whole/), I was shocked to read that the suggested strategy focus is not on resolving these uncertainties (with preference theory as a central part), but on a way of going ahead that assumes that these uncertainties will be resolved in one particular way.

Flat | Top-Level Comments Only

Profile

xuenay

Website

December 2018

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Page Summary

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Jul. 12th, 2025 05:41 am

A view to the gallery of my mind

Or, an exhibition of specific neuronal pathways

Travel diary, days 10-11

Travel diary, days 10-11

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re: intelligence augmentation

Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

Re: Re (2): intelligence augmentation

no subject

Profile

December 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags