(i) Design aim: parent needs to *certify* the child is safe before switching on child.

(ii) Design aim: preferably child is smarter.

(iii) Godel.

But these three are incompatible given the maths. (iii) is more or less fixed, as it’s a standard result in math logic. So, one has to give up, or get round (i) or (ii). I think your hope is to get round (ii) by making the child in a sense both smarter but weaker, but I think it’s hopeless … I think you should give up design aim (i).

There is a good way to make the child smarter: give the child the truth theory for the parent. (This would be the “reflection approach”, in line with Feferman’s work on transfinite progressions, reflective closure, etc.)

One oddity of all this is that the parent (well, PA) can talk about their own truth theory (which is merely talking about syntax), but the parent can’t *believe* this theory … For example, I (JK) know that the truth theory for my language contains axioms “JK says “p” if and only if p”. I can easily describe this. However, I cannot believe it. The addition of this axiom to my “belief box” makes me inconsistent (indeed I also know this…!).

For example, adding the simplest truth theory to PA gives a theory called DT (disquotational truth), which has axioms “A is true iff A”, for each sentence . In fact, DT is a conservative extension of PA, and in fact PA proves this. So,

Jeff

]]>– I meant the parent has to prove that *the act of turning on the child* is safe, not that *the parent is safe*;

– By “child” I didn’t mean “copy of parent” (which some of your comments seem to assume, at least implicitly), but “anything the parent wants to make, and delegate some authority and responsibility to, to help it do something” — so “agent” is probably a better word, and indeed the authors of TilingAgents.pdf used “agent” rather than (or at least more prominently than) “child”. So the thing I was calling “the child” might be arbitrarily different than the “parent” — those terms only make sense because of the “child” having been made by the “parent”. (Maybe this terminology makes more sense to programmers (like me), who use “parent” and “child” in all kinds of ways, e.g. for nodes in directed graphs, for computer programs when one initiates the running of another, etc, than it does to other people.)

So to clarify again, and answer your points: the parent can’t prove it’s safe, but we hope it *is* safe, and we try to prove that ourselves. But to prove that, we have to prove it only takes safe actions (including when it makes agents or “children”). If we specify a limited set of actions it can take (perhaps including “produce an agent of exact design X”), and prove all these actions are safe, we’re ok, but we don’t want to be so limited. So we want to let the parent take any action *it* can prove is safe, using a theory whose proofs *we* already trust. (Of course there is no ultimate justification for our trust in a specific proof theory — it’s really a matter of faith. But as long as we have that faith, we want to take advantage of it and are willing to trust it.)

And to make sure the parent is not overly limited in what kind of agents it can produce to delegate some powers to, we want to include actions to do that (i.e. to produce agents of any design — or for that matter, recognize already existing ones — and delegate powers to them) in the set of actions it can do, in any form it can prove is safe.

And this is where we run into the logical problem, since at least in the way this has been formalized in TilingAgents.pdf, it doesn’t seem possible to achieve all that unless the parent produces an agent whose own proof theory (for its own internal use in limiting its actions) is weaker than that of the parent.

I suspect that problem can be solved somehow, perhaps only by revising the way it’s formalized, but it’s still an open question.

(It’s also conceivable it can be proved impossible to solve, much like various goals people had for formal systems were proved impossible.)

(By the way, in case there is any danger of some other reader taking some of these comments out of context, let me make it clear that I’m not advocating that *human beings* should have to prove it’s safe to have a child before doing so! (Or prove it’s safe to take any other action.) This is only about a technical model of one hypothetical way to make certain kinds of machines safe for humans to use in general ways.)

]]>“If the parent has to prove it’s safe to turn on the child before doing so, I don’t think anyone believes it could be possible for the child to have a more powerful proof theory than the parent. (Only unsafe operations, like evolution, can do that.)”

So mu question would be, why does the parent have to *prove* this? Is it not ok for the parent simply to *be* safe?

Maybe I want to have a child. Maybe I am safe, maybe I’m not. God knows! But in order to have a safe child, I merely need to *be* safe. I don’t have to *prove* I’m safe before I have a child.

(It could be irresponsible, I suppose, for a person to reproduce if they already know, or have some evidence, they’ll pass on some horrible condition. But I think that situation is a bit different – this would be an agent that knows that one of its own “modules” is broken; so the agent could request that module to be fixed, before generating a child.)

Also, it’s a bit unclear what use there would be in *me* proving *I* am safe. After all, unsafe theories (e.g., inconsistent ones) *can* prove that they are themselves safe. And there are consistent theories which, wrongly, prove themselves inconsistent! (E.g., PA + ~Con(PA).) So, I’m inclined to say that self-certification of safeness seems impossible, given Godel’s incompleteness result.

Jeff

]]>If the parent has to prove it’s safe to turn on the child before doing so, I don’t think anyone believes it could be possible for the child to have a more powerful proof theory than the parent. (Only unsafe operations, like evolution, can do that.)

But what ought to be possible would be for the child to have the *same* proof theory as the parent, and be smarter or better in practical ways (like implementation shortcuts, speed, capacity, etc).

But it’s not yet known how to do that in a general way; the most that’s known (in principle only, not yet done in practice) is how the child can be *slightly weaker* in proof theory (though it could at the same time be smarter or better in other ways).

It is known how to do the equal-proof-theory child in a *non-general* way. For example, the parent could have a special operation “replace me with an identical copy”. It would be easy to prove that’s safe. But what’s wanted is for the same safety rules used for general operations on the physical world to also work for making and activating the child. That way it’s clear that there is no artificial limit on the child’s structure or implementation. (And there are other benefits, described in TilingAgents.pdf.) That’s what is not yet known how to do in principle (to those authors or me, anyway).

]]>