%!PS-Adobe-1.0 %%Title: reactive.mss %%DocumentFonts: (atend) %%Creator: Tom Mitchell and Scribe 6(1600) %%CreationDate: 19 September 1990 10:47 %%Pages: (atend) %%EndComments % PostScript Prelude for Scribe. /BS {/SV save def 0.0 792.0 translate .01 -.01 scale} bind def /ES {showpage SV restore} bind def /SC {setrgbcolor} bind def /FMTX matrix def /RDF {WFT SLT 0.0 eq {SSZ 0.0 0.0 SSZ neg 0.0 0.0 FMTX astore} {SSZ 0.0 SLT neg sin SLT cos div SSZ mul SSZ neg 0.0 0.0 FMTX astore} ifelse makefont setfont} bind def /SLT 0.0 def /SI { /SLT exch cvr def RDF} bind def /WFT /Courier findfont def /SF { /WFT exch findfont def RDF} bind def /SSZ 1000.0 def /SS { /SSZ exch 100.0 mul def RDF} bind def /AF { /WFT exch findfont def /SSZ exch 100.0 mul def RDF} bind def /MT /moveto load def /XM {currentpoint exch pop moveto} bind def /UL {gsave newpath moveto dup 2.0 div 0.0 exch rmoveto setlinewidth 0.0 rlineto stroke grestore} bind def /LH {gsave newpath moveto setlinewidth 0.0 rlineto gsave stroke grestore} bind def /LV {gsave newpath moveto setlinewidth 0.0 exch rlineto gsave stroke grestore} bind def /BX {gsave newpath moveto setlinewidth exch dup 0.0 rlineto exch 0.0 exch neg rlineto neg 0.0 rlineto closepath gsave stroke grestore} bind def /BX1 {grestore} bind def /BX2 {setlinewidth 1 setgray stroke grestore} bind def /PB {/PV save def newpath translate 100.0 -100.0 scale pop /showpage {} def} bind def /PE {PV restore} bind def /GB {/PV save def newpath translate rotate div dup scale 100.0 -100.0 scale /showpage {} def} bind def /GE {PV restore} bind def /FB {dict dup /FontMapDict exch def begin} bind def /FM {cvn exch cvn exch def} bind def /FE {end /original-findfont /findfont load def /findfont {dup FontMapDict exch known{FontMapDict exch get} if original-findfont} def} bind def /BC {gsave moveto dup 0 exch rlineto exch 0 rlineto neg 0 exch rlineto closepath clip} bind def /EC /grestore load def /SH /show load def /MX {exch show 0.0 rmoveto} bind def /W {0 32 4 -1 roll widthshow} bind def /WX {0 32 5 -1 roll widthshow 0.0 rmoveto} bind def %%EndProlog %%Page: 0 1 BS 0 SI 15 /Times-Bold AF 20350 11405 MT (Becoming Increasingly Reactive)SH 10 SS 26975 13595 MT (Tom M. Mitchell)SH 8 /Times-Roman AF 36135 15460 MT (1)SH 10 SS 24664 15805 MT (School of Computer Science)SH 24989 16910 MT (Carnegie Mellon University)SH 26280 18015 MT (Pittsburgh, PA 15213)SH 25126 19120 MT (Tom.Mitchell@cs.cmu.edu)SH /Times-Bold SF 17419 21819 MT (Abstract)SH /Times-Roman SF 33736 21835 MT (augmenting its reactive) 6 W( component whenever it is forced to)5 W 33736 22940 MT (plan. When) 358 W( used to control a laboratory mobile robot,) 54 W( the)55 W 8600 23429 MT (We describe a robot control) 665 W( architecture which)664 W 33736 24045 MT (Theo-Agent in simple cases learns to reduce its reaction)172 W 7600 24534 MT (combines a stimulus-response) 724 W( subsystem for rapid)725 W 33736 25150 MT (time for new tasks from several minutes to) 226 W( less than a)227 W 7600 25639 MT (reaction, with a search-based planner for handling)623 W 33736 26255 MT (second.)SH 7600 26744 MT (unanticipated situations. The robot) 409 W( agent continually)410 W 34736 27498 MT (The research) 182 W( reported here is part of our larger effort)181 W 7600 27849 MT (chooses which action it is to perform, using the) 97 W( stimulus-)96 W 33736 28603 MT (toward developing a general-purpose learning robot)603 W 7600 28954 MT (response subsystem when possible, and falling) 29 W( back on the)30 W 33736 29708 MT (architecture, and builds on earlier work described) 419 W( in)418 W 7600 30059 MT (planning subsystem when necessary. Whenever it) 387 W( is)386 W 33986 30813 MT ([Blythe and Mitchell 89]. We believe that in) 215 W( order to)216 W 7600 31164 MT (forced to plan, it applies an explanation-based learning)260 W 33736 31918 MT (become increasingly successful, a learning robot) 54 W( will have)53 W 7600 32269 MT (mechanism to formulate a new) 161 W( stimulus-response rule to)160 W 33736 33023 MT (to incorporate several types of learning:)SH 7600 33374 MT (cover this new situation and others similar to) 223 W( it. With)224 W /Symbol SF 34526 34476 MT (\267)SH /Times-Roman SF 35236 XM (It must become)164 W /Times-Italic SF 42144 XM (increasingly correct)164 W /Times-Roman SF 50804 XM (at predicting)165 W 7600 34479 MT (experience, the agent becomes increasingly) 102 W( reactive as its)101 W 35236 35581 MT (the effects of its actions in the world.)SH 7600 35584 MT (learning component acquires new stimulus-response rules)125 W 7600 36689 MT (that eliminate the) 96 W( need for planning in similar subsequent)95 W /Symbol SF 34526 36907 MT (\267)SH /Times-Roman SF 35236 XM (It must become)114 W /Times-Italic SF 41993 XM (increasingly reactive)113 W /Times-Roman SF (, by reducing)113 W 7600 37794 MT (situations. This) 364 W( Theo-Agent) 57 W( architecture is described, and)58 W 35236 38012 MT (the time required) 178 W( for it to make rational choices;)179 W 7600 38899 MT (results are presented demonstrating its ability to reduce)240 W 35236 39117 MT (that is, the) 554 W( time required to choose actions)553 W 7600 40004 MT (routine reaction time for a simple mobile) 352 W( robot from)353 W 35236 40222 MT (consistent with the above predictions and its goals.)SH 7600 41109 MT (minutes to under a second.)SH /Symbol SF 34526 41548 MT (\267)SH /Times-Roman SF 35236 XM (It must become)802 W /Times-Italic SF 44059 XM (increasingly perceptive)803 W /Times-Roman SF 55274 XM (at)SH 35236 42653 MT (distinguishing those features of its world) 451 W( that)450 W 35236 43758 MT (impact its success.)SH 12 /Times-Bold AF 7200 44793 MT (1. Introduction and Motivation)SH 10 /Times-Roman AF 34736 45097 MT (This paper) 297 W( focuses on the second of these types of)298 W 8600 45898 MT (Much attention has) 591 W( focused recently on)590 W /Times-Italic SF 28139 XM (reactive)SH /Times-Roman SF 33736 46202 MT (learning. We) 464 W( describe how the Theo-Agent increases the)106 W 7600 47003 MT (architectures for robotic) 49 W( agents that continually sense their)50 W 33736 47307 MT (scope of) 87 W( situations for which it can quickly make rational)88 W 7600 48108 MT (environment and compute appropriate reactions to their)257 W 33736 48412 MT (decisions, by adding new stimulus-response) 797 W( rules)796 W 7600 49213 MT (sense stimuli) 89 W( within bounded time \050e.g.,) 90 W( [Brooks) SH( 86,) 90 W( Agre)SH 33736 49517 MT (whenever it is forced to plan for a situation outside the)193 W 7600 50318 MT (and Chapman 87,) 234 W( Rosenschein) SH( 85]\051. Such) 234 W( architectures)233 W 33736 50622 MT (current scope of its reactive component. Its) 127 W( explanation-)126 W 7600 51423 MT (offer advantages) 164 W( over more traditional open-loop search-)165 W 33736 51727 MT (based learning mechanism) 62 W( produces rules that recommend)63 W 7600 52528 MT (based planning systems because they can react more)399 W 33736 52832 MT (precisely the same action) 131 W( as recommended by the slower)130 W 7600 53633 MT (quickly to changes to their environment, and) 64 W( because they)65 W 33736 53937 MT (planner, in exactly those) 47 W( situations in which the same plan)48 W 7600 54738 MT (can operate more robustly in worlds) 181 W( that are difficult to)180 W 33736 55042 MT (rationale would) 173 W( apply. However, the learned rules infer)172 W 7600 55843 MT (model in advance.)192 W /Times-Italic SF 16172 XM (Search-based)SH /Times-Roman SF 22058 XM (planning architectures,)193 W 33736 56147 MT (the desired action) 13 W( immediately from the input sense data in)14 W 7600 56948 MT (on the other hand, offer the advantage of more general-)184 W 33736 57252 MT (a single inference step--without considering explicitly the)118 W 7600 58053 MT (purpose \050if slower\051 problem) 247 W( solving mechanisms which)248 W 33736 58357 MT (robot's goals, available actions, or) 664 W( their predicted)665 W 7600 59158 MT (provide the flexibility to deal with) 149 W( a more diverse set of)148 W 33736 59462 MT (consequences.)SH 7600 60263 MT (unanticipated goals and situations.)SH 8600 61506 MT (This paper considers) 13 W( the question of how to combine the)14 W 7600 62611 MT (benefits of reactive and search-based architectures) 369 W( for)368 W 11 /Times-Bold AF 33336 63079 MT (1.1. Related Work)SH 10 /Times-Roman AF 7600 63716 MT (controlling autonomous agents. We describe the) 228 W( Theo-)229 W 34736 64184 MT (There has been a great deal of recent work) 502 W( on)501 W 7600 64821 MT (Agent architecture, which) 346 W( incorporates both a reactive)345 W 33736 65289 MT (architectures for robot control) 65 W( which continually sense the)66 W 7600 65926 MT (component and a search-based planning component.) 109 W( The)470 W 33736 66394 MT (environment and operate in bounded time) 220 W( \050e.g.,) 219 W( [Brooks)SH 7600 67031 MT (fundamental design) 124 W( principle of the Theo-Agent is that it)123 W 33736 67499 MT (86, Schoppers) SH( 87,) 279 W( Agre) SH( and Chapman 87]\051, though this)280 W 7600 68136 MT (reacts when it) 270 W( can, plans when it must, and learns by)271 W 10800 50 7200 70352 UL 6 SS 8000 71691 MT (1)SH 8 SS 8300 72000 MT (This is a reprint of a paper which appeared in the)SH /Times-Italic SF 24178 XM (Proceedings of the 1990 AAAI Conference)SH /Times-Roman SF (, August 1990, Boston.)SH ES %%Page: 1 2 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (1)SH 7600 7886 MT (type of work) 61 W( has not directly addressed issues of learning.)60 W 33736 XM (reactive component) 74 W( whenever forced to plan. In addition,)75 W 7600 8991 MT (Segre's ARMS) 374 W( system) 375 W( [Segre) SH( 88] applies explanation-)375 W 33736 XM (the architecture makes widespread use of caching) 320 W( and)319 W 7600 10096 MT (based learning to acquire) 117 W( planning tactics for a simulated)116 W 33736 XM (dependency maintenance in) 525 W( order to avoid needless)526 W 7600 11201 MT (hand-eye system, and Laird's) 658 W( RoboSoar) 659 W( [Laird) SH( and)659 W 33736 XM (recomputation of repeatedly) 15 W( accessed beliefs. The primary)14 W 7600 12306 MT (Rosenbloom 90] has been applied to simple problems in a)63 W 33736 XM (characteristics of the Theo-Agent are:)SH 7600 13411 MT (real hand-eye) 43 W( robot system. While these researchers share)44 W /Symbol SF 34526 13759 MT (\267)SH /Times-Roman SF 35236 XM (It continually reassesses what action it should)414 W 7600 14516 MT (our goal of developing systems) 415 W( that are increasingly)414 W 35236 14864 MT (perform. The) 338 W( agent runs in a tight loop in which it)44 W 7600 15621 MT (reactive, the underlying architectures vary) 104 W( significantly in)105 W 35236 15969 MT (repeatedly updates its sensor) 372 W( inputs, chooses a)373 W 7600 16726 MT (the form of the knowledge being learned, underlying)379 W 35236 17074 MT (control action, begins executing it,) 26 W( then repeats this)25 W 7600 17831 MT (representations, and real response time. Sutton) 470 W( has)471 W 35236 18179 MT (loop.)SH 7600 18936 MT (proposed an inductive approach) 74 W( to acquiring robot control)73 W /Symbol SF 34526 19505 MT (\267)SH /Times-Roman SF 35236 XM (It reacts) 303 W( when it can, and plans when it must.)304 W 7600 20041 MT (strategies, in his DYNA system) 733 W( [Sutton) SH( 90], and)734 W 35236 20610 MT (Whenever it must choose an) 284 W( action, the system)283 W 7600 21146 MT (Pommerleau has) 128 W( developed a connectionist system which)127 W 35236 21715 MT (consults a set) 330 W( of stimulus-response rules which)331 W 7600 22251 MT (learns to control an outdoor road-following) 553 W( vehicle)554 W 35236 22820 MT (constitute its reactive component. If) 134 W( one of these)133 W 7850 23356 MT ([Pommerleau 89].) 138 W( In) 525 W( addition to work on learning such)137 W 35236 23925 MT (rules applies to) 105 W( the current sensed inputs, then the)106 W 7600 24461 MT (robot control strategies, there) 450 W( has been much recent)451 W 35236 25030 MT (corresponding action is taken. If no) 179 W( rules apply,)178 W 7600 25566 MT (interest in robot learning more generally,) 149 W( including work)148 W 35236 26135 MT (then the) 493 W( planner is invoked to determine an)494 W 7600 26671 MT (on learning increasingly correct models) 627 W( of actions)628 W 35236 27240 MT (appropriate action.)SH 7850 27776 MT ([Christiansen, et al. 90,) 248 W( Zrimic) SH( and Mowforth 88],) 248 W( and)247 W /Symbol SF 34526 28566 MT (\267)SH /Times-Roman SF 35236 XM (Whenever forced) 515 W( to plan, it acquires a new)514 W 7600 28881 MT (work on becoming increasingly perceptive [Tan 90].)SH 35236 29671 MT (stimulus-response rule.) 57 W( The) 366 W( new rule recommends)58 W 8600 30124 MT (The work reported here is also somewhat related) 283 W( to)284 W 35236 30776 MT (the action which) 94 W( the planner has recommended, in)93 W 7600 31229 MT (recent ideas for compiling low-level reactive systems) 34 W( from)33 W 35236 31881 MT (the same situations \050i.e., those world states) 323 W( for)324 W 7600 32334 MT (high-level specifications \050e.g.,) 508 W( [Rosenschein) SH( 85]\051.) 508 W( In)1268 W 35236 32986 MT (which the same plan) 30 W( justification would apply\051, but)29 W 7600 33439 MT (particular, such) 131 W( compilation transforms input descriptions)130 W 35236 34091 MT (can be invoked much more) 36 W( efficiently. Learning is)37 W 7600 34544 MT (of actions and goals into effective control) 57 W( strategies, using)58 W 35236 35196 MT (accomplished by an explanation-based learning)441 W 7600 35649 MT (transformations similar) 128 W( to those achieved by explanation-)127 W 35236 36301 MT (algorithm \050EBG) 94 W( [Mitchell,) SH( et) 94 W( al 86]\051, and provides)95 W 7600 36754 MT (based learning in the Theo-Agent.) 189 W( The) 630 W( main difference)190 W 35236 37406 MT (a demand-driven incremental compilation) 292 W( of the)291 W 7600 37859 MT (between such design-time compilation and) 1036 W( the)1035 W 35236 38511 MT (planner's knowledge into an equivalent reactive)314 W 7600 38964 MT (explanation-based learning used in) 53 W( the Theo-Agent, is that)54 W 35236 39616 MT (strategy, guided by the agent's experiences.)SH 7600 40069 MT (for the Theo-Agent) 369 W( learning occurs incrementally and)368 W /Symbol SF 34526 40942 MT (\267)SH /Times-Roman SF 35236 XM (Every belief that depends on) 399 W( sensory input is)398 W 7600 41174 MT (spread across) 442 W( the lifetime of the agent, so that the)443 W 35236 42047 MT (maintained as long as) 13 W( its explanation remains valid.)14 W 7600 42279 MT (compilation transformation is incrementally focused by the)16 W 35236 43152 MT (Many beliefs in) 410 W( the Theo-Agent, including its)409 W 7600 43384 MT (worlds actually encountered by) 279 W( the agent, and may be)280 W 35236 44257 MT (belief of which action) 307 W( to perform next, depend)308 W 7600 44489 MT (interleaved with other learning mechanisms) 714 W( which)713 W 35236 45362 MT (directly or indirectly on observed sense data. The)99 W 7600 45594 MT (improve the agent's models of its actions.)SH 35236 46467 MT (architecture maintains a network) 258 W( of explanations)259 W 35236 47572 MT (for every belief) 218 W( of the agent, and deletes beliefs)217 W 35236 48677 MT (only when their support ceases. This) 182 W( caching of)183 W 8600 49576 MT (The next section of this paper describes the Theo-Agent)47 W 35236 49782 MT (beliefs significantly improves the response time of)90 W 7600 50681 MT (architecture in greater detail.) 348 W( The) 944 W( subsequent section)347 W 35236 50887 MT (the agent by) 1 W( eliminating recomputation of beliefs in)2 W 7600 51786 MT (presents an example of its use in controlling a) 279 W( simple)280 W 35236 51992 MT (the face of unchanging or irrelevant sensor inputs.)SH 7600 52891 MT (mobile robot, the learning mechanism) 173 W( for acquiring new)172 W /Symbol SF 34526 53318 MT (\267)SH /Times-Roman SF 35236 XM (It determines which) 67 W( goal to attend to, based on the)66 W 7600 53996 MT (stimulus-response rules, and timing) 418 W( data showing the)419 W 35236 54423 MT (perceived world state, a) 331 W( predefined set of goal)332 W 7600 55101 MT (effect of caching) 284 W( and rule learning on system reaction)283 W 35236 55528 MT (activation and satisfaction conditions, and) 297 W( given)296 W 7600 56206 MT (time. The) 318 W( final section) 34 W( summarizes some of the lessons of)35 W 35236 56633 MT (priorities among goals.)SH 7600 57311 MT (this work, including features and bugs) 5 W( in the current design)4 W /Times-Bold SF 34736 57972 MT (Internal structure of) 74 W( agent:)75 W /Times-Roman SF 47301 XM (A Theo-Agent is defined)75 W 7600 58416 MT (of the architecture.)SH 33736 59077 MT (by a frame structure whose slots, subslots, subsubslots,) 12 W( etc.)11 W 8 SS 50702 59845 MT (2)SH 10 SS 33736 60190 MT (define the agent's beliefs, or) 37 W( internal state)36 W 51102 XM (. The) 322 W( two most)36 W 33736 61295 MT (significant slots of) 263 W( the agent are Chosen.Action, which)264 W 12 /Times-Bold AF 7200 62100 MT (2. The Theo-Agent Architecture)SH 10 /Times-Roman AF 33736 62400 MT (describes the action the agent presently chooses) 518 W( to)517 W 8600 63205 MT (The design of the Theo-Agent architecture is) 121 W( primarily)122 W 33736 63505 MT (perform; and) 25 W( Observed.World, which describes the agent's)26 W 7600 64310 MT (driven by the goal) 423 W( of combining the complementary)422 W 33736 64610 MT (current perception of its world. As indicated) 35 W( in Figure 2-1)34 W 7600 65415 MT (advantages of reactive and search-based) 6 W( systems. Reactive)7 W 7600 66520 MT (systems offer the advantage of) 190 W( quick response. Search-)189 W 10800 50 33336 67580 UL 7600 67625 MT (based planners offer the advantage of) 290 W( broad scope for)291 W 7600 68730 MT (handling a) 245 W( more diverse range of unanticipated worlds.)244 W 6 SS 34136 68919 MT (2)SH 8 SS 34436 69228 MT (The Theo-Agent is implemented on top of a generic) 255 W( frame-based)256 W 10 SS 7600 69835 MT (The Theo-Agent architecture employs both,) 428 W( and uses)429 W 8 SS 33336 70152 MT (problem solving and learning system called) 200 W( Theo) 199 W( [Mitchell,) SH( et al. 90],)199 W 10 SS 7600 70940 MT (explanation-based learning) 259 W( to incrementally augment its)258 W 8 SS 33336 71076 MT (which provides the inference, representation,) 218 W( dependency maintenance,)219 W 33336 72000 MT (and learning mechanisms.)SH ES %%Page: 2 3 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (2)SH 35236 7886 MT (features of Observed.World, and delete any) 54 W( cached)53 W 24160 50 7200 8086 UL 35236 8991 MT (values for)SH /Times-Italic SF 39457 XM (lazily sensed)SH /Times-Roman SF 44790 XM (features.)SH 8 /Courier-Bold AF 7600 9370 MT (+-------------------------------------------+)SH 7600 10315 MT (| |)20160 W 10 /Times-Roman AF 34236 10317 MT (2.)SH /Times-Bold SF 35236 XM (Decide)SH /Times-Roman SF 38374 XM (upon the current Chosen.Action)SH 8 /Courier-Bold AF 7600 11260 MT (| ATTENDED.TO.GOAL) 960 W( PLAN) 5280 W( |)3360 W 10 /Times-Roman AF 34236 11643 MT (3.)SH /Times-Bold SF 35236 XM (Execute)SH /Times-Roman SF 38874 XM (the Chosen.Action)SH 8 /Courier-Bold AF 7600 12205 MT (| |)20160 W 10 /Times-Roman AF 34736 12982 MT (When the Chosen.Action slot is accessed \050during the)236 W 8 /Courier-Bold AF 7600 13150 MT (| |)20160 W 10 /Times-Roman AF 33736 14087 MT (decision portion of the) 162 W( above cycle\051, the following steps)161 W 8 /Courier-Bold AF 7600 14095 MT (| ACTIVE.GOALS) 960 W( |)12960 W 7600 15040 MT (| |)20160 W 10 /Times-Roman AF 33736 15192 MT (are attempted in sequence until one succeeds:)SH 8 /Courier-Bold AF 7600 15985 MT (| |)20160 W 10 /Times-Roman AF 34236 16531 MT (1.)SH 35236 XM (Retrieve the cached value of this slot \050if available\051)SH 8 /Courier-Bold AF 7600 16930 MT (| OBSERVED.WORLD) 480 W( CHOSEN.ACTION) 5280 W( |)480 W 10 /Times-Roman AF 34236 17857 MT (2.)SH 35236 XM (Infer a value based on the available stimulus-)375 W 8 /Courier-Bold AF 7600 17875 MT (| |)20160 W 7600 18820 MT (+-------------------------------------------+)SH 10 /Times-Roman AF 35236 18962 MT (response rules)SH 34236 20288 MT (3.)SH 35236 XM (Select the first step of) 70 W( the agent's Plan \050inferring a)69 W 8 /Courier-Bold AF 10000 20710 MT (SENSORS EFFECTORS)9120 W 10 /Times-Roman AF 35236 21393 MT (plan if necessary\051)SH 34236 22719 MT (4.)SH 35236 XM (Select the default action \050e.g., WAIT\051)SH /Times-Bold SF 34736 24058 MT (Sensing policy:)154 W /Times-Roman SF 42116 XM (Each primitive sensed input \050e.g.,) 154 W( an)155 W /Times-Bold SF 11156 24231 MT (Figure 2-1:)SH /Times-Roman SF 16405 XM (Data Flow in a Theo-Agent)SH 33736 25163 MT (array of input sonar data\051) 219 W( is stored in some slot of the)218 W 24160 50 7200 26041 UL 33736 26268 MT (agent's Observed.World. Higher level) 240 W( features such as)241 W 33736 27373 MT (edges in the sonar array, regions, region) 178 W( width, etc., are)177 W 7600 27632 MT (the agent may infer its Chosen.Action either directly) 71 W( from)72 W 33736 28478 MT (represented by) 1009 W( values of other slots of the)1010 W 7600 28737 MT (its Observed.World, or) 86 W( alternatively from its current Plan.)85 W 33736 29583 MT (Observed.World, and are) 138 W( inferred upon demand from the)137 W 7600 29842 MT (Its Plan is in) 189 W( turn derived from its Observed.World and)190 W 33736 30688 MT (lower-level features. The) 69 W( decision-making portions of the)70 W 7600 30947 MT (Attended.To.Goal. The) 311 W( Attended.To.Goal defines the goal)30 W 33736 31793 MT (agent draw upon) 255 W( the entire range of low to high level)254 W 7600 32052 MT (the agent is currently attempting to achieve, and is)453 W 33736 32898 MT (sensory features as needed. In order) 44 W( to deal with a variety)45 W 7600 33157 MT (computed as the highest priority of its Active.Goals,) 19 W( which)18 W 33736 34003 MT (of sensing procedures of varying cost, the) 284 W( Theo-Agent)283 W 7600 34262 MT (are themselves inferred from the Observed.World.)SH 33736 35108 MT (distinguishes between two types of primitive sensed)488 W /Times-Bold SF 8600 35505 MT (Agent goals:)421 W /Times-Roman SF 14998 XM (Goals are specified to the agent by)422 W 33736 36213 MT (features: those) 320 W( which it)35 W /Times-Italic SF 43715 XM (eagerly)SH /Times-Roman SF 46999 XM (senses, and) 35 W( those which it)34 W 7600 36610 MT (defining conditions under) 90 W( which they are active, satisfied,)89 W /Times-Italic SF 33736 37318 MT (lazily)SH /Times-Roman SF 36603 XM (senses. Eagerly) 1152 W( sensed features are refreshed)451 W 7600 37715 MT (and attended to.) 152 W( For) 556 W( example, an agent may be given a)153 W 33736 38423 MT (automatically during) 114 W( each cycle through the agent's main)113 W 7600 38820 MT (goal Recharge.Battery) 134 W( which is defined to become active)133 W 33736 39528 MT (loop, so that dependent cached beliefs of the) 233 W( agent are)234 W 7600 39925 MT (when it) 206 W( perceives its battery level to be less than 75%,)207 W 33736 40633 MT (retained when possible. In contrast, lazily sensed) 68 W( features)67 W 7600 41030 MT (becomes satisfied when the battery) 176 W( charge is 100%, and)175 W 33736 41738 MT (are simply deleted during each cycle. They) 606 W( are)607 W 7600 42135 MT (which is attended to whenever it is active and the) 93 W( \050higher)94 W 33736 42843 MT (recomputed only) 241 W( if the agent queries the corresponding)240 W 7600 43240 MT (priority\051 goal Avoid.Oncoming.Obstacle is inactive.)SH 33736 43948 MT (slot during some subsequent cycle. This division between)57 W /Times-Bold SF 8600 44483 MT (Caching policy:)42 W /Times-Roman SF 15629 XM (The basic operation) 42 W( of the Theo-Agent)41 W 33736 45053 MT (eagerly and) 281 W( lazily refreshed features provides a simple)280 W 7600 45588 MT (is to repeatedly infer a value) 180 W( for its Chosen.Action slot.)181 W 33736 46158 MT (focus of attention which allows keeping the overhead of)165 W 7600 46693 MT (Each slot of the agent typically has one or) 110 W( more attached)109 W 33736 47263 MT (collecting new sense data during each cycle to a minimum.)SH 7600 47798 MT (procedures for) 251 W( obtaining a value upon demand. These)252 W /Times-Bold SF 34736 48506 MT (Learning policy:)89 W /Times-Roman SF 42248 XM (Whenever the agent) 89 W( is forced to plan)88 W 7600 48903 MT (procedures typically access other) 520 W( slots, backchaining)519 W 33736 49611 MT (in order to obtain a value for) 48 W( its Chosen.Action, it invokes)49 W 7600 50008 MT (eventually to) 374 W( queries to slots of the Observed.World.)375 W 33736 50716 MT (its explanation-based generalization) 230 W( routine to acquire a)229 W 7600 51113 MT (Whenever some slot value is successfully inferred, this)252 W 33736 51821 MT (new stimulus-response) 202 W( rule to cover this situation. The)203 W 7600 52218 MT (value is cached \050stored\051 in the) 175 W( corresponding slot, along)176 W 33736 52926 MT (details of this routine are described in greater detail in) 71 W( the)70 W 7600 53323 MT (with an explanation) 144 W( justifying its value in terms of other)143 W 33736 54031 MT (next section. The effect of this) 312 W( learning policy is to)313 W 7600 54428 MT (slot values, which are in turn justified in terms of) 115 W( others,)116 W 33736 55136 MT (incrementally extend the scope of the set) 297 W( of stimulus-)296 W 7600 55533 MT (leading eventually to values of individual features in) 158 W( the)157 W 33736 56241 MT (response rules to fit) 435 W( the types of problem instances)436 W 7600 56638 MT (Observed.World, which are themselves inferred) 7 W( by directly)8 W 33736 57346 MT (encountered by the system in its world.)SH 7600 57743 MT (accessing the robot sensors.) 92 W( Values) 432 W( remain cached for as)91 W 7600 58848 MT (long as their explanations remain valid.) 69 W( Thus,) 389 W( the agent's)70 W 7600 59953 MT (Active.Goals and Chosen.Action) 193 W( may remain cached for)192 W 12 /Times-Bold AF 33336 61030 MT (3. Example and Results)SH 10 /Times-Roman AF 7600 61058 MT (many cycles, despite) 160 W( irrelevant changes in sensor inputs.)161 W 34736 62135 MT (This section describes) 463 W( the use of the Theo-Agent)462 W 7600 62163 MT (This policy of always) 427 W( caching values, deleting them)426 W 33736 63240 MT (architecture to develop a simple) 32 W( program to control a Hero)33 W 7600 63268 MT (immediately when explanations become invalid, and) 39 W( lazily)40 W 33736 64345 MT (2000 mobile) 381 W( robot to search the laboratory to locate)380 W 7600 64373 MT (recomputing upon demand, assures that) 54 W( the agent's beliefs)53 W 8 SS 39046 65113 MT (3)SH 10 SS 33736 65458 MT (garbage cans)118 W 39446 XM (. In) 486 W( particular, we illustrate how) 118 W( goals and)117 W 7600 65478 MT (adapt quickly to changes in its) 369 W( input senses, without)370 W 33736 66563 MT (actions are provided to the robot with no initial) 106 W( stimulus-)107 W 7600 66583 MT (needless recomputation.)SH /Times-Bold SF 8600 67826 MT (Control policy:)489 W /Times-Roman SF 16299 XM (The Theo-Agent is controlled by)488 W 7600 68931 MT (executing the following loop:)SH 10800 50 33336 69428 UL 8600 70174 MT (Do Forever:)SH 6 SS 34136 70767 MT (3)SH 8 SS 34436 71076 MT (A detailed description) 127 W( of the modified Hero 2000 robot used here is)128 W 10 SS 8100 71513 MT (1.)SH /Times-Bold SF 9100 XM (Sense)SH /Times-Roman SF 11923 XM (and update readings) 184 W( for all)185 W /Times-Italic SF 23760 XM (eagerly sensed)185 W 8 /Times-Roman AF 33336 72000 MT (available in [Lin, et al. 89].)SH ES %%Page: 3 4 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (3)SH 7600 7886 MT (response rules, how it initially) 633 W( selects actions by)632 W 33736 XM (Chosen.Action slot, which has no cached value, and no)210 W 7600 8991 MT (constructing plans,) 131 W( and how it incrementally accumulates)132 W 33736 XM (associated stimulus-response rules.) 178 W( Thus,) 607 W( it is forced to)179 W 7600 10096 MT (stimulus-response rules that cover its routine actions.)SH 33736 XM (plan in order to determine a value for) 297 W( Chosen.Action.)296 W 33736 11201 MT (When queried, the planner determines) 339 W( which goal the)340 W 8600 11339 MT (The robot sensors) 353 W( used in this example include an)352 W 33736 12306 MT (agent is) 302 W( attending to, then searches for a sequence of)301 W 7600 12444 MT (ultrasonic sonar mounted on its) 115 W( hand, a rotating sonar on)116 W 33736 13411 MT (actions which it projects will satisfy) 120 W( this goal. Thus, the)121 W 7600 13549 MT (its head, and a battery voltage) 32 W( sensor. By rotating its hand)31 W 33736 14516 MT (planner queries the Attending.To.Goal slot, which) 95 W( queries)94 W 7600 14654 MT (and head sonars it is able to) 11 W( obtain arrays of sonar readings)12 W 33736 15621 MT (the Active.Goals) 145 W( slots, which query the Observed.World,)146 W 7600 15759 MT (that measure echo distance versus rotation) 162 W( angle. These)161 W 33736 16726 MT (leading eventually to) 1414 W( determining that the)1413 W 7600 16864 MT (raw sonar readings are interpreted) 144 W( \050on demand\051 to locate)145 W 33736 17831 MT (Attending.To.Goal is Goal.Closer. The planner,) 391 W( after)392 W 7600 17969 MT (edges in the sonar array, as well as regions,) 49 W( and properties)48 W 33736 18936 MT (some search,) 350 W( then derives a two-step plan to execute)349 W 7600 19074 MT (of regions such as region width, distance,) 182 W( direction, and)183 W 33736 20041 MT (Forward.10 two times in a row \050this plan leads to) 340 W( a)341 W 7600 20179 MT (identity. The) 809 W( primitive sensing operations used in the)279 W 33736 21146 MT (projected sonar reading of 21.5 inches, which) 399 W( would)398 W 7600 21284 MT (present example include Battery,) 437 W( which indicates the)438 W 33736 22251 MT (satisfy Goal.Closer\051.) 676 W( The) 1603 W( inferred value for the)677 W 7600 22389 MT (battery voltage level, Sonarw, which measures sonar range)34 W 33736 23356 MT (Chosen.Action slot) 33 W( is thus Forward.10 \050the first step of the)32 W 7600 23494 MT (with the wrist sonar pointed) 561 W( directly forward, and)562 W 33736 24461 MT (inferred plan\051.)SH 7600 24599 MT (Sweep.Wrist.Roll, which obtains an array of) 595 W( sonar)594 W 7600 25704 MT (readings by rotating the wrist from) 89 W( left to right. Of these)90 W 34736 XM (The agent caches the) 187 W( result of each of the above slot)188 W 7600 26809 MT (sensed features, Sonarw is eagerly) 124 W( sensed, and the others)123 W 33736 XM (queries, along with an explanation) 134 W( that justifies each slot)133 W 7600 27914 MT (are lazily sensed.)SH 33736 XM (value in terms of the) 173 W( values from which it was derived.)174 W 33736 29019 MT (This network) 368 W( of explanations relates each belief \050slot)367 W 8600 29157 MT (The robot) 483 W( actions here include Forward.10 \050move)484 W 33736 30124 MT (value\051 of the agent eventually) 248 W( to sensed features of its)249 W 7600 30262 MT (forward 10 inches\051, Backward.10) 369 W( \050move backward 10)368 W 33736 31229 MT (Observed.World.)SH 7600 31367 MT (inches\051, Face.The.Object \050turn toward the) 280 W( closest sonar)281 W 7600 32472 MT (region in front of) 340 W( the robot\051, and Measure.The.Object)339 W 34736 XM (In the above scenario the) 2 W( agent had to construct a plan in)1 W 7600 33577 MT (\050obtain several additional) 460 W( sonar sweeps to determine)461 W 33736 XM (order to) 10 W( infer its Chosen.Action. Therefore, it formulates a)11 W 7600 34682 MT (whether the closest sonar) 103 W( region in front of the robot is a)102 W 33736 XM (new stimulus-response) 276 W( rule which will recommend this)275 W 7600 35787 MT (garbage can\051. The.Object refers to) 20 W( the closest sonar region)21 W 33736 XM (chosen action in future situations, without planning. The)116 W 7600 36892 MT (in front) 158 W( of the robot, as detected by the sense procedure)157 W 33736 XM (agent then executes the action and begins a new) 99 W( cycle by)98 W 7600 37997 MT (Sweep.Wrist.Roll.)SH 33736 XM (eagerly refreshing the Sonarw feature and) 264 W( deleting any)265 W 33736 39102 MT (other sensed features \050in this case) 228 W( the observed Battery)227 W 8600 39240 MT (This Theo-Agent has been tested by giving it) 130 W( different)131 W 33736 40207 MT (level, which was queried by the planner as it checked the)96 W 7600 40345 MT (sets of) 52 W( initial goals, leading it to compile out different sets)51 W 33736 41312 MT (preconditions for various actions\051.) 284 W( During) 817 W( this second)283 W 7600 41450 MT (of stimulus-response) 174 W( rules exhibiting different behaviors.)175 W 33736 42417 MT (cycle, the stimulus-response rule) 180 W( learned during the first)181 W 7600 42555 MT (In the simple example presented here, the) 147 W( agent is given)146 W 33736 43522 MT (cycle applies, and the agent quickly decides) 377 W( that the)376 W 7600 43660 MT (three goals:)SH 33736 44627 MT (appropriate Chosen.Action in the new situation) 418 W( is to)419 W /Symbol SF 8390 45113 MT (\267)SH /Times-Roman SF 9100 XM (Goal.Closer: approach distant objects. This) 49 W( goal is)50 W 33736 45732 MT (execute Forward.10. As it gains experience, the) 259 W( agent)258 W 9100 46218 MT (activated when the Sonarw) 563 W( sense reading is)562 W 33736 46837 MT (acquires additional rules) 168 W( and an increasing proportion of)169 W 9100 47323 MT (between 25 and 100 inches, indicating an object) 71 W( at)72 W 33736 47942 MT (its decisions are made by) 18 W( invoking these stimulus-response)17 W 9100 48428 MT (that distance. It is satisfied when Sonarw is) 165 W( less)164 W 33736 49047 MT (rules rather than planning.)SH 9100 49533 MT (that 25) 341 W( inches, and attended to whenever it is)342 W 9100 50638 MT (active.)SH /Symbol SF 8390 51964 MT (\267)SH /Times-Roman SF 9100 XM (Goal.Further: back off from) 96 W( close objects. This is)95 W 11 /Times-Bold AF 33336 52664 MT (3.1. Rule Learning)SH 10 /Times-Roman AF 9100 53069 MT (activated when Sonarw is) 22 W( between 3 and 15 inches,)23 W 34736 53769 MT (The rule acquisition procedure used) 89 W( by the Theo-Agent)90 W 9100 54174 MT (satisfied when Sonarw) 213 W( is greater than 15 inches,)212 W 33736 54874 MT (is an) 304 W( explanation-based learning algorithm called EBG)303 W 9100 55279 MT (and attended to whenever it is active.)SH 33986 55979 MT ([Mitchell, et al 86]. This procedure) 245 W( explains why the)246 W /Symbol SF 8390 56605 MT (\267)SH /Times-Roman SF 9100 XM (Goal.Identify.The.Object: determine whether the)403 W 33736 57084 MT (Chosen.Action of the Theo-Agent is justified,) 248 W( finds the)247 W 9100 57710 MT (nearest sonar region corresponds to a) 86 W( garbage can.)85 W 33736 58189 MT (weakest conditions under which) 270 W( this explanation holds,)271 W 9100 58815 MT (This is activated when there is an object in front of)43 W 33736 59294 MT (and then produces a rule that recommends) 772 W( the)771 W 9100 59920 MT (the robot whose identity is) 391 W( unknown, satisfied)390 W 33736 60399 MT (Chosen.Action under just) 654 W( these conditions. More)655 W 9100 61025 MT (when the) 77 W( object identity is known, and attended to)78 W 33736 61504 MT (precisely, given some Chosen.Action,) 137 W( ?Action, the Theo-)136 W 9100 62130 MT (whenever it) 679 W( is active and Goal.Closer and)678 W 33736 62609 MT (Agent explains why) 543 W( ?Action satisfies the following)544 W 9100 63235 MT (Goal.Further are inactive.)SH 33736 63714 MT (property:)SH 8600 64574 MT (In order to illustrate the operation) 175 W( of the Theo-Agent,)176 W 33736 65438 MT (Justified.Action\050?Agent, ?Action\051)SH /Symbol SF 47717 XM (\254)SH /Times-Roman SF 7600 65679 MT (consider the sequence of events that results from) 172 W( setting)171 W 35736 66543 MT (\0501\051 The Attending.To.Goal of the ?Agent is ?G)SH 7600 66784 MT (the robot loose in the lab with the above goals,) 6 W( actions, and)7 W 35736 67648 MT (\0502\051 ?G is Satisfied by result of ?Agent's plan)SH 7600 67889 MT (sensing routines: During the first) 276 W( iteration through its)275 W 35736 68753 MT (\0503\051 The tail of ?Agent's plan will not succeed without)SH 7600 68994 MT (sense-decide-execute loop, it \050eagerly\051 senses a reading of)70 W 37152 69858 MT (first executing ?Action)SH 7600 70099 MT (41.5 from) 94 W( Sonarw, reflecting an object at 41.5 inches. In)93 W 35736 70963 MT (\0504\051 ?Action is the first step of the ?Agent's plan)SH 7600 71204 MT (the decide phase of) 544 W( this cycle it then queries its)545 W ES %%Page: 4 5 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (4)SH 34736 7886 MT (EBG constructs an explanation of) 970 W( why the)969 W 24160 50 7200 8086 UL 33736 8991 MT (Chosen.Action is a Justified.Action as) 53 W( defined above, then)54 W 8 /Courier-Bold AF 7600 9370 MT (\050hero justified.action\051 = face.the.object)SH 10 /Times-Roman AF 33736 10096 MT (determines the) 68 W( weakest conditions on the Observed.World)67 W 8 /Courier-Bold AF 8080 10315 MT (<--prolog--)SH /Times-Roman SF 50349 10864 MT (4)SH 10 SS 33736 11209 MT (under which this explanation) 239 W( will hold)240 W 50749 XM (. Consider,) 730 W( for)240 W 8 /Courier-Bold AF 8080 11260 MT (\050hero attending.to.goals\051 = goal.identify.object)SH 8560 12205 MT (<--prolog--)SH 10 /Times-Roman AF 33736 12314 MT (example, a scenario in which the Hero agent is attending to)7 W 8 /Courier-Bold AF 8560 13150 MT (\050hero monitored.goals\051 = goal.identify.object)SH 10 /Times-Roman AF 33736 13419 MT (the goal Goal.Identify.The.Object, and has) 192 W( constructed a)193 W 8 /Courier-Bold AF 8560 14095 MT (\050hero goal.identify.object attending.to?\051 = t)SH 10 /Times-Roman AF 33736 14524 MT (two-step plan:) 1131 W( Face.The.Object,) 2511 W( followed by)1130 W 8 /Courier-Bold AF 9040 15040 MT (<--prolog--)SH 10 /Times-Roman AF 33736 15629 MT (Measure.The.Object. Figure) 754 W( 3-1 shows) 252 W( the explanation)253 W 8 /Courier-Bold AF 9040 15985 MT (\050hero goal.identify.object active?\051 = t)SH 10 /Times-Roman AF 33736 16734 MT (generated by the system for) 221 W( why Face.The.Object is its)220 W 8 /Courier-Bold AF 9520 16930 MT (<--prolog--)SH 10 /Times-Roman AF 33736 17839 MT (Justified.Action. In) 572 W( this figure, each) 161 W( line corresponds to)162 W 8 /Courier-Bold AF 9520 17875 MT (\050hero observed.world\051 = w0)SH 9520 18820 MT (\050w0 the.object identity known?\051 = nil)SH 10 /Times-Roman AF 33736 18944 MT (some belief of the agent, and level) 110 W( of indentation reflects)109 W 8 /Courier-Bold AF 8560 19765 MT (\050hero goal.closer active?\051 = nil)SH 10 /Times-Roman AF 33736 20049 MT (dependency. Each) 330 W( belief is written in) 40 W( the form \050frame slot)41 W 8 /Courier-Bold AF 9040 20710 MT (<--prolog--)SH 10 /Times-Roman AF 33736 21154 MT (subslot subsubslot ...\051=value, and arrows such as) 346 W( "<--)345 W 8 /Courier-Bold AF 9040 21655 MT (\050hero observed.world\051 = w0)SH 10 /Times-Roman AF 33736 22259 MT (observed.value--" indicate how the belief above) 34 W( and left of)35 W 8 /Courier-Bold AF 9040 22600 MT (\050w0 sonarw\051 = 22.5)SH 10 /Times-Roman AF 33736 23364 MT (the arrow was inferred) 155 W( from the beliefs below and to its)154 W 8 /Courier-Bold AF 8560 23545 MT (\050hero goal.further active?\051 = nil)SH 10 /Times-Roman AF 33736 24469 MT (right. For) 722 W( example, the leftmost belief that the Hero's)236 W 8 /Courier-Bold AF 9040 24490 MT (<--prolog--)SH 9040 25435 MT (\050hero observed.world\051 = w0)SH 10 /Times-Roman AF 33736 25574 MT (Justified.Action is Face.The.Object, is supported) 257 W( by the)256 W 8 /Courier-Bold AF 9040 26380 MT (\050w0 sonarw\051 = 22.5)SH 10 /Times-Roman AF 33736 26679 MT (three next leftmost beliefs that \0501\051 the \050Hero)875 W 8 /Courier-Bold AF 8080 27325 MT (\050world376 goal.identify.object wsatisfied?\051 = t)SH 10 /Times-Roman AF 33736 27784 MT (Attending.To.Goals\051=Goal.Identify.Object, \0502\051) 1753 W( the)1752 W 8 /Courier-Bold AF 8560 28270 MT (<--prolog--)SH 10 /Times-Roman AF 33736 28889 MT (\050World376 Goal.Identify.Object Satisfied?\051=t, and \0503\051 \050W0)18 W 8 /Courier-Bold AF 8560 29215 MT (\050world376 the.object identity known?\051 = t)SH 10 /Times-Roman AF 33736 29994 MT (Measure.The.Object Prec.Sat?\051=nil. W0 is the) 330 W( current)329 W 8 /Courier-Bold AF 9040 30160 MT (<--expected.value--)SH 10 /Times-Roman AF 33736 31099 MT (Observed.World, World376 is) 271 W( the world state which is)272 W 8 /Courier-Bold AF 9040 31105 MT (\050world376 previous.state\051 = world159)SH 9040 32050 MT (\050world159 measure.the.object prec.sat?\051 = t)SH 10 /Times-Roman AF 33736 32204 MT (predicted to result from) 66 W( the agent's plan, and Prec.Sat? is)65 W 8 /Courier-Bold AF 9520 32995 MT (<--prolog--)SH 10 /Times-Roman AF 33736 33309 MT (the predicate indicating whether the) 208 W( preconditions of an)209 W 8 /Courier-Bold AF 9520 33940 MT (\050world159 battery\051 = 100)SH 10 /Times-Roman AF 33736 34414 MT (action are satisfied in a given) 202 W( world state. These three)201 W 8 /Courier-Bold AF 10000 34885 MT (<--expected.value--)SH 10 /Times-Roman AF 33736 35519 MT (supporting beliefs correspond to the first three clauses) 137 W( in)138 W 8 /Courier-Bold AF 10000 35830 MT (\050world159 previous.state\051 = w0)SH /Times-Roman SF 49749 36287 MT (5)SH 10 SS 33736 36632 MT (the above definition of Justified.Action)80 W 50149 XM (. Notice) 410 W( the third)80 W 8 /Courier-Bold AF 10000 36775 MT (\050w0 battery\051 = 100)SH 10480 37720 MT (<--observed.value--)SH 10 /Times-Roman AF 33736 37737 MT (clause indicates that in this case the) 32 W( tail of the agent's plan)31 W 8 /Courier-Bold AF 10480 38665 MT (\050w0 battery observed.value\051 = 100)SH 10 /Times-Roman AF 33736 38842 MT (cannot succeed since the) 95 W( preconditions of the second step)96 W 8 /Courier-Bold AF 9520 39610 MT (\050world159 the.object distance\051 = 22)SH 10 /Times-Roman AF 33736 39947 MT (of the plan are not satisfied in the initial observed world.)SH 8 /Courier-Bold AF 10000 40555 MT (<--expected.value--)SH 10000 41500 MT (\050world159 previous.state\051 = w0)SH 24160 50 33336 41938 UL 10000 42445 MT (\050w0 face.the.object prec.sat?\051 = t)SH 33736 43222 MT (IF)SH 10480 43390 MT (<--prolog--)SH 34216 44167 MT (\0501\051 Identity of The.Object in Observed.World)SH 10480 44335 MT (\050w0 battery\051 = 100)SH 37576 45112 MT (is not Known)SH 10960 45280 MT (<--observed.value--)SH 34216 46057 MT (\0501\051 Sonarw in Observed.World = ?s)SH 10960 46225 MT (\050w0 battery observed.value\051 = 100)SH 34216 47002 MT (\0501\051 Not [3 < ?s < 15])SH 10480 47170 MT (\050w0 the.object direction known?\051 = t)SH 34216 47947 MT (\0501\051 Not [25 < ?s < 100])SH 10000 48115 MT (\050w0 the.object distance\051 = 22)SH 34216 48892 MT (\0502\051 Battery in Observed.World > 70)SH 10480 49060 MT (<--observed.value--)SH 34216 49837 MT (\0502\051 Distance to The.Object in Observed.World)SH 10480 50005 MT (\050w0 the.object distance)SH 37576 50782 MT (= ?dist)SH 17680 50950 MT (observed.value\051 = 22)SH 34216 51727 MT (\0502\051 15 <= ?dist <= 25)SH 9520 51895 MT (\050world159 the.object direction\051 = 0)SH 34216 52672 MT (\0502,3\051 Direction to The.Object in Observed.World)SH 10000 52840 MT (<--expected.value--)SH 37576 53617 MT (= ?dir)SH 10000 53785 MT (\050world159 previous.state\051 = w0)SH 34216 54562 MT (\0503\051 Not [-5 <= ?dir <= 5])SH 10000 54730 MT (\050w0 face.the.object prec.sat?\051 = t)SH 10480 55675 MT (<--prolog--)SH 33736 56452 MT (THEN)SH 10480 56620 MT (\050w0 battery\051 = 100)SH 36136 57397 MT (Chosen.Action of Hero = Face.The.Object)SH 10960 57565 MT (<--observed.value--)SH 10960 58510 MT (\050w0 battery observed.value\051 = 100)SH 10 /Times-Bold AF 35278 59028 MT (Figure 3-2:)SH /Times-Roman SF 40527 XM (Rule for Explanation from Figure 3-1)SH 8 /Courier-Bold AF 10480 59455 MT (\050w0 the.object direction known?\051 = t)SH 8080 60400 MT (\050w0 measure.the.object prec.sat?\051 = nil)SH 24160 50 33336 60838 UL 8560 61345 MT (<--prolog--)SH 8560 62290 MT (\050w0 the.object direction\051 = 10)SH 9040 63235 MT (<--observed.value--)SH 10800 50 33336 63884 UL 9040 64180 MT (\050w0 the.object direction observed.value\051 = 10)SH 6 /Times-Roman AF 34136 65223 MT (4)SH 8 SS 34436 65532 MT (Notice that) 20 W( the third clause in the definition of Justified.Action requires)21 W 10 /Times-Bold AF 13531 65811 MT (Figure 3-1:)SH /Times-Roman SF 18780 XM (Explanation for)SH 8 SS 33336 66456 MT (that the first step of the plan be) 94 W( essential to the plan's success. Without)93 W 10 SS 13430 66916 MT (\050Hero Justified.Action\051 = Face.The.Object)SH 8 SS 33336 67380 MT (this requirement, the definition is too weak, and can produce) 147 W( rules that)148 W 33336 68304 MT (recommend non-essential actions such) 178 W( as WAIT, provided they can be)177 W 24160 50 7200 68726 UL 33336 69228 MT (followed by other actions that eventually achieve the goal.)SH 6 SS 34136 70767 MT (5)SH 8 SS 34436 71076 MT (The fourth clause) 107 W( is not even made explicit, since this is satisfied by)106 W 33336 72000 MT (defining the rule postcondition to recommend the current action.)SH ES %%Page: 5 6 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (5)SH 8600 7886 MT (Figure 3-2 shows the english description of the) 269 W( rule)268 W 24160 50 33336 8086 UL 7600 8991 MT (produced by the Theo-Agent from the explanation) 367 W( of)368 W 8 /Courier-Bold AF 46696 9370 MT (Decision Reaction)960 W 10 /Times-Roman AF 7600 10096 MT (Figure 3-1. The) 491 W( number to the left of each rule)490 W 8 /Courier-Bold AF 47656 10315 MT (Time Time)2880 W 10 /Times-Roman AF 7600 11201 MT (precondition indicates the corresponding clause of)725 W 8 /Courier-Bold AF 33736 12205 MT (1. Construct simple plan: 34.3 sec) SH( 36.8) 1440 W( sec)SH 10 /Times-Roman AF 7600 12306 MT (Justified.Action which is) 183 W( supported by this precondition.)182 W 7600 13411 MT (For example, the first four lines in) 74 W( the rule assure that the)75 W 8 /Courier-Bold AF 33736 14095 MT (2. Construct similar plan: 5.5 sec) SH( 6.4) 1920 W( sec)SH 10 /Times-Roman AF 7600 14516 MT (robot is in a world state for which it should) 110 W( attend to the)109 W 7600 15621 MT (goal Goal.Identify.Object \050i.e., they assure) 188 W( that this goal)189 W 8 /Courier-Bold AF 33736 15985 MT (3. Apply learned rules:) SH( 0.2) 1920 W( sec) SH( 0.9) 1920 W( sec)SH 10 /Times-Roman AF 7600 16726 MT (will be active,) 202 W( and that all higher priority goals will be)201 W 7600 17831 MT (inactive\051. Of) 444 W( course this rule need not explicitly) 97 W( mention)98 W /Times-Bold SF 34098 18561 MT (Table 3-1:)SH /Times-Roman SF 38959 XM (Effect of Learning on Agent Response Time)SH 7600 18936 MT (this goal or any other, since it instead mentions) 381 W( the)380 W 7600 20041 MT (observed sense features which imply) 102 W( the activation of the)103 W 9 SS 35536 20102 MT (\050Timings are in CommonLisp on a Sun3 workstation\051)SH 10 SS 7600 21146 MT (relevant goals. Similarly, the rule need not mention the)168 W 24160 50 33336 21758 UL 7600 22251 MT (plan, since) 11 W( it instead mentions those conditions, labeled \0502\051)12 W 33736 23349 MT (the cost of producing a very similar plan on the next cycle.)24 W 7600 23356 MT (and \0503\051, which) 2 W( assure that the first step of the plan will lead)1 W 33736 24454 MT (The speedup over the first line is due to) 175 W( the use of slot)174 W 7600 24461 MT (eventually to achieving the desired goal.)SH 33736 25559 MT (values which were cached during the) 478 W( first planning)479 W 8600 25704 MT (In all, the agent typically) 251 W( learns from five to fifteen)252 W 33736 26664 MT (episode, and whose) 78 W( explanations remain valid through the)77 W 7600 26809 MT (stimulus-response rules) 176 W( for this set of goals and actions,)175 W 33736 27769 MT (second cycle. The third) 99 W( line shows the timing for a third)100 W 7600 27914 MT (depending on its) 88 W( specific experiences and the sequence in)89 W 33736 28874 MT (cycle in which) 55 W( the agent applied a set of learned stimulus-)54 W 7600 29019 MT (which they are encountered. By adding) 259 W( and removing)258 W 33736 29979 MT (response rules to determine the same action. Here,)422 W 7600 30124 MT (other goals and actions, other agents can be specified that)85 W 33736 31084 MT (decision time \050200) 162 W( msec.\051 is comparable to sensing time)161 W 7600 31229 MT (will "compile out" into sets of) 42 W( stimulus-response rules that)41 W 33736 32189 MT (\050500 msec\051 and the time to) 98 W( initiate execution of the robot)99 W 7600 32334 MT (produce different behaviors.)SH 33736 33294 MT (action \050200 msec.\051, so that) 443 W( decision time no longer)442 W 33736 34399 MT (constitutes the) 53 W( bulk of overall reaction time. The decision)54 W 33736 35504 MT (time is found empirically to) 1 W( require 80 + 14r msec. to test a)SH 11 /Times-Bold AF 7200 35951 MT (3.2. Impact of Experience on Agent Reaction Time)SH 8 /Times-Roman AF 46179 36272 MT (6)SH 10 SS 33736 36617 MT (set of r stimulus-response rules)SH 46579 XM (.)SH 8600 37056 MT (With experience, the typical reaction time of the) 99 W( Theo-)100 W 34736 37860 MT (Of course the specific timing) 574 W( figures above are)573 W 7600 38161 MT (Agent in the) 105 W( above scenario drops from a few minutes to)104 W 33736 38965 MT (dependent on) 144 W( the particular agent goals, sensors, training)145 W 7600 39266 MT (under a second, due to its acquisition of) 50 W( stimulus-response)51 W 33736 40070 MT (experience, actions, etc. Scaling to more complex agents)106 W 7600 40371 MT (rules and its caching of beliefs. Let us define)4 W /Times-Italic SF 26052 XM (reaction time)3 W /Times-Roman SF 33736 41175 MT (that require hundreds or thousands of stimulus-response)229 W 7600 41476 MT (as the) 220 W( time required for a single iteration of the sense-)221 W 33736 42280 MT (rules, rather than ten, is likely to) 679 W( require more)678 W 7600 42581 MT (decide-execute loop of) 6 W( the agent. Similarly, define)5 W /Times-Italic SF 28360 XM (sensing)SH /Times-Roman SF 33736 43385 MT (sophisticated methods) 441 W( for encoding and indexing the)442 W /Times-Italic SF 7600 43686 MT (time)SH /Times-Roman SF (,)SH /Times-Italic SF 9829 XM (decision time)7 W /Times-Roman SF (, and)8 W /Times-Italic SF 17351 XM (execution time)8 W /Times-Roman SF 23421 XM (as the time required)8 W 33736 44490 MT (learned stimulus-response pairings. Approaches such) 197 W( as)196 W 7600 44791 MT (for the corresponding portions of this cycle. Decision) 17 W( time)16 W 33736 45595 MT (Rete matching, or encoding stimulus-response) 122 W( pairings in)123 W 7600 45896 MT (is reduced by two factors:)SH 33736 46700 MT (some type of network) 6 W( [Rosenschein) SH( 85,) 6 W( Brooks) SH( 86] may) 6 W( be)5 W /Symbol SF 8390 47349 MT (\267)SH /Times-Roman SF 9100 XM (Acquisition of stimulus-response rules. Matching) 11 W( a)12 W 33736 47805 MT (important for scaling) 192 W( to larger systems. At present, the)193 W 9100 48454 MT (stimulus-response rule requires) 90 W( on the order of ten)89 W 33736 48910 MT (significant result reported here) 312 W( is simply the existence)311 W 9100 49559 MT (milliseconds, whereas planning typically) 239 W( requires)240 W 33736 50015 MT (proof that) 39 W( the learning mechanisms employed in the Theo-)40 W 9100 50664 MT (several minutes.)SH 33736 51120 MT (Agent are sufficient) 58 W( to reduce decision time by two orders)57 W /Symbol SF 8390 51990 MT (\267)SH /Times-Roman SF 9100 XM (Caching of beliefs about future world states.) 120 W( The)488 W 33736 52225 MT (of magnitude for a real robot with) 112 W( fairly simple goals, so)113 W 9100 53095 MT (time required by) 81 W( planning is reduced as a result of)82 W 33736 53330 MT (that decision time ceases to dominate overall reaction) 40 W( time)39 W 9100 54200 MT (caching all agent beliefs. In particular,) 526 W( the)525 W 33736 54435 MT (of the agent.)SH 9100 55305 MT (descriptions of) 219 W( future world states considered by)220 W 9100 56410 MT (the planner \050e.g., "the wrist sonar) 215 W( reading in the)214 W 9100 57515 MT (world that will result from applying the action)327 W 12 /Times-Bold AF 33336 58119 MT (4. Summary, Limitations and Future Work)SH 10 /Times-Roman AF 9100 58620 MT (Forward.10 to) 289 W( the current Observed.World"\051 are)288 W 34736 59224 MT (The key design features of the Theo-Agent are:)SH 9100 59725 MT (cached, and remain as beliefs of the agent) 238 W( even)239 W /Symbol SF 34526 60677 MT (\267)SH /Times-Roman SF 35236 XM (A stimulus-response system combined) 524 W( with a)525 W 9100 60830 MT (after its sensed world is updated. Some) 231 W( cached)230 W 35236 61782 MT (planning component of) 202 W( broader scope but slower)201 W 9100 61935 MT (features of this) 41 W( imagined future world may become)42 W 35236 62887 MT (response time.) 353 W( This) 957 W( combination allows quick)354 W 9100 63040 MT (uncached each cycle) 392 W( as old sensed values are)391 W 35236 63992 MT (response for routine situations, plus) 215 W( flexibility to)214 W 9100 64145 MT (replaced by newer ones, but others tend to remain.)SH 35236 65097 MT (plan when novel situations are encountered.)SH 8600 65484 MT (The improvement in agent reaction time is) 89 W( summarized)90 W /Symbol SF 34526 66423 MT (\267)SH /Times-Roman SF 35236 XM (Explanation-based learning) 1189 W( mechanism for)1190 W 7600 66589 MT (in the timing data from a typical scenario,) 96 W( shown in table)95 W 7600 67694 MT (3-1. The) 440 W( first) 95 W( line shows decision time and total reaction)96 W 7600 68799 MT (time for a sense-decide-execute) 21 W( cycle in which a plan must)20 W 10800 50 33336 69428 UL 7600 69904 MT (be created. Notice that here) 103 W( decision time constitutes the)104 W 6 SS 34136 70767 MT (6)SH 10 SS 7600 71009 MT (bulk of reaction) 55 W( time. The second line of this table shows)54 W 8 SS 34436 71076 MT (Rules are simply tested in sequence) 80 W( with no sophisticated indexing or)81 W 33336 72000 MT (parallel match algorithms.)SH ES %%Page: 6 7 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (6)SH 9100 7886 MT (incrementally augmenting) 449 W( the stimulus-response)448 W 35236 XM (are interested in extending the system to) 38 W( allow it to)39 W 9100 8991 MT (component of the system.) 7 W( When) 266 W( forced to plan, the)8 W 35236 XM (inductively learn better models of the) 75 W( effects of its)74 W 9100 10096 MT (agent formulates new stimulus-response rules that)147 W 35236 XM (actions, as a result of its experience.) 137 W( Preliminary)525 W 9100 11201 MT (produce precisely the same decision) 97 W( as the current)98 W 35236 XM (results with this) 96 W( kind of learning using a hand-eye)95 W 9100 12306 MT (plan, in precisely the same situations.)SH 35236 XM (robot are described in) 808 W( [Christiansen,) SH( et) 808 W( al.)809 W 35236 13411 MT (90, Zrimic and Mowforth 88].)SH /Symbol SF 8390 13632 MT (\267)SH /Times-Roman SF 9100 XM (The agent chooses its own) 325 W( goals based on the)324 W 9100 14737 MT (sensed world state, goal activation conditions) 150 W( and)151 W /Symbol SF 34526 XM (\267)SH /Times-Roman SF 35236 XM (The current planner considers the correctness of) 50 W( its)49 W 9100 15842 MT (relative goal priorities. Goals) 558 W( are explicitly)557 W 35236 XM (plans, but not the cost of) 404 W( sensing or effector)405 W 9100 16947 MT (considered by the agent)557 W /Times-Italic SF 21048 XM (only)SH /Times-Roman SF 23577 XM (when it must)558 W 35236 XM (commands. Therefore,) 398 W( the plans and the stimulus-)73 W 9100 18052 MT (construct plans. As the) 546 W( number of learned)545 W 35236 XM (response rules derived from) 319 W( them may refer to)320 W 9100 19157 MT (stimulus-response rules grows, the) 130 W( frequency with)131 W 35236 XM (sense features which are quite) 70 W( expensive to obtain,)69 W 9100 20262 MT (which the agent explicitly) 470 W( considers its goals)469 W 35236 XM (and which) 469 W( contribute in only minor ways to)470 W 9100 21367 MT (decreases.)SH 35236 XM (successful behavior. For instance, in order to)367 W 35236 22472 MT (guarantee correctness of a plan to pick up a) 68 W( cup, it)69 W /Symbol SF 8390 22693 MT (\267)SH /Times-Roman SF 9100 XM (Caching and dependency maintenance) 547 W( for all)548 W 35236 23577 MT (might be necessary to verify that the cup) 230 W( is not)229 W 9100 23798 MT (beliefs of the agent. Every belief of the agent is)157 W 35236 24682 MT (glued to) 334 W( the floor. The current system would)335 W 9100 24903 MT (cached along with an explanation that) 303 W( indicates)304 W 35236 25787 MT (include such a test in the stimulus-response) 232 W( rule)231 W 9100 26008 MT (those beliefs on which it depends.) 141 W( Whenever) 530 W( the)140 W 35236 26892 MT (that recommends the grasp operation, provided) 39 W( this)40 W 9100 27113 MT (agent sense inputs change, dependent beliefs) 30 W( which)31 W 35236 27997 MT (feature was considered by the planner.) 162 W( We) 572 W( must)161 W 9100 28218 MT (are affected are deleted, to be recomputed) 183 W( if and)182 W 35236 29102 MT (find a way to allow the agent) 37 W( to choose which tests)38 W 9100 29323 MT (when they are subsequently queried.)SH 35236 30207 MT (are necessary and) 52 W( which can be ignored in order to)51 W /Symbol SF 8390 30649 MT (\267)SH /Times-Roman SF 9100 XM (Distinction between) 280 W( eagerly and lazily refreshed)281 W 35236 31312 MT (construct plausible plans that it can) 172 W( then attempt,)173 W 9100 31754 MT (sense features. In order to) 273 W( minimize the lower)272 W 35236 32417 MT (and recover from as needed.)SH 9100 32859 MT (bound on reaction) 69 W( time, selected sense features are)70 W /Symbol SF 34526 33743 MT (\267)SH /Times-Roman SF 35236 XM (Scaling issues. As noted in) 158 W( the previous section,)157 W 9100 33964 MT (eagerly updated during each) 237 W( agent cycle. Other)236 W 35236 34848 MT (the current robot system requires only) 1 W( a small set of)2 W 9100 35069 MT (features are lazily updated by deleting) 224 W( them and)225 W 35236 35953 MT (stimulus-response rules to govern its) 30 W( behavior. We)29 W 9100 36174 MT (recomputing them if and when they) 173 W( are required.)172 W 35236 37058 MT (must consider how the approach can be scaled) 151 W( to)152 W 9100 37279 MT (This provides a simple focus of attention)738 W 35236 38163 MT (more complex) 65 W( situations. Some avenues are to \0501\051)64 W 9100 38384 MT (mechanism that helps minimize response time.) 101 W( In)451 W 35236 39268 MT (explore other strategies for indexing learned)614 W 9100 39489 MT (the future, we hope to allow the agent to)578 W 35236 40373 MT (knowledge \050e.g., index rules by) 75 W( goal, so that many)74 W 9100 40594 MT (dynamically control) 91 W( the assignment of eagerly and)90 W 35236 41478 MT (subsets of rules are stored rather) 66 W( than a single set\051,)67 W 9100 41699 MT (lazily sensed features.)SH 35236 42583 MT (\0502\051 develop) 114 W( a more selective strategy for invoking)113 W 8600 43038 MT (There are several) 224 W( reasonable criticisms of the current)225 W 35236 43688 MT (learning only when the) 29 W( benefits outweigh the costs,)30 W 7600 44143 MT (TheoAgent architecture, which indicate its) 731 W( current)730 W 35236 44793 MT (and \0503\051 consider representations of) 387 W( the control)386 W 7600 45248 MT (limitations. Among) 250 W( these are:)SH 35236 45898 MT (function that sacrifice expressive) 447 W( precision for)448 W /Symbol SF 8390 46701 MT (\267)SH /Times-Roman SF 9100 XM (The kind of planning) 41 W( the TheoAgent performs may)42 W 35236 47003 MT (fixed computational cost \050e.g., fixed) 497 W( topology)496 W 9100 47806 MT (be unrealistically difficult in many situations, due)150 W 35236 48108 MT (neural networks with constant response time\051.)SH 9100 48911 MT (to lack of knowledge about the world, the likely)196 W 9100 50016 MT (effects of the agent's actions, or other) 148 W( changes in)147 W 9100 51121 MT (the world.) 49 W( One) 350 W( possible response to this limitation)50 W 34736 52164 MT (We believe the) 469 W( notion of incrementally compiling)470 W 9100 52226 MT (is to) 26 W( add new decision-making mechanisms beyond)25 W 33736 53269 MT (reactive systems) 273 W( from more general but slower search-)272 W 9100 53331 MT (the current planner) 152 W( and stimulus-response system.)153 W 33736 54374 MT (based systems is an important approach toward) 97 W( extending)98 W 9100 54436 MT (For example, one could imagine) 123 W( a decision-maker)122 W 33736 55479 MT (the flexibility of robotic systems while) 343 W( still achieving)342 W 9100 55541 MT (with an evaluation function) 409 W( over world states,)410 W 33736 56584 MT (respectable \050asymptotic\051 response times.) 354 W( The) 959 W( specific)355 W 9100 56646 MT (which evaluates actions) 770 W( based on one-step)769 W 33736 57689 MT (design of the) 131 W( Theo-Agent illustrates one way to organize)130 W 9100 57751 MT (lookahead \050similar to that) 330 W( proposed in Sutton's)331 W 33736 58794 MT (such a system.) 395 W( Our) 1041 W( intent is to extend the current)396 W 9100 58856 MT (DYNA [Sutton) SH( 90].\051.) 181 W( As) 610 W( suggested in) 180 W( [Kaelbling)SH 33736 59899 MT (architecture by adding) 90 W( new learning mechanisms that will)89 W 9100 59961 MT (86], a spectrum of multiple-decision makers could)104 W 33736 61004 MT (allow it to improve the) 9 W( correctness of its action models and)10 W 9100 61066 MT (trade off response) 35 W( speed for correctness. However,)34 W 33736 62109 MT (its abilities to usefully perceive its) 11 W( world. These additional)10 W 9100 62171 MT (learning mechanisms such) 17 W( as those used here might)18 W 33736 63214 MT (learning capabilities are intended to complement the type)121 W 9100 63276 MT (still compile stimulus-response rules from the)491 W 33736 64319 MT (of learning presented here.)SH 9100 64381 MT (decisions produced by this spectrum of) 201 W( decision-)202 W 9100 65486 MT (makers.)SH /Symbol SF 8390 66812 MT (\267)SH /Times-Roman SF 9100 XM (Although the TheoAgent learns to become)747 W 9100 67917 MT (increasingly reactive, its decisions do) 150 W( not become)151 W /Times-Bold SF 34736 67941 MT (Acknowledgements.)SH /Times-Roman SF 43902 XM (This work is based on extensions)69 W 9100 69022 MT (increasingly correct. The acquired stimulus-)575 W 33736 69046 MT (to earlier) 129 W( joint work with Jim Blythe, reported in) 130 W( [Blythe)SH 9100 70127 MT (response rules) 81 W( are only as good as the planner and)82 W 33736 70151 MT (and Mitchell 89]. I am most grateful for Jim's) 68 W( significant)67 W 9100 71232 MT (action models from which they are compiled. We)92 W 33736 71256 MT (contributions to the) 223 W( design of the Theo-Agent. Thanks)224 W ES %%Page: 7 8 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (7)SH 7600 7886 MT (also to the entire) 172 W( Theo group, which produced the Theo)171 W 33336 XM ([Laird and Rosenbloom 90])SH 7600 8991 MT (system on which Theo-Agent is built.) 109 W( Theo) 469 W( provides the)110 W 40836 XM (Laird, J.E. and Rosenbloom, P.S.)SH 7600 10096 MT (underlying inference,) 956 W( representation, and learning)955 W 40836 XM (Integrating Planning, Execution, and)SH 7600 11201 MT (mechanisms used by) 150 W( the Theo-Agent. Finally, thanks to)151 W 42336 XM (Learning in Soar for External)SH 7600 12306 MT (Long-Ji Lin who developed) 138 W( a number of the routines for)137 W 42336 XM (Environments.)SH 7600 13411 MT (interfacing from) 112 W( workstations to the robot. This research)113 W 40836 XM (In)SH /Times-Italic SF 41919 XM (Proceedings of AAAI '90)SH /Times-Roman SF (. AAAI,)250 W 7600 14516 MT (is supported by DARPA) 739 W( under research contract)738 W 42336 XM (1990.)SH 7600 15621 MT (N00014-85-K-0116 and by) 53 W( NASA under research contract)54 W 33336 16307 MT ([Lin, et al. 89])SH 40836 XM (Lin, L., Philips, A., Mitchell, T., and)SH 7600 16726 MT (NAGW-1175.)SH 40836 17412 MT (Simmons, R.)SH /Times-Italic SF 40836 18517 MT (A Case Study in Robot Exploration)SH /Times-Roman SF (.)SH 40836 19622 MT (Robotics Institute Technical Report)SH 12 /Times-Bold AF 7200 20410 MT (References)SH 10 /Times-Roman AF 42336 20727 MT (CMU-RI-89-001, Carnegie Mellon)SH 42336 21832 MT (University, Robotics Institute,)SH 7200 22201 MT ([Agre and Chapman 87])SH 42336 22937 MT (January, 1989.)SH 14700 23306 MT (Agre, P. and Chapman, D.)SH 14700 24411 MT (Pengi: An Implementation of a Theory of)SH 33336 24728 MT ([Mitchell, et al 86])SH 40836 XM (Mitchell, T.M., Keller, R.K., and Kedar-)SH 16200 25516 MT (Activity.)SH 40836 25833 MT (Cabelli, S.)SH 14700 26621 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of the National)SH /Times-Roman SF 40836 26938 MT (Explanation-Based Generalization: A)SH /Times-Italic SF 16200 27726 MT (Conference on Artificial Intelligence)SH /Times-Roman SF (,)SH 42336 28043 MT (Unifying View.)SH 16200 28831 MT (pages 268-272. Morgan Kaufmann,)SH /Times-Italic SF 40836 29148 MT (Machine Learning)SH /Times-Roman SF 48502 XM (1\0501\051, 1986.)SH 16200 29936 MT (July, 1987.)SH 33336 30939 MT ([Mitchell, et al. 90])SH 7200 31727 MT ([Blythe and Mitchell 89])SH 40836 32044 MT (Mitchell, T. M., J. Allen, P. Chalasani,)SH 14700 32832 MT (Blythe, J., and Mitchell, T.)SH 40836 33149 MT (J. Cheng, O. Etzioni, M. Ringuette, and)SH 14700 33937 MT (On Becoming Reactive.)SH 40836 34254 MT (J. Schlimmer.)SH 14700 35042 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of the Sixth International)SH /Times-Roman SF 40836 35359 MT (Theo: A Framework for Self-improving)SH /Times-Italic SF 16200 36147 MT (Machine Learning Workshop)SH /Times-Roman SF (, pages)SH 42336 36464 MT (Systems.)SH 16200 37252 MT (255-259. Morgan) 250 W( Kaufmann, June,)SH 40836 37569 MT (In VanLehn, K. \050editor\051,)SH /Times-Italic SF 50861 XM (Architectures)SH /Times-Roman SF 16200 38357 MT (1989.)SH /Times-Italic SF 42336 38674 MT (for Intelligence)SH /Times-Roman SF (. Erlbaum, 1990.)SH 7200 40148 MT ([Brooks 86])SH 14700 XM (Brooks, R.A.)SH 33336 40465 MT ([Pommerleau 89])SH 40836 XM (Pommerleau, D.A.)SH 14700 41253 MT (A Robust Layered Control System for a)SH 40836 41570 MT (ALVINN: An Autonomous Land)SH 16200 42358 MT (Mobile Robot.)SH 42336 42675 MT (Vehicle In a Neural Network.)SH /Times-Italic SF 14700 43463 MT (IEEE Journal of Robotics and)SH /Times-Roman SF 40836 43780 MT (In Touretzky, D. \050editor\051,)SH /Times-Italic SF 51250 XM (Advances in)SH 16200 44568 MT (Automation)SH /Times-Roman SF 21117 XM (2\0501\051, March, 1986.)SH /Times-Italic SF 42336 44885 MT (Nerual Information Processing)SH 42336 45990 MT (Systems, Vol. 1)SH /Times-Roman SF (. Morgan Kaufmann,)SH 7200 46359 MT ([Christiansen, et al. 90])SH 42336 47095 MT (1989.)SH 14700 47464 MT (Christiansen, A., Mason, M., and)SH 14700 48569 MT (Mitchell, T.)SH 33336 48886 MT ([Rosenschein 85])SH 40836 XM (Rosenschein, S.)SH 14700 49674 MT (Learning Reliable Manipulation)SH 40836 49991 MT (Formal Theories of Knowledge in AI and)SH 16200 50779 MT (Strategies without Initial Physical)SH 42336 51096 MT (Robotics.)SH 16200 51884 MT (Models.)SH /Times-Italic SF 40836 52201 MT (New Generation Computing)SH /Times-Roman SF 52364 XM (3:345-357,)SH 14700 52989 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of the IEEE International)SH /Times-Roman SF 42336 53306 MT (1985.)SH /Times-Italic SF 16200 54094 MT (Conference on Robotics and)SH /Times-Roman SF 33336 55097 MT ([Schoppers 87])SH 40836 XM (Schoppers, M.J.)SH /Times-Italic SF 16200 55199 MT (Automation)SH /Times-Roman SF (. IEEE) 250 W( Press, May, 1990.)SH 40836 56202 MT (Universal Plans for Reactive Robots in)SH 7200 56990 MT ([Kaelbling 86])SH 14700 XM (Kaelbling, L.P.)SH 42336 57307 MT (Unpredictable Environments.)SH 14700 58095 MT (An Architecture for Intelligent Reactive)SH 40836 58412 MT (In)SH /Times-Italic SF 41919 XM (Proceedings of the Tenth International)SH /Times-Roman SF 16200 59200 MT (Systems.)SH /Times-Italic SF 42336 59517 MT (Joint Conference on Artificial)SH /Times-Roman SF 14700 60305 MT (In M.P. Georgeff and A.L. Lansky)SH /Times-Italic SF 42336 60622 MT (Intelligence)SH /Times-Roman SF (, pages 1039-1046.)SH 16200 61410 MT (\050editor\051,)SH /Times-Italic SF 19699 XM (Reasoning about Actions)SH /Times-Roman SF 42336 61727 MT (AAAI, August, 1987.)SH /Times-Italic SF 16200 62515 MT (and Plans: Proceedings of the 1986)SH /Times-Roman SF 33336 63518 MT ([Segre 88])SH 40836 XM (Segre, A.M.)SH /Times-Italic SF 16200 63620 MT (Workshop)SH /Times-Roman SF (. Morgan) 250 W( Kaufmann ,)SH /Times-Italic SF 40836 64623 MT (Machine Learning of Robot Assembly)SH /Times-Roman SF 16200 64725 MT (1986.)SH /Times-Italic SF 42336 65728 MT (Plans.)SH /Times-Roman SF 40836 66833 MT (Kluwer Academic Press, 1988.)SH ES %%Page: 8 9 BS 0 SI 10 /Times-Roman AF 30350 4286 MT (8)SH 7200 7886 MT ([Sutton 90])SH 14700 XM (Sutton, R.)SH 14700 8991 MT (First Results with DYNA, an Integrated)SH 16200 10096 MT (Architecture for Learning, Planning,)SH 16200 11201 MT (and Reacting.)SH 14700 12306 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of AAAI Spring)SH 16200 13411 MT (Symposium on Planning in)SH 16200 14516 MT (Uncertain, Unpredictable, or)SH 16200 15621 MT (Changing Environments)SH /Times-Roman SF (, pages)SH 16200 16726 MT (136-140. AAAI,) 250 W( March, 1990.)SH 7200 18517 MT ([Tan 90])SH 14700 XM (Tan, M.)SH 14700 19622 MT (CSL: A Cost-Sensitive Learning System)SH 16200 20727 MT (for Sensing and Grasping Objects.)SH 14700 21832 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of the 1990 IEEE)SH 16200 22937 MT (International Conference on Robotics)SH 16200 24042 MT (and Automation)SH /Times-Roman SF (. IEEE,) 250 W( May, 1990.)SH 7200 25833 MT ([Zrimic and Mowforth 88])SH 14700 26938 MT (Zrimic, T., and Mowforth, P.)SH 14700 28043 MT (An Experiment in Generating Deep)SH 16200 29148 MT (Knowledge for Robots.)SH 14700 30253 MT (In)SH /Times-Italic SF 15783 XM (Proceedings of the Conference on)SH 16200 31358 MT (Representation and Reasoning in an)SH 16200 32463 MT (Autonomous Agent)SH /Times-Roman SF (. 1988.)250 W ES %%Page: i 10 BS 0 SI 10 /Times-Roman AF 30461 4286 MT (i)SH 12 /Times-Bold AF 26033 8004 MT (Table of Contents)SH 11 SS 8850 9172 MT (1. Introduction and Motivation)SH 53450 XM (0)SH 10 SS 10700 10252 MT (1.1. Related Work)SH 53500 XM (0)SH 11 SS 8850 11420 MT (2. The Theo-Agent Architecture)SH 53450 XM (1)SH 8850 12588 MT (3. Example and Results)SH 53450 XM (2)SH 10 SS 10700 13668 MT (3.1. Rule Learning)SH 53500 XM (3)SH 10700 14748 MT (3.2. Impact of Experience on Agent Reaction Time)SH 53500 XM (5)SH 11 SS 8850 15916 MT (4. Summary, Limitations and Future Work)SH 53450 XM (5)SH 8850 17084 MT (References)SH 53450 XM (7)SH ES %%Page: ii 11 BS 0 SI 10 /Times-Roman AF 30322 4286 MT (ii)SH 12 /Times-Bold AF 26866 8004 MT (List of Figures)SH 11 SS 8850 9172 MT (Figure 2-1:) SH( Data) 550 W( Flow in a Theo-Agent)SH 53450 XM (2)SH 8850 10340 MT (Figure 3-1:) SH( Explanation) 550 W( for \050Hero Justified.Action\051 = Face.The.Object)SH 53450 XM (4)SH 8850 11508 MT (Figure 3-2:) SH( Rule) 550 W( for Explanation from Figure 3-1)SH 53450 XM (4)SH ES %%Page: iii 12 BS 0 SI 10 /Times-Roman AF 30183 4286 MT (iii)SH 12 /Times-Bold AF 27099 8004 MT (List of Tables)SH 11 SS 8850 9172 MT (Table 3-1:) SH( Effect) 550 W( of Learning on Agent Response Time)SH 53450 XM (5)SH ES %%Trailer %%Pages: 12 %%DocumentFonts: Times-Roman Times-Bold Times-Italic Symbol Courier-Bold