{“状态”：“确定”，“消息类型”：“工作”，“信息版本”：“1.0.0”，“邮件”：{“索引”：{-“日期部分”：[[2024,9,9]]，“日期时间”：“2024-09-09T06:46:00Z”，“时间戳”：1725864360727}，“出版商位置”：“查姆”，“参考计数”：33，“出版者”：“斯普林格国际出版”，“isbn-type”：[{“类型”：”打印“，“值”：“9783319462264“}，{”类型“：”电子“，“value”：“9783319462271”}]，“license”：[{“start”：{“date-parts”：[[2016,1,1]]，“date-time”：“2016-01-01T00:00:00Z”，“timestamp”：1451606400000}，“content-version”：“tdm”，“delay-in-days”：0，“URL”：“https:\\/www.springer.com\/tdm”}-01T00:00:00Z“，“timestamp”：1451606400000}，“content-version”：“vor”，“delay-in-days”：0，“URL”：“https:\/\/www.springer.com//tdm”}]，“content-domain”：{“domain”:[“link.springer.com”]，“crossmark-restriction”：false}，“short-container-title”：[]，“published-print”：{“date-parts”：[[2016]]}，”DOI“：”10.1007\/978-319-46227-1_30“，”类型“：”book-chapter“，”创建“：{”date-part“：[[[2016,9,3]]，“日期时间”：“2016-09-03T01:34:10Z”，“时间戳”：1472866450000｝，“页面”：“475-491”，“更新策略”：“http:\/\/dx.doi.org/10.1007\/springer_crossmark_policy”，“源”：“Crossref”，“被计数引用”：9，“标题”：[“马尔可夫决策过程中的信息处理约束和模型不确定性规划”]，“前缀”：“10.1007”，“作者”：[｛“给定”：“Jordi”，“家族”：“Grau Moya”、“sequence”：“first”，“affiliation”：[]}，{“given”：“Felix”，“family”：“Leibfried”，“sequences”：“additional”，“filiation“：[]{”given“：”Tim“，”family“：”Genewein“，”sequence“：”additional“，”affiliance“：[]}，}“gived”：“Daniel A.”，“家族”：“Braun”，“序列”：“附加”，“从属关系”：[]{“成员”：“297”，“published-on-line”：{“日期部分”：[[2016,9,4]]}，“参考”：[{“volume-title”：“自适应控制”，“year”：“2013”，“author”：“KJ\u00c5str\u00f6m”，“key”：“30_CR1”、“unstructured”：“u00c5 str\u00 f6m，K.J.，Wittenmark，B.：自适应控制。Courier Corporation，Mineola（2013）”}，{“key“：”30_CR2“，”：“Bellman，R.：动态编程，第1版。普林斯顿大学出版社，普林斯顿（1957）http://books.google.com/books？id=fyVtp3EMxasC&pg=PR5&dq=动态+编程+richard+e+bellman&client=firefox-a#v=onepage&q=dynamic%20programming%20richard%20e%20bellman&f=false“}，{“volume-title”：“神经动态编程”，“year”：“1996”，“author”：“D Bertsekas”，“key”：“30_CR3”，“unstructured”：“Bertsekas，D.，Tsitsiklis，J.：神经动力学编程。Athena Scientific，Belmont（1996）“}，{“key”：“30_CR4”，“doi-asserted-by”：“crossref”，“unstructured”：“Braun，D.A.，Ortega，P.A.，Theodorou，E.，Schaal，S.：路径积分控制与有限理性。摘自：2011年IEEE自适应动态规划与强化学习研讨会（ADPRL），第202\u2013209页。IEEE（2011）“，”DOI“：”10.1109\/ADPRL.2011.5967366“}，{”key“：”30_CR5“，”unstructured“：”van den Broek，B.，Wiegerink，W.，Kappen，H.J.：风险敏感路径积分控制。In:UAI（2010）“}Chow，Y.，Tamar，A.，Mannor，S.，Pavone，M.：风险敏感和稳健决策：CVaR优化方法。In:Advances In Neural Information Processing Systems，pp.1522\u20131530（2015）“}，{“key”：“30_CR7”，“unstructured”：“Duff，M.O.：最优学习：Bayes-adaptive Markov决策过程的计算程序。马萨诸塞大学Amherst（2002）博士论文”}，“{“密钥”：“30 _CR8”，“非结构化”：“Fox，R.，Pakman，A.，Tishby，N.：G-学习：通过软更新来驯服强化学习中的噪音。arXiv预印本（2015）。arXiv:1512.08562“}，{“key”：“30_CR9”，“首页”：“1573”，“卷”：“16”，“作者”：“Geramifard”，“年份”：“2015年”，“非结构化”：“杰拉米法德，A.，Dann，C.，Klein，R.H.，Dabney，W.，How，J.P.：Rlpy:一个基于价值函数的强化学习框架，用于教育和研究。J.Mach.Learn.Res.16，1573\u20131578（2015）”，“新闻标题”：“J。机器。学习。研究“}，{“key”：“30_CR10”，“unstructured”：“Guez，A.，Silver，D.，Dayan，P.：使用基于样本的搜索进行有效的Bayes自适应强化学习。In:Advanced In Neural Information Processing Systems，pp.1025\u20131033（2012）”}，}，“key“：”30_CR11“，”doi-asserted-by“：”crossref“，”first page“841”，“doi”：“10.1613\/jair.4117”，“volume”：”48“，”author“：”A Guez”，“年份”：“2013年”，“非结构化”：“Guez，A.，Silver，D.，Dayan，P.：基于Monte-Carlo树搜索的可扩展高效贝叶斯自适应强化学习。J.阿蒂夫。智力。Res.48，841\u2013883（2013）“，“期刊标题”：“J.Artif。智力。Res.“}，{”key“：”30_CR12“，”doi-asserted-by“：”publisher“，”doi“：”10.1515\/9781400829385“，”volume-title“：”Robustness“，”author“：”LP Hansen“，“year”：“2008”，“unstructured”：“Hansen，L.P.，Sargent，T.J.：Robustness.普林斯顿大学出版社，普林斯顿（2008）“}”，{“issue”：“2”，“key”：“30_CR13”，“doi-assert-by”：“publisher”，“first page”：“257”，“doi”：“10.1287\/moor.1040.0129”，“卷”：“30”，“作者”：“GN Iyengar”，“年份”：“2005”，“非结构化”：“Iyengar，G.N.：鲁棒动态编程。数学。操作。研究30（2），257\u2013280（2005）“，“期刊标题”：“数学。操作。Res.“}，{”issue“：”20“，”key“：”30_CR14“，”doi-asserted-by“：”publisher“，”first-page“：”200201“，”doi“：”10.1103\/PhysRevLett.952001“，“volume”：”95“，”author“：”HJ Kappen“，”year“：”2005“，”unstructured“：”Kappen，H.J.：非线性随机系统控制的线性理论。Phys.Rev.Lett.95（20），200201（2005）“，”journal-title“：”Phys.Rev.字母“}，{”issue“：”2“，”key“：”30_CR15“，”doi-asserted-by“：”publisher“，”first page“：“308”，“doi”：“10.1287\/mnsc.1060.0614”，“volume”：“53”，“author”：“S Mannor”，“year”：“2007”，“unstructured”：“Mannor，S.，Simester，D.，Sun，P.，Tsitsiklis，J.N.：价值函数估计中的偏差和方差近似。管理。科学。53（2），308\u2013322（2007）“，“新闻标题”：“管理。科学。“}，{”issue“：”5“，”key“：”30_CR16“，”doi-asserted-by“：”publisher“，”first-page“：”780“，”doi“：”10.1287\/opere.1050.0216“，”volume“：“53”，”author“：”A Nilim“，”year“：”2005“，”unstructured“：”Nilim，A.，El Ghaoui，L.：具有不确定转移矩阵的Markov决策过程的鲁棒控制。Oper.Res.53（5），780\u2013798（2005）“，”journal-title“：”操作结果“}，{“key”：“30_CR17”，“doi-asserted-by”：“crossref”，“unstructured”：“Ortega，P.A.，Braun，D.A.：基于因果干预的自适应控制的贝叶斯规则。In:第三届人工智能会议（AGI-2010），亚特兰蒂斯出版社（2010）”，“doi”：“10.2991\/AGI.2010.39”}：“475”，“DOI”：“10.1613\/jair.3062”，“volume”：“38”，“author”：“PA Ortega”，“year”：“2010”，“unstructured”：“Ortega，P.A.，Braun，D.A.：学习和行动的最小相对熵原则。J.Artif.Intell.Res.38（11），475\u2013511，“首页”：“20120683”，“DOI”：“10.1098\/rspa.2012.0683”，“volume”：“469”，“author”：“PA Ortega”，“year”：“2013”，“unstructured”：“Ortega，P.A.，Braun，D.A.：热力学是一种具有信息处理成本的决策理论。Proc.R.Soc.A.469，20120683（2013）。英国皇家学会，“journal-title”：“Proc.R.Soc.A.”}，{“issue”：“1”，“key”：”30_CR20，“doi-asserted-by”：“publisher”，“first page”：“2”，“doi”：“10.1186\/2194-3206-2-2”，《volume》：“2》，“author”：“PA Ortega”，“year”：“2014”，“unstructured”：“Ortega，P.A.，Braun，D.A.：序列决策和因果推断的广义汤普森抽样。Complex Adapt.Syst.Model.2（1），2（2014）”，“journal title”：“Complex Adapt.Syst.Mode.”}，{“key”：“30_CR21”，“doi-asserted-by”：“crossref”，“unstructured”：“Ortega，P.A.，Braun，D.A.，Tishby，N.：广义最优性方程精确有效解的蒙特卡罗方法。摘自：2014 IEEE国际机器人与自动化会议（ICRA），第4322\u20134327页。IEEE（2014）“，“DOI”：“10.1109\/ICRA.2014.6907488”}，{“key”：“30_CR22”，“unstructured”：“Osogami，T.：马尔可夫决策过程中的稳健性和风险敏感性。in:Advances in Neural Information Processing Systems，pp.233\u2013241（2012）”}、{“密钥”：“30-CR23”，“DOI-asserted-by”：“crossref”，“非结构化”：“Peters，J.、M\u00fcling，K.、Altun，Y.、Poole，F.D.等人：相对熵政策搜索。参见：第二十四届全国人工智能会议（AAAI-10），第1607\u20131612页。AAAI Press（2010）“，”DOI“：”10.1609\/AAAI.v24i1.7727“}，{”key“：”30_CR24“，”first page“：“1729”，“volume”：“12”，“author”：“S Ross”，“year”：“2011”，“unstructured”：“Ross，S.，Pineau，J.，Chaib-draa，B.，Kreitmann，P.：在部分可观察的马尔可夫决策过程中学习和规划的贝叶斯方法。J.Mach.Learn.Res.121729\u20131770（2011），“新闻标题”：“J.Mach。学习。Res.“}，{”key“：”30_CR25“，”series-title“：”Intelligent Systems Reference Library“，”doi-asserted-by“：”publisher“，“first page”：“57”，“doi”：“10.1007\/978-3642-24647-0_3”，“volume-title”：“决策者不完善的决策”，“author”：“J Rubin”，”year“2012”，“unstructured”：“Rubin，J.、Shamir，O.、Tishby，N.：MDP中的交易价值和信息。收录：Guy，T.V.，K\u00e1rn\u00fd，M.，Wolpert，D.H.（编辑）《不完善决策者的决策》。智能系统参考图书馆，第28卷，第57\u201374页。Springer，Heidelberg（2012）“}，{“问题”：“7”，“关键”：“30_CR26”，“doi-asserted-by”：“出版商”，“首页”：“1298”，“doi”：“10.1162\/NECO_a_00600”，“卷”：“26”，《作者》：“Y Shen”，“年份”：“2014”，“非结构化”：“Shen，Y.，Tobia，M.J.，Sommer，T.，Obermayer，K.：风险敏感强化学习。神经计算。26（7），1298\u20131328（2014）“，”journal-title“：”神经计算。“}，{”key“：”30_CR27“，”first page“：“2413”，“volume”：“10”，“author”：“AL Strehl”，“year”：“2009”，“unstructured”：“Strehl，A.L.，Li，L.，Littman，M.L.：有限MDPs中的强化学习：Pac analysis。J.Mach.Learn.Res.10，2413\u20132444“crossref”，“unstructured”：Szita，I.，L\u0151rincz，A.：乐观的许多面孔：统一的方法。摘自：《第25届机器学习国际会议论文集》，第1048\u20131055页。ACM（2008）“，”DOI“：”10.1145\/1390156.1390288“}，{“key”：“30_CR29”，“unstructured”：“Szita，I.，Szepesv\u00e1ri，C.：基于模型的强化学习，探索复杂度边界几乎很紧。摘自：第27届国际机器学习会议论文集（ICML-10），pp.1031\u20131038（2010）”}，}，“key“：”30_CR30“series-title”：“认知和神经系统中的Springer系列”，“doi-asserted-by”：“publisher”，“first page”：“601”，《doi》：“10.1007\/9781-4419-1452-1_19”，“volume-title”：“感知-行动循环”，“author”：“N Tishby”，“year”：“2011”，“unstructured”：“Tishby，N.，Polani，D.：决策和行动的信息理论。摘自：Cutsuridis，V.、Hussain，A.、Taylor，J.G.（编辑）《感知-行动循环》。Springer认知和神经系统系列，第601\u2013636页。Springer，New York（2011）“｝，｛“key”：“30_CR31”，“doi asserted by”：“crossref”，“nonstructured”：“Todorov，E.：线性可解马尔可夫决策问题。In：Advances In Neural Information Processing Systems，pp.1369\u20131376（2006）”，“doi”：“10.7551\\mitpress\\7503.003.0176”｝，｛“issue”：“28”，“key”：“30_CR32”，“doi asserted by”：“publisher”，“first page”：“11478“，”DOI“：”10.1073\/pnas.0710743106“，”volume“：”106“，“author”：”E Todorov“，”year“：”2009“，”unstructured“：”Todorof，E.：优化操作的高效计算。程序。美国国家科学院。科学。106（28），11478\u201311483（2009）“，“新闻标题”：“Proc。美国国家科学院。科学。“}，{”issue“：“1”，”key“：“30_CR33”，”doi-asserted-by“：”publisher“，”first page“：”153“，“doi”：“10.1287\/moor.1120.0566”，“volume”：“38”，“author”：“W Wiesemann”，“year”：“2013”，“unstructured”：“Wieseman，W.，Kuhn，D.，Rustem，B.：稳健马尔可夫决策过程。Math.Oper.Res.38（1），153\u2013183（2013）”，“日志标题”：“数学运算结果”}]，“container-title“：[“计算机科学课堂讲稿”，“数据库中的机器学习和知识发现”]，“original-title”：[]，“language”：“en”，“link”：[{“URL”：“https:\/\/link.springer.com/content\/pdf\/10007\/978-3-319-46227-1_30”，“content-type”：“unspecified”，“content-version”：“vor”，“intended-application”：“similarity-checking”}]，“存放“：{“date-parts”：[[2023,8,19]]，“date-time”：“2023-08-19T19:24:14Z”，“timestamp”：1692473054000}，“score”：1，“resource”：{”primary“:{”URL“：”https:\/\/link.springer.com\/10007\/978-3-319-46227-1_30“}，”subtitle“：[]，”shorttitle“：[]，”issued“：{“date-parts”：[[2016]]}，《ISBN》：[“9783319”462264“，”9783319462271“]，”references-count“：33，”URL“：”http:\/\/dx.doi.org\/10.1007\/978-3-319-46227-1_30“，”关系“：{}，”ISSN“：[”0302-9743“，”1611-3349“]，”ISSN-type“：[{”类型“：”打印“，”值“：”0302-7743“}，{”型号“：”电子“，”数值“：”1611-33049“}]，”主题“：[]，”已发布“：{”日期部分“：[2016]]}，“断言”：[{“value”：“4 2016年9月“，”订单“：1，”名称“：”first_online“，”标签“：”first online“，“group”：{“name“：”ChapterHistory“，”label“：”Chapter History”}}，{“value”：“ECML PKDD”，“order”：1，“name”：“conference_acrombit”，“label”：“conference缩写”，“group”：会议名称”，“组”：｛“名称”：“ConferenceInfo”，“标签”：“会议信息”｝｝，｛“值”：“Riva del Garda”，“顺序”：3，“名称”：“Conference_city”，“标签”：“会议城市”，“组”：｛“名称”：“ConferenceInfo”，“标签”：“会议信息”｝｝，｛“值”：“意大利”，“顺序”：4，“名称”：“Conference_country”，“标签”：“会议国家”，“组”：｛“名称”：“会议信息“，”标签“：”会议信息“}}，{“value”：“2016”，“order”：5，“name”：“Conference_year”，“label”：“会议年份”，“group”：{“name”:“ConferenceInfo”，“table”：“Conferency Information”}}、{“value”：“2016-9/19”，“rder”：7，“name“Conference_start_date”，”标签：“会议开始日期”，“group”：{-“name”，“ConfernceInfo”会议信息“}}，{“value”：“2016年9月23日”，“order”：8，“name”：“Conference_end_date”，“label”：“会议结束日期”，“group”：ecml2016“，”order“：10，”name“：”conference_id“，”label“：”会议id“，”group“：