计算机科学与技术学院学术报告

报告题目:Reinforcement Learning for Text Understanding

报告人:秦涛 微软亚洲研究院 主管研究员

报告时间:2017年3月17日  8:40-9:20

地点:本部理工楼504会议室

报告摘要:

In recent years reinforcement learning (RL) has achieved great success in video and board games. Such success has motivated researchers and industry practitioners to apply reinforcement learning techniques to real world applications. In this talk I will present two of our recent research projects. In the first one, we introduce value networks into neural machine translation (NMT) to improve its decoding procedure. We propose a recurrent structure for the value network, and train its parameters from bilingual data. During the test time, when choosing a word for decoding, we consider both its conditional probability given by the NMT model and its long-term value predicted by the value network. In the second project, we extend existing RL methods for sequence prediction to exploit unlabeled data. To leverage unlabeled data, we propose to learn the reward function from labeled data and use the predicted reward as pseudo reward for unlabeled data. We propose a RNN-based reward network with attention mechanism, trained with purposely biased data distribution. We show that the pseudo reward can provide fairly good supervision, and guide the learning process on unlabeled data.

报告人简介:

秦涛博士,微软亚洲研究院主管研究员,中国科学技术大学兼职博导。在国际会议和期刊发表学术论文100余篇,现/曾任SIGIR、ACML、AAMAS领域主席,ICML、NIPS、KDD、IJCAI、AAAI、WSDM、EC、SIGIR、AAMAS、WINE 等多个国际学术大会程序委员会成员,曾任多个国际学术研讨会联合主席。他的团队的主要研究方向是深度学习和强化学习的算法和理论及其在实际问题中的应用。


报告题目:Policy Search by Direct Action Optimization

报告人:俞扬 南京大学 副教授

报告时间:2017年3月17日 9:20-10:10

地点:苏州大学本部理工楼504会议室

报告摘要:

Policy search methods have been shown effective in complex reinforcement learning tasks. Previous methods mostly focus on optimizing the parameters of the policy model, e.g., neural networks. This, however, can be inefficient due to the sophisticated mapping from the policy parameters to the policy actions. We propose to shift the optimization from policy parameters to state actions. The new approach not only enjoys a more efficient optimization, but also separates the optimization and policy model learning, such that any supervised learning approach can be directly employed to learn the policy model. Empirical studies in several controlling tasks show that the new approach can beat some state-of-the-art policy search methods.

报告人简介:

俞扬,博士,南京大学副教授,博士生导师。主要研究领域为人工智能、机器学习、演化计算。分别于2004年和2011年获得南京大学计算机科学与技术系学士学位和博士学位。20118月加入南京大学计算机科学与技术系、机器学习与数据挖掘研究所(LAMDA)从事教学与科研工作。曾获2013年全国优秀博士学位论文奖、2011年中国计算机学会优秀博士学位论文奖。发表论文40余篇,包括多篇Artificial IntelligenceIJCAIAAAINIPSKDD等人工智能、机器学习和数据挖掘国际顶级期刊和顶级会议论文,研究成果获得IDEAL'16 Best PaperKDD'12 Best PosterGECCO'11 Best Theory PaperPAKDD'08 Best PaperPAKDD'06数据挖掘竞赛冠军等论文和竞赛奖。任人工智能领域国际顶级会议IJCAI-15/17高级程序委员,任IJCAI-16/-17宣传共同主席、IEEE ICDM-16宣传共同主席、ACML-16 Workshop共同主席。


 

报告题目:Policy Gradient in Multi-agent Learning

报告人:郝建业 天津大学 副教授

报告时间:2017年3月17日10:30-11:10

地点:本部理工楼504会议室

报告摘要:

Recent years have witnessed various applications of policy-gradient learning into multi-agent settings to facilitate coordination among agents. In this talk, I will first review the history of policy-gradient based multi-agent learning approaches. Most of the existing approaches aims at learning towards Nash equilibrium solutions, which may be far from optimality in terms of their received payoffs. Then I will show how social awareness can be incorporated into agent’s policy update to improve agents’ utilities.

报告人简介:

郝建业博士,现任天津大学软件学院副教授,天津市青年千人计划专家。博士毕业于香港中文大学计算机科学与工程专业,先后在麻省理工学院和新加坡科技与设计大学担任博士后研究员。主要研究方向为多智能体系统、机器学习、软件工程等。在多智能体系统、人工智能、软件工程等领域的国际期刊和国际会议上发表论文40余篇,专著一部。曾获得多项香港地区和国际学术奖励,包括ANAC国际谈判比赛2012年度冠军、2015年度亚军、澳大利亚教育部 Endeavor Fellowship、香港中文大学全球杰出研究者等。同时担任多个顶级期刊(包括JAAMAS, TAAS,TOSEM等)审稿人,美国国家科学基金委物联网项目评审专家。


 

(计算机科学与技术学院)
苏大概况 教育教学
院部设置 科学研究
组织机构 合作交流
招生就业 公共服务
版权所有©苏州大学

地址:江苏省苏州市姑苏区十梓街1号

苏ICP备10229414号-1
苏公网安备 32050802010530号
推荐使用IE8.0以上浏览器,1440*900以上分辨率访问本网站