一种建立在对客户端浏览历史进行LDA建模基础上的个性化查询推荐算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

浙江省自然科学基金(LY12F02010);四川省科技支撑项目(2014GZ0063)


An algorithm for personalized query recommendation using LDA modeling of client side browsing history
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    现代搜索引擎普遍采用简单的关键词形式来表达查询,这为用户带来便利的同时也增加了准确获取信息的难度.搜索引擎很难基于少量的几个关键词准确捕捉用户的信息需求.查询推荐作为缓解上述问题的关键技术,已经开始应用于目前主流的搜索引擎.然而,绝大多数现有的查询推荐技术基于群体智慧,以搜索引擎日志为数据源,从中挖掘用户群体在构造查询方面的习性及查询之间的语义关联性,未考虑不同用户个性化的信息喜好,而且在搜索引擎服务器端进行查询推荐计算也会影响搜索引擎的响应效率和查询吞吐率.为此,本文提出一种运行于客户端的个性化查询推荐策略,该策略以用户浏览历史为数据源,采用LDA模型,并从中学习用户的信息喜好,在此基础上利用原始查询被主题模型生成的概率确定用户的查询意图,以词条与查询意图之间的关联来度量词条作为扩展查询被推荐的强度,最后选择Top N强度最强的词条作为查询推荐的结果.以人工标注的测试数据对本文所提出的算法进行了实验验证,结果表明,该算法在推荐扩展查询的准确率方面明显优于单纯基于词条与原始查询语义相关的方法

    Abstract:

    Modern search engines generally provide the method of expressing queries with a few keywords for their users, which makes convenience for the common users but also difficulty to retrieve the needed information for the search engines because it is difficult for them to capture the information needs of the users exactly based on a few keywords. Therefore, query recommendation as such a technology to alleviate the difficulty begins its applications in the popular search engines of nowadays. However, almost all the approaches of query recommendation proposed until today are based on wisdom of crowds, using search logs as information source and mining behavior patterns of the users related to query construction and semantic correlation between queries. Such approach does not consider the personalized preference of information with respect to different users, and furthermore, the recommendation computing performed on the server side would impact the response efficiency and the throughput of a search engine. In this paper a strategy of personalized query recommendation running on the client side is proposed. The strategy makes use of a users browsing history as the information source and learns information preference of a user from the information source based on LDA(Latent Dirichlet Allocation)modeling. When an original query is submitted by a user, the search intention of the user is captured by the probability distribution of generating the original query from the learned LDA model, and the correlation between a term and the captured search intention is evaluated as the recommendation strength of the term with respect to the original query. The strongest Top N terms are selected as the final recommended expanded queries for the original query. The experimental validation of the proposed algorithm was performed on a test data set annotated manually. The experimental results show that the algorithm is superior to the approaches based solely on the semantic correlation between terms and original queries with respect to the accuracy of recommending expanded queries.

    参考文献
    相似文献
    引证文献
引用本文

引用本文格式: 王桂华,陈黎,于中华,丁革建,罗谦. 一种建立在对客户端浏览历史进行LDA建模基础上的个性化查询推荐算法 [J]. 四川大学学报: 自然科学版, 2015, 52: 755~763.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2014-09-02
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2015-07-27