4.8 Article

Anchor-Free Correlated Topic Modeling

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2018.2827377

关键词

Topic modeling; identifiability; anchor free; sufficiently scattered; non-convex optimization; non negative matrix factorization

资金

  1. US National Science Foundation (NSF) [ECCS-1608961, IIS-1247632]
  2. NSFC [61671411, U1709219, 61374020]
  3. Fundamental Research Funds for the Central Universities
  4. Zhejiang Provincial NSF of China [LR15F010002]
  5. US National Science Foundation [CCF-1526078, CMMI-1727757]
  6. AFOSR [15RT0767]

向作者/读者索取更多资源

In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has a characteristic anchor word that only appears in that topic. The anchor-word assumption is fragile in practice, because words and terms have multiple uses; yet it is commonly adopted because it enables identifiability guarantees. Remedies in the literature include using three- or higher-order word co-occurence statistics to come up with tensor factorization models, but such statistics need many more samples to obtain reliable estimates, and identifiability still hinges on additional assumptions, such as consecutive words being persistently drawn from the same topic. In this work, we propose a new topic identification criterion using second order statistics of the words. The criterion is theoretically guaranteed to identify the underlying topics even when the anchor-word assumption is grossly violated. An algorithm based on alternating optimization, and an efficient primal-dual algorithm are proposed to handle the resulting identification problem. The former exhibits high performance and is completely parameter-free; the latter affords up to 200 times speedup relative to the former, but requires step-size tuning and a slight sacrifice in accuracy. A variety of real text copora are employed to showcase the effectiveness of the approach, where the proposed anchor-free method demonstrates substantial improvements compared to a number of anchor-word based approaches under various evaluation metrics.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据