【NLP.TM】觀點分析

最新 02-01

【NLP.TM】

本欄目是結合我最近上的課，和我最近的研究方向，自然語言處理和文本挖掘而設計的，會講一些和自然語言處理以及文本挖掘相關的內容，歡迎大家關注和交流！

上次是從我自己的研究角度來談觀點分析，這次是我上自然語言處理的課程筆記整理。長文預警。

什麼是觀點分析

觀點是指人們對事物的看法，具有明顯的主觀性，不同人對同一事物的看法存在差異。一般可以描述為「誰」對「什麼事物」的「什麼屬性」所具有的「什麼觀點」，有時還會加上時間這一個角度，因為不同時間一個人對同一事物的觀點可能也不同。而觀點挖掘與傾向性分析則是指從海量數據中挖掘觀點信息，並分析觀點信息的傾向性。

為什麼要做觀點分析

對企業而言，可以做商品和企業形象的輿情分析，能夠了解整體輿論趨勢，快速找到輿論焦點，發現商機，進行企業形象管理以及精準營銷。對普通用戶而言，能了解某件事件的事態發展，商品的質量預評價，對政策的一些觀點評價等。對政府而言，能夠在識別輿情的情況下控制輿論走向，監控人們活動。

觀點分析的主要任務和內容

觀點分析的主要任務有：

針對觀點識別的內容主要有：

觀點屬性抽取則主要包括觀點持有者和觀點對象。

情感的識別主要情感識別粒度來分類，分為辭彙層面、句子層面、篇章層面與其他。

辭彙層面的觀點分析

基於辭彙的情感識別主要針對詞語的情感傾向性，具有模型直觀，易於計算等優點，但是利用詞典或者大規模語料方法計算詞之間相似性易產生噪音，同時部分詞語的傾向性與上下文相關，片面地用辭彙表示情感不合適，同時很多辭彙的分析都只局限在形容詞，名次動詞之類的也是有傾向性的。

主要的思路是利用詞的相似度來進行匹配和對比，主要方法是基於詞典的方法與基於語料庫兩種。Hu(KDD 2004)利用辭彙之間在WordNet中的同反義來進行比對，Hassan(ACL 2010)與Kamps(LREC 2004)同樣利用wordnet進行分析和匹配，構建語義圖，然後利用隨機遊走和最短距離模型進行計算。基於語料的方法中，Turney(ACL 2002)提出Near運算元，結合網路資源等信息，分析兩者的相關度；Du(WSDM 2010)認為在不同的領域需要不同的情感詞典，不同領域之間的情感預料應用是一個遷移問題。

句子層面的觀點分析

句子層面的情感識別較為複雜，難度主要在於情感的識別，主要方式有基於語料的與基於辭彙的，然而仍存在監督學習的標籤問題，無監督方法難以遷移的問題亟待解決。

運用傳統的文本分類方法是目前的主流，通過Unigram、Bigram、POS、Adj.、Position等方法進行文本表示，涉及支持向量機、樸素貝葉斯、最大熵、決策樹等常用機器學習方法；另一方面，傾向轉移，如「這家店鋪的事物不是很好吃」，這裡面的傾向轉移難以識別，主要是通過詞典信息 (Ikeka IJCNLP 2008)和特徵選擇 (Li Coling 2010)進行實現。基於辭彙的方法主要針對句子中的詞傾向性來識別句子的含義。Turney(ACL 2002)用POS進行文本表示，PMI進行辭彙傾向分析，最後計算整個句子的情感傾向；Taras(COLING 2008)則利用了句子和辭彙混合方法進行聯合識別；Qiu(CIKM 2009)提出自學習方法，利用詞典信息產生初始標註利用置信度高的樣本作為訓練集，訓練分類器利用啟發式規則對於多個分類器進行集成；另外還有半監督方法(Li ACL 2009)，建立文檔與辭彙的共現矩陣，訓練Matrix Factorization Model，利用少量的標註語料以及詞典的先驗知識，同時對於未標註樣本進行標註。

篇章層面的觀點分析

文檔層面的情感分析主要識別篇章整體觀點和整體傾向性。方法和思路與句子層面的分析有很類似，然而其難點是一個文章中可能會有更為明顯的觀點變化和多觀點傾向。

Pang(ACL 2004)認為篇章中的客觀句子對於篇章整體的觀點傾向性沒有意義，於是利用圖演算法從篇章中識別出觀點句，剔除客觀句，只考慮觀點句來識別篇章觀點；McDonald(ACL 2007)則認為文章中每句話都能對篇章觀點有貢獻，所以在句子級傾向性識別與篇章級傾向性識別一體化的基礎上，考慮句子的上下文特徵，提出結構化CRFs模型；Lin(CIKM 2009)和Mei(WWW 2007)認為篇章整體的觀點傾向性是篇章中針對每個子主題的觀點傾向性的集成，提出篇章主題信息與觀點信息協同挖掘。

跨語言與跨領域

不同的語言會有特定的特點，跨語言的情感分析主要研究點是缺乏訓練數據下的分析、利用其他資源、借鑒其他語言的情感分類等問題，主要解決的方案是翻譯與匹配、集成策略以及多視角策略。

跨領域則是目前的一個研究熱點，主要是不同領域對下情感傾向會有差異，尤其是比較性觀點，另外同樣的詞在不同的領域傾向不同，不同領域的使用的觀點詞不同導致特徵提取出現問題，另外訓練數據比較有限。主要的解決方案有兩種，針對不同領域，一方面認為不同領域，特徵相同但是數據分析不同(Jiang ACL 2007; Dai AAAI 2007)，即特徵權重不同，另一種認為不同領域有不同的特徵，於是需要構建統一的特徵體系(Blitzer ACL 2007; Liu CIKM 2009; Pan WWW 2010)

總結

觀點分析目前已經有豐碩的成果，大量方法已經投入使用，但是仍存在不少問題，制約情感分析的有效性，例如情感的量化描述，句法現象的識別與分析（轉折、否定、反語、比較等），稀疏數據，領域的可遷移性，多觀點混合等，有待進一步深入的研究。

上完這次課，感覺知識體系會更加完善，這是讀論文所不能有的，不過，還是要通過閱讀文獻來豐富自己的認識，讓自己的知識體系更加豐滿，個人簡介之後就是參考文獻，繼續往下拉吧，超級多！

參考文獻

這是上面涉及到的一些參考文獻，有興趣可以深入閱讀，在這麼狂躁的時代下靜下心去讀，去理解，才能比別人向前邁更大的一步。

【1】J. Blitzer, M. Dredze and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). pages 440-447. 2007.

【2】Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Transferring Na?ve Bayes

Classifiers for Text Classification. In Proceedings. of AAAI. 2007.

【3】Weifu Du, Songbo Tan, Xueqi Cheng, Xiaochun Yun: Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon. WSDM 2010:111-120

【4】Ahmed Hassan, and Dragomir Radev. 2010. Identifying Text Polarity Using Random Walks. The 48th Annual Meeting of the Association for Computational Linguistics

M. Hu and B. Liu. Mining Opinion Features in Customer Reviews. In Proceedings of AAAI, 2004.

【5】Xuanjing Huang and W. Bruce Croft. A Unified Relevance Model for Opinion Retrieval. In Proceedings of CIKM 2009.

【6】Jaap Kamps, Maarten Marx, Robert J. Mokken and Maarten de Rijke. Using WordNet to measure semantic orientation of adjectives. In Proc. of LREC』04, pp. 1115-1118, 2004.

【6】Jin Jiang and ChengXiang Zhai. Instance Weighting for Domain Adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), pages 264-271. 2007.

【7】Soo-Min Kim and Eduard Hovy. Identifying Opinion Holders for Question Answering in Opinion Texts.2005. In Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains

【8】Binyang Li, Lanjun Zhou, Shi Feng, Kam-Fai Wong, A Unified Graph Model for Sentence-based Opinion Retrieval, In Proceedings of ACL 2010

【9】Tao Li, Yi Zhang and Vikas Sindhwani. A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge. In Proceedings of ACL. 2009.

【10】Shoushan Li, Rui Xia, Chengqing Zong, Chu-Ren Huang: A Framework of Feature Selection Methods for Text Categorization. ACL/AFNLP 2009: 692-700.

【11】Shoushan Li, Sophia Yat Mei Lee, Ying Chen, Chu-Ren Huang, Guodong Zhou: Sentiment Classification and Polarity Shifting. COLING 2010: 635-643

【12】Fangtao Li, Chao Han, Minlie Huang and Xiaoyan Zhu. Structure-Aware Review Mining and Summarization. In The 23rd International Conference on Computational Linguistics (COLING 2010)

【13】Bing Liu, Minqing Hu and Junsheng Cheng. 「Opinion Observer: Analyzing and Comparing Opinions on the Web」 To appear in Proceedings of the 14th international World Wide Web conference (WWW-2005), May 10-14, 2005, in Chiba, Japan

【14】Kang Liu and Jun Zhao. Cross-Domain Sentiment Classification using a Two-Stage Method. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM). November 2-6, 2009, Hong Kong

【15】Chenghua Lin and Yulan He. Joint Sentiment/Topic Model for Sentiment Analysis. In Proceedings of CIKM』s 09. 2009

【16】Y. Mao and G. Lebanon, Isotonic Conditional Random Fields and Local Sentiment Flow. Advances in Neural Information Processing Systems 19, 2007

【17】Ryan McDonald, Kerry Hannan and Tyler Neylon et al. Structured Models for Fine-toCoarse Sentiment Analysis. In Proceedings of ACL, 2007, pp. 432-439.

【18】Qiaozhu Mei, Xu Ling, et al. Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. In Proceedings of WWW 2007.

【19】Prem Melville, Wojciech Gryc and Richard D. Lawrence. Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification. In Proceedings of KDD. 2009.

【20】Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the Association of Computational Linguistics (ACL).

【21】Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP 2002, pp.79- 86.

【22】Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang and Zheng Chen. CrossDomain Sentiment Classification via Spectral Feature Alignment. In Proceedings of the 19th International World Wide Web Conference (WWW-10). Raleigh, NC, USA. April 26-30, 2010. Pages 751-760.

【23】Popescu A. M. and Etzioni O. Extracting Product Features ad Opinion Reviews. In Proceedings of EMNLP』05, 2005.

【24】L. Qiu, Weishi Zhang, Changjian Hu and Kai Zhao. SELC: A Self-Supervised for Sentiment Classification. In Proceedings of CIKM, 2009.

【25】Guang Qiu, Bing Liu, Jiajun Bu, Chun Chen: Expanding Domain Sentiment Lexicon through Double Propagation. IJCAI 2009: 1199-1204

【26】Peter Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of ACL. 2002.

【27】Xiaojun Wan. Co-Training for Cross-Lingual Sentiment Classification. In Proceedings of ACL-IJCNLP, pages 235-243, 2009.

【28】Xiaojun Wan. Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis. In Proceedings of EMNLP, pages 553-561. 2008.

【29】Bo Wang, Houfeng Wang: A Cross-Inducing Method for Bootstrapping Product Features and Opinion Words. In Proceedings of 2008 International Conference on Natural Language Processing (IJCNLP 2008), India

【30】Janyce Webie, Theresa Wilson and Claire Cardie. Annotating expressions of opinions and emotions in Proceedingsof lauguage. Language Resources and Evaluation 2005

【31】Taras Zagibalov. and John Carroll. Automatic seed word selection for unsupervised sentiment classification of Chinese text. In Proceedings of The 22nd International Conference on Computational Linguistics (COLING), 2008. Manchester, UK.

【32】Min Zhang and Xingyao Ye. A Generative Model to Unify Topic Relevance and Lexiconbased Sentiment for Opinion Retrieval. In Proceedings of SIGIR, pp. 411-418, 2008.

【33】Jun Zhao, Kang Liu and Gen Wang. Adding Redundant Features for CRFs-based Sentence Sentiment Classification. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP). October 25-27, 2008, Hawaii

【34】Jingbo Zhu, Huizhen Wang, Benjamin Tsou and Muhua Zhu. 2009. Multi-aspect opinion polling from textual reviews, In Proceedings of CIKM『09, short session, pp1799-1802

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 CS的陋室 的精彩文章:

TAG:CS的陋室 |