Improving Sentiment Analysis by Text Pre-Processing

Bettering Sentiment Analysis by Text Pre-Processing

Executive Summary

Academic anxiety?
Get original paper in 3 hours and nail the task
Get your paper price

124 experts online

While a sentiment analysis tool is one of the most powerful manner to analyze a certain subject really good, the job of unable to analyse abbreviations, emoticons and sentences constructed with spelling mistakes still persists. Therefore, the chief aims of this research are to construct and function a high quality set of informations dictionary in machines, to implement right manner of pre-processing to accomplish better information unity, using informations transmutation, and filtrating to accomplish better informations end product categorization and to analyse and measure classifiers public presentation. Research methods are the acting of computational model and infixing a new set of words into the sentiment analysis tool. There are a few expected outcomes from this research. One of it is, it is expected for the sentiment analysis tools to be working with a better truth than usual. Other than that, it is besides expected that the sentiment analysis tool could break place the spelling mistakes and emoticons. In this manner, the tool can bring forth a consequence with mistakes every bit to the lowest degree as possible.


In this epoch of turning engineering, sentiment analysis has become one of the most of import thing that plays a large function in the concern universe. Of class, facts are critical in this country, but sentiments are besides counted here. Just imagine, a computing machine maker who sell merchandises such as laptops are inquiring why their merchandises can’t be sold. Now, in this instance, while facts sing the laptop itself is of import excessively, such as the design, colour choice, weight and stuff, consumers’ sentiments are besides should be taken earnestly. This is the portion where sentiment analysis comes in. Extracting sentiments from human-document which is besides likely unstructured every bit good will be really utile in managing many business-intelligence undertakings. The computing machine maker can use a system that finds reappraisal or sentiments or even ailments on the Web such as newsgroup, web logs, or even forums and creates a summarized version of the reappraisals of the overall subject. This could salvage a great sum of clip for the computing machine maker from holding to read a 100s or even 1000s of complains which does non differ greatly from one another.

However, there is a job in this. Sentiment analysis tools presently, are unable to construe sentences that are written in abbreviation, emoticons or even with spelling mistakes. This could be a serious job for those who are depending on this sentiment analysis tools for the improvement of their concern. Here in this proposal, stated the methods to get the better of this job.

Justification of Research

It is hard and disputing to acquire and understand the latest tendencies in sentiment excavation approximately merchandises as it is composed by a large diverseness and societal media informations as this creates an machine-controlled and legit sentimental analysis and excavation. Sentiment analysis had been a critical tool and particularly more of import in categorization when it comes to reexamine where it contains both positive and negative feedbacks every bit good as it has become a ambitious country filled with a batch of obstruction as it requires natural manner of processing linguistic communication. At this junction, we know that, critical development of machine’s ability to understand texts as human readers will exactly better the truth of informations end product on sentiment based analysis. This applies whenever in footings of merchandise feedback, public temper, or investor’s opinions’ . The proper execution and transporting out specific methods for this job would greatly increase the manner of making a sentimental analysis research. As such, supplying a information lexicon in machine acquisition classifiers and since machine larning developing based on working algorithm we can put classification to certain mutual opposition, using informations transmutation and filtering will greatly cut down the common jobs faced in sentimental analysis every bit good as application of pre-processing are expected to accomplish and bring forth informations set with good truth and straight helps the cut down the redundancy in sentimental analysis.

Aims of Research

The aims of the research are as follows:

  1. To construct and function a high quality set of informations dictionary in machines to better the ability of the machine to bring forth the best truth set of informations principals.
  2. To implement right manner of pre-processing to accomplish better information unity.
  3. Using informations transmutation and filtering to accomplish better informations end product categorization.
  4. To analyse and measure classifiers public presentation.


Lillian Lee, Cornell University( 2004 ) ,Sentiment Analysis and Business Intelligence

hypertext transfer protocol: //

Lillian Lee from Cornell Univeristy has written that placing abbreviation, emoticons and spelling mistakes have long been a concern among those who uses sentiment analysis tools to better their plants. In the paper, the writer has mentioned the challenges in sentiment categorization. There might be the usage of the same words or phrase in two different sentences but means different things. Besides that, abbreviations that are normally used by largely adolescents in societal webs such as “np” which stands for “no problem” , “tq” which stands for “thank you” and so on are non truly the sort of words that a sentiment analysis tools can observe.

Phil Mennie, Can Sentiment Analysis Spot Sarcasm?

hypertext transfer protocol: //

Harmonizing to Phil, who leads our Social Media Governance service, has stated that the usage of abbreviations and ironies have ever been one of the most ambitious elements of sentiment analysis. Sometimes, when analysing tweets, the use of the hashtags will besides be a job to the sentiment analysis tools.

Nathan Hall ( 2014 ) , The Future of Sentiment Analysis

hypertext transfer protocol: //

Nathan Hall has stated in his paper that if there is a field where worlds have dominated the machines more than any, it’s in the ability to analyse sentiment, or sentiment analysis. Sentiment analysis plans in usage by informations mineworkers achieve, at most, an truth degree between 30 % and 40 % , while worlds are capable of a arresting 96 % truth.

Research Methodology

The methodological analysis of this proposed research are divided into 2 parts:

  1. Perform computational frame which consists of 3 phases viz. :
  1. Most relevant characteristics extracted by using informations transmutation and filtering.
  2. Develop suited classifier to find the truth of informations from the anticipation.
  3. Classifiers public presentation will evaluated utilizing proper attack.
  1. Insert a new set of words into the sentiment analysis tool.
  1. A whole set of known words of abbreviation compressed into a file and inserted.
  2. Unknown words ignored momently and so learn the sentiment analysis tool subsequently on.
  1. Perform computational model
  1. Specifically abbreviation were expanded utilizing pattern acknowledgment and regular look technique and text will be cleaned from non-alphabetic mark. Several stopwords constructed for service available criterion stoplists, with alterations to specific classification of the informations. For illustration the words laptop, specification, public presentation. They considered ad halt words because they are Computer sphere specific words. For negation, [ 1 ] method followed by labeling negation word till the first punctuation grade happening. Steming will be performed on informations to cut down redundancy. The consistence of the findings will be extracted as [ 2 ] .
  2. After done with the above stairss, we will use SVM classifier on each phase. It is recommend to utilize Gaussian radial due to the sensitiveness of SVM public presentation to their values. For categorization procedure, each information set will be divided into parts where one is for proving and the other for preparation.
  3. The public presentation prosodies that we will be utilizing are preciseness, callback and F-measure. The intent of utilizing this prosodies is because it informations will be categorized in true positive, false positive, true negative, false negative and assigned categories.
  1. Insert a new set of words
  1. When a brute-force onslaught is used on something, for illustration on a.rar file to check it, hackers are known normally to utilize dictionary onslaught. This works by allowing the plan that performs the brute-force onslaught usage a file which already contains a set of normally used as watchword words. The same manner can be implemented in sentiment analysis tools.
  2. A set of known abbreviations and common spelling mistakes done by human when composing messages or so can be inserted into the sentiment analysis tools.
  3. Other than that, emoticons could besides be included into the file. For illustration, the most normally used emoticons such as, “i?S” , “ ; ) ” , ” : D” and so on.
  4. For the words that the sentiment analysis tools ne’er came across before, we can so add the significance of those words into its database. In this manner, the tool will maintain larning new things and therefore will be able to analyse better.


  1. B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? sentiment categorization utilizing machine larning techniques, in: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , 2002.
  2. K. Dave, S. Lawrence, D. M. Pennock, Mining the peanut gallery: Opinion extraction and semantic categorization of merchandise reappraisals, in: Proceedings of WWW, 2003, p. 519 528.
  3. Lillian Lee, Cornell University ( 2004 ) , Sentiment Analysis and Business Intelligence, retrieved from: hypertext transfer protocol: //
  4. Phil Mennie, Can Sentiment Analysis Spot Sarcasm? Retrieved from: hypertext transfer protocol: //
  5. Nathan Hall ( 2014 ) , The Future of Sentiment Analysis, Retrieved from:

hypertext transfer protocol: //

This essay was written by a fellow student. You may use it as a guide or sample for writing your own paper, but remember to cite it correctly. Don’t submit it as your own as it will be considered plagiarism.

Need a custom essay sample written specially to meet your requirements?

Choose skilled expert on your subject and get original paper with free plagiarism report

Order custom paper Without paying upfront

Improving Sentiment Analysis by Text Pre-Processing. (2016, Nov 25). Retrieved from