Data

E-liquid reviews were collected from JuiceDB from June 26, 2013 to November 12, 2015 for study purposes. JuiceDB is one of the world’s largest independent review websites of e-liquids and vape juices. It claims to have more than 17,000 reviews and 14,000 registered users. Each review includes the author’s account, e-liquid name, brand, ratings, and detailed comments. The ratings are integers ranging from one to nine at the time we collected data. In total, we collected 14,433 e-liquid reviews.

Data analysis

To gain a systematic understanding of which e-liquid features the e-cigarette users care about and their feelings about these features, we first extracted e-liquid feature texts from e-juice reviews by keyword search, then conducted sentiment analysis on the feature texts to reveal the opinion polarities.

Feature text extraction

First, we identified three aspects of e-liquid features: flavors [7, 12, 13], common ingredients [7, 21] and smoking feelings [7]. Previous studies listed and categorized e-liquid flavors [12, 13]. We followed the flavor categorization and manually identified more flavors mentioned in the reviews, including pear, plum, grape and lime in fruit category, cheese and butter in cream category, and caramel in sweet category. The basic e-liquid ingredients are water, nicotine, flavorings, vegetable glycerin (VG) and propylene glycol (PG). Nicotine, VG and PG are frequently discussed in posts. Nicotine is widely contained in tobacco products. Its users are affected physically and easily get addicted. VG increases the flavor and creates large amounts of vapor, and PG produces a great throat hit. The typical ratio of PG and VG is 50/50, 60/40 and 70/30. We manually identified two smoking feelings: cloud production and throat hit. These two features are discussed online [21] and also mined by topic analysis [7]. Cloud production means how much cloud the e-liquid can produce, and the throat hit is the feeling at the throat when using e-cigarettes. These two features are manipulated to imitate the traditional cigarette or cigar experience, and many e-cigarette users enjoy the cloud and the throat hit. All the features are listed in Table 1.

Table 1 Sentiment analysis of single flavors, ingredients and smoking feelings Full size table

Second, we used feature keywords to extract sentences about the features of interest. Shown in Table 1, the features in flavor category are specific flavors and they are categorized into eight subcategories. The keywords of cream, tobacco, menthol, sweet and nuts subcategories only include the corresponding features because they are also specific flavors. Fruit, beverages and seasonings are not specific flavors but categories only, so when calculating the popularity and preference of these subcategories, the keywords do not only include the corresponding flavors but also include themselves. For example, the keywords of beverages subcategory are beverages, coffee, tea and wine. The keywords of ingredients and smoking feelings are also the corresponding features. After extracting sentences by keyword searching, the feature sentences from each review form the feature texts for sentiment analysis.

Sentiment analysis

We applied sentiment analysis to classify the feature texts into two categories: positive and negative. If a text is in the positive sentiment category, the review text writer likes the feature; if a text is in the negative sentiment category, the text reflects the review writer doesn’t like the feature. Because the dataset is product reviews, all posts are very emotional; therefore, we don’t have the neutral category. Many posts have mixed sentiment, but the users usually have overall evaluations on the e-liquid. Thus, we don’t have a category for mixed sentiment but consider whether the overall sentiment is positive or negative.

We manually labeled 500 randomly selected posts. The sentiment label is consistent with review ratings (correlation = 0.72). If we regard the reviews with ratings higher than 7 as positive and the reviews with ratings equal to or lower than 7 as negative, the agreement and Krippendorff’s alpha are maximized (agreement = 91.2%, Krippendorff’s alpha = 0.71). As the review ratings objectively reflect the users’ likes or dislikes, we chose them as ground truth and regarded the reviews with ratings higher than 7 as positive and the reviews with ratings equal to or lower than 7 as negative. About two-thirds of the reviews were deemed to be positive.

Then we trained a NBSVM sentiment analysis model, which integrates Naive Bayes and Support Vector Machine and achieves good performance on texts of different lengths [22], on the training dataset including 3000 randomly selected reviews. In the training set, 2097 reviews are positive and 903 reviews are negative. We used the remaining 11,712 reviews as the test dataset and achieved an accuracy of 82.04%. To further test the effectiveness when applying this classifier to short texts, we manually labeled 150 sentences from the reviews, and the testing accuracy is 72.67%. Therefore, this classifier is reliable for sentiment analysis no matter whether the feature texts are long posts containing multiple sentences or just single sentences.