How to Design a Popular Video Game: Rating Prediction Using NLP and Random Forest
- Jin Pu
- May 25, 2020
- 5 min read
It is true that the video game industry is now the hot spot, the revenue of which is 4 times that of movies and 7 times that of music. Especially now in the face of Covid-19, more and more people are seeking social connection and self-entertainment in a virtual video game world. Without any doubt, the video game industry is promising.
However, is it easy to succeed in the video game industry? The answer is NO! Though it is promising, it is still a very risky business. First, there are over a million video games which drive fierce competition within the industry. Second, the profit margin is super tight. The playstation4, for example, has a margin of merely close to 5%. Guess what, most of the cost comes from R&D. Thus, accurate investment in game design could make a huge difference. This article aims at using random forest and NLP techniques to find crucial game features that can greatly influence games' ratings.

Dataset
Exiting Amazon datasets contain metadata (includes basic product information of a video game) and review data (includes reviews and ratings of video games). These tables are linked together by a unique product ID (asin). Below are the snapshots of datasets. Though many features are listed, none of them are directly related to game design features. Where are we likely to generate game design features? Reviews! Next, we are going to dive into reviews to extract features.

Snapshot of metadata, Source by Author

Snapshot of review data, Source by Author
Feature Engineering
In this project, we are going to use the IBM Natural Language Understanding package to fulfill feature engineering. One of the embedded functionalities in this package is “keywords” which can return keywords of the whole text. Additionally, sentiment information or emotion information can be returned. We started off by extracting keywords on the entire review dataset (combining all reviews). However, what was returned was not a word but a phase. As is often the case in text mining, the keywords were also very messy with different forms of words.

Examples of keywords, Source by Author
Thus, we applied PorterStemmer, word_tokenize, stopwords to boil down our keywords list. Finally, we got a keyword list shown below. The most mentioned keyword is "charact" which is the abbreviation for "character". You might have already noticed, if we input "charact" into the sentiment extraction function, the function is not able to detect the corresponding form in the text. Next step, we checked the most used forms of keywords and added flesh to them. For example, fight can be in the form of fighting, fighter, and fights.

Snapshot 1 of the keyword list, Source by Author

Snapshot 2 of the keyword list, Source by Author
Now we've got the finalized keywords list. Putting all the features into one basket can be messy and less intuitive. Let's categorize them into different baskets to look further into our keywords. Keywords can be further grouped into 4 baskets- game type, series, game elements, and characteristics.

Wordcloud of different groups of keywords, Source by Author
Finally, we extracted the sentiment score on those keywords for each product. The final feature matrix is sparse with many missing values. Considering that many keywords point to the same game design feature such as story, plot, and storyline, we aggregated similar features by averaging their sentiment scores.
Descriptive Analysis
Ratings: The average rating is about 4.15 and the top 25% of products enjoy the rating of over 4.46.

Rating distribution, Source by Author
Game types: The rpg (role-playing game) stands out among all the game types in terms of average ratings and volatility, followed by adventure games and multiplayer games. Sports games and action games have poor performance on ratings.

Rating for different game types, Source by Author
Game series: Digimon, Pokemon, and Golden Sun enjoy the highest average rating of above 4.25, while Madden merely achieves the average rating of 3.88.

Rating for different game series, Source by Author
Game elements & Characteristics: The ruler is the sentiment score. The average sentiment scores among different elements & characteristics are boringly a straight line but the standard deviation of sentiment scores does form some insights. Sentiment scores vary a lot in the aspect of graphics, new, easy, etc. in comparison to features like monster.

The standard deviation of sentiment score on game elements, Source by Author
Machine Learning and Interpretation
Now we are ready for machine learning. Based on findings in the descriptive analysis, we cut the rating at the point of 4.4 to form a high-performance group (30%) and a poor-performance group. Features are what we’ve extracted from reviews, as well as price. Game types and game series are coded as binary variables while the others are continuous.
From personal experiences, logistic regression and random forest really stand out in terms of both interpretation and prediction accuracy. We applied both techniques and finally chose random forest as the main interpreter model due to its slightly better performance. Performance matrices adopted are AUC and Accuracy.
Resample: Due to the way we split products into different groups, the dataset is a little bit skewed. We tried undersample, oversample, and smote. For better interpretation and accuracy, we finally harness the undersampling.
Feature selection: Feature importance technique and recursive feature elimination technique were tried but improve little.
Hyperparameters tuning: Tuned n_estimators, max_depth, and min_sample_split using grid search.
Finally, we are able to achieve 70% accuracy and 72% AUC on the validation set. To interpret, we use permutation importance score since we have some binary variables. We also eliminated features of game series in the interpretation plot because they contribute few insights to game design.

Permutation importance score, Source by Author
Game type (dark blue): Sports games show great influence on distinguishing high-performance products. Combined with findings in the descriptive analysis that sports games negatively influence ratings, We further back up this insight by conducting logistic regression which also indicates that sports games significantly and negatively influence ratings.
Game elements (orange): Graphics, level, and control contribute the most to the ratings. In the descriptive analysis, we also detect that graphics have the most variability within the industries. Other focuses for product design can be look, sound, and character.

“Graphic” in reviews, Source by Author

“Level” in reviews, Source by Author
Characteristics (light blue): Innovation should surely be encouraged since “new” ranks 3rd in the importance score. The most frequently talked innovations are “new features”, “new weapon”, and “new style”. “Difficulty” can be put as “level” in the game element. They always come up together in reviews.

“New” in reviews, Source by Author
For further investigations, more work can be done. For example:
For different types of video games, do importance scores of different features vary a lot?
Graphics is a large part of the cost structure. What kind/style of graphics would be more appealing to the market?
What innovations would bring higher returns?
How to choose the appropriate method for difficulty level design?
Code source: https://github.com/JinPu-dududu/How-to-Design-a-Popular-Video-Game-Rating-Prediction-Using-NLP-and-Random-Forest



Comments