The Goal My goal is to find the probability of wining in tic-tac-toe game given that you make the first move. To obtain hypothesis bases on my goal I have to state some conditions and facts on the game. They are: 1) There are 362, 880 ways of placing O’s and X’s. 2) When X make first move, possibility of X winning is 131,184, O winning is 77, 904, and 46, 080 tied games (Source: http://en. wikipedia. org/wiki/Tic-tac-toe). After eliminating rotations and/or reflections of other outcomes, there are only 138 unique outcomes. X won 91 times, O won 44 times and 3 ties (Source: http://en. ikipedia. org/wiki/Tic-tac-toe). Basically, the win of X is the concept. There are 8 possible ways of creating three X in row. Based on this my hypothesis states: Hypothesis “If X makes the first move then the probability of the player with X will win is 60% and above. ” Null Hypothesis “If X makes the first move then the probability of the player with X will win is less than 60%. ” Data Collection and Preparation To prove or refute the hypothesis, data has to be collected. As we all know this step requires a great amount of time and effort.
Also in order to build an effective model a data mining algorithm must be presented with a few hundred or few thousands relevant/applicable records. As mentioned above there are thousands of winning combinations, I have collected datasets with 958 instances which encodes set of possible board configurations at the end of tic-tac-toe game given X makes the first move. The data set for tic-tac- toe board end game was taken from UCI machine learning repository website (Source: http://archive. ics. uci. edu/ml/datasets/Tic-Tac-Toe+Endgame). The data was in a command delimited text file without attributes name on the top.
After downloading the data in CSV format, I converted into excel spreadsheet using the options available from MS Excel. Since the attributes name was missing I have to read the data description file which was provided at the same website to relate the values to its attributes. Before converting into excel spreadsheet I added the attributes name on top of the values in same format as the values. Since abbreviation for attributes has been used, I am providing detailed description of attribute in the below table: Serial NoAttribute AbbreviationAttribute DescriptionValue T-LTop-Left-Square(x, o, b) 2T-MTop-Middle-Square(x, o, b) 3T-RTop-Right-Square(x, o, b) 4M-LMiddle-Left-Square(x, o, b) 5M-MMiddle-Middle-Square(x, o, b) 6M-RMiddle-Right-Square(x, o, b) 7B-LBottom-Left-Square(x, o, b) 8B-MBottom-Middle-Square(x, o, b) 9B-RBottom-Right-Square(x, o, b) 10ClassClass (Output)Positive, Negative (x=Player x has taken, o=Player o has taken, b=blank) I have scanned data for missing values but could not find any. Looking at the data I can say that we have solid set of data to do mining. This dataset will help us to prove or refute the hypothesis.
Data Mining To do data mining there are many technique and tools available in the market. Choosing a data mining technique requires lot of factors into consideration. A formal statement as per our text book reads: Given a set of data containing attributes and values to be mined together with information about the nature of the data and the problem to be solved, determine an appropriate data mining technique. So to choose the data mining technique I looked at the attributes which is all had categorical data and we already have a predefined output.
So the appropriate technique for this kind of data set is “Supervised Learning”. I have used the ESX model provided by the iDA for data mining. I choose ESX over neural network because of the categorical nature of the data. I am just going to walk through the process of supervised learning using iDA. I opened the excel spreadsheet which contained data, then I added two rows below the attribute name representing the nature of the data and the other one representing whether it is an input or output attribute.
In our case, first nine attributes are input attributes and the last one is the output attribute. After that I performed ESX supervised learning with all data as training data as shown below: Then I performed another mining session with only first 658 instances as training data and other 300 as testing data in order to evaluate the model. The parameter window is shown below. Apart from supervised learning, several other mining techniques are used to evaluate the data. They are: Mangrove, a freeware which generates decision tree and classification tree using excel file.
All of them will be described more in detail under the evaluating the results division. Interpretation of the results (from Data Mining) When talking about interpreting the results, first we have to check whether the formed classes are solid one. To do this we have to check whether the class resemblance score which should be higher than the domain resemblance score. In this case both Positive and Negative class has score higher than the domain resemblance score. They are only slightly higher because of the nature of the values of the attribute.
The scores are shown below. When we look at the domain statistics for the categorical attributes we can see the predictability score for all the attribute value X is more than the predictability score of other two possible values of O (other player move) and b (blank). The scores are shown below. When we examine the class individually I found that the attribute value highly sufficient for being class membership is M-M = X, and the attribute value highly sufficient and necessary for class memberships is Class = positive.
Also you can see the predictability and predictiveness score for ‘Positive’ is 1. The data is shown below. All the data till now proved the solid formation of class, also it say when X makes the first move it has higher predictability value for all the attributes. Also the individual class assures that when M-M = X then it sufficient for the class to be fall under positive category, which make sense because when X makes the first move in the middle of the grid he has all eight possible options open then placing anywhere else.
Data from the most common occurring attribute also prove above point also adds that probability of wining increases when placing in T-R, T-L, M-M, B-R, and B-L. Also when we look at the data how it is classified when X make the first move, around 65% was classified into positive class and only 35% was classified into negative class. This means that there is a probability of X wining out those possible 8 winning combination when making a first move is 65%. The data is shown below. Evaluating the result By interpreting the mining results I was able to prove the hypothesis.
Just results from one model can be considered as solid evidence; I used several models to evaluate the result. A freeware called mangrove which produced decision tree by taking in tab delimited data file as input was used. By feeding my data into the mangrove, I end up in a decision tree which showed majority of classification to be positive starting from the root node M-M = X. This proved the hypothesis and validated the results obtained from the supervised ESX model. Part of the decision tree is shown below.
In the figure green color shows the positive class meaning wining percent of X and blue shows the contradictory. I have used another model to further validate the model. The other model was MS Excel based classification tree. When I plug in the data and followed the direction the model provided with a classification tree, confusion matrix, and a chart displaying the class distribution. From the above class distribution chart we can say that probability of X wining when making a first move is 64% which once again proves and validate both supervised ESX model and the hypothesis.
Also from below confusion matrix we are able to see the predictive class which is Positive. It also gives us the percentage misclassified both in training and test data which on the lower side which means classes are correctly classified. The confusion matrix gives us another option to further validate the data because of the high number in the positive-positive column and in negative-negative column which means the more number of instance fall under correctly classified class. All these interpretation and evaluation basically proves our hypothesis and refute our null hypothesis.
Hypothesis Established Hypothesis: “If X makes the first move then the probability of the player with X will win is 60% and above. ” Yes, the hypothesis has been proven true using the supervised ESX model, mangrove decision tree and classification tree in excel. The classification of the classes proves it where 65% of data fall under the winning category of X (‘Positive’ class), given X makes the first move. When the hypothesis is proven true, it is obvious that null hypothesis “If X makes the first move then the probability of the player with X will win is less than 60%”does not hold.
Conclusion Choosing a data mining technique plays a vital role. To add on, it is not only choosing one model to run and interpret results but it is equally important to evaluate the result using several other techniques. We used several data mining technique to prove our hypothesis statement. Also you can use results to refute the hypothesis. This paper proves that in a tic-tac-toe game probability of winning when you make a first move is 60% or above. Bibliography ?Asuncion, A. & Newman, D. J. (2007). UCI Machine Learning Repository [http://archive. cs. uci. edu/ml/datasets/Tic-Tac-Toe+Endgame]. Irvine, CA: University of California, School of Information and Computer Science. ?http://en. wikipedia. org/wiki/Tic-tac-toe ?http://www. infoacumen. com/ ?http://www. kdnuggets. com/software/classification-decision-tree. html#tree-free ohttp://www. geocities. com/adotsaha/CTree/CtreeinExcel. html ohttp://www. tetris1d. org/zigah/mangrove/ ?Data Mining – A tutorial based primer, Richard J. Riger and Michael W. Geatz, Second impression 2008, Pearson Education Inc.