Mathematical programming in data mining: Models for binary classification with application to collusion detection in online gambling
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractData mining is a semi-automated technique to discover patterns and trends in large amounts of data and can be used to build statistical models to predict those patterns and trends. One type of prediction model is a classifier, which attempts to predict to which group a particular item belongs. An important binary classifier, the Support Vector Machine classifier, uses non-linear optimization to find a hyperplane separating the two classes of data. This classifier has been reformulated as a linear program and as a pure quadratic program. We propose two modeling extensions to the Support Vector Machine classifier. The first, the Linearized Proximal Support Vector Machine classifier, linearizes the objective function of the pure quadratic version. This reduces the importance the classifier places on outlying data points. The second extension improves the conceptual accuracy of the model. The Integer Support Vector Machine classifier uses binary indicator variables to indicate potential misclassification errors and minimizes these errors directly. Performance of both these new classifiers was evaluated on a simple two dimensional data set as well as on several data sets commonly used in the literature and was compared to the original classifiers. These classifiers were then used to build a model to detect collusion in online gambling. Collusion occurs when two or more players play differently against each other than against the rest of the players. Since their communication cannot be intercepted, collusion is easier for online gamblers. However, collusion can still be identified by examining the playing style of the colluding players. By analyzing the record of play from online poker, a model to predict whether a hand contains colluding players or not can be built. We found that these new classifiers performed about as well as previous classifiers and sometimes worse and sometimes better. We also found that one form of online collusion could be detected, but not perfectly.
Degree ProgramGraduate College
Systems and Industrial Engineering