000214549 001__ 214549
000214549 005__ 20180128055040.0
000214549 0247_ $$2doi$$a10.5075/epfl-thesis-6831
000214549 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis6831-9
000214549 02471 $$2nebis$$a10568444
000214549 037__ $$aTHESIS_LIB
000214549 041__ $$aeng
000214549 088__ $$a6831
000214549 245__ $$aCombine and Conquer$$bMining Social Systems for Prediction
000214549 269__ $$a2015
000214549 260__ $$aLausanne$$bEPFL$$c2015
000214549 300__ $$a136
000214549 336__ $$aTheses
000214549 502__ $$aprofesseure Sabine Süsstrunk (présidente) ; Prof. Matthias Grossglauser, Prof. Patrick Thiran (directeurs) ; Prof. Pascal Frossard, Prof. Stratis Ioannidis, Dr Alessandra Sala (rapporteurs)
000214549 520__ $$aIn this thesis, we explore the application of data mining and machine learning techniques to several practical problems. These problems have roots in various fields such as social science, economics, and political science. We show that computer science techniques enable us to bring significant contributions to solving them. Moreover, we show that combining several models or datasets related to the problem we are trying to solve is key to the quality of the solution we find.
   The first application we consider is human mobility prediction. We describe our winning contribution to the Nokia Mobile Data Challenge, in which we predict the next location a user will visit based on his history and the current context. We first highlight some data characteristics that contribute to the difficulty of the task, such as sparsity and non-stationarity. Then, we present three families of models and observe that, even though their average accuracies are similar, their performances vary significantly across users. To take advantage of this diversity, we introduce several strategies to combine models, and show that the combinations outperform any individual predictor.
   The second application we examine is predicting the success of crowdfunding campaigns. We collected data on Kickstarter (one of the most popular crowdfunding platforms) in order to predict whether a campaign will reach its funding goal or not. We show that we obtain good performances by simply using information about money, but that combining this information with social features extracted from Kickstarter's social graph and Twitter improves early predictions. In particular, predictions made a few hours after the beginning of a campaign are improved by 4%, to reach an accuracy of 76%.
   Then, we move to the realms of politics, and first investigate the ideologies of politicians. Using their opinion on several aspects of politics, gathered on a voting advice application (VAA), we show that the themes that divide politicians the most are the ones that we usually associate with left-wing/right-wing and liberal/conservative, thus validating the simplified two-dimensional view of the political system that many people use. We bring attention to the potentially malicious uses of VAAs by creating a fake candidate profile that is able to gather twice as many voting recommendations as any other. To counter this, we demonstrate that we are able to monitor politicians after they were elected, and potentially detect changes of opinion, by combining the data extracted from the VAA with the votes that they cast at the Parliament.
   Finally, we study the outcome of issue votes. We first show that simply considering vote results at a fine geographical level is sufficient to highlight characteristic geographical voting patterns across a country, and their evolution over time. It also enables us to find representative regions that are crucial in determining the national outcome of a vote. We then demonstrate that predicting the actual result of a vote in all regions (in opposition to the binary national outcome) is a much harder task that requires combining data about regions and votes themselves to obtain good performances. We compare the use of Bayesian and non-Bayesian models that combine matrix-factorization and regression. We show that, here too, combining appropriate models and datasets improves the quality of the predictions, and that Bayesian methods give better estimates of the model's hyperparameters.
000214549 6531_ $$adata mining
000214549 6531_ $$amachine learning
000214549 6531_ $$acombining models and datasets
000214549 6531_ $$ahuman mobility prediction
000214549 6531_ $$acrowdfunding success prediction
000214549 6531_ $$apolitical data analysis
000214549 6531_ $$avote results prediction
000214549 6531_ $$adimensionality reduction
000214549 6531_ $$aBayesian models
000214549 6531_ $$aGaussian processes
000214549 700__ $$0245633$$aEtter, Vincent$$g161149
000214549 720_2 $$0241029$$aGrossglauser, Matthias$$edir.$$g152655
000214549 720_2 $$0240373$$aThiran, Patrick$$edir.$$g103925
000214549 8564_ $$iINTERNAL$$uhttps://infoscience.epfl.ch/record/214549/files/EPFL_TH6831.pdf$$xPUBLIC$$zn/a
000214549 909C0 $$0252455$$pLCA4
000214549 909CO $$ooai:infoscience.tind.io:214549$$pIC$$pthesis-bn2018$$pthesis
000214549 917Z8 $$x108898
000214549 917Z8 $$x108898
000214549 917Z8 $$x108898
000214549 918__ $$aIC$$cISC$$dEDIC
000214549 919__ $$aLCA4
000214549 920__ $$a2015-12-4$$b2015
000214549 970__ $$a6831/THESES
000214549 973__ $$aEPFL$$sPUBLISHED
000214549 980__ $$aTHESIS