Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Efficient Distributed Decision Trees for Robust Regression
 
conference paper

Efficient Distributed Decision Trees for Robust Regression

Guo, Tian  
•
Kutzkov, Konstantin
•
Ahmed, Mohammed
Show more
2016
ECML PKDD 2016: Machine Learning and Knowledge Discovery in Databases
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

The availability of massive volumes of data and recent advances in data collection and processing platforms have motivated the development of distributed machine learning algorithms. In numerous real-world applications large datasets are inevitably noisy and contain outliers. These outliers can dramatically degrade the performance of standard machine learning approaches such as regression trees. To this end, we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy data. We propose to integrate robust statistics based error criteria into the regression tree. A data summarization method is developed and used to improve the efficiency of learning regression trees in the distributed setting. We implemented the proposed approach and baselines based on Apache Spark, a popular distributed data processing platform. Extensive experiments on both synthetic and real datasets verify the effectiveness and efficiency of our approach.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1007/978-3-319-46227-1_6
Author(s)
Guo, Tian  
Kutzkov, Konstantin
Ahmed, Mohammed
Calbimonte, Jean-Paul  
Aberer, Karl  
Date Issued

2016

Published in
ECML PKDD 2016: Machine Learning and Knowledge Discovery in Databases
Start page

79

End page

95

Subjects

Decision Tree

•

Distributed Machine Learning

•

Robust Regression

•

Data Summarization

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSIR  
Event nameEvent placeEvent date
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

Riva del Garda, Italy

September 19–23, 2016

Available on Infoscience
June 29, 2016
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/126893
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés