Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Scaling Similarity Joins over Tree-Structured Data
 
research article

Scaling Similarity Joins over Tree-Structured Data

Tang, Yu
•
Cai, Yilun
•
Mamoulis, Nikos
2015
Proceedings Of The Vldb Endowment

Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude.

  • Details
  • Metrics
Type
research article
DOI
10.14778/2809974.2809976
Web of Science ID

WOS:000362283300002

Author(s)
Tang, Yu
Cai, Yilun
Mamoulis, Nikos
Date Issued

2015

Publisher

Assoc Computing Machinery

Published in
Proceedings Of The Vldb Endowment
Volume

8

Issue

11

Start page

1130

End page

1141

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
IINFCOM  
Available on Infoscience
December 2, 2015
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/120972
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés