Validating functional redundancy with mixed generative adversarial networks

Nguyen, Thanh Tam; Huynh, Thanh Trung; Pham, Minh Tam; Hoang, Thanh Dat; Nguyen, Thanh Thi; Nguyen, Quoc Viet Hung

doi:10.1016/j.knosys.2023.110342

research article

Validating functional redundancy with mixed generative adversarial networks

Nguyen, Thanh Tam

•

Huynh, Thanh Trung

•

Pham, Minh Tam

more

February 4, 2023

Knowledge-Based Systems

Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four decades, existing works on data redundancy mostly focus on syntactic formulations such as normal forms and functional dependencies, which lead to intractable discovery problems. In this work, we propose a new concept, namely functional redundancy, that overcomes the limitations of functional dependencies, especially on continuous data. We design and develop efficient algorithms based on generative adversarial networks to validate any functional redundancy without heavily depending on the number of attributes and the number of tuples like functional dependencies. The core idea is to use the imputation power of generative adversarial networks to model any semantic dependencies between attributes. Extensive experiments on different real-world and synthetic datasets show that our approach outperforms representative baselines, is applicable for first-order and high-order dependencies, and is extensible for different types of data. (c) 2023 Elsevier B.V. All rights reserved.

Type

research article

DOI

10.1016/j.knosys.2023.110342

Web of Science ID

WOS:000929603200001

Authors

Nguyen, Thanh Tam

•

Huynh, Thanh Trung

•

Pham, Minh Tam

•

Hoang, Thanh Dat

•

Nguyen, Thanh Thi

•

Nguyen, Quoc Viet Hung

Publication date

2023-02-04

Publisher

ELSEVIER

Published in

Knowledge-Based Systems

Volume

264

Article Number

110342

Subjects

Computer Science, Art...

Computer Science

functional redundancy...

data imputation

generative adversaria...

mixed data types

data management

functional dependency...

dependency discovery

efficient discovery

rumor detection

algorithm

Peer reviewed

REVIEWED

EPFL units

LSIR

Available on Infoscience

March 13, 2023

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/195852