REND: A Reinforced Network-Based Model for Clustering Sparse Data with Application to...
Prof. Wei Ding, UMass Boston
We will discuss a new algorithm, called Reinforced Network-Based Model for Clustering Sparse Data (REND), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. REND is an autoencoder neural network alternative to non-negative matrix factorization (NMF). NMF has made significant advancements in various clustering tasks with great practical success. The use of neural networks over NMF allows the implementation of non-negative model variants with multi-layered, arbitrarily non-linear structures, which is much needed to handle nonlinearity in complex real data. However, standard neural networks cannot achieve its full potential when data is sparse and the sample size is hundreds of orders of magnitude smaller than the dimension of the feature space. To address these issues, we present a model consisting of integrated layers of reinforced network smoothing and an sparse autoencoder. The architecture of hidden layers incorporates existing network dependency in the feature space. The reinforced network layers smooth sparse data over the network structure. Most importantly, through backpropagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse autoencoder layers that set the target values to be equal to the inputs. Our approach integrates physically meaningful feature dependencies into model design and efficiently clusters sparse data through integrated smoothing and sparse autoencoder learning. Empirical results demonstrate that REND achieves improved accuracy and render physically meaningful clustering results.
Wei Ding received her Ph.D. degree in Computer Science from the University of Houston in 2008. She is an Associate Professor of Computer Science at the University of Massachusetts Boston. Her research interests include data mining, machine learning, artificial intelligence, computational semantics, and with applications to health sciences, astronomy, geosciences, and environmental sciences. She has published more than 122 referred research papers, 1 book, and has 2 patents. She is an Associate Editor of the ACM Transaction on Knowledge Discovery from Data (TKDD), Knowledge and Information Systems (KAIS) and an editorial board member of the Journal of Information System Education (JISE), the Journal of Big Data, and the Social Network Analysis and Mining Journal. Her research projects are sponsored by NSF, NIH, NASA, and DOE. She is an IEEE senior member and an ACM senior member.