ANNULÉ / Inderjit S. Dhillon  » NOMAD : A Distributed Framework for Latent Variable Models »

Jeudi  9 juillet  2015 de 14h00 à 15h30

Auditorium IRCICA, parc scientifique de la Haute Borne à Villeneuve d’Ascq

√ Abstract :

Latent variable models are the cornerstone for many machine learning problems. As data grows in size and complexity, it is a contemporary challenge todevelop scalable and distributed algorithms for this task. In this talk, I willfocus on two such problems of considerable current interest: matrix completion and topic modeling. We tackle these problems by developing a new framework, which we call NOMAD. In both our problems, certain variables behave NOMAD-ically,as they migrate from processor to processor after performing their tasks at each processor. As a result of our framework, the corresponding distributed algorithms are decentralized, lock-free, asynchronous and serializable (or almost serializable).
As a result of these properties, our NOMAD-ic algorithms exhibit good scaling behavior on matrix completion problems with billions of ratings, and topic modeling problems with billions of words. As examples, on a distributed machine with 32 processors where each processor has 4 cores, we can solve a matrix completion problem with 2.7B ratings in 10 minutes, and a topic modeling problem with 1.5B word occurrences and 1024 topics in 16 minutes.

Joint work with Hsiang-Fu Yu, Cho-Jui Hsieh, H. Yun and S.V.N. Vishwanathan


Les commentaires sont fermés