An Information-Theoretic Approach to Distributed Learning. Distributed Source Coding Under Logarithmic Loss

Thèse Année : 2019

An Information-Theoretic Approach to Distributed Learning. Distributed Source Coding Under Logarithmic Loss Approche théorie de l’information à l’apprentissage statistique. Codage distribué de sources sous mesure de fidélité logarithmique

(1)

1 (Université Gustave Eiffel, Cité Descartes, Bâtiment Copernic, 5 bd Descartes, 77454 Marne-la-Vallée Cedex 2 - France) 1001627

ENPC - École nationale des ponts et chaussées (École nationale des ponts et chaussées, 6-8 avenue Blaise-Pascal, Cité Descartes, Champs-sur-Marne, 77455 Marne-la-Vallée cedex 2 - France) 301545
CNRS - Centre National de la Recherche Scientifique : UMR8049 (France) 441569
Université Gustave Eiffel (Cité Descartes, 5 Boulevard Descartes • Champs-sur-Marne, 77454 Marne-la-Vallée Cedex 2 - France) 580722

"> LIGM - Laboratoire d'Informatique Gaspard-Monge

Yigit Ugur

Fonction : Auteur
PersonId : 182631
IdHAL : yigit-ugur
ORCID : 0000-0003-1835-6964

Laboratoire d'Informatique Gaspard-Monge

Résumé

One substantial question, that is often argumentative in learning theory, is how to choose a `good' loss function that measures the fidelity of the reconstruction to the original. Logarithmic loss is a natural distortion measure in the settings in which the reconstructions are allowed to be `soft', rather than `hard' or deterministic. In other words, rather than just assigning a deterministic value to each sample of the source, the decoder also gives an assessment of the degree of confidence or reliability on each estimate, in the form of weights or probabilities. This measure has appreciable mathematical properties which establish some important connections with lossy universal compression. Logarithmic loss is widely used as a penalty criterion in various contexts, including clustering and classification, pattern recognition, learning and prediction, and image processing. Considering the high amount of research which is done recently in these fields, the logarithmic loss becomes a very important metric and will be the main focus as a distortion metric in this thesis. In this thesis, we investigate a distributed setup, so-called the Chief Executive Officer (CEO) problem under logarithmic loss distortion measure. Specifically, agents observe independently corrupted noisy versions of a remote source, and communicate independently with a decoder or CEO over rate-constrained noise-free links. The CEO also has its own noisy observation of the source and wants to reconstruct the remote source to within some prescribed distortion level where the incurred distortion is measured under the logarithmic loss penalty criterion. One of the main contributions of the thesis is the explicit characterization of the rate-distortion region of the vector Gaussian CEO problem, in which the source, observations and side information are jointly Gaussian. For the proof of this result, we first extend Courtade-Weissman's result on the rate-distortion region of the discrete memoryless (DM) $K$-encoder CEO problem to the case in which the CEO has access to a correlated side information stream which is such that the agents' observations are independent conditionally given the side information and remote source. Next, we obtain an outer bound on the region of the vector Gaussian CEO problem by evaluating the outer bound of the DM model by means of a technique that relies on the de Bruijn identity and the properties of Fisher information. The approach is similar to Ekrem-Ulukus outer bounding technique for the vector Gaussian CEO problem under quadratic distortion measure, for which it was there found generally non-tight; but it is shown here to yield a complete characterization of the region for the case of logarithmic loss measure. Also, we show that Gaussian test channels with time-sharing exhaust the Berger-Tung inner bound, which is optimal. Furthermore, application of our results allows us to find the complete solutions of three related problems: the quadratic vector Gaussian CEO problem with \textit{determinant} constraint, the vector Gaussian distributed hypothesis testing against conditional independence problem and the vector Gaussian distributed Information Bottleneck problem. With the known relevance of the logarithmic loss fidelity measure in the context of learning and prediction, developing algorithms to compute the regions provided in this thesis may find usefulness in a variety of applications where learning is performed distributively. Motivated from this fact, we develop two type algorithms: i) Blahut-Arimoto (BA) type iterative numerical algorithms for both discrete and Gaussian models in which the joint distribution of the sources are known; and ii) a variational inference type algorithm in which the encoding mappings are parameterized by neural networks and the variational bound approximated by Monte Carlo sampling and optimized with stochastic gradient descent for the case in which there is only a set of training data is available. Finally, as an application, we develop an unsupervised generative clustering framework that uses the variational Information Bottleneck (VIB) method and models the latent space as a mixture of Gaussians. This generalizes the VIB which models the latent space as an isotropic Gaussian which is generally not expressive enough for the purpose of unsupervised clustering. We illustrate the efficiency of our algorithms through some numerical examples.

Une problème de fond, qui suscite un intérêt croissant en théorie de l’apprentissage statistique, est de déterminer des limites d'apprentissage d'une architecture donnée. Aussi, comment choisir la structure la mieux adaptée a un problème donné? Quelle fonction 'risque' fait le plus sens? Ces questions sont d'autant plus importantes que les principales critiques faites aux approches actuelles reposent sur le fait que celles-ci sont très largement expérimentales, et donc sans garanties réelles de reproductibilité ('black-box' approaches). Dans cette thèse, nous adoptons une approche théorie de l'information au problème. En réalité, le problème d'inférence statistique est fondamentalement un problème de codage de source sous mesure de distorsion logarithmique ('logarithmic loss'). Cela peut être démontre a partir des principes de base. La mesure de distorsion logarithmique est largement utilisée comme critère de pénalité dans divers contextes, notamment la prédiction linéaire, la classification, reconnaissance de formes, ainsi que le traitement d'images. Motivés par le problème d'apprentissage a partir de données partielles, nous étudions les limites d'apprentissage du problème de 'Multiview learning'. Nous établissons des limites fondamentales sur la capacité d'apprentissage dans ce contexte en étudiant, et résolvent, le problème de codage de source distribue sous contrainte de distorsion logarithmique. Dans ce problème, dit 'Chief Executive Officer (CEO) problem', des agents observent les versions bruitées d'une source distante. Les agents communiquent de façon indépendante avec un décodeur commun via des liens a capacité finie. Le décodeur commun, dit aussi CEO, a également sa propre observation bruite de la source et souhaite reconstruire la source distante avec un niveau de fidélité donne, ou la fidélité est mesurée selon le critère de pénalité de perte logarithmique. Une des contributions principales de cette thèse est la caractérisation explicite de la région taux de compression / distorsion du problème de codage distribue de sources (CEO problem) sous perte logarithmique dans le cas ou les sources sont vectorielles et Gaussiennes. Pour la preuve de ce résultat, nous commençons par étendre un résultat important de Courtade-Weissman pour le cas spécifique de même modèle ou le CEO ne possède pas sa propre observation dans le cas discret sans mémoire. Ensuite, nous établissons une borne externe sur la région taux de compression / distorsion du problème du problème CEO avec sources vectorielles Gaussiennes en évaluant la limite extérieure du modèle discret sans mémoire au moyen d'une technique qui s'appuie sur l'identité de Bruijn et les propriétés des informations de Fisher. L'application de nos résultats au problème d'apprentissage via la technique Information Bottleneck nous permet d'établir le compromis optimal entre niveau de précision et capacité de généralisation. De plus, nous développons des algorithmes d'apprentissage distribue, qui sont applicables aussi bien au cas de données discrètes (exemple, classification) et que de données continues (exemple régression linéaire). En particulier, un de nos algorithme est de type inférence variationnelle dans lequel les mappings sont paramétrés par des réseaux de neurones et la fonction coût approchée par échantillonnage de Monte Carlo et optimisé avec descente de gradient stochastique. Nous montrons que nos algorithmes convergent; et nous proposons des structures d'apprentissage basées sur les réseaux de neurones qui permettent d'implémenter efficacement ces algorithmes. Les algorithmes et architectures développés dans cette these sont valides par des résultats expérimentales. Enfin, en tant qu'application, nous développons un cadre de clustering génératif non supervisé qui utilise le modèle VIB et modélise l'espace latent comme un mélange de gaussiennes.

Mots clés

multiterminal source coding CEO problem rate-distortion region logarithmic loss quadratic loss hypothesis testing Information Bottleneck Blahut-Arimoto algorithm distributed learning distributed classification unsupervised clustering

codage distribue de sources Chief Executive Officer (CEO) perte logarithmique région taux de compression / distorsion Information Bottleneck perte quadratique Apprentissage distribué Classification distribuee clustering non supervisé

Domaines

Théorie de l'information [cs.IT] Machine Learning [stat.ML]

Fichier principal

YigitUGUR_PhD_thesis.pdf (3.61 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Yigit Ugur : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-02489734

Soumis le : lundi 24 février 2020-16:01:18

Dernière modification le : jeudi 19 décembre 2024-16:50:04

Archivage à long terme le : lundi 25 mai 2020-18:48:01

Dates et versions

tel-02489734 , version 1 (24-02-2020)

Identifiants

HAL Id : tel-02489734 , version 1

Citer

Yigit Ugur. An Information-Theoretic Approach to Distributed Learning. Distributed Source Coding Under Logarithmic Loss. Information Theory [cs.IT]. Université Paris-Est, 2019. English. ⟨NNT : 2019PESC2062⟩. ⟨tel-02489734⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS LIGM_SIGNAL PARISTECH LIGM THESES-UNC UNIV-EIFFEL LIGM_MMSID

162 Consultations

332 Téléchargements