Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology - Archive ouverte HAL
[go: up one dir, main page]

Pré-Publication, Document De Travail Année : 2022
Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology
1 IMT - Institut de Mathématiques de Toulouse UMR5219 (UPS IMT, F-31062 Toulouse Cedex 9, INSA Toulouse, F-31077 Toulouse, France UT1, F-31042 Toulouse, France UT2, F-31058 Toulouse, Téléphone : 05.61.55.67.90 - France)
"> IMT - Institut de Mathématiques de Toulouse UMR5219
2 LAAS-RIS - Équipe Robotique et InteractionS (France)
"> LAAS-RIS - Équipe Robotique et InteractionS
3 UVa - Universidad de Valladolid [Valladolid] (C/Plaza de Santa Cruz, 8, 47002 Valladolid - Espagne)
"> UVa - Universidad de Valladolid [Valladolid]

Résumé

This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space $\mathcal{P}(\mathbb{R}^2/\mathbb{Z}^2)$ with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions. We extend existing results of the theory of Optimal Transport to the $d$-dimensional flat torus $\mathbb{T}^d=\mathbb{R}^d/\mathbb{Z}^d$, in particular a Central Limit Theorem. Moreover, we propose different approaches for two-sample goodness-of-fit testing for the one and two-dimensional case, based on the Wasserstein distance. We prove their validity and consistency. We provide an implementation of these tests in \textsf{R}. Their performance is assessed by numerical experiments on synthetic data and illustrated by an application to protein structure data.
Fichier principal
Vignette du fichier
wgof_torus.pdf (995.39 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03369795 , version 1 (07-10-2021)
hal-03369795 , version 2 (13-04-2022)
hal-03369795 , version 3 (08-06-2023)
Identifiants
  • HAL Id : hal-03369795 , version 2

Citer

Javier González-Delgado, Alberto González-Sanz, Juan Cortés, Pierre Neuvial. Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology. 2022. ⟨hal-03369795v2⟩
371 Consultations
208 Téléchargements

Partager

More