[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/p/tse/wpaper/129830.html
   My bibliography  Save this paper

ICS for complex data with application to outlier detection for density data objects

Author

Listed:
  • Thomas-Agnan, Christine
  • Mondon, Camille
  • Trinh, Thi-Huong
  • Ruiz-Gazen, Anne
Abstract
ICS (Invariant coordinate selection) is a method aimed at dimension reduction as a preliminary step for clustering and outlier detection. It can be applied on multivariate or functional data. This work introduces a coordinate-free definition of ICS and extends the ICS method to distributional data. Indeed the inherent constraints of density functions imply a necessary adaptation of functional ICS. Our first achievement is a coordinate-free version of ICS within the framework of Hilbert spaces, assuming that the data lies almost surely in a finite dimensional subspace. Using the Bayes space framework tailored for density functions, we express the centred log-ratio of the density curves in a subspace of L2 0(a, b) of zero-integral spline functions and conduct ICS in this finite dimensional subspace. We describe the different steps of the procedure for outlier detection and study the impact of some parameters of this procedure on the results. The methodology is then illustrated on a sample of daily maximum temperatures densities recorded across northern Vietnamese provinces between 1987 and 2016.

Suggested Citation

  • Thomas-Agnan, Christine & Mondon, Camille & Trinh, Thi-Huong & Ruiz-Gazen, Anne, 2024. "ICS for complex data with application to outlier detection for density data objects," TSE Working Papers 24_1585, Toulouse School of Economics (TSE).
  • Handle: RePEc:tse:wpaper:129830
    as

    Download full text from publisher

    File URL: https://www.tse-fr.eu/sites/default/files/TSE/documents/doc/wp/2024/wp_tse_1585.pdf
    File Function: Working Paper Version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ruiz-Gazen, Anne & Thomas-Agnan, Christine & Laurent, Thibault & Mondon, Camille, 2022. "Detecting outliers in compositional data using Invariant Coordinate Selection," TSE Working Papers 22-1320, Toulouse School of Economics (TSE).
    2. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    4. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    5. Dai, Wenlin & Mrkvička, Tomáš & Sun, Ying & Genton, Marc G., 2020. "Functional outlier detection and taxonomy by sequential transformations," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    6. Archimbaud, Aurore & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2018. "ICS for multivariate outlier detection with application to quality control," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 184-199.
    7. Tyler, David E., 2010. "A note on multivariate location and scatter statistics for sparse data sets," Statistics & Probability Letters, Elsevier, vol. 80(17-18), pages 1409-1413, September.
    8. Virta, Joni & Li, Bing & Nordhausen, Klaus & Oja, Hannu, 2020. "Independent component analysis for multivariate functional data," Journal of Multivariate Analysis, Elsevier, vol. 176(C).
    9. J. Machalová & K. Hron & G.S. Monti, 2016. "Preprocessing of centred logratio transformed density functions using smoothing splines," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(8), pages 1419-1435, June.
    10. Thomas-Agnan, Christine & Simioni, Michel & Trinh, Thi-Huong, 2023. "Discrete and Smooth Scalar-on-Density Compositional Regression for Assessing the Impact of Climate Change on Rice Yield in Vietnam," TSE Working Papers 23-1410, Toulouse School of Economics (TSE), revised Apr 2024.
    11. Aurore Archimbaud & Zlatko Drmac & Klaus Nordhausen & Una Radojicic & Anne Ruiz-Gazen, 2023. "Numerical Considerations and a New Implementation for Invariant Coordinate Selection," Post-Print hal-04038657, HAL.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Archimbaud, Aurore & Boulfani, Fériel & Gendre, Xavier & Nordhausen, Klaus & Ruiz-Gazen, Anne & Virta, Joni, 2021. "ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control," TSE Working Papers 21-1182, Toulouse School of Economics (TSE), revised Mar 2022.
    2. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Ruiz-Gazen, Anne & Thomas-Agnan, Christine & Laurent, Thibault & Mondon, Camille, 2022. "Detecting outliers in compositional data using Invariant Coordinate Selection," TSE Working Papers 22-1320, Toulouse School of Economics (TSE).
    4. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    5. Fischer, Daniel & Berro, Alain & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2019. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," TSE Working Papers 19-1001, Toulouse School of Economics (TSE).
    6. Dominique Guégan & Matteo Iacopini, 2018. "Nonparameteric forecasting of multivariate probability density functions," Documents de travail du Centre d'Economie de la Sorbonne 18012, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    7. Cristian Preda & Quentin Grimonprez & Vincent Vandewalle, 2021. "Categorical Functional Data Analysis. The cfda R Package," Mathematics, MDPI, vol. 9(23), pages 1-31, November.
    8. Virta, J., 2016. "One-step M-estimates of scatter and the independence property," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 133-136.
    9. Moritz Herrmann & Fabian Scheipl, 2021. "A Geometric Perspective on Functional Outlier Detection," Stats, MDPI, vol. 4(4), pages 1-41, November.
    10. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    11. Karel Hron & Jitka Machalová & Alessandra Menafoglio, 2023. "Bivariate densities in Bayes spaces: orthogonal decomposition and spline representation," Statistical Papers, Springer, vol. 64(5), pages 1629-1667, October.
    12. Ojo, Oluwasegun Taiwo & Fernández Anta, Antonio & Genton, Marc G., 2022. "Multivariate Functional Outlier Detection using the FastMUOD Indices," DES - Working Papers. Statistics and Econometrics. WS 35665, Universidad Carlos III de Madrid. Departamento de Estadística.
    13. Zhong, Rou & Liu, Shishi & Li, Haocheng & Zhang, Jingxiao, 2022. "Robust functional principal component analysis for non-Gaussian longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    14. Nordhausen, Klaus & Oja, Hannu & Tyler, David E., 2022. "Asymptotic and bootstrap tests for subspace dimension," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    15. Prieto, Francisco J. & Rendón, Carolina, 2014. "Independent components techniques based on kurtosis for functional data analysis," DES - Working Papers. Statistics and Econometrics. WS ws141006, Universidad Carlos III de Madrid. Departamento de Estadística.
    16. Thomas-Agnan, Christine & Simioni, Michel & Trinh, Thi-Huong, 2023. "Discrete and Smooth Scalar-on-Density Compositional Regression for Assessing the Impact of Climate Change on Rice Yield in Vietnam," TSE Working Papers 23-1410, Toulouse School of Economics (TSE), revised Apr 2024.
    17. Dargel, Lukas & Thomas-Agnan, Christine, 2024. "Pairwise share ratio interpretations of compositional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 195(C).
    18. Taskinen, Sara & Koch, Inge & Oja, Hannu, 2012. "Robustifying principal component analysis with spatial sign vectors," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 765-774.
    19. Talská, R. & Menafoglio, A. & Machalová, J. & Hron, K. & Fišerová, E., 2018. "Compositional regression with functional response," Computational Statistics & Data Analysis, Elsevier, vol. 123(C), pages 66-85.
    20. Hannu Oja & Davy Paindaveine & Sara Taskinen, 2009. "Parametric and nonparametric test for multivariate independence in IC models," Working Papers ECARES 2009_018, ULB -- Universite Libre de Bruxelles.

    More about this item

    Keywords

    Bayes spaces; distributional data; functional data; invariant coordinate selection; outlier detection; Vietnam temperature densities;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tse:wpaper:129830. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/tsetofr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.