The goal of tidyclust is to provide a tidy, unified interface to clustering models. The packages is closely modeled after the parsnip package.
You can install the released version of tidyclust from CRAN with:
install.packages("tidyclust")
and the development version of tidyclust from GitHub with:
# install.packages("pak")
pak::pak("tidymodels/tidyclust")
The first thing you do is to create a cluster specification
. For this
example we are creating a K-means model, using the stats
engine.
library(tidyclust)
set.seed(1234)
kmeans_spec <- k_means(num_clusters = 3) %>%
set_engine("stats")
kmeans_spec
#> K Means Cluster Specification (partition)
#>
#> Main Arguments:
#> num_clusters = 3
#>
#> Computational engine: stats
This specification can then be fit using data.
kmeans_spec_fit <- kmeans_spec %>%
fit(~., data = mtcars)
kmeans_spec_fit
#> tidyclust cluster object
#>
#> K-means clustering with 3 clusters of sizes 7, 14, 11
#>
#> Cluster means:
#> mpg cyl disp hp drat wt qsec vs
#> 1 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286
#> 2 15.10000 8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000
#> 3 26.66364 4 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909
#> am gear carb
#> 1 0.4285714 3.857143 3.428571
#> 2 0.1428571 3.285714 3.500000
#> 3 0.7272727 4.090909 1.545455
#>
#> Clustering vector:
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 1 1 3 1
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 2 1 2 3
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 3 1 1 2
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 2 2 2 2
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 2 3 3 3
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 3 2 2 2
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 2 3 3 3
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 2 1 2 3
#>
#> Within cluster sum of squares by cluster:
#> [1] 13954.34 93643.90 11848.37
#> (between_SS / total_SS = 80.8 %)
#>
#> Available components:
#>
#> [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
#> [6] "betweenss" "size" "iter" "ifault"
Once you have a fitted tidyclust object, you can do a number of things.
predict()
returns the cluster a new observation belongs to
predict(kmeans_spec_fit, mtcars[1:4, ])
#> # A tibble: 4 × 1
#> .pred_cluster
#> <fct>
#> 1 Cluster_1
#> 2 Cluster_1
#> 3 Cluster_2
#> 4 Cluster_1
extract_cluster_assignment()
returns the cluster assignments of the
training observations
extract_cluster_assignment(kmeans_spec_fit)
#> # A tibble: 32 × 1
#> .cluster
#> <fct>
#> 1 Cluster_1
#> 2 Cluster_1
#> 3 Cluster_2
#> 4 Cluster_1
#> 5 Cluster_3
#> 6 Cluster_1
#> 7 Cluster_3
#> 8 Cluster_2
#> 9 Cluster_2
#> 10 Cluster_1
#> # … with 22 more rows
and extract_centroids()
returns the locations of the clusters
extract_centroids(kmeans_spec_fit)
#> # A tibble: 3 × 12
#> .cluster mpg cyl disp hp drat wt qsec vs am gear carb
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Cluster_1 19.7 6 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43
#> 2 Cluster_2 26.7 4 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
#> 3 Cluster_3 15.1 8 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
Below is a visualization of the available models and how they compare using 2 dimensional toy data sets.
This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
-
For questions and discussions about tidymodels packages, modeling, and machine learning, please post on RStudio Community.
-
If you think you have encountered a bug, please submit an issue.
-
Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code.
-
Check out further details on contributing guidelines for tidymodels packages and how to get help. Footer