PyCaret
PyCaret is an open-source, low-code machine learning library in Python that simplifies the process of training and deploying machine learning models. It offers a wide range of functions and features that make it easy to go from preparing your data to deploying your model within seconds.
With DagsHub, you can log the experiments you run with PyCaret to a remote server with minimal changes to your code.
This includes versioning raw and processed data with DVC, as well as logging experiment metrics, parameters, and trained models with MLflow. This integration enables you to continue using the familiar MLflow interface, while also facilitating collaboration with others, comparing results from different runs, and making data-driven decisions with ease.
How does PyCaret work with DagsHub?¶
By setting DagsHub as the logger of the experiment, it authenticates your DagsHub user and uses MLflow and DagsHub Client to log the information of the experiment to your DagsHub repository. We use built-in PyCaret callbacks to log the metrics and parameters of every run using MLflow, and the artifacts, as in data and trained model, using either MLflow or DVC. You can find the source code of the logger in the PyCaret repository.
How to log PyCaret Experiments on DagsHub?¶
Configurations¶
-
We will start by installing PyCaret, DagsHub, and MLflow by running the following command from the CLI
pip install pycaret dagshub mlflow
-
Configure DagsHub [optional] - To avoid the authentication process with DagsHub's servers, we can conduct one of the following options:
- Log in using the dagshub client.
dagshub login export MLFLOW_TRACKING_URI="<enter-your-MLflow-remote-DagsHub>"
Run an Experiment¶
-
Choose any one of PyCaret's many Machine Learning models and set DagsHub as the logger during initialization. === "Mac-os, Linux, Windows"
from pycaret.classification import * s = setup(..... , log_experiment="dagshub" , ....)
Authentication
If the DagsHub Logger is not already authenticated on your local machine, the terminal will prompt you to enter the repo_owner/repo_name
and provide an authentication link. The repository and remote MLflow server will then be automatically initialized in the background.
Congratulations, you’re all set to track your PyCaret experiments using DagsHub!
PyCaret will automatically detect that the integration is triggered and available and will ensure that it adds our hook to your pipeline. Now, when you run your code, you will see new runs appear in the experiment tables, with their status and origin
Additional Resources¶
- DagsHub x PyCaret - a full tutorial that showcases how to use DagsHub with PyCaret.
- Example notebook - create your own transformer model and track your experiments.
Known Issues, Limitations & Restrictions¶
If you do not set the ML_TRACKING_URI
environment variable, you will be prompted to enter the repo_owner/repo_name every time you run your experiment.
The latest feature of dagshub dagshub.init
which configures your repository with MLflow configuration does not set this variable, hence this method will still trigger the prompt.