kedro-aim
is a kedro-plugin that enables tracking of metrics and parameters with Aim from within Kedro.
Kedro is a great tool for data engineering and data science, but it lacks a clear way to track metrics and parameters.
Aim is a great tool for tracking metrics and parameters, but it lacks a clear way to integrate with Kedro.
This plugin aims to solve both problems.
- Automatic Registration of Aim
Run
in Data Catalog - Tracking of Artifact with Aim DataSet
- Configuration over
aim.yml
Install the package with pip
:
pip install kedro-aim
The plugin automatically registers a Run instance in the DataCatalog under the name run
which can be accessed by all nodes to log metrics and parameters.
This run instance can be used track metrics and parameters in the same way as in any other python project
First you need to initilize the aim.yml
config file inside your pre-existing Kedro project.
This can be done by running the following command:
kedro aim init
In order to use aim
inside a node you need to pass the run object as a argument of the function.
Inside the function you can access the run object and use it to log metrics and parameters.
# nodes.py
import pandas as pd
from aim import Run
def logging_in_node(run: Run, data: pd.DataFrame) -> None:
# track metric
run.track(0.5, "score")
# track parameter
run["parameter"] = "abc"
When defining the pipeline, you need to pass the run
dataset as a input to the node.
The run
dataset will be automatically created by kedro-aim
and added to the DataCatalog.
As a result, the run
dataset will be passed to the node as an argument.
# pipeline.py
from kedro.pipeline import node, Pipeline
from kedro.pipeline.modular_pipeline import pipeline
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[
node(
func=logging_in_node,
inputs=["run", "input_data"],
outputs=None,
name="logging_in_node",
)
]
)
The module is configured via the aim.yml
file which should be placed inside the conf/base
folder.
A default config file can be generated using the kedro aim init
command from the shell.
You can enable the schema validation in your VSCode IDE to enable real-time validation, autocompletion and see information about the different fields in your catalog as you write it. To enable this, make sure you have the YAML plugin installed.
Then enter the following in your settings.json
file:
{
"yaml.schemas": {
"https://raw.githubusercontent.com/AnH0ang/kedro-aim/master/static/jsonschema/kedro_aim_schema.json": "**/*aim*.yml"
}
}
This project was inspired by the work of kedro-mlflow which is a plugin for Kedro that enables tracking of metrics and parameters with MLflow from within Kedro.