Mage is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.
- Extract and synchronize data from 3rd party sources.
- Transform data with real-time and batch pipelines using Python, SQL, and R.
- Load data into your data warehouse or data lake using our pre-built connectors.
- Run, monitor, and orchestrate thousands of pipelines without losing sleep.
Plus hundreds of enterprise-class features, infrastructure innovations, and magical surprises.
For teams. Fully managed platform for integrating and transforming data. | Self-hosted. System to build, run, and manage data pipelines. |
Documentation 🌪️ Get a 5 min overview 🌊 Play with live tool 🔥 Get instant help
1️⃣ 🏗️
We designed an easy developer experience that you’ll enjoy.
↓
2️⃣ 🔮
Get instant feedback from your code each time you run it.
↓
3️⃣ 🚀
Easy for a solo developer or large team to scale up and manage thousands of pipelines.
Mage is an open-source data pipeline tool for transforming and integrating data.
The recommended way to install the latest version of Mage is through Docker with the following command:
docker pull mageai/mageai:latest
You can also install Mage using pip or conda, though this may cause dependency issues without the proper environment.
pip install mage-ai
conda install -c conda-forge mage-ai
Looking for help? The fastest way to get started is by checking out our documentation here.
Looking for quick examples? Open a demo project right in your browser or check out our guides.
Build and run a data pipeline with our demo app.
WARNING
The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).
Click the image to play video
- Load data from API, transform it, and export it to PostgreSQL
- Integrate Mage into an existing Airflow project
- Train model on Titanic dataset
- Set up dbt models and orchestrate dbt runs
🔮 Features
🎶 | Orchestration | Schedule and manage data pipelines with observability. |
📓 | Notebook | Interactive Python, SQL, & R editor for coding data pipelines. |
🏗️ | Data integrations | Synchronize data from 3rd party sources to your internal destinations. |
🚰 | Streaming pipelines | Ingest and transform real-time data. |
❎ | dbt | Build, run, and manage your dbt models with Mage. |
A sample data pipeline defined across 3 files ➝
- Load data ➝
@data_loader def load_csv_from_file() -> pl.DataFrame: return pl.read_csv('default_repo/titanic.csv')
- Transform data ➝
@transformer def select_columns_from_df(df: pl.DataFrame, *args) -> pl.DataFrame: return df[['Age', 'Fare', 'Survived']]
- Export data ➝
@data_exporter def export_titanic_data_to_disk(df: pl.DataFrame) -> None: df.to_csv('default_repo/titanic_transformed.csv')
What the data pipeline looks like in the UI ➝
New? We recommend reading about blocks and learning from a hands-on tutorial.
Every user experience and technical design decision adheres to these principles.
💻 | Easy developer experience | Open-source engine that comes with a custom notebook UI for building data pipelines. |
🚢 | Engineering best practices built-in | Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts. |
💳 | Data is a first-class citizen | Designed from the ground up specifically for running data-intensive workflows. |
🪐 | Scaling is made simple | Analyze and process large data quickly for rapid iteration. |
These are the fundamental concepts that Mage uses to operate.
Project | Like a repository on GitHub; this is where you write all your code. |
Pipeline | Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code. |
Block | A file with code that can be executed independently or within a pipeline. |
Data product | Every block produces data after it's been executed. These are called data products in Mage. |
Trigger | A set of instructions that determine when or how a pipeline should run. |
Run | Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc. |
Add features and instantly improve the experience for everyone.
Check out the contributing guide to set up your development environment and start building.
Individually, we’re a mage.
🧙 Mage
Magic is indistinguishable from advanced technology. A mage is someone who uses magic (aka advanced technology). Together, we’re Magers!
🧙♂️🧙 Magers (
/ˈmājər/
)A group of mages who help each other realize their full potential! Let’s hang out and chat together ➝
For real-time news, fun memes, data engineering topics, and more, join us on ➝
GitHub | |
Slack |
Check out our FAQ page to find answers to some of our most asked questions.
See the LICENSE file for licensing information.