pyarrow

Here are 39 public repositories matching this topic...

vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

visualization python data-science machine-learning bigdata tabular-data hdf5 machinelearning dataframe memory-mapped-file pyarrow

Updated Oct 8, 2024
Python

ibis-project / ibis

Star

the portable Python dataframe library

mysql python bigquery sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql trino pyarrow datafusion duckdb polars

Updated Nov 27, 2024
Python

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

machine-learning deep-learning tensorflow pytorch pyspark parquet parquet-files sysml pyarrow

Updated Dec 2, 2023
Python

narwhals-dev / narwhals

Star

Lightweight and extensible compatibility layer between dataframe libraries!

pandas dask ibis vaex pyarrow modin cudf duckdb polars

Updated Nov 26, 2024
Python

dacort / faker-cli

Star

Command-line interface to quickly generate fake CSV and JSON data

aws json csv parquet faker-provider pyarrow deltalake

Updated Jul 11, 2024
Python

icaropires / pdf2dataset

Star

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

python pdf distributed-systems data-science ocr pandas-dataframe parallel distributed-computing tesseract python3 tesseract-ocr parquet ray pdftotext pytesseract pdf2image pyarrow pytesseract-ocr

Updated Sep 20, 2020
Python

kraina-ai / overturemaestro

Star

An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features

python open-source openstreetmap geo geospatial pyarrow overturemaps overture-maps

Updated Nov 24, 2024
Python

ismailhammounou / db2ixf

Star

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.

Updated Mar 16, 2024
Python

zen-xu / pyarrow-stubs

Sponsor

Star

Type annotations for pyarrow

typing pyarrow

Updated Nov 25, 2024
Python

legout / pydala

Star

Poor mans simple python api for creating a local or remote datalake based on several (pyarrow) datasets using duckdb

datalake pyarrow duckdb

Updated Jul 14, 2023
Python

DanielAvdar / pandas-pyarrow

Sponsor

Star

Seamlessly switch Pandas DataFrame backend to PyArrow.

python backend arrow pandas-dataframe pandas pyarrow pandas-tricks-for-data-manipulation dtypes db-dtypes pandas-pyarrow pandas-arrow

Updated Nov 26, 2024
Python

mercator-labs / oakstore

Star

highspeed timeseries pandas dataframe database

python finance data-science machine-learning database big-data timeseries deep-learning pandas dataset parquet deeplearning dask datawarehouse pyarrow

Updated Oct 28, 2024
Python

xbrianh / xdlake

Star

A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.

python spark hive parquet databricks pyarrow delta-lake deltalake deltatables

Updated Oct 12, 2024
Python

jaysnm / dremio-arrow

Star

Dremio Arrow Flight Client

python r pandas dataframe dremio pyarrow dremio-arrow

Updated Mar 20, 2024
Python

lykmapipo / Python-Spark-Log-Analysis

Star

Python scripts to process, and analyze log files using PySpark.

Updated Jul 13, 2024
Python

legout / pydala2

Star

poor man´s data lake - Simple api to efficiently query your parquet datasets using Duckdb or polars

python local pandas object-storage pyarrow localcache duckdb fsspec polars

Updated Nov 22, 2024
Python

kiwi0fruit / featherhelper

Star

Concise interface to cache numpy arrays and pandas dataframes

python cache numpy pandas pyarrow

Updated Jan 22, 2019
Python

thread53 / pqviewer

Star

View Apache Parquet Files In Your Terminal

python terminal textual parquet pyarrow

Updated Aug 2, 2024
Python

psmyth94 / biosets

Star

A bioinformatics extension of 🤗 Datasets library, built for ML applications on biological and omics data, offering easy integration of metadata and low-code data management tools.

Updated Nov 16, 2024
Python

miraisolutions / apache-arrow-flight-python-example

Star

Code examples / snippets for website news post

python pyarrow arrow-flight

Updated Feb 16, 2022
Python

Improve this page

Add a description, image, and links to the pyarrow topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyarrow topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyarrow

Here are 39 public repositories matching this topic...

vaexio / vaex

ibis-project / ibis

uber / petastorm

narwhals-dev / narwhals

dacort / faker-cli

icaropires / pdf2dataset

kraina-ai / overturemaestro

ismailhammounou / db2ixf

zen-xu / pyarrow-stubs

legout / pydala

DanielAvdar / pandas-pyarrow

mercator-labs / oakstore

xbrianh / xdlake

jaysnm / dremio-arrow

lykmapipo / Python-Spark-Log-Analysis

legout / pydala2

kiwi0fruit / featherhelper

thread53 / pqviewer

psmyth94 / biosets

miraisolutions / apache-arrow-flight-python-example

Improve this page

Add this topic to your repo