Stars
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Laminar - open-source all-in-one platform for engineering AI products. Traces, Evals, Datasets, Labels. YC S24.
1-Click is all you need.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…
devflowinc / firecrawl-simple
Forked from mendableai/firecrawl➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready …
For optimization algorithm research and development.
Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.
A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A curated reading list of research in Mixture-of-Experts(MoE).
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Implementation of "Attention Is Off By One" by Evan Miller
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
A simple screen parsing tool towards pure vision based GUI agent
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
[EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"