sunlylorn

孙林 sunlylorn

一个专注于NLP、LLM、搜索等领域的典型技术男。

3 followers · 0 following

Lists (16)

Sort

Stars

mlfoundations / dclm

DataComp for Language Models

HTML 1,156 104 Updated Nov 18, 2024

mem0ai / mem0

The Memory layer for your AI apps

Python 22,861 2,103 Updated Nov 16, 2024

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

5,196 286 Updated Nov 11, 2024

BurntSushi / suffix

Fast suffix arrays for Rust (with Unicode support).

Rust 263 30 Updated Oct 10, 2023

adbar / trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python 3,647 261 Updated Nov 15, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,045 147 Updated Nov 15, 2024

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,249 108 Updated Nov 17, 2024

keirp / OpenWebMath

XSLT 117 8 Updated May 2, 2024

confident-ai / deepeval

The LLM Evaluation Framework

Python 3,694 292 Updated Nov 17, 2024

Qihoo360 / 360zhinao

360zhinao

Python 282 21 Updated Sep 11, 2024

FranxYao / Long-Context-Data-Engineering

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

Python 435 28 Updated Mar 19, 2024

wyu97 / GenRead

Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.

Python 278 20 Updated Jan 29, 2023

GAIR-NLP / ReAlign

Reformatted Alignment

JavaScript 112 6 Updated Sep 23, 2024

Improbable-AI / curiosity_redteam

Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizXgXU)

Jupyter Notebook 62 10 Updated Mar 15, 2024

EdinburghNLP / awesome-hallucination-detection

List of papers on hallucination detection in LLMs.

678 55 Updated Nov 1, 2024

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Python 741 45 Updated Nov 2, 2024

NtesEyes / pylane

An python vm injector with debug tools, based on gdb.

Python 359 34 Updated Nov 6, 2022

PKU-Alignment / safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Python 1,347 120 Updated Jun 13, 2024

arcee-ai / mergekit

Tools for merging pretrained large language models.

Python 4,816 439 Updated Nov 5, 2024

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 9,723 863 Updated Nov 12, 2024

pratyushasharma / laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Python 370 28 Updated Jul 9, 2024

vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

Python 1,236 48 Updated Nov 6, 2024

FreedomIntelligence / Medical_NLP

Medical NLP Competition, dataset, large models, paper

2,134 405 Updated Nov 17, 2024

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 7,243 743 Updated Nov 18, 2024

yule-BUAA / MergeLM

Codebase for Merging Language Models (ICML 2024)

Python 774 45 Updated May 5, 2024

SciPhi-AI / library-of-phi

161 18 Updated Oct 13, 2023

amazon-science / RefChecker

RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Language Models.

Python 302 31 Updated Nov 7, 2024

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,984 568 Updated Apr 16, 2024

princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Python 558 47 Updated Mar 4, 2024

CLUEbenchmark / SuperCLUE-Safety

SC-Safety: 中文大模型多轮对抗安全基准

107 7 Updated Mar 15, 2024

孙林 sunlylorn

Lists (16)

adapter

chatgpt

data annotation

Data Argument

dataset

metric

NLG

NLP

OCR

parallel

prompt learning

PTM

question answering

speech

text2image

word segment

Stars