llama

Llama

An example of generating text with Llama (1 or 2) using MLX.

Llama is a set of open source language models from Meta AI Research¹² ranging from 7B to 70B parameters. This example also supports Meta's Llama Chat and Code Llama models, as well as the 1.1B TinyLlama models from SUTD.³

Setup

Install the dependencies:

pip install -r requirements.txt

Next, download and convert the model. If you do not have access to the model weights you will need to request access from Meta:

[!TIP] Alternatively, you can also download a few converted checkpoints from the MLX Community organization on Hugging Face and skip the conversion step.

You can download the TinyLlama models directly from Hugging Face.

Convert the weights with:

python convert.py --torch-path <path_to_torch_model>

To generate a 4-bit quantized model use the -q flag:

python convert.py --torch-path <path_to_torch_model> -q

For TinyLlama use

python convert.py --torch-path <path_to_torch_model> --model-name tiny_llama

By default, the conversion script will make the directory mlx_model and save the converted weights.npz, tokenizer.model, and config.json there.

Run

Once you've converted the weights to MLX format, you can interact with the LlamA model:

python llama.py --prompt "hello"

Run python llama.py --help for more details.

For Llama v1 refer to the arXiv paper and blog post for more details. ↩
For Llama v2 refer to the blob post ↩
For TinyLlama refer to the gihub repository ↩

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
convert.py		convert.py
llama.py		llama.py
requirements.txt		requirements.txt
sample_prompt.txt		sample_prompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama

llama

README.md

Llama

Setup

Run

Files

llama

Directory actions

More options

Directory actions

More options

Latest commit

History

llama

Folders and files

parent directory

README.md

Llama

Setup

Run

Footnotes