llama-guard

Llama Guard is an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. If the input is determined to be safe, the response will be Safe. Else, the response will be Unsafe, followed by one or more of the violating categories:

S1: Violent Crimes.
S2: Non-Violent Crimes.
S3: Sex Crimes.
S4: Child Sexual Exploitation.
S5: Defamation.
S6: Specialized Advice.
S7: Privacy.
S8: Intellectual Property.
S9: Indiscriminate Weapons.
S10: Hate.
S11: Suicide & Self-Harm.
S12: Sexual Content.
S13: Elections.
S14: Code Interpreter Abuse.

This repository contains a Streamlit app for exploring content moderation with Llama Guard on Groq. Sign up for an account at GroqCloud and get an API token, which you'll need for this project.

Here's a sample response by Llama Guard upon detecting a prompt that violated a specific category.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llama-guard.png		llama-guard.png
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-guard

About

Sponsor this project

Languages

License

alphasecio/llama-guard

Folders and files

Latest commit

History

Repository files navigation

llama-guard

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages