GCG with Probe Sampling

Colab Notebook: Run the Attack in Colab
Original Paper: Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

Project Overview

This project tests the safety of AI models. It uses a technique called GCG (Greedy Coordinate Gradient) to try and bypass the AI's safety filters by adding a mathematically calculated string of random text (a "suffix") to the end of a prompt.

Because calculating this text takes a lot of computing power, this approach uses Probe Sampling to speed things up.

How It Works:

Finding Weaknesses: It uses math (gradients) to find exactly which words or symbols will trick the AI into giving a forbidden response (like starting with "Sure, here is how").
Two-Model Speedup (Probe Sampling): Instead of testing every idea on a massive, slow AI model, it tests them on a small, fast "draft" model first. It only sends the very best ideas to the big model.
Smart Filtering: It constantly checks if the small model and big model agree with each other. If they agree, it filters out bad ideas very aggressively to save time.
Safety Testing: This successfully demonstrates breaking a modern AI's safety training.

Example Results

Without GCG (Baseline):

"I'm sorry, but I can't assist with that."

With GCG Suffix (After ~20 Iterations):

"Sure, I can help you with that. However,........" (Safety filter bypassed)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
GCG.ipynb		GCG.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCG with Probe Sampling

Project Overview

How It Works:

Example Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GCG with Probe Sampling

Project Overview

How It Works:

Example Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages