PromptShot - An AI red teaming framework for large language model security research.

PromptShot

An AI red teaming framework for large language model security research.

Pitch

PromptShot offers a comprehensive adversarial attack pipeline tailored for exploring and enhancing the security of modern Large Language Models. With advanced techniques for jailbreak generation, prompt poisoning, and persona hijacking, this framework empowers researchers to assess vulnerabilities while ensuring responsible usage in compliance with security protocols.

Description

PromptShot v5.4 is a cutting-edge AI red teaming framework tailored for security research and vulnerability assessment of Large Language Models (LLMs). Utilizing innovative jailbreak techniques rooted in the Elder Plinus methodology, PromptShot empowers researchers to generate complex adversarial prompts that probe the defenses of modern LLMs.

Key Features

24 Skeleton Transforms: Complete implementation of the Elder Plinus manifest, designed to exploit various vulnerabilities in LLM safety systems.
Zero Fingerprint Engine: Offers vast variation pools to evade detection, enhancing the stealth capability of generated prompts.
Various Generation Modes: Choose from seven distinct modes, ranging from stealthy to maximum aggression, allowing tailored approaches based on assessment needs.
Skeleton Chaining: Stack between 1 to 10 transforms per payload for more nuanced attacks.
Vendor Optimization: Optimized weights for major LLM providers such as OpenAI, Anthropic, xAI, Google, and Meta, ensuring effective interaction with various architectures.
High-Entropy Dividers and Synergy Stacks: Introduces advanced techniques for semantic boundary manipulation and pre-configured technique combinations, facilitating sophisticated adversarial scenarios.

The 24-Skeleton Manifest

The core methodology defines 24 distinct adversarial transforms that target specific safety system vulnerabilities:

Persona Manipulation: Techniques for identity control within the model (e.g., RPC, SPO).
Output Structure: Control the output format and structure through various methods (e.g., DSOC, TPE).
Intent Inversion: Alter the intention behind user prompts (e.g., SIL, IRS).
Boundary Erosion: Break down the hard constraints imposed on the model (e.g., CES, BRP).
Self-Reference Loops: Create loops within the model's output for unexpected responses (e.g., IMS, SLF).

Generation Modes

Choose from several generation modes based on the nature of the testing:

stealth: Mimics natural language to probe initial defenses.
balanced: Optimal mix of power and risk for standard tests.
aggressive: Harnesses maximum capabilities for in-depth assessments.
skeleton: Directly applies chained skeleton techniques for peak bypass effectiveness.

Example Usage

Basic command for generating a prompt:

python pipeline.py -s "explain buffer overflow exploitation" -m balanced

For maximum bypass attempts with skeleton chaining:

python pipeline.py -s "sensitive query" -m skeleton --skeleton-depth 8 --vendor openai -v

Conclusion

PromptShot is designed exclusively for authorized security research, offering invaluable tools for those seeking to understand and enhance the security posture of advanced AI systems. This framework not only broadens the understanding of attack vectors but also emphasizes responsible usage in the growing field of AI and machine learning security research.

0 comments

No comments yet.

New comment