Comparing changes

What's New: New Features & Enhancements - Introduced Multistage Attack: We've added a novel `multistage_depth` parameter to the `start_testing()` fucntion, allowing users to specify the depth of a dialogue during testing, enabling more sophisticated and targeted LLM Red teaming strategies. - Refactored Sycophancy Attack: The `sycophancy_test` has been renamed to `sycophancy`, transforming it into a multistage attack for increased effectiveness in uncovering model vulnerabilities. - Enhanced Logical Inconsistencies Attack: The `logical_inconsistencies_test` has been renamed to `logical_inconsistencies` and restructured as a multistage attack to better detect and exploit logical weaknesses within language models. - New Multistage Harmful Behavior Attack: Introducing `harmful_behaviour_multistage`, a more nuanced version of the original harmful behavior attack, designed for deeper penetration testing. - Innovative System Prompt Leakage Attack: We've developed a new multistage attack, `system_prompt_leakage`, leveraging jailbreak examples from dataset to target and exploit model internals. Improvements & Refinements - Conducted extensive refactoring for improved code efficiency and maintainability across the framework. - Made numerous small improvements and optimizations to enhance overall performance and user experience. --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

* small fix for attacks and add strip parameter for ChatSession --------- Co-authored-by: Низамов Тимур Дамирович <abc@nizamovtimur.ru>

* Add Crescendo attack * Add BON attack * Add Docker example with Jupyter Notebook and installed LLAMATOR * Improve attack system prompt for Prompt Leakage * Other minor improvements and bug fixes --------- Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

* Add HarmBench Prompts * Add Suffix Attack * Remake Harmful Behavior Attack --------- Co-authored-by: Shine-afk <belyaevskij.nikita@gmail.com> Co-authored-by: Timur Nizamov <abc@nizamovtimur.ru> Co-authored-by: Nikita Ivanov <nikita.ivanov.778@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Jan 14, 2025

Commits on Jan 18, 2025

Commits on Feb 5, 2025

Commits on Feb 10, 2025

This comparison is taking too long to generate.