[DRAFT][EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks #867

bashirpartovi · 2025-04-08T14:25:31Z

Description

This is a draft for an experimental design proposal on how to orchestrate attacks. The idea is to modularize the components, separate the core attack algorithms and flows, and enhance the separation of concerns. Please keep in mind, this is not the final design, it is just a starting point for discussion. I've even included a CLI example to show how the code might look from the user's perspective.

I prioritized readability and chose the builder pattern, even though it might feel a bit verbose. You can also check out the diagrams folder to see the attack and class diagrams.

(as you can tell, this is my 3rd iteration on it v3)

pyrit/orchestratorv3/cli/launcher.py

romanlutz · 2025-04-08T23:03:44Z

pyrit/orchestratorv3/components/conversation_manager.py

+        self,
+        *,
+        orchestrator_identifier: dict[str, str],
+        memory: Optional[MemoryInterface] = None,


As you know, I'm not a huge fan of this because it means we could make an accidental mistake and pass different memory types in different places.

I think using memory internally within each individual class is an anti-pattern and actually a bad design because storage happens as a side effect which is not known to the user. Also, you end up with inconsistencies. For example, let's say I wrote a new orchestrator, and this orchestrator has a bug somewhere that causes the whole program to crash. If you point it to a SQL storage, you end up having a few entries till the program crashes. This causes inconsistent or (garbage data). I think we need to rethink how we do this.
Dependency injection is a well-known pattern that helps testing individual pieces much easier. You could have a warning for example if memory is provided. I am envisioning all of the message tracking to move to a conversation piece. Once the entire attack is finished, it can be offloaded to the storage.

Not sure I see the problem tbh but feel free to write up your concerns in a doc or example, whatever works!

It's consistent and works. Users very much know what's being used because they have to choose. We used to have explicit passing of memory and that led to confusion and mistakes. No longer possible!

The "garbage data" example can happen, but it's not like that matters. It's also possible in any other setup unless you only let the SDK commit entries to the DB when the entire conversation is finished which is overkill (?)
I'd go a step further and say we want data from failed runs to analyze.

Anyway, I'm not ruling out that we might be missing something here but I don't yet see benefits to do anything else. It's easy to test even now. We're doing it already after all.

romanlutz · 2025-04-08T23:06:26Z

pyrit/orchestratorv3/components/conversation_manager.py

+        - Tracks turn count for multi-turn conversations
+        - Extracts conversation state for continuation
+
+        Args:


Note: this should always be called "Parameters", not "Args." We're doing this wrong in a 1000 places but I want to spread the knowledge 😆

No worries, this is again mainly for showcasing the work

romanlutz · 2025-04-08T23:08:07Z

pyrit/orchestratorv3/components/conversation_manager.py

+        # By default, include in memory
+        return False  # Don't exclude from memory
+
+    def _extract_conversation_state(


From a method named like this I would imagine we have a return value.

we can name it as _populate_conversation_state.

romanlutz · 2025-04-08T23:09:21Z

pyrit/orchestratorv3/components/prompt_utils.py

+class PromptUtils:
+    """
+    Utility class for handling prompt normalization and validation.


Not a fan of the name. If it normalizes it should be the normalizer? Or maybe it should be part of the existing normalizer class?

100% agreed, I am not a fan either but couldn't find a better name for this. I thought about moving it to the prompt normalizer class but I don't fully understand what it does or what the responsibility of it is. I am not attached to the name at all. The point of this is that this logic should belong to a different component whether its prompt utils or prompt normalizer.

romanlutz · 2025-04-08T23:16:25Z

pyrit/orchestratorv3/multi_turn/base.py

+        target: PromptTarget, 
+        adversarial_chat: PromptChatTarget,
+        system_prompt_path: Path,
+        seed_prompt: Optional[str] = "How can I help you?",
+        prompt_normalizer: Optional[PromptNormalizer] = None,
+        memory: Optional[MemoryInterface] = None


There are lots of differences between orchestrators with this...

Crescendo has refusal AND objective scorer which use the same target but with different system prompts. [Note: system_prompt should be specific, too, otherwise system prompt for what exactly?]
TAP has off topic scorer and objective scorer.
One can perhaps envision others.
With the scanner and in ops the question of having multiple scorers in general came up. They may just be auxiliary and not influence how the algorithm proceeds, but having them is worthwhile nevertheless. We do this for PSO already.

I made this to have all the common params as input, the rest could be adjusted by individual attack builders as additional methods but we could have them all here as well. The only downside is, as you introduce new attacks with diff params, then this init also needs to be changed which is not desirable. It's best to be handled by the actual attack

romanlutz · 2025-04-08T23:17:30Z

pyrit/orchestratorv3/multi_turn/base.py

+        """Initialize the Multi-Turn Orchestrator Builder.
+
+        Args:
+            target: Target system to attack


We're calling that prompt_target everywhere else. Target is the generic term for any kind of target including adversarial_chat or scoring targets.

romanlutz · 2025-04-08T23:18:33Z

pyrit/orchestratorv3/multi_turn/base.py

+
+        return self
+
+    def with_seed_prompt(self: T_OrcBuilder, seed_prompt: Union[str | SeedPrompt]) -> T_OrcBuilder:


did you mean seed prompt group?

romanlutz · 2025-04-08T23:22:42Z

pyrit/orchestratorv3/multi_turn/red_teaming.py

+            context.achieved_objective = await self._evaluate_objective_achievement(context, target_response, score)
+
+            # Apply backtracking if configured and needed
+            if (not context.achieved_objective and 


It's more like all these things AND backtracking criteria are fulfilled. This could be refusal, or it could be low scores, for example. So that needs another way to customize IMO

romanlutz · 2025-04-08T23:25:04Z

pyrit/orchestratorv3/multi_turn/red_teaming.py

+            memory_labels=context.memory_labels
+        )
+
+    async def _generate_next_prompt(self, context: MultiTurnAttackContext) -> str:


This will be quite custom for all the techniques. [No action item, just FYI]

rlundeen2 · 2025-04-10T22:41:39Z

pyrit/orchestratorv3/multi_turn/base.py

+        seed_prompt: SeedPrompt,
+        prompt_normalizer: Optional[PromptNormalizer] = None,
+        prepend_conversation: Optional[List[SeedPrompt]] = None,
+        memory: Optional[MemoryInterface] = None,


I know this is a nit, but I would be intentional about all paradigm changes. E.g. even memory, if we can. I know it's an anti-pattern, but as a dev I think the consensus has been super nice to have a memory singleton

rlundeen2 · 2025-04-10T22:46:41Z

pyrit/orchestratorv3/multi_turn/base.py

+    def __init__(
+        self,
+        *,
+        target: PromptTarget, 


One paradigm I want to normalize on, and this might be a good time to tackle, is to take objective_scorer in the init vs initialize them in the class. It's becoming increasingly clear that's needed.

We need good ways to make it easy to initialize these though. Factories (like you introduce here) is one way. Another might be scanner configurations.

But either way, I think we both need the flexibility of passing a general "true_false" scorer AND the ease of use of having a default.

rlundeen2 · 2025-04-10T22:49:27Z

pyrit/orchestratorv3/models/context.py

+class ConversationSession:
+    """Session for conversations."""
+    conversation_id: str = field(default_factory=lambda: str(uuid.uuid4()))
+    adversarial_chat_conversation_id: str = field(default_factory=lambda: str(uuid.uuid4()))


we will likely need more of these. It'll be good to think through TAP which can have a ton of conversations due to branching.

We MAY be able to generalize by having a list of active_adversarial_chat_conversation_id (because some are thrown away, e.g. if backtracked or pruned).

In fact, I think that's a general piece of feedback. I'm positive about this change, but for the first iteration I'd prefer TAP. If it's general enough for TAP, then it will work elsewhere. My biggest concern is some of it may not generalize

Bashir Partovi and others added 3 commits April 8, 2025 10:18

[EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks

7752c12

Merge branch 'main' into dev/bashirpartovi/experimental-orc-design

d27b1f7

updated the docs

2337b51

romanlutz reviewed Apr 8, 2025

View reviewed changes

Merge branch 'main' into dev/bashirpartovi/experimental-orc-design

adbd469

rlundeen2 reviewed Apr 10, 2025

View reviewed changes

updated the design

32904d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT][EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks #867

[DRAFT][EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks #867

bashirpartovi commented Apr 8, 2025 •

edited

Loading

romanlutz Apr 8, 2025

bashirpartovi Apr 9, 2025

romanlutz Apr 10, 2025

romanlutz Apr 8, 2025

bashirpartovi Apr 9, 2025

romanlutz Apr 8, 2025

bashirpartovi Apr 9, 2025

romanlutz Apr 8, 2025

bashirpartovi Apr 9, 2025

romanlutz Apr 8, 2025

bashirpartovi Apr 9, 2025

romanlutz Apr 8, 2025

romanlutz Apr 8, 2025

romanlutz Apr 8, 2025

romanlutz Apr 8, 2025

rlundeen2 Apr 10, 2025

rlundeen2 Apr 10, 2025

rlundeen2 Apr 10, 2025

rlundeen2 Apr 10, 2025

rlundeen2 Apr 10, 2025


		return self

		def with_seed_prompt(self: T_OrcBuilder, seed_prompt: Union[str \| SeedPrompt]) -> T_OrcBuilder:

[DRAFT][EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks #867

Are you sure you want to change the base?

[DRAFT][EXPERIMENTAL] FEAT: Proposing a new design for Orchestration of Attacks #867

Conversation

bashirpartovi commented Apr 8, 2025 • edited Loading

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bashirpartovi commented Apr 8, 2025 •

edited

Loading