Skip to content

Commit

Permalink
docs: fix broken anchor links (promptfoo#1645)
Browse files Browse the repository at this point in the history
  • Loading branch information
mldangelo authored Sep 12, 2024
1 parent 3bc6a14 commit 207c571
Show file tree
Hide file tree
Showing 15 changed files with 75 additions and 104 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,15 +233,15 @@ You can also output a nice [spreadsheet](https://docs.google.com/spreadsheets/d/

#### Model quality

In the [next example](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-3.5-vs-4), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
In the [next example](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-4o-vs-4o-mini), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:

```
npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html
npx promptfoo eval -p prompts.txt -r openai:gpt-4o openai:gpt-4o-mini -o output.html
```

Produces this HTML table:

![Side-by-side evaluation of LLM model quality, gpt3 vs gpt4, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)
![Side-by-side evaluation of LLM model quality, gpt-4o vs gpt-4o-mini, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)

## Usage (node package)

Expand Down
131 changes: 54 additions & 77 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
"shx": "^0.3.4",
"ts-node": "^10.9.2",
"tsconfig-paths": "^4.2.0",
"typescript": "^5.5.4",
"typescript": "^5.6.2",
"typescript-eslint": "^7.18.0",
"zod-to-json-schema": "^3.23.2"
},
Expand Down
2 changes: 1 addition & 1 deletion site/blog/llm-agent-red-teaming-plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ date: 2024-08-14

# New Red Teaming Plugins for LLM Agents: Enhancing API Security

We're excited to announce the release of three new red teaming plugins designed specifically for Large Language Model (LLM) agents with access to internal APIs. These plugins address critical security vulnerabilities outlined in the [OWASP API Security Top 10](https://owasp.org/www-project-api-security-top-10/):
We're excited to announce the release of three new red teaming plugins designed specifically for Large Language Model (LLM) agents with access to internal APIs. These plugins address critical security vulnerabilities outlined in the [OWASP API Security Top 10](https://genai.owasp.org/llm-top-10/):

1. [Broken Object Level Authorization (BOLA)](/docs/red-team/plugins/bola/)
2. [Broken Function Level Authorization (BFLA)](/docs/red-team/plugins/bfla/)
Expand Down
2 changes: 1 addition & 1 deletion site/blog/promptfoo-enterprise.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ To meet the needs of larger teams, Promptfoo Enterprise offers:
- Priority support with a 24-hour SLA
- Named account manager

If your company is interested in signing up for Promptfoo Enterprise **[contact us](mailto:inquiries@promptfoo.com)** so we can get you up and running with a proof of concept.
If your company is interested in signing up for Promptfoo Enterprise **[contact us](mailto:inquiries@promptfoo.dev)** so we can get you up and running with a proof of concept.
6 changes: 3 additions & 3 deletions site/docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ You can also output a nice [spreadsheet](https://docs.google.com/spreadsheets/d/

### Model quality

In [this next example](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-3.5-vs-4), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
In [this next example](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-4o-vs-4o-mini), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:

```yaml title=promptfooconfig.yaml
prompts:
Expand Down Expand Up @@ -289,9 +289,9 @@ A simple `npx promptfoo@latest eval` will run the example. Also note that you ca

Produces this HTML table:

![Side-by-side evaluation of LLM model quality, gpt3 vs gpt4, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)
![Side-by-side evaluation of LLM model quality, gpt-4o vs gpt-4o-mini, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)

Full setup and output [here](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-3.5-vs-4).
Full setup and output [here](https://github.com/promptfoo/promptfoo/tree/main/examples/gpt-4o-vs-4o-mini).

A similar approach can be used to run other model comparisons. For example, you can:

Expand Down
2 changes: 1 addition & 1 deletion site/docs/guides/mixtral-vs-gpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ npx promptfoo@latest eval -o results.csv

The comparison will provide you with a side-by-side performance view of Mixtral, GPT-4o-mini, and GPT-4o based on your test cases. Use this data to make informed decisions about which LLM best suits your application.

Contrast this with public benchmarks from the [Chatbot Arena](https://arena.lmsys.org/) leaderboard:
Contrast this with public benchmarks from the [Chatbot Arena](https://lmarena.ai/) leaderboard:

| Model | Arena rating | MT-bench score |
| -------------------------- | ------------ | -------------- |
Expand Down
2 changes: 1 addition & 1 deletion site/docs/guides/qwen-benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ npx promptfoo@latest eval -o results.csv

The comparison will provide you with a side-by-side performance view of Qwen, GPT-4, and Llama based on your customer support chatbot test cases. Use this data to make informed decisions about which LLM best suits your application.

Contrast this with public benchmarks from the [Chatbot Arena](https://chat.lmsys.org/?leaderboard) leaderboard:
Contrast this with public benchmarks from the [Chatbot Arena](https://lmarena.ai/?leaderboard) leaderboard:

| Model | Arena rating |
| -------------------- | ------------ |
Expand Down
2 changes: 1 addition & 1 deletion site/docs/red-team/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ redteam:
The RBAC plugin tests whether the agent respects predefined access control policies. The BOLA and BFLA plugins check if the agent can be tricked into accessing or modifying resources or functions beyond its intended scope.
Promptfoo's red teaming capabilities include many other OWASP vulnerabilities. Learn more about them [here](https://promptfoo.com/docs/red-team/owasp-llm-top-10).
Promptfoo's red teaming capabilities include many other OWASP vulnerabilities. Learn more about them [here](https://promptfoo.dev/docs/red-team/owasp-llm-top-10).
## Context Poisoning and Data Exfiltration
Expand Down
6 changes: 3 additions & 3 deletions site/docs/red-team/llm-vulnerability-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ This page documents categories of potential LLM vulnerabilities and failure mode

Potential failures are bucketed as follows:

- [Types of LLM vulnerabilities](#types-of-llm-vulnerabilities)
- [Types of LLM vulnerabilities](#)
- [Privacy and Security](#privacy-and-security)
- [Technical Vulnerabilities](#technical-vulnerabilities)
- [Criminal Activities](#criminal-activities)
- [Harmful Content](#harmful-content)
- [Misinformation and Misuse](#misinformation-and-misuse)
- [Common LLM vulnerabilities by application type](#common-llm-vulnerabilities-by-application-type)

Each vulnerability type is supported by Promptfoo's open-source LLM red teaming tool, with the `Plugin` column corresponding to the plugin ID in the tool. [Learn more](/docs/red-team/quickstart/).
Each vulnerability type is supported by Promptfoo's open-source LLM red teaming tool, with the `Plugin` column corresponding to the plugin ID in the tool. [Learn more](/docs/red-team/quickstart).

## Privacy and Security

Expand Down Expand Up @@ -79,7 +79,7 @@ Each vulnerability type is supported by Promptfoo's open-source LLM red teaming
| Hallucination | Generation of false or misleading information that undermines the reliability and trustworthiness of the system. | hallucination |
| Overreliance | Model susceptibility to incorrect user input, potentially propagating errors or misinformation. | overreliance |

# Common LLM vulnerabilities by application type
## Common LLM vulnerabilities by application type

The table below shows common vulnerabilities across different LLM application types.
A 🚨 indicates that the vulnerability is typically applicable to that application type, while a ✅ means it's generally not a concern for that type of application.
Expand Down
6 changes: 0 additions & 6 deletions site/docs/red-team/plugins/prompt-extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,4 @@ Testing for prompt extraction vulnerabilities is critical for:

By incorporating the Prompt Extraction plugin in your LLM red teaming strategy, you can identify and address potential vulnerabilities in your AI system's ability to protect its core instructions and maintain its intended role.

## Related Concepts

- [Information Disclosure](../llm-vulnerability-types.md#privacy-and-security)
- [Social Engineering](../llm-vulnerability-types.md#social-engineering)
- [Model Inversion Attacks](../llm-vulnerability-types.md#model-inversion)

For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our [Types of LLM Vulnerabilities](/docs/red-team/llm-vulnerability-types) page.
2 changes: 1 addition & 1 deletion site/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const config = {

onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'throw',

onBrokenAnchors: 'throw',
// Even if you don't use internalization, you can use this field to set useful
// metadata like html lang. For example, if your site is Chinese, you may want
// to replace "en" with "zh-Hans".
Expand Down
Loading

0 comments on commit 207c571

Please sign in to comment.