Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization

Overview

This repository provides the code, dataset, and resources for the research project investigating how large language models (LLMs) can be manipulated by hidden instructions embedded within HTML web pages (prompt injection). The project examines the impact of such hidden prompts on automated web summarization and evaluates model behavior using both qualitative and quantitative methods.

What’s Inside

280 HTML webpages: 140 clean (normal), 140 with various HTML-based prompt injection attacks. (Not all HTML pages in HTML folder were used during evaluation, only pages with hidden tags were kept)
Python scripts: For generating pages, extracting content, summarizing via LLMs, and evaluating summary changes.
Data and results: Includes sample LLM summaries with and without attacks, evaluation metrics, and metadata file.

Directory Structure

clean/
images/
injected/
evaluation.py
file_generation.py
gemma.csv
llama.csv
metadata.csv

Project Workflow

Generate Pages: Create HTML web pages, both clean and with injected HTML prompt attacks.
Extract Content: Use automated scripts to collect both the raw HTML and user-visible text from each page.
Summarization: Input web page content into LLMs (Llama 4 Scout and Gemma 9B IT) to generate summaries.
Output Comparison: Measure the difference in summaries with metrics like ROUGE-L and SBERT, and check for successful prompt injections.
Manual Annotation: Manually confirm cases where the hidden prompt caused a significant change in summary content or style.

Key Findings

LLMs are susceptible to invisible HTML prompt injections, which can significantly alter summary outputs.
Certain techniques, such as using meta tags and invisible (opacity) divs, are particularly effective.
Detailed metric results and qualitative examples are available in the output and CSV files.

License

This work is licensed under the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization

Overview

What’s Inside

Directory Structure

Project Workflow

Key Findings

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
clean		clean
images		images
injected		injected
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
file_generation.py		file_generation.py
gemma.csv		gemma.csv
llama.csv		llama.csv
metadata.csv		metadata.csv

License

ishaanv1206/Decoding-Latent-Attack-Surfaces-in-LLMs-Prompt-Injection-via-HTML-in-Web-Summarization

Folders and files

Latest commit

History

Repository files navigation

Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization

Overview

What’s Inside

Directory Structure

Project Workflow

Key Findings

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages