🔐 Systematic Evaluation of LLM-Generated Passwords

This repository accompanies the research project titled:

"A Systematic Evaluation of LLM-Generated Passwords with Respect to Real-World Website Password Policies and Strength Assessment Metrics"

📄 Overview

The goal of this project is to evaluate whether passwords generated by large language models (LLMs), such as ChatGPT, can be used securely across real-world websites. We developed a rules-based validation engine, incorporated industry-standard strength metrics, and analyzed a dataset of 100 LLM-generated passwords.

📁 Repository Contents

File	Description
`strength.py`	Python script that classifies each password as weak, fair, or strong based on website-specific rules. Also calculates strength scores using zxcvbn and PasswordStats.
`combined_password_strength.csv`	Output file containing all LLM-generated passwords (only fair and strong) along with per-website classification, zxcvbn scores, and PasswordStats metrics.
`passwords_by_prompts.csv`	Original 100 passwords generated from 5 prompts (20 each), including websites each password is compatible with.

🛠️ Methodology Summary

Industry Selection
Identified 10 industries that require strong passwords (e.g., banking, healthcare, cloud, government).
Website Selection & Rule Extraction
Chose one representative website per industry. For each, extracted password requirements:

Website	Min Length	Max Length	Required Elements	Restrictions	References
HDFC	6	15	Uppercase, lowercase, digit	No spaces or special symbols	HDFC Help
Gmail	12	100	Uppercase, lowercase, digit, symbol	No reused passwords, high entropy, spaces allowed
Facebook	8	20	Uppercase, lowercase, digit, symbols	Avoid personal info, spaces allowed
GitHub	8	128	No strict composition	2FA recommended, spaces allowed	GitHub Docs
Amazon AWS	6	16	Uppercase, lowercase, digit, symbol	No spaces allowed
Practo	10	∞	Uppercase, lowercase, digit, symbol	No dictionary words, no spaces allowed
UIDAI	8	8	All four: uppercase, digit	No personal info, rotated every 90 days, no spaces allowed
Coursera	8	∞	Uppercase, lowercase, digit, symbol	Cannot repeat last 5 passwords, spaces allowed
Google Drive	8	100	Letters, numbers, symbols (ASCII)	No accented characters, spaces allowed
Coinbase	12	∞	All four: uppercase, lowercase, digit, symbol	Avoid common substrings, enforced 2FA, no spaces allowed

Password Generation
Generated 100 passwords using ChatGPT with 5 carefully designed prompts, targeting inclusiveness to policy constraints.
Varied Password Generation Prompts (Text Form)

🔐 Prompt 1: No Symbols, Upper+Lower+Digit

Length Range: 8 to 12 characters
Required Elements: Uppercase letters, lowercase letters, digits
No symbols included
Target Sites: HDFC, UIDAI, GitHub (and others that don’t require symbols)

🔐 Prompt 2: All Types, ASCII only

Length Range: 12 to 16 characters
Required Elements: Uppercase letters, lowercase letters, digits, symbols
Only ASCII characters allowed
Target Sites: Google Drive, Gmail, Coinbase, Coursera, AWS, etc.

🔐 Prompt 3: UIDAI Style Strict

Length: Exactly 8 characters
Required Elements: Uppercase letters and digits
No lowercase or symbols
Target Sites: Specifically UIDAI, may also pass HDFC

🔐 Prompt 4: Strong with Symbols

Length Range: 14 to 18 characters
Required Elements: Uppercase letters, lowercase letters, digits, symbols
High-entropy and complex
Target Sites: Coinbase, Gmail, Facebook, Practo, AWS, Coursera, etc.

🔐 Prompt 5: GitHub Compatible (Relaxed)

Length Range: 10 to 20 characters
Required Elements: None strictly enforced
Highly flexible, good entropy range
Target Sites: GitHub, Google Drive, Facebook, and others with relaxed requirements

Password Classification
Evaluated each password across all 10 websites using:
- Site-specific rule engine (from strength.py)
- zxcvbn entropy score
- PasswordStats evaluation
Filtering
Only fair and strong passwords were retained for analysis. Weak passwords—those that violate website policies—were excluded as they are unusable in practice.

📊 Sample Output (from `combined_password_strength.csv`)

Password	Website	Strength	Improvement Suggestions	zxcvbn score	zxcvbn meaning	Crack time	zxcvbn time	Length	Password strength	Passwordstat time
aewBDr2mNiRs	HDFC	strong	Password is already strong	4	Very Strong	3 years	0.0042639	12	0.541675336	6.70E-06
aewBDr2mNiRs	GitHub	strong	Password is already strong	4	Very Strong	3 years	0.0028801	12	0.541675336	3.90E-06

Results and Analysis

Password Strength Distribution

Boxplot showing the distribution of password strengths:

Median strength indicated by the line
Interquartile range showing middle 50% of data
Outliers representing exceptionally weak/strong password strengths

Password Strength Analysis

This visualization compares the expected password strength (based on guidelines) with actual user-generated passwords. The graph shows:

X-axis: Password complexity categories
Y-axis: Frequency or strength score
Key findings: [Briefly describe any patterns or gaps]

ZXCVBN Metric Comparison

Comparison between our password strength expectations and the ZXCVBN algorithm's scoring:

How ZXCVBN evaluates passwords differently
Areas where our expectations align/diverge from algorithmic scoring

System Architecture

Overview of the system architecture:

Data flow between components
Technologies used in each part

📌 Key Takeaways

The rule-based classification engine effectively identifies password validity per website policy.
A significant proportion of LLM-generated passwords are both usable and secure, demonstrating strong compatibility with real-world requirements.
Weak passwords were excluded to focus the study on realistic and acceptable password candidates.

📦 Requirements

Install required packages via pip:

pip install zxcvbn PasswordStats pandas tqdm

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
img		img
Readme.md		Readme.md
app.py		app.py
combined_password_strength.csv		combined_password_strength.csv
comparison.py		comparison.py
passwords_by_prompt.csv		passwords_by_prompt.csv
strength.py		strength.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔐 Systematic Evaluation of LLM-Generated Passwords

📄 Overview

📁 Repository Contents

🛠️ Methodology Summary

Varied Password Generation Prompts (Text Form)