This repository accompanies the research project titled:
"A Systematic Evaluation of LLM-Generated Passwords with Respect to Real-World Website Password Policies and Strength Assessment Metrics"
The goal of this project is to evaluate whether passwords generated by large language models (LLMs), such as ChatGPT, can be used securely across real-world websites. We developed a rules-based validation engine, incorporated industry-standard strength metrics, and analyzed a dataset of 100 LLM-generated passwords.
| File | Description |
|---|---|
strength.py |
Python script that classifies each password as weak, fair, or strong based on website-specific rules. Also calculates strength scores using zxcvbn and PasswordStats. |
combined_password_strength.csv |
Output file containing all LLM-generated passwords (only fair and strong) along with per-website classification, zxcvbn scores, and PasswordStats metrics. |
passwords_by_prompts.csv |
Original 100 passwords generated from 5 prompts (20 each), including websites each password is compatible with. |
-
Industry Selection
Identified 10 industries that require strong passwords (e.g., banking, healthcare, cloud, government). -
Website Selection & Rule Extraction
Chose one representative website per industry. For each, extracted password requirements:
| Website | Min Length | Max Length | Required Elements | Restrictions | References |
|---|---|---|---|---|---|
| HDFC | 6 | 15 | Uppercase, lowercase, digit | No spaces or special symbols | HDFC Help |
| Gmail | 12 | 100 | Uppercase, lowercase, digit, symbol | No reused passwords, high entropy, spaces allowed | |
| 8 | 20 | Uppercase, lowercase, digit, symbols | Avoid personal info, spaces allowed | ||
| GitHub | 8 | 128 | No strict composition | 2FA recommended, spaces allowed | GitHub Docs |
| Amazon AWS | 6 | 16 | Uppercase, lowercase, digit, symbol | No spaces allowed | |
| Practo | 10 | ∞ | Uppercase, lowercase, digit, symbol | No dictionary words, no spaces allowed | |
| UIDAI | 8 | 8 | All four: uppercase, digit | No personal info, rotated every 90 days, no spaces allowed | |
| Coursera | 8 | ∞ | Uppercase, lowercase, digit, symbol | Cannot repeat last 5 passwords, spaces allowed | |
| Google Drive | 8 | 100 | Letters, numbers, symbols (ASCII) | No accented characters, spaces allowed | |
| Coinbase | 12 | ∞ | All four: uppercase, lowercase, digit, symbol | Avoid common substrings, enforced 2FA, no spaces allowed |
- Password Generation
Generated 100 passwords using ChatGPT with 5 carefully designed prompts, targeting inclusiveness to policy constraints.
- Length Range: 8 to 12 characters
- Required Elements: Uppercase letters, lowercase letters, digits
- No symbols included
- Target Sites: HDFC, UIDAI, GitHub (and others that don’t require symbols)
- Length Range: 12 to 16 characters
- Required Elements: Uppercase letters, lowercase letters, digits, symbols
- Only ASCII characters allowed
- Target Sites: Google Drive, Gmail, Coinbase, Coursera, AWS, etc.
- Length: Exactly 8 characters
- Required Elements: Uppercase letters and digits
- No lowercase or symbols
- Target Sites: Specifically UIDAI, may also pass HDFC
- Length Range: 14 to 18 characters
- Required Elements: Uppercase letters, lowercase letters, digits, symbols
- High-entropy and complex
- Target Sites: Coinbase, Gmail, Facebook, Practo, AWS, Coursera, etc.
- Length Range: 10 to 20 characters
- Required Elements: None strictly enforced
- Highly flexible, good entropy range
- Target Sites: GitHub, Google Drive, Facebook, and others with relaxed requirements
-
Password Classification
Evaluated each password across all 10 websites using:- Site-specific rule engine (from
strength.py) - zxcvbn entropy score
- PasswordStats evaluation
- Site-specific rule engine (from
-
Filtering
Only fair and strong passwords were retained for analysis. Weak passwords—those that violate website policies—were excluded as they are unusable in practice.
| Password | Website | Strength | Improvement Suggestions | zxcvbn score | zxcvbn meaning | Crack time | zxcvbn time | Length | Password strength | Passwordstat time |
|---|---|---|---|---|---|---|---|---|---|---|
| aewBDr2mNiRs | HDFC | strong | Password is already strong | 4 | Very Strong | 3 years | 0.0042639 | 12 | 0.541675336 | 6.70E-06 |
| aewBDr2mNiRs | GitHub | strong | Password is already strong | 4 | Very Strong | 3 years | 0.0028801 | 12 | 0.541675336 | 3.90E-06 |
Boxplot showing the distribution of password strengths:
- Median strength indicated by the line
- Interquartile range showing middle 50% of data
- Outliers representing exceptionally weak/strong password strengths
This visualization compares the expected password strength (based on guidelines) with actual user-generated passwords. The graph shows:
- X-axis: Password complexity categories
- Y-axis: Frequency or strength score
- Key findings: [Briefly describe any patterns or gaps]
Comparison between our password strength expectations and the ZXCVBN algorithm's scoring:
- How ZXCVBN evaluates passwords differently
- Areas where our expectations align/diverge from algorithmic scoring
Overview of the system architecture:
- Data flow between components
- Technologies used in each part
- The rule-based classification engine effectively identifies password validity per website policy.
- A significant proportion of LLM-generated passwords are both usable and secure, demonstrating strong compatibility with real-world requirements.
- Weak passwords were excluded to focus the study on realistic and acceptable password candidates.
Install required packages via pip:
pip install zxcvbn PasswordStats pandas tqdm


