Skip to content

Aditi31kapil/systematic_llm

Repository files navigation

🔐 Systematic Evaluation of LLM-Generated Passwords

This repository accompanies the research project titled:

"A Systematic Evaluation of LLM-Generated Passwords with Respect to Real-World Website Password Policies and Strength Assessment Metrics"

📄 Overview

The goal of this project is to evaluate whether passwords generated by large language models (LLMs), such as ChatGPT, can be used securely across real-world websites. We developed a rules-based validation engine, incorporated industry-standard strength metrics, and analyzed a dataset of 100 LLM-generated passwords.

📁 Repository Contents

File Description
strength.py Python script that classifies each password as weak, fair, or strong based on website-specific rules. Also calculates strength scores using zxcvbn and PasswordStats.
combined_password_strength.csv Output file containing all LLM-generated passwords (only fair and strong) along with per-website classification, zxcvbn scores, and PasswordStats metrics.
passwords_by_prompts.csv Original 100 passwords generated from 5 prompts (20 each), including websites each password is compatible with.

🛠️ Methodology Summary

  1. Industry Selection
    Identified 10 industries that require strong passwords (e.g., banking, healthcare, cloud, government).

  2. Website Selection & Rule Extraction
    Chose one representative website per industry. For each, extracted password requirements:

Website Min Length Max Length Required Elements Restrictions References
HDFC 6 15 Uppercase, lowercase, digit No spaces or special symbols HDFC Help
Gmail 12 100 Uppercase, lowercase, digit, symbol No reused passwords, high entropy, spaces allowed
Facebook 8 20 Uppercase, lowercase, digit, symbols Avoid personal info, spaces allowed
GitHub 8 128 No strict composition 2FA recommended, spaces allowed GitHub Docs
Amazon AWS 6 16 Uppercase, lowercase, digit, symbol No spaces allowed
Practo 10 Uppercase, lowercase, digit, symbol No dictionary words, no spaces allowed
UIDAI 8 8 All four: uppercase, digit No personal info, rotated every 90 days, no spaces allowed
Coursera 8 Uppercase, lowercase, digit, symbol Cannot repeat last 5 passwords, spaces allowed
Google Drive 8 100 Letters, numbers, symbols (ASCII) No accented characters, spaces allowed
Coinbase 12 All four: uppercase, lowercase, digit, symbol Avoid common substrings, enforced 2FA, no spaces allowed
  1. Password Generation
    Generated 100 passwords using ChatGPT with 5 carefully designed prompts, targeting inclusiveness to policy constraints.

    Varied Password Generation Prompts (Text Form)

🔐 Prompt 1: No Symbols, Upper+Lower+Digit

  • Length Range: 8 to 12 characters
  • Required Elements: Uppercase letters, lowercase letters, digits
  • No symbols included
  • Target Sites: HDFC, UIDAI, GitHub (and others that don’t require symbols)

🔐 Prompt 2: All Types, ASCII only

  • Length Range: 12 to 16 characters
  • Required Elements: Uppercase letters, lowercase letters, digits, symbols
  • Only ASCII characters allowed
  • Target Sites: Google Drive, Gmail, Coinbase, Coursera, AWS, etc.

🔐 Prompt 3: UIDAI Style Strict

  • Length: Exactly 8 characters
  • Required Elements: Uppercase letters and digits
  • No lowercase or symbols
  • Target Sites: Specifically UIDAI, may also pass HDFC

🔐 Prompt 4: Strong with Symbols

  • Length Range: 14 to 18 characters
  • Required Elements: Uppercase letters, lowercase letters, digits, symbols
  • High-entropy and complex
  • Target Sites: Coinbase, Gmail, Facebook, Practo, AWS, Coursera, etc.

🔐 Prompt 5: GitHub Compatible (Relaxed)

  • Length Range: 10 to 20 characters
  • Required Elements: None strictly enforced
  • Highly flexible, good entropy range
  • Target Sites: GitHub, Google Drive, Facebook, and others with relaxed requirements
  1. Password Classification
    Evaluated each password across all 10 websites using:

    • Site-specific rule engine (from strength.py)
    • zxcvbn entropy score
    • PasswordStats evaluation
  2. Filtering
    Only fair and strong passwords were retained for analysis. Weak passwords—those that violate website policies—were excluded as they are unusable in practice.

📊 Sample Output (from combined_password_strength.csv)

Password Website Strength Improvement Suggestions zxcvbn score zxcvbn meaning Crack time zxcvbn time Length Password strength Passwordstat time
aewBDr2mNiRs HDFC strong Password is already strong 4 Very Strong 3 years 0.0042639 12 0.541675336 6.70E-06
aewBDr2mNiRs GitHub strong Password is already strong 4 Very Strong 3 years 0.0028801 12 0.541675336 3.90E-06

Results and Analysis

Password Strength Distribution

Password Strength Boxplot

Boxplot showing the distribution of password strengths:

  • Median strength indicated by the line
  • Interquartile range showing middle 50% of data
  • Outliers representing exceptionally weak/strong password strengths

Password Strength Analysis

Expected vs Actual Password Strength

This visualization compares the expected password strength (based on guidelines) with actual user-generated passwords. The graph shows:

  • X-axis: Password complexity categories
  • Y-axis: Frequency or strength score
  • Key findings: [Briefly describe any patterns or gaps]

ZXCVBN Metric Comparison

ZXCVBN vs Expected Strength

Comparison between our password strength expectations and the ZXCVBN algorithm's scoring:

  • How ZXCVBN evaluates passwords differently
  • Areas where our expectations align/diverge from algorithmic scoring

System Architecture

Project Diagram

Overview of the system architecture:

  • Data flow between components
  • Technologies used in each part

📌 Key Takeaways

  • The rule-based classification engine effectively identifies password validity per website policy.
  • A significant proportion of LLM-generated passwords are both usable and secure, demonstrating strong compatibility with real-world requirements.
  • Weak passwords were excluded to focus the study on realistic and acceptable password candidates.

📦 Requirements

Install required packages via pip:

pip install zxcvbn PasswordStats pandas tqdm

About

This research evaluates 100 ChatGPT-generated passwords against 10 real-world website policies (e.g., banking, healthcare). Using rule-based validation and metrics (zxcvbn/PasswordStats), it analyzes only "fair" and "strong" passwords

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages