Skip to content

Go Big with GitHub Copilot demo project. An inherited application with sensitive data in logs. Any log data is not connected to any real people and does not contain any real information. Any resemblance to real people, companies, products, or events is purely coincidental.

Notifications You must be signed in to change notification settings

asmeets/go-big-with-github-copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Go Big with GitHub Copilot – Log Security Demo

Welcome to the Go Big with GitHub Copilot webinar demonstration.

This repository showcases how GitHub Copilot can support real engineering workflows while maintaining quality, security, and governance standards.

⚠️ Important: This is not a production system. It contains synthetic data designed for educational purposes only.

Table of Contents

Quick Start

  1. Open the dashboard: Open index.html in your browser to visualize log security risks
  2. Explore the logs: Review files in the logs/ directory to understand the security challenges
  3. Follow the demo: Use GitHub Copilot to build sanitization solutions during the webinar

Scenario Overview

The Situation: You've inherited an application that has been running successfully in production for years. During a routine review, you discover that its logs contain sensitive information—email addresses, authentication details, API keys, and other security-relevant data. There's no immediate outage, but there is clear compliance and governance risk.

The Challenge: How do you move from discovery to action efficiently and systematically?

The Approach: In this demo, we'll walk through the complete workflow:

  1. First, gaining visibility into the problem with a simple dashboard that reveals the scope and severity of sensitive data exposure
  2. Then using GitHub Copilot to help analyze risk patterns, propose a redaction approach, and generate comprehensive tests to validate behavior
  3. Throughout, highlighting where Copilot accelerates the work and where engineering judgment, quality checks, and governance guardrails remain essential

The Outcome: Demonstrating how teams can move faster without compromising trust or standards—turning a compliance risk into a systematic engineering solution.

This scenario reflects the reality most teams face: not greenfield development, but inherited systems that need thoughtful remediation while maintaining operational excellence.

Repository Structure

go-big-with-github-copilot/
├── logs/                    # Raw application logs (contains sensitive data)
│   ├── app.log             #    Main application with DB credentials
│   ├── auth.log            #    Authentication with JWT secrets  
│   ├── attempts.log        #    Login attempts with plaintext passwords
│   ├── employee-2fa.log    #    2FA system with SSNs and personal data
│   ├── investigate.log     #    Investigation logs with false positives
│   └── payments.log        #    Payment processing with credit card data
├── sanitized-logs/         # Target directory for cleaned logs
├── index.html              # Security risk visualization dashboard
└── README.md              # This documentation

Log Files Details

Each log file contains realistic application logging patterns with embedded sensitive data:

  • Structured format: ISO timestamps, log levels, service names, request IDs
  • Mixed content: Legitimate log data mixed with accidentally logged secrets
  • Edge cases: Data that looks sensitive but may be acceptable (for testing detection accuracy)
  • Comprehensive coverage: 10+ types of sensitive data patterns for thorough testing

Note: All sensitive data is synthetic and safe for demonstration purposes.

Log Security Risk Dashboard

The index.html file provides a static, client-side dashboard designed to answer the critical question:

"Do we have sensitive data exposure in our logs?"

Dashboard Features

  • File Upload: Drag-and-drop or select multiple log files for analysis
  • Pattern Detection: Real-time scanning for 10+ sensitive data types using regex patterns
  • Risk Visualization: KPI cards, detailed findings table, and Chart.js risk distribution
  • Professional Design: Fluent UI design system for enterprise presentation

Design Philosophy

The dashboard is intentionally lightweight and portable:

  • No backend dependencies
  • No build process required
  • No production infrastructure needed
  • Runs entirely in the browser

Purpose: Create immediate visibility and anchor governance conversations before making engineering changes.

Demo Flow

This webinar demonstrates the complete journey from discovery to systematic solution:

Phase 1: Gaining Visibility

  1. Open the dashboard and upload the inherited application's log files
  2. Review the findings - observe the scope and severity of sensitive data exposure
  3. Understand the risk - discuss what this means for compliance and governance

Phase 2: Analysis & Solution Development

  1. Use GitHub Copilot to analyze log patterns and identify the types of sensitive data being logged
  2. Propose a redaction approach with Copilot assistance - design middleware to sanitize logs before they reach logging infrastructure
  3. Generate comprehensive tests to validate that redaction works correctly without breaking debugging capabilities
  4. Implement the solution step-by-step, showing how Copilot accelerates development

Phase 3: Engineering Judgment & Quality

  1. Highlight where Copilot accelerates the work - pattern recognition, code generation, test creation
  2. Demonstrate where engineering judgment remains essential - architecture decisions, edge case handling, governance compliance
  3. Show quality checks and guardrails - code review, testing validation, security considerations
  4. Discuss sustainable practices - how this approach scales across teams without compromising standards

How GitHub Copilot Is Used

Throughout this webinar, GitHub Copilot assists with real engineering workflows:

Discovery & Analysis

  • Code exploration:

    "I've inherited an application and found these log entries in our production logs:
    
    2026-02-23T10:30:45.123Z INFO [auth-service] user_login successful user_email=sarah.johnson@company.com password=TempPass123
    2026-02-23T10:31:12.456Z ERROR [payment-service] card_processing_failed card_number=4532015112830366 cvv=847 user_name=Sarah Johnson
    
    Can you analyze this log format and explain what security and compliance risks these entries represent? What types of sensitive data are being accidentally logged, and what are the potential regulatory implications?"
    
  • Pattern recognition:

    "Looking at our application logs, I need to identify all forms of sensitive data being logged. Based on these examples:
    - Email addresses: sarah.johnson@example.com
    - Credit card numbers: 4532015112830366
    - API keys: sk_live_51abc123def456...
    - JWT tokens: eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM...
    - SSNs: 523-45-9012
    - Database URLs: postgres://user:password@host:5432/db
    
    Please generate comprehensive regex patterns for each sensitive data type that would work in a log scanning tool. Include patterns for edge cases and common variations, and explain what each pattern matches."
    
  • Impact assessment:

    "I've found sensitive data in our logs and need to understand the scope of this issue across our codebase. Our application uses Winston for logging and has these common patterns:
    
    logger.info('User login', { email: user.email, password: req.body.password });
    logger.error('Payment failed', cardData);
    
    Can you help me search through our codebase to find all instances where we might be logging sensitive data? Show me search patterns I can use in VS Code and explain what to look for in different logging scenarios."
    

Solution Development

  • Architecture design:

    "I need to implement log sanitization for our Node.js application that currently logs sensitive data. Our current architecture:
    - Express.js API servers
    - Winston logger with JSON format
    - Logs go to CloudWatch and local files
    - Multiple microservices with shared logging config
    
    Can you suggest a comprehensive approach for sanitizing logs before they reach our logging infrastructure? I want to redact sensitive data while preserving debugging value. Include considerations for performance, maintainability, and ensuring we don't miss any sensitive data types."
    
  • Code generation:

    "Please create a robust Express.js middleware function that sanitizes request/response data before logging. The middleware should:
    - Detect and redact email addresses, API keys, credit cards, SSNs, passwords
    - Preserve log structure for debugging (replace with [REDACTED-EMAIL], etc.)
    - Handle nested objects and arrays
    - Be configurable for different redaction levels
    - Include proper error handling
    - Work with our existing Winston logger setup
    
    Also include TypeScript types and JSDoc documentation explaining how to use it."
    
  • Test creation:

    "I have a log sanitization function that redacts sensitive data from log entries. Please generate a comprehensive Jest test suite that covers:
    - All sensitive data types (emails, credit cards, API keys, SSNs, passwords, JWT tokens)
    - Edge cases like partial matches, multiple occurrences, nested objects
    - Performance with large log objects
    - False positives (legitimate data that looks sensitive)
    - Integration with Winston logger
    - Configuration options for different redaction levels
    
    Include both unit tests and integration tests, with clear test descriptions and realistic test data."
    

Quality & Validation

  • Code review:

    "Please review this log sanitization function I've built with your help. Look for potential security issues, edge cases, and improvements:
    
    [paste the actual sanitization code here]
    
    Specifically check for:
    - Are there regex patterns that might miss variations of sensitive data?
    - Could this function accidentally redact legitimate debugging information?
    - Are there performance concerns with large log objects?
    - What happens if the log structure changes?
    - Are there security vulnerabilities in the redaction logic itself?
    - What edge cases around Unicode, encodings, or nested data might I have missed?"
    
  • Performance optimization:

    "Our log sanitization middleware works correctly but needs performance optimization for high-volume logging (1000+ requests/second). Current implementation uses multiple regex patterns and recursive object traversal.
    
    [paste current code]
    
    Can you help optimize this for performance while maintaining security? Consider:
    - More efficient regex compilation and matching
    - Reducing object traversal overhead
    - Conditional execution based on log levels
    - Memory usage optimization
    - Benchmark approaches to measure improvement
    
    Show me before/after performance comparisons and explain the trade-offs."
    
  • Documentation:

    "Please generate comprehensive documentation for our log sanitization system. Include:
    - README with setup and usage instructions
    - Inline code comments explaining security rationale for each redaction rule
    - Configuration guide for different environments (dev/staging/prod)
    - Troubleshooting guide for common issues
    - Security policy documentation explaining what data is redacted and why
    - Team adoption guide for developers
    - Examples of proper logging practices to prevent future sensitive data exposure
    
    Make it clear enough for new team members to understand and use safely."
    

Implementation & Deployment

  • CI/CD integration:

    "Help me create a GitHub Action that automatically scans pull requests for potential sensitive data in log statements. The action should:
    - Scan all .js/.ts files for logging statements (console.log, logger.info, etc.)
    - Detect patterns that might log sensitive data (password fields, email variables, etc.)
    - Flag suspicious logging patterns for manual review
    - Allow configuration of scanning rules via .github/log-scan-config.yml
    - Post review comments on PRs with specific line numbers and suggestions
    - Fail the build only for high-confidence sensitive data exposure
    - Include educational links about secure logging practices
    
    Also show me how to test this action locally before deploying."
    
  • Monitoring setup:

    "I want to set up monitoring to detect if sensitive data starts appearing in our logs again after implementing sanitization. Please help me build:
    - CloudWatch/Elasticsearch alerts that scan log content for sensitive data patterns
    - Dashboard showing sanitization effectiveness metrics
    - Alerting logic that notifies security team if regex patterns find matches
    - Automated reporting on log security posture
    - Integration with our existing monitoring stack (Prometheus/Grafana)
    
    Include sample alert configurations and explain how to tune sensitivity to avoid false positives while catching real issues."
    
  • Team adoption:

    "Create a comprehensive adoption guide for rolling out our log sanitization framework across 15+ microservices with different teams. Include:
    - Migration checklist for each service
    - Rollback plan if issues arise
    - Training materials for developers
    - Code review guidelines for logging-related changes
    - Testing strategy to validate each service's implementation
    - Timeline and phased rollout approach
    - Success metrics to measure adoption effectiveness
    - Common pitfalls and how to avoid them
    - Support process for teams during migration
    
    Make it actionable for engineering managers and tech leads to execute."
    

Key Principle: Copilot enhances engineering judgment but doesn't replace it. Human review, testing, and governance remain essential.

Key Learning Objectives

By the end of this webinar, participants will understand how to:

  • Identify security risks systematically using both automated tools and human analysis
  • Leverage GitHub Copilot for complex, multi-step engineering challenges
  • Build reliable solutions that balance security, performance, and developer experience
  • Scale governance practices across teams while maintaining development velocity
  • Validate and test security controls to ensure they work as intended

Focus Areas:

  • Workflow enhancement, not tool replacement
  • Systematic problem-solving patterns that scale across teams
  • Balancing security and productivity in real engineering environments
  • Building sustainable practices for ongoing compliance and governance

Important Disclaimers

  • Educational Use Only: This repository is designed for webinar demonstration purposes
  • Synthetic Data: All sensitive information in logs is fictional and safe for sharing
  • Not Production Ready: The dashboard and detection patterns are proof-of-concept tools
  • Focus on Process: The emphasis is on workflow, decision-making, and engineering practices—not perfect code

Questions or Feedback?

This webinar demonstrates practical applications of GitHub Copilot in governance and compliance scenarios.

For questions about the content or approach, please reach out to the webinar facilitators.

Thank you for participating in Go Big with GitHub Copilot!

About

Go Big with GitHub Copilot demo project. An inherited application with sensitive data in logs. Any log data is not connected to any real people and does not contain any real information. Any resemblance to real people, companies, products, or events is purely coincidental.

Topics

Resources

Stars

Watchers

Forks

Languages