Skip to content

[New Skill]: Local PII Guardrail (micro-f1-mask) #32

@rosspeili

Description

@rosspeili

Skill Name

security/pii_masker

What should this skill do?

Capability

High-precision, local PII (Personally Identifiable Information) detection and redaction using the micro-f1-mask model. This skill acts as a "Privacy Firewall" at the edge, scrubbing sensitive data before it reaches high-latency cloud models.

Problem Solved

Agentic workflows inherently risk leaking sensitive user data (names, physical addresses, emails, crypto wallets) to external LLM providers (Google, OpenAI, Anthropic). This skill provides a "Privacy Firewall" that executes locally, ensuring data is scrubbed at the source.

Technical Approach

  • Inference Engine: Leverages a local Ollama instance.
  • Model: Integrates arpahls/micro-f1-mask, a 270M parameter model optimized for high-recall PII detection.
  • Efficiency: The model is lightweight ("potato-friendly") enough to run alongside the main agent without significant latency.
  • Context Awareness: Unlike regex, it distinguishes between a generic date and a specific date of birth or a passport number based on sentence structure.

Documentation & Standards

To ensure full integration into the Skillware ecosystem, the following are required:

  1. Reference Card: Create docs/skills/pii_masker.md detailing supported entity types (Names, Emails, Crypto Wallets, SSN, etc.) and model performance metrics.
  2. Global Registry: Update docs/skills/README.md to include this under the security category.
  3. Integration Example: Provide examples/pii_guardrail_flow.py demonstrating how to use this skill as a decorator or middleware for an agent's output stream.

Testing & Quality Assurance

  • Unit Tests: Must include a test suite in tests/skills/test_pii_masker.py covering standard PII patterns and edge cases (e.g., distinguishing between a generic date and a birth date).
  • Mock Inference: Implement a mock for the Ollama API response to allow CI/CD testing without requiring a local GPU runner.
  • Precision Validation: Ensure the micro-f1-mask logic maintains a high F1 score for sensitive ARPA-specific identifiers (e.g., ENS tags, Internal IDs).

Ideal Inputs & Outputs

Arguments

  • text (string): The raw, sensitive input string.
  • mode (string, optional): Options: mask (default - e.g., [PERSON]), redact (e.g., XXXX), or remove.
  • ollama_url (string, optional): Defaults to http://localhost:11434.

JSON Return

{
  "sanitized_text": "Hello [PERSON_1], your wallet [CRYPTO_ADDRESS] has been verified.",
  "metadata": {
    "detected_entities": ["PERSON", "CRYPTO_ADDRESS"],
    "entity_count": 2,
    "security_level": "local-only",
    "model": "arpahls/micro-f1-mask"
  }
}

### Targeted Models (if applicable)

Model Agnostic (All)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestskill requestRequest for a new capability to be added.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions