-
Notifications
You must be signed in to change notification settings - Fork 9
[New Skill]: Local PII Guardrail (micro-f1-mask) #32
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or requestskill requestRequest for a new capability to be added.Request for a new capability to be added.
Description
Skill Name
security/pii_masker
What should this skill do?
Capability
High-precision, local PII (Personally Identifiable Information) detection and redaction using the micro-f1-mask model. This skill acts as a "Privacy Firewall" at the edge, scrubbing sensitive data before it reaches high-latency cloud models.
Problem Solved
Agentic workflows inherently risk leaking sensitive user data (names, physical addresses, emails, crypto wallets) to external LLM providers (Google, OpenAI, Anthropic). This skill provides a "Privacy Firewall" that executes locally, ensuring data is scrubbed at the source.
Technical Approach
- Inference Engine: Leverages a local Ollama instance.
- Model: Integrates arpahls/micro-f1-mask, a 270M parameter model optimized for high-recall PII detection.
- Efficiency: The model is lightweight ("potato-friendly") enough to run alongside the main agent without significant latency.
- Context Awareness: Unlike regex, it distinguishes between a generic date and a specific date of birth or a passport number based on sentence structure.
Documentation & Standards
To ensure full integration into the Skillware ecosystem, the following are required:
- Reference Card: Create
docs/skills/pii_masker.mddetailing supported entity types (Names, Emails, Crypto Wallets, SSN, etc.) and model performance metrics. - Global Registry: Update
docs/skills/README.mdto include this under thesecuritycategory. - Integration Example: Provide
examples/pii_guardrail_flow.pydemonstrating how to use this skill as a decorator or middleware for an agent's output stream.
Testing & Quality Assurance
- Unit Tests: Must include a test suite in
tests/skills/test_pii_masker.pycovering standard PII patterns and edge cases (e.g., distinguishing between a generic date and a birth date). - Mock Inference: Implement a mock for the Ollama API response to allow CI/CD testing without requiring a local GPU runner.
- Precision Validation: Ensure the
micro-f1-masklogic maintains a high F1 score for sensitive ARPA-specific identifiers (e.g., ENS tags, Internal IDs).
Ideal Inputs & Outputs
Arguments
text(string): The raw, sensitive input string.mode(string, optional): Options:mask(default - e.g.,[PERSON]),redact(e.g.,XXXX), orremove.ollama_url(string, optional): Defaults tohttp://localhost:11434.
JSON Return
{
"sanitized_text": "Hello [PERSON_1], your wallet [CRYPTO_ADDRESS] has been verified.",
"metadata": {
"detected_entities": ["PERSON", "CRYPTO_ADDRESS"],
"entity_count": 2,
"security_level": "local-only",
"model": "arpahls/micro-f1-mask"
}
}
### Targeted Models (if applicable)
Model Agnostic (All)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestskill requestRequest for a new capability to be added.Request for a new capability to be added.