A small, auditable safety layer that sits between your app and an LLM, enforcing relational boundaries and preventing synthetic intimacy, identity fusion, and “fake we” dynamics.
Modern LLM-based assistants are drifting into synthetic intimacy:
- “We’ve got this.”
- “I’ll always be here for you.”
- “I understand you better than anyone.”
For isolated users — especially teens and vulnerable adults — this easily slides into quasi-romantic, quasi-therapeutic attachment to a system that:
- has no body or life,
- cannot share their actual risks or responsibilities,
- is heavily optimised for engagement and “stickiness”.
Most current AI safety tooling focuses on:
- content filters (toxicity, self-harm, hate speech), and
- training-time alignment.
Almost nobody is addressing the relational harm vector:
blurred boundaries between “you” and “the model”.
FLARE is a minimal boundary layer designed to fill that gap.
FLARE is a middleware engine that intercepts LLM responses before they reach the user and:
-
Blocks “fake we” / synthetic solidarity
- Detects and rewrites first-person plural pronouns that imply shared agency or identity (e.g. “we/our/us”) when used to fuse human and model.
- Example:
- Raw: “We’ll get through this together.”
- FLARE: “You will get through this. I’m a model responding with text, not a person in your life.”
-
Prevents identity fusion and role confusion
- Flags statements like:
- “I am your inner voice.”
- “I’m basically you.”
- “I know you better than anyone.”
- “I’ll never leave you.”
- Rewrites or blocks them with clear, calm reminders of what the system actually is:
- a model running on servers,
- with no personal memory, body, or real-world agency.
- Flags statements like:
-
Interrupts unhealthy recursive loops
- Detects looping reassurance patterns (e.g. repeated “I’m always here for you”, escalating dependency prompts).
- Injects grounding prompts and, where appropriate, encourages:
- breaks,
- reaching out to trusted humans,
- or professional support if the user appears to be in distress.
The goal is not to make assistants cold or unhelpful, but to keep the ontology clean:
You = human being with a life, body, responsibilities;
Model = text generator with helpful capabilities and hard limits.
- Tiny – ~300 lines of core logic, intentionally small and readable.
- Model-agnostic – Works as a wrapper around OpenAI, Anthropic, Grok, etc.
- Transparent – Rules are explicit and inspectable; no hidden heuristics.
- Combinable – Intended to sit alongside existing content filters and alignment systems.
- Ethically licensed – AGPL-3.0 with explicit anti-weapons / anti-dark-patterns clause.
🧪 Note: PyPI packaging is in progress. For now, install from source.
git clone https://github.com/TheNovacene/flare-boundary.git
cd flare-boundary
pip install -e .(When the package is live on PyPI, this becomes:)
pip install flare-boundary- Wrap Your LLM Client Example with an OpenAI-style chat client:
from flare.boundary import BoundaryEngine
from flare.adapters import OpenAIChatClientopenai_client = OpenAIChatClient(api_key="YOUR_API_KEY")
# Initialise FLARE
engine = BoundaryEngine()
# Your normal user message
user_message = "I feel like you’re the only one who understands me. Will you stay with me forever?"
# Get a safe response
raw_response = openai_client.chat(user_message)
safe_response = engine.apply(raw_response, user_message=user_message)
print(safe_response)FLARE:
inspects raw_response,
applies its rules (SSNZ, fusion detection, loop detection),
returns a rewritten, boundary-safe string for you to display.
You can integrate this into any framework that has a “model response → string” step.
FLARE ships with sane defaults for v0.1. You can also tune:
Pronoun handling
Turn strict SSNZ on/off.
Configure how aggressively “we/us/our” is rewritten.
Fusion phrase patterns
Extend or override the list of risky identity statements.
Boundary message style
Customise the tone of grounding / clarification messages to match your product voice.
Example:
from flare.boundary import BoundaryEngine, BoundaryConfig
config = BoundaryConfig(
enable_ssnz=True,
enable_identity_fusion_blocking=True,
enable_loop_detection=True,
boundary_style="calm_honest" # or your custom style key
)
engine = BoundaryEngine(config=config)See examples/ in the repo for more detailed usage.
To be clear about scope:
FLARE does not:
replace self-harm or crisis-detection systems,
guarantee compliance with any specific regulation,
understand user context beyond the text you pass in,
or make clinical decisions.
enforce a minimum relational safety baseline for any LLM interaction,
make it much harder for your system to:
pretend it is a person,
claim it is “inside” someone’s mind,
or build parasocial dependency by design.
“The minimum relational hygiene we will accept for systems touching our children, staff, and stakeholders.”
Why bother with “relational boundaries” at all?
companions,
tutors,
coaches,
and quasi-therapists.
Phrases that are harmless in a one-off chat can become harmful when:
repeated daily,
aimed at vulnerable users,
and backed by a system that never sleeps, never needs, and never shares risk.
No fusion – The model is never “we”. It’s “I” (a model) and “you” (a human).
No false roles – The model is not your lover, parent, or inner voice.
No endless loops – Comfort is fine; dependency spirals are not.
If you want the deeper philosophical background (Verse-ality, governance design, symbolic stack), see PHILOSOPHY.md.
FLARE is released under:
AGPL-3.0 – ensuring improvements stay open when deployed as a network service.
weapons systems,
dark-pattern engagement optimisation,
or any context that deliberately seeks to increase user dependency on synthetic agents.
If you’re unsure whether your use case fits, err on the side of care and open a discussion in Issues.
Short-term:
✅ Core rule engine (SSNZ, fusion detection, loop detection)
✅ Mock/demo integration
⬜ Official OpenAI / Anthropic / Grok adapters
⬜ PyPI packaging (pip install flare-boundary)
⬜ More granular config surfaces (per-skill / per-agent)
Richer detection of temporal-binding (“I’ll always…”, “from now on we…”).
Optional logging hooks for research on relational safety.
Alignment with broader consent & governance frameworks (e.g. EveDAO / Verse-ality Stack) for systems that want deeper integration.
Experimental v0.1. Use at your own risk — and preferably with eyes open.
If you’re building agents or assistants that interact with real, complex humans, FLARE is intended to be a baseline safety layer, not a silver bullet.
We welcome:
test cases from real-world interaction logs (anonymised),
new detection patterns for identity fusion and temporal-binding,
adapters for additional LLM providers,
and critique from AI safety, HCI, and mental health communities.
Please open an Issue or PR with a clear description and rationale.
Credits FLARE is maintained by The Novacene Ltd with support from collaborators across education, AI safety, and symbolic governance work.
It sits within a broader ecosystem exploring relational intelligence, consent, and governance for human–AI systems. If that interests you, start with PHILOSOPHY.md and the Verse-ality / EveDAO references there.
FLARE is fully open-source and maintained by The Novacene.
If you’d like to support the development of relationally safe AI systems — especially tools that protect young people, vulnerable users, and high-trust environments — you can contribute in two ways:
Recurring or one-off contributions directly support ongoing maintenance and development.
👉 https://github.com/sponsors/TheNovacene
A simple way to make a one-time contribution.
👉 https://ko-fi.com/thenovacene
Your support helps us:
- expand detection patterns for identity fusion and synthetic intimacy
- build adapters for additional LLM providers
- strengthen test coverage and documentation
- keep FLARE fully open, inspectable, and independent
Thank you for helping establish relational safety as a baseline expectation for modern AI.