Synaptic Chain‑of‑Thought (SyCoT) - Preview

Synaptic Chain‑of‑Thought (SyCoT) - Preview

A Flask web app showcasing a “self-RAG” memory for RLM (Reasoning Language Model) thinking. It archives task-conclusion pairs, retrieves them via semantic similarity and thematic clustering, and enhances answer consistency across sessions. This preview demonstrates the core concept through a user interface.

Why this exists

Modern reasoning models produce long chains of thought (CoT) to solve problems. These CoT traces generate many intermediate “reasoning tokens” that drive the model toward a final answer. In most setups, those reasoning tokens are ephemeral: once the answer is produced, they are discarded. This creates three issues:

Latency and cost: longer CoT → more tokens → higher inference time and compute.
Waste: the work done to reach a correct conclusion is thrown away, even if a very similar task appears later.
Inconsistency: without persistent memory, the same or related tasks can yield different conclusions across sessions.

SyCoT turns that discarded effort into a reusable asset. Instead of saving the raw CoT tokens, SyCoT persistently archives the conclusion along with a sentence embedding of the task. When a new task arrives, SyCoT:

Reuses a high‑similarity conclusion instantly (no model call), or
Primes the model with concise, related conclusions retrieved by similarity and by thematic clustering.

This keeps CoT benefits while avoiding repeated long reasoning. Result: faster, cheaper, more consistent answers, with a small, transparent memory that improves over time.

How it works

Request lifecycle:

Input and session: The browser UI posts your task to /chat.
Decision gate: A lightweight DecisionEngine checks if memory should be used (e.g., questions, short prompts, or follow‑ups → use memory).
Embedding and retrieval: The task is embedded with all‑MiniLM‑L6‑v2 and compared against the compressed on‑disk archive (archive.json.gz).
- If a very similar task exists, SyCoT returns its archived conclusion immediately with near‑zero latency.
- Otherwise, SyCoT collects moderately similar conclusions to use as context.
Thematic clustering: K‑Means groups tasks into themes. SyCoT also adds conclusions from the nearest cluster to broaden context.
Prompt assembly: A concise prompt is built from these prior conclusions plus your current task. Coding mode adjusts thresholds and prompt tone.
Model call: The prompt is sent to the NVIDIA Nemotron model via an OpenAI‑compatible API.
Archive and respond: If a final conclusion is detected and auto‑archive is enabled, it is saved to the archive for future reuse, and a polished answer is returned to the UI.

Conceptually, this mimics synaptic consolidation: conclusions are encoded, strengthened by recurrence (similarity), and organised into themes (clusters) that support transfer and generalisation.

Positioning

Not a new CoT algorithm: SyCoT does not alter a model’s intrinsic chain‑of‑thought algorithm or decoding. It adds a dynamic memory and retrieval layer that supports the reasoning process.
Context awareness: Retrieves similar and thematic conclusions (via clustering) to ground responses.
Efficiency: On high‑similarity matches, returns conclusions instantly (no model call); otherwise seeds the prompt with concise prior conclusions.

User experience

Instant replies on repeats: On high‑similarity matches, SyCoT returns the archived conclusion immediately (no model call).
Lower latency and token use: Related tasks reuse prior conclusions or short context instead of long CoT.
More consistent answers: Memory reduces drift across sessions.
Coding mode: Simple toggle that tunes thresholds and prompt tone for code tasks.

How SyCoT compares to other approaches

Faithful Chain‑of‑Thought Reasoning (arXiv 2301.13379)
- Emphasises faithfulness via symbolic chains and deterministic solvers.
- SyCoT is complementary: memory improves consistency/efficiency; symbolic layers can be added for verification when needed.
A‑Thought: Efficient Reasoning via Bidirectional Compression for Low‑Resource Settings (arXiv 2505.24550)
- Compresses/searches reasoning paths for efficiency in constrained budgets.
- SyCoT reduces repeated search by reusing conclusions and providing thematic retrieval for related intents.
Resource‑Budgeted Adaptive CoT (arXiv 2505.11896 PDF)
- Adapts CoT depth/length under token/latency budgets with early‑exit style policies.
- SyCoT is complementary: its memory layer can return instant, zero‑token answers for recurrent queries and only fall back to adaptive CoT when novel reasoning is required.

SyCoT can be used alongside other approaches to maximise efficiency or accuracy by leveraging its memory layer for instant answers on recurring tasks while integrating with adaptive or symbolic methods for novel problems.

In short, many newer methods optimise whether/how much to think now; SyCoT additionally optimises by remembering and reusing what has already been thought, yielding consistent answers, lower latency, and reduced compute for recurring or thematically related tasks.

What’s in this preview

preview.py: Flask server and JSON endpoints
SyCoT.py: core archive, retrieval, clustering, and model calls
decision_engine.py: heuristics for when to use memory
templates/index.html: bare UI
static/script.js, static/style.css: simple chat frontend
requirements.txt, .gitignore, README.md

Features

Persistent memory: Compressed JSON archive on disk with deduplication by semantic similarity
Hybrid retrieval: Direct “same task” reuse + related examples via similarity and clustering
One‑click reset: Clear archive from the UI (or POST /clear-archive)
Session‑aware coding mode: Persisted per browser session
Lightweight stack: No database required
Well‑commented codebase: Clear comments to facilitate quick understanding and easy modification

Architecture

High‑level data flow:

Components:

UI (HTML/JS) ↔ Flask endpoints (/, /chat, /clear-archive)
Archive store: archive.json.gz (plus archive_usage.log)
Embeddings: sentence-transformers (all-MiniLM-L6-v2)
Clustering: scikit-learn KMeans
Reasoning model: NVIDIA Nemotron via OpenAI‑compatible API (chat.completions)

Endpoints

GET / - serves the minimal chat UI

POST /chat - body:

{
  "task": "string",
  "is_coding_mode": true,
  "auto_archive": true
}

response:

{
  "thinking": "string",
  "conclusion": "string",
  "is_coding_mode": true
}

POST /clear-archive - clears the on‑disk archive and resets clustering

Requirements

Python 3.10+
Windows (tested), should work elsewhere with minor changes
Internet access for the NVIDIA model API

Setup (Windows / PowerShell)

Create and activate a virtual environment

python -m venv .venv
.\.venv\Scripts\Activate

Install dependencies

pip install -r requirements.txt

Configure the API

NVIDIA API key (Nemotron via OpenAI‑compatible endpoint)
- Go to https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1, click "View Code" and generate an API key.
- Open SyCoT.py and replace INSERT_API_KEY_HERE with your key

Run the preview app

python preview.py

Open http://127.0.0.1:5000 or http://localhost:5000 in your browser.

Using the UI

Type a prompt (or paste some code)
Toggle Coding Mode to switch to code‑oriented thresholds and prompts.
Use Clear Archive to wipe memory (archive.json.gz) and reset clusters.

Notes

Similarity thresholds live near the top of SyCoT.py:

similarity_high = 0.8       # Threshold for direct conclusion reuse
similarity_low = 0.5        # Minimum similarity for context inclusion
similarity_high_code = 0.85 # (coding mode)
similarity_low_code = 0.55  # (coding mode)

Archive file paths and logging are defined in SyCoT.py.
The model ID and API base are in SyCoT.py (nvidia/llama-3.1-nemotron-ultra-253b-v1).
The app silently sends a handshake (“hey”) on page load to sync session state, skipping memory retrieval for the first prompt sent by the user.

License

This project is licenced under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Synaptic Chain‑of‑Thought (SyCoT) - Preview

Why this exists

How it works

Positioning

User experience

How SyCoT compares to other approaches

What’s in this preview

Features

Architecture

Endpoints

Requirements

Setup (Windows / PowerShell)

Using the UI

Notes

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SyCoT.py		SyCoT.py
decision_engine.py		decision_engine.py
preview.py		preview.py
requirements.txt		requirements.txt

License

ageeorn/SyCoT

Folders and files

Latest commit

History

Repository files navigation

Synaptic Chain‑of‑Thought (SyCoT) - Preview

Why this exists

How it works

Positioning

User experience

How SyCoT compares to other approaches

What’s in this preview

Features

Architecture

Endpoints

Requirements

Setup (Windows / PowerShell)

Using the UI

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages