Visit NornNet. Where the Fates Weave Destiny with a Touch of AI Magic - A private AI ChatBot that rocks!
- Eliezer Lamien
- Joe Scott
- Owen Osmera
- Shawn Noon
- LM Studio Tutorial: Run Large Language Models (LLM) on Your Laptop
- Network Chuck Private AI
- How to Build a Local AI Agent With Python (Ollama, LangChain & RAG)
- The Ultimate Guide to Local AI and AI Agents (The Future is Here)
- Part 1 - Run Ollama with Python in 5 Minutes (Full Setup Guide
- Part 2 - Local Ollama Chatbot in Python
- Demo Creating an App Using Ollama OpenAI Python Client
- How to Create Your Own AI Chatbot Like DeepSeek with Ollama in Python Flask
- Learn Ollama in 15 Minutes - Run LLM Models Locally for FREE
- How To Prompt AI in 2026 (according to experts)-NetworkChuck
- Integrate an LLM into flask with python https://github.com/ollama/ollama-python
- Create, read, and write to a PDF file https://github.com/py-pdf/pypdf
- Chat history/Memory/Context https://github.com/digithree/ollama-rag
- Fork the NornNet Repository.
- Pull local copy to your computer.
- Make code changes.
- Commit --> then Push to your repository
- If your repository is not up to date --> Synchronize Changes.
- Create a Pull Request.
- Please watch: Video about Guild Project (10-16-25)
Python Flask project to create a web chatbot. Private mode: host an open-source model yourself (local server, private cloud, or on-prem) for full control over data and privacy.
Build a reliable web chatbot. The app should let users exchange messages with an AI model and see streamed responses.
- Privacy-first solution running a private model server (containerized) with secure access and logging controls.
- Backend: Python, Flask, Waitress (chat endpoint and API connector)
- Frontend: HTML, CSS, JavaScript
- Private AI: local model server (lightweight runtime like llama.cpp / GGML for CPU quantized models)
High level: Client (browser) -> Flask app -> (Private model server) -> Flask -> browser.
The implementation is split into phases so the team can ship incrementally.
-
Download the latest Ollama Release (e.g., ollama-windows-amd64.zip)
-
Extract the zip file --> place all files in c:\ollama
-
Add to system environment variables: c:\ollama
-
Verify installation: Open a Command Prompt:
-
Verify that gemma3:4b is installed as the model
ollama --version -
Pull a Model to test with:
ollama pull gemma3:4b
This ensures Ollama starts automatically with the server and restarts if it ever crashes, without requiring a user to be logged in.
Step 1: Download NSSM
- Go to the NSSM download page.
- Download the latest release (e.g., nssm-2.24.zip).
- Extract the ZIP file to a permanent location on your server, like C:\nssm
Step 2: Configure the Ollama Service with NSSM
-
Open an Administrator Command Prompt or Administrator PowerShell.
nssm install Ollama
This will open the NSSM service installer GUI.
Application Tab:
-
Path: Click the ... button and navigate to your ollama.exe file.
-
Startup directory: This should automatically populate to the correct folder.
-
Arguments: This is the most important part. Enter:
serve
This tells Ollama to run as a server.
- Log on Tab: For a server, it's best to run this as a Local System account. Select the
Local System account
radio button. This allows the service to run even when no user is logged in.
- Environment Tab: This is critical for allowing your Flask server to access Ollama.
- In the Environment variables box, add the following key-value pair:
OLLAMA_HOST=0.0.0.0
This tells Ollama to listen for requests on all network interfaces, not just localhost.
- Click the
Install servicebutton. You should see a "Service 'Ollama' installed successfully!" message.
Step 3: Start the Service
-
Open the Windows Services app (run services.msc).
-
Find your new service named Ollama.
-
Right-click it and select Start.
-
Verify it's working: Open your server's web browser and navigate to:
http://localhost:11434 -
You should see the message "Ollama is running."
The Ollama server is now running as a persistent service on port 11434, accessible from other machines on your network.
β Todo β Draft README π In progress β Add examples and docs β Complete β Initial release
β Complete β Initial release
- Create
main_app.pywith routes for the homepage (/) and chat (/chat).- Dynamic base path support: set
BASE_PATH=/nornnetin.envfor production; leave unset locally to serve at root.
- Dynamic base path support: set
- Build
index.htmlwith a chat history, input, and send button. - Implement minimal CSS to make the UI usable.
- Download the student handbook onto the server for the AI to reference
- Create virtual environment
-
Install python ollama library:
pip install ollama
-
β Complete β Initial release
- Use
main.jsto send user messages to/chatand append both user and bot messages to the DOM.
β Complete β Initial release
- CPU-only machines: prefer quantized models (GGML / llama.cpp) or small transformer models.
- Create a model server and expose an internal HTTP endpoint.
- Secure the private server: mTLS / API key, and run behind an IIS server which runs waitress
- Make the ai function a class for expandability.
- DO NOT chage the AI function file only make a new file for the class.
- SetUserQuestion
- GetUserQuestion
- GetAIResponse
- SetAIPrompt
- GetAIPrompt
- Find a way to get a pdf reader in python and integrate that into the code.
- How do we add the student handbook as context for the ai?
- This can be a function for class we just need a way to read the file to create words for prompting the ai.
- Find out if context can be enabled on ollama.
- Integrate this into the code so the AI can have context
- Research using CAG or RAG
- Add github repository address to docs page.
- Add responsive design to user interface. The interface should resize for different devices. Look at ChatGPT for an example
-
- Add support for streaming responses if the model server supports it. This may provide a faster start to the response and a better user experience.
β Todo
- Decide what to log (timestamps, anonymized session id, message length). Avoid storing PII by default.
- Add a user-facing privacy notice and an opt-out for logging.
- Implement retention policy and a script to sweep/delete logs older than X days.
β Todo
- Add health checks, simple metrics (request counts, success/failure), and basic monitoring instructions.
This plan maps team roles to implement the privacy-oriented private mode.
- Team 1 (HTML & Structure):
index.htmlβ chat layout, message list, input form. - Team 2 (CSS & Interactivity):
style.cssandmain.jsβ chat bubbles, responsive layout, DOM updates, optimistic UI.
Shared goal: Deliver a clean, usable chat UI with graceful error states.
- Team 3 (Server & Routing):
app.pyβ create routes, session handling, simple persistence for sessions. - Team 4 (Request & Response Handling): Implement
/chatlogic that validates input, forwards to the connector, and returns JSON or streaming responses.
Shared goal: Reliable, well-documented endpoints and simple persistence for conversation history.
- Team 5 (Private AI & Security): Implement local/private model server wiring and implement logging & retention rules. If hardware allows, tune for streaming and lower-latency inference.
Optional extras for the team:
- Add unit/integration tests for the connector code.
- Add a small evaluation page that replays test prompts and shows model outputs side-by-side.
- Add simple cost/latency telemetry so the team can compare different models.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright (c) 2025 WNCC IT Program
