Skip to content

JamePeng/searxng_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

searxng_search

License: MIT

A Python client library crafted by JamePeng (jame_peng@sina.com) for seamless interaction with your self-hosted SearXNG instance. This library empowers you to programmatically perform various searches (text, images, videos, news, etc.) against your local or LAN-deployed SearXNG-Docker server, giving you full control over your search data without relying on external APIs.


Features

  • Connect to Local/LAN SearXNG: Easily specify the base URL (IP address and port) of your self-hosted SearXNG instance, supporting both local and network-accessible deployments.
  • Text Search: Perform general web searches and retrieve structured results.
  • Structured Output: Receive search results in a clean, parseable JSON format.
  • Robust Error Handling: Comprehensive error management for network issues, server responses, and parsing failures.
  • Customizable: Extendable to support more SearXNG categories and search types.

Installation

Prerequisites

  • Python 3.9+: Ensure you have a compatible Python version installed.
  • Docker & Docker Compose: (Recommended for setting up SearXNG) Make sure Docker and Docker Compose are installed on your system.

Install the searxng_search Library

You can install searxng_search by building it from source.

From Source

Clone the repository (or create the project structure):

git clone https://github.com/jamepeng/searxng_search.git
cd searxng_search

Alternatively, manually create the following directory structure and files within your project root:

searxng_search_package/
├── searxng_search/
│   ├── __init__.py
│   ├── searxng_search.py
│   ├── exceptions.py
│   └── utils.py
├── test/
│   ├── demo.py
├── .gitignore
├── pyproject.toml
├── CHANGELOG.md
└── README.md

Install build tools:

pip install build wheel setuptools

Build and Install:

pip install .

Setting up SearXNG with Docker (Recommended)

Before using searxng_search, you'll need a running SearXNG instance. This setup uses searxng-docker's recommended docker-compose.yaml with Caddy as a reverse proxy, suitable for both local and external (LAN/public) access.

Clone the searxng-docker repository:

git clone https://github.com/searxng/searxng-docker.git
cd searxng-docker

Adjust docker-compose.yaml and Caddyfile:

The provided docker-compose.yaml is designed for use with Caddy and Redis. Here's the configuration for your docker-compose.yaml:

version: "3.7"

services:
  caddy:
    container_name: caddy
    image: docker.io/library/caddy:2-alpine
    network_mode: host # Binds to host network, allowing direct port access (e.g., 80/443)
    restart: unless-stopped
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data:rw
      - caddy-config:/config:rw
    environment:
      # Define your SearXNG hostname here.
      # For local access, set to "localhost" or "127.0.0.1".
      # For LAN/Internet access, set to your domain or public IP (e.g., "mysearxng.com").
      - SEARXNG_HOSTNAME=${SEARXNG_HOSTNAME:-localhost} # Default to localhost
      - SEARXNG_TLS=${LETSENCRYPT_EMAIL:-internal} # For internal HTTPS or Let's Encrypt email
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine # Using Valkey as a Redis fork
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    networks:
      - searxng
    volumes:
      - valkey-data2:/data
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    networks:
      - searxng
    ports:
      - "8080:8080"
      # Here, we change "127.0.0.1:8080:8080" to "8080:8080" so that it can be accessed from any network to which the host is connected (including the LAN and even the Internet if port forwarding and firewall rules are configured).
    volumes:
      - ./searxng:/etc/searxng:rw
      - searxng-data:/var/cache/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

networks:
  searxng:

volumes:
  caddy-data:
  caddy-config:
  valkey-data2:
  searxng-data:

Edit the .env file to set the hostname and an email

# By default listen on https://localhost
# To change this:
# * uncomment SEARXNG_HOSTNAME, and replace <host> by the SearXNG hostname
# * uncomment LETSENCRYPT_EMAIL, and replace <email> by your email (require to create a Let's Encrypt certificate)

# SEARXNG_HOSTNAME=<host>
# LETSENCRYPT_EMAIL=<email>

Modify settings.yml under the searxng-docker\searxng path, add search block for more format support (such as json):

# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: "ultrasecretkey"  # change this!
  limiter: false  # enable this when running the instance for a public usage on the internet
  image_proxy: true
search:
  formats:
    - html
    - json
    - csv
    - rss
redis:
  url: redis://redis:6379/0

Important Configuration Notes for Local and LAN Access:

  • SEARXNG_HOSTNAME: This environment variable in docker-compose.yaml and Caddyfile dictates how SearXNG is accessed.
    • For local access only: Set SEARXNG_HOSTNAME=localhost in your .env file (or directly in docker-compose.yaml). You'll then access SearXNG via http://localhost or https://localhost (if Caddy's internal TLS is enabled).
    • For LAN/public access: Set SEARXNG_HOSTNAME=your.lan.ip.address (e.g., 192.168.1.100) or SEARXNG_HOSTNAME=your.domain.com. You must also ensure your host's firewall and/or router forwards ports 80 (HTTP) and 443 (HTTPS) to the machine running Docker. This allows other devices on your network to reach the SearXNG server.
  • SEARXNG_TLS:
    • For external domains, provide a valid email (e.g., LETSENCRYPT_EMAIL=your@email.com) to enable Let's Encrypt for automatic HTTPS.
    • For local/LAN IP access or testing, internal will generate self-signed certificates, which your browser might warn about but allows for encrypted connections.
  • ports for searxng service: 127.0.0.1:8080:8080 means SearXNG is only directly accessible from the Docker host itself on port 8080. Caddy acts as the front-end proxy, handling external access and routing. You don't need to change this mapping for LAN access, as Caddy (running in network_mode: host) handles the external ports.

Start the SearXNG containers:

Navigate to the searxng-docker directory (where your docker-compose.yaml is) and run:

docker-compose up -d

This will pull the necessary images and start the Caddy, Redis, and SearXNG containers.

Verify SearXNG is running:

  • Local Access: Open your web browser and go to http://localhost or https://localhost (if SEARXNG_TLS is internal).
  • LAN Access: From another device on your local network, open your web browser and go to http://your.lan.ip.address or https://your.lan.ip.address (if SEARXNG_TLS is internal). If you configured a domain, use https://your.domain.com.

You should see the SearXNG search interface.


Usage Examples

Here's how you can use the searxng_search library in your Python code:

from searxng_search.searxng_search import SearXNGSearch
from searxng_search.exceptions import RequestException, ParsingException, SearXNGSearchException

import logging

# Configure logging for better visibility
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# --- IMPORTANT ---
# Replace this with the actual base URL of your running SearXNG instance.
# If you set SEARXNG_HOSTNAME=localhost, use http://localhost or https://localhost (if internal TLS is enabled)
# If you set SEARXNG_HOSTNAME=your.lan.ip.address, use [http://your.lan.ip.address](http://your.lan.ip.address) or [https://your.lan.ip.address](https://your.lan.ip.address)
# If you set SEARXNG_HOSTNAME=your.domain.com, use [https://your.domain.com](https://your.domain.com)
SEARXNG_BASE_URL = "http://localhost:8080" # <-- ADJUST THIS BASED ON YOUR CADDY/SEARXNG_HOSTNAME SETTING!

def perform_text_search(keywords: str, language: str = "en-US"):
    """Demonstrates performing a text search with error handling."""
    logger.info(f"Performing text search for: '{keywords}' on {SEARXNG_BASE_URL}")

    # Use 'verify=False' if you're using Caddy's 'internal' TLS for localhost or LAN IP
    # and Python can't verify the self-signed certificate. For production with valid certs, keep 'True'.
    with SearXNGSearch(base_url=SEARXNG_BASE_URL, timeout=20, verify=False, retries=3, backoff_factor=0.1) as client:
        try:
            # Perform a general text search, requesting JSON format, limit to 8 results
            results = client.text(keywords, category="general", language=language, format="json", max_results=8)

            if results:
                logger.info(f"Successfully retrieved {len(results)} results for '{keywords}':")
                for i, result in enumerate(results):
                    logger.info(f"  Result {i+1}:")
                    logger.info(f"    Title: {result.get('title', 'N/A')}")
                    logger.info(f"    URL:   {result.get('href', 'N/A')}")
                    logger.info(f"    Body:  {result.get('body', 'N/A')}...")
            else:
                logger.info(f"No results found for '{keywords}'.")

        except RequestException as e:
            logger.error(f"Network or HTTP error during search: {e}")
        except ParsingException as e:
            logger.error(f"Error parsing SearXNG response: {e}")
        except ValueError as e:
            logger.error(f"Invalid input parameter: {e}")
        except SearXNGSearchException as e:
            logger.error(f"A general SearXNG search error occurred: {e}")
        except Exception as e:
            logger.critical(f"An unexpected critical error occurred: {e}", exc_info=True)
    print("-" * 50) # Separator for clarity

if __name__ == "__main__":
    perform_text_search("Python programming best practices", language="en-US")
    perform_text_search("MCP是什么?", language="zh-CN")
    perform_text_search("nonexistent query xyz123") # Example for no results
    perform_text_search("") # Example for ValueError

API Reference (Planned)

SearXNGSearch(base_url: str, headers: dict | None = None, timeout: int | None = 30, verify: bool = True, retries: int = 3, backoff_factor: float = 0.5)

Initializes the client.

  • base_url (str): The full URL to your SearXNG instance (e.g., "http://192.168.1.100:8080/" or "https://your.domain.com/").
  • headers (dict, optional): Custom HTTP headers for requests.
  • timeout (int, optional): Request timeout in seconds. Defaults to 30.
  • verify (bool, optional): Whether to verify SSL certificates. Set to False for self-signed certificates (e.g., Caddy's internal TLS or custom certs) if you encounter SSL errors, but keep True for production with valid certificates. Defaults to True.
  • retries (int, optional): Number of times to retry a failed HTTP request. Defaults to 3.
  • backoff_factor (float, optional): A factor by which to multiply the retry delay. The delay will be backoff_factor * (2 ** (retry_count - 1)). Defaults to 0.5.

SearXNGSearch.text(keywords: str, category: str = "general", language: str = "en-US", pageno: int = 1, format: Literal["json", "html"] = "json", max_results: int | None = None) -> list[dict[str, str]]

Performs a text search.

  • keywords (str): The search query.
  • category (str, optional): SearXNG category (e.g., "general", "science", "it", "images", "videos", "news"). Defaults to "general".
  • language (str, optional): Language parameter for SearXNG (e.g., "en-US", "zh-CN"). Defaults to "en-US".
  • pageno (int, optional): Page number of results to fetch. Defaults to 1.
  • format (Literal["json", "html"], optional): Desired output format from SearXNG. "json" is highly recommended for structured data. Defaults to "json".
  • max_results (int, optional): Maximum number of results to return from the client side. If None, all results from the requested page are returned.

Exceptions

  • SearXNGSearchException: Base exception for all library errors.
  • RequestException: Raised for HTTP communication issues (network errors, timeouts, 4xx/5xx status codes).
  • ParsingException: Raised when SearXNG's response cannot be decoded or parsed as expected (e.g., invalid JSON, unexpected HTML structure).
  • ValueError: Raised for invalid input parameters provided to library methods.

Contributing

Contributions are welcome! If you find a bug, have a feature request, or want to improve the code, please feel free to:

  • Open an Issue: Describe the bug or feature you'd like to see.
  • Submit a Pull Request: Fork the repository, create a new branch, make your changes, and submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Author

JamePeng (jame_peng@sina.com)

About

A Python client library for seamless programmatic interaction with self-hosted SearXNG instances

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages