This repository provides a streamlined solution for performing patent similarity searches by leveraging the capabilities of Language Learning Models (LLMs) and Retrieval-Augmented Generation (RAG). The application integrates advanced tools such as Google Patent Database, DuckDuckGo search, and cutting-edge models like Google's Gemini (via Generative AI API) to generate keywords, build queries, and parse results efficiently.
- Keyword Generation: Generate relevant and diverse keywords for patent searches.
- Query Optimization: Create advanced search queries using Boolean and proximity operators for maximum recall and precision.
- Web Search Integration: Seamlessly search patent data from Google Patents using DuckDuckGo API.
- Result Parsing: Extract, parse, and display search results in a user-friendly DataFrame.
- Streamlit App: A professional, interactive web application for patent similarity searches, inspired by Google Patents' interface.
Clone the repository:
git clone https://github.com/hissain/llm_patent_search.git
cd llm_patent_search
Install the required dependencies:
langchain==0.0.213
duckduckgo-search==0.5
streamlit==1.24.0
pandas==2.1.0
openai==0.27.0
google-cloud==3.0.0
langchain-google-genai==0.0.1
requests==2.28.1
Google Gemini API: Add your Google Gemini API key as an environment variable GEMINI_API_KEY.
streamlit run app.py
- Open the Streamlit app.
- Enter your patent description in the provided text area.
- View auto-generated keywords, queries, and search results.
- Analyze the search results in a clean and structured interface.
llm_patent_search
├── app.py # Streamlit app entry point with logics
├── prompts.py # Prompttemplates for keyword and query
└── README.md # Project documentation
└── requirements.txt # Project requirements for python modules (use with `pip install -r requirements.txt`)
This tool is designed for informational purposes only. It is not intended to replace professional patent searches or legal consultations. While the system uses advanced models and APIs to ensure the relevance of search results, the outputs may not always be exhaustive or accurate. The system relies on public search APIs like DuckDuckGo and Google Patents. Any restrictions or inaccuracies in these APIs will affect the results. Ensure compliance with local regulations and ethical considerations while using this tool. Do not misuse it for infringing upon intellectual property rights. Using APIs such as DuckDuckGo or Google Gemini may be subject to rate limits or usage restrictions based on their respective policies.
- Integration with Semantic Scholar for academic references.
- Enhanced LLM fine-tuning for industry-specific searches.
- Real-time result clustering and visualization.
This project is licensed under the MIT License. See the LICENSE file for details.
- LangChain for providing robust tools for RAG workflows.
- DuckDuckGo and Google Search API for seamless web search integration.
- Google Gemini API for state-of-the-art LLM capabilities.
Feel free to customize it further to reflect specific nuances of your project.