🚀 Unlock the Power of AI-Powered Web Scraping!
Welcome to the AI Web Scraper Chatbot! This tool combines AI and web scraping technologies to provide an intelligent interface for extracting and interacting with website data. Seamlessly scrape website content, ask intelligent questions, and extract precise information—all through an interactive chatbot interface.
- 🖥️ Web Scraping: Enter a URL, and the tool will scrape the website content in real-time.
- 📄 DOM Content Viewer: Instantly preview the website's DOM content for better understanding.
- 💬 Conversational Parsing: Ask questions about the content, and the chatbot will extract the information you need.
- 🔍 Smart Data Extraction: Provide a description for parsing, and get the exact data you're looking for with AI-driven precision.
- 💻 Frontend: Streamlit for a seamless and interactive user interface.
- ⚙️ Backend: Python-based web scraping using Selenium, BeautifulSoup, and requests.
- 🧠 AI Integration: LLaMA Model for smart text parsing and chatbot functionality.
- 📦 Libraries:
- Selenium for automated web interactions
- BeautifulSoup for HTML parsing
- LLaMA for AI-driven response generation
- Streamlit for building interactive UI
- Step 1: Enter a URL you want to scrape.
- Step 2: Preview the DOM content of the webpage.
- Step 3: Ask questions or describe the specific data you want to extract.
- Step 4: The AI model will parse the content and provide relevant answers based on your query.
Below is the flow of how the system works:
- Scrape Website → 2. Extract DOM → 3. Chat with AI → 4. Get Parsed Results
Check out the demo video of how this tool works:
![]()
-
Clone the repository:
git clone https://github.com/naveennk045/AI-WebScraper
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Usage:
Open your browser and navigate to `http://localhost:8501`. Enter a URL in the chatbox to scrape and start interacting!
We welcome contributions! Feel free to fork the repository, raise issues, or submit PRs to help make this tool even better.
For any queries or support, feel free to reach out:
- Email: naveennk045@gmail.com
This project is licensed under the MIT License. See the LICENSE file for more details.
Special thanks to the open-source community and libraries that made this project possible! 🙏
