-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
singhayush7
committed
Dec 22, 2024
1 parent
64bd1bb
commit 6f47601
Showing
20 changed files
with
687 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Logs | ||
logs | ||
*.log | ||
npm-debug.log* | ||
yarn-debug.log* | ||
yarn-error.log* | ||
pnpm-debug.log* | ||
lerna-debug.log* | ||
|
||
node_modules | ||
dist | ||
dist-ssr | ||
*.local | ||
|
||
# Editor directories and files | ||
.vscode/* | ||
!.vscode/extensions.json | ||
.idea | ||
.DS_Store | ||
*.suo | ||
*.ntvs* | ||
*.njsproj | ||
*.sln | ||
*.sw? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
**AI-Powered Article Recommendation System** | ||
============================================ | ||
|
||
An advanced **AI-driven article recommendation engine** designed to process and retrieve **relevant articles** from a vast dataset of over **2 million articles**. This tool provides real-time, **context-aware article suggestions** by leveraging advanced **vector search** and **natural language processing (NLP)** technologies. | ||
|
||
**Demo** | ||
-------- | ||
|
||
![Real-Time Autocomplete Demo](https://github.com/lancedb/assets/blob/main/recipes/article_recommendation_engine.gif) | ||
|
||
|
||
* * * * * | ||
|
||
**Features** | ||
------------ | ||
|
||
- 🔍 **Keyword-Based Search**: Input any keyword or phrase, and get **top 10 relevant articles** instantly. | ||
- 🌐 **Massive Dataset Support**: Efficiently processes and retrieves results from a **dataset of over 2 million articles**. | ||
- 📈 **High Precision Recommendations**: Articles are ranked based on semantic similarity and relevance using state-of-the-art embeddings. | ||
- 🧠 **AI-Powered Relevance**: Built with **LangChain.js** and **LanceDB** for robust NLP and vector search capabilities. | ||
|
||
* * * * * | ||
|
||
**How It Works** | ||
---------------- | ||
|
||
1. **Data Preprocessing**: Articles are divided into smaller, context-preserving chunks using **RecursiveCharacterTextSplitter**.\ | ||
Example configuration: | ||
|
||
`const splitter = new RecursiveCharacterTextSplitter({ | ||
chunkSize: 25000, // Adjust chunk size for optimal performance | ||
chunkOverlap: 1, // Ensure overlap for context continuity | ||
});` | ||
|
||
2. **Vector Embedding**: The preprocessed data is embedded using **OpenAIEmbeddings**. | ||
3. **Efficient Storage**: Embedded vectors are stored in **LanceDB**, optimized for high-speed similarity search. | ||
4. **Query and Retrieval**: User input is matched against the dataset to retrieve **top 10 semantically similar articles**. | ||
|
||
* * * * * | ||
|
||
**Technical Highlights** | ||
------------------------ | ||
|
||
- **Advanced Vector Search**: Uses LanceDB to enable fast and scalable similarity searches across millions of articles. | ||
- **Real-Time Results**: The system retrieves and ranks articles within milliseconds. | ||
- **Customizable Dataset**: Easily replace the default dataset or upload custom datasets in `.csv` or `.txt` formats. | ||
|
||
* * * * * | ||
|
||
**Use Cases** | ||
------------- | ||
|
||
- **Research and Academic Work**: Find articles that are most relevant to your research topic. | ||
- **Content Curation**: Discover the best content for blogs, newsletters, or social media. | ||
- **Media Monitoring**: Track trends and news articles efficiently. | ||
- **Educational Insights**: Access curated learning material on any subject. | ||
|
||
* * * * * | ||
|
||
**Getting Started** | ||
------------------- | ||
|
||
### **1\. Prerequisites** | ||
|
||
- **Node.js** version **20+** | ||
- A valid [OpenAI API Key](https://platform.openai.com/signup) | ||
|
||
### **2\. Installation** | ||
|
||
Clone the repository and install dependencies: | ||
|
||
|
||
`git clone <repository-url> | ||
cd <repository-folder> | ||
npm install` | ||
|
||
### **3\. Configure API Key** | ||
|
||
Add your OpenAI API key in `.env`: | ||
|
||
`OPENAI_API_KEY=your_openai_key` | ||
|
||
* * * * * | ||
|
||
|
||
### **4\. Add your data source** | ||
|
||
Add your data source under the src>Backend>dataSourceFiles as news.csv | ||
If you name it otherwise, you might have to change the data source link in langChainProcessor.mjs file | ||
|
||
* * * * * | ||
|
||
### **5\. Running the System** | ||
|
||
use node >V20 | ||
|
||
`npm install` | ||
|
||
#### Run Backend Server: | ||
|
||
`npm run server` | ||
|
||
#### Run Full Application: | ||
|
||
|
||
`npm run dev` | ||
|
||
Access the app at: | ||
|
||
`http://localhost:5173` | ||
|
||
* * * * * | ||
|
||
**Customizing the Dataset** | ||
--------------------------- | ||
|
||
You can upload or replace the dataset for customized recommendations: | ||
|
||
1. Navigate to `src/Backend/dataSourceFiles`. | ||
2. Replace the existing `.csv` or `.txt` file with your dataset. | ||
3. Restart the backend server to process the new dataset. | ||
|
||
For example, to use the **All the News 2 Dataset**:\ | ||
[A dataset of 180mb size..used for creating this app](https://components.one/datasets/above-the-fold)\ | ||
[All the News 2 Dataset](https://components.one/datasets/all-the-news-2-news-articles-dataset) | ||
|
||
* * * * * | ||
|
||
**API Overview** | ||
---------------- | ||
|
||
**Endpoint**: `/api/articles`\ | ||
**Method**: `POST`\ | ||
**Request Body**: | ||
|
||
`{ | ||
"text": "Your keyword here" | ||
}` | ||
|
||
**Response**: | ||
|
||
`{ | ||
"result": [ | ||
{ | ||
"metadata": { | ||
"title": "Sample Title", | ||
"author": "Author Name", | ||
"content": "Snippet of the article..." | ||
} | ||
} | ||
] | ||
}` | ||
|
||
* * * * * | ||
|
||
**Future Enhancements** | ||
----------------------- | ||
|
||
- **Support for Multi-Modal Datasets**: Images, PDFs, and multimedia support. | ||
- **Interactive Filters**: Filter results by date, author, or publication. | ||
- **Deployable Cloud Versions**: Ready-to-deploy solutions for AWS, Vercel, and Netlify. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import js from '@eslint/js' | ||
import globals from 'globals' | ||
import react from 'eslint-plugin-react' | ||
import reactHooks from 'eslint-plugin-react-hooks' | ||
import reactRefresh from 'eslint-plugin-react-refresh' | ||
|
||
export default [ | ||
{ ignores: ['dist'] }, | ||
{ | ||
files: ['**/*.{js,jsx}'], | ||
languageOptions: { | ||
ecmaVersion: 2020, | ||
globals: globals.browser, | ||
parserOptions: { | ||
ecmaVersion: 'latest', | ||
ecmaFeatures: { jsx: true }, | ||
sourceType: 'module', | ||
}, | ||
}, | ||
settings: { react: { version: '18.3' } }, | ||
plugins: { | ||
react, | ||
'react-hooks': reactHooks, | ||
'react-refresh': reactRefresh, | ||
}, | ||
rules: { | ||
...js.configs.recommended.rules, | ||
...react.configs.recommended.rules, | ||
...react.configs['jsx-runtime'].rules, | ||
...reactHooks.configs.recommended.rules, | ||
'react/jsx-no-target-blank': 'off', | ||
'react-refresh/only-export-components': [ | ||
'warn', | ||
{ allowConstantExport: true }, | ||
], | ||
}, | ||
}, | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
<!doctype html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="UTF-8" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
<title>Article</title> | ||
</head> | ||
<body> | ||
<div id="root"></div> | ||
<script type="module" src="/src/main.jsx"></script> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
{ | ||
"name": "article-recommender", | ||
"private": true, | ||
"version": "0.0.0", | ||
"type": "module", | ||
"scripts": { | ||
"dev": "vite", | ||
"build": "vite build", | ||
"lint": "eslint .", | ||
"preview": "vite preview", | ||
"server": "node src/backend/server.mjs", | ||
"start": "npm-run-all --parallel server dev" | ||
}, | ||
"dependencies": { | ||
"@heroicons/react": "^2.2.0", | ||
"@lancedb/lancedb": "^0.12.0", | ||
"@langchain/community": "^0.3.1", | ||
"@langchain/openai": "^0.3.14", | ||
"@phosphor-icons/react": "^2.1.7", | ||
"@testing-library/jest-dom": "^5.17.0", | ||
"@testing-library/react": "^13.4.0", | ||
"@testing-library/user-event": "^13.5.0", | ||
"body-parser": "^1.20.3", | ||
"cors": "^2.8.5", | ||
"csv-parser": "^3.0.0", | ||
"express": "^4.21.2", | ||
"fs": "^0.0.1-security", | ||
"langchain": "^0.3.7", | ||
"multer": "^1.4.5-lts.1", | ||
"phosphor-react": "^1.4.1", | ||
"react": "^18.3.1", | ||
"react-dom": "^18.3.1", | ||
"react-quill": "^2.0.0", | ||
"react-scripts": "5.0.1", | ||
"vectordb": "^0.1.19", | ||
"web-vitals": "^2.1.4" | ||
}, | ||
"devDependencies": { | ||
"@eslint/js": "^9.15.0", | ||
"@types/react": "^18.3.12", | ||
"@types/react-dom": "^18.3.1", | ||
"@vitejs/plugin-react": "^4.3.4", | ||
"autoprefixer": "^10.4.20", | ||
"eslint": "^9.15.0", | ||
"eslint-plugin-react": "^7.37.2", | ||
"eslint-plugin-react-hooks": "^5.0.0", | ||
"eslint-plugin-react-refresh": "^0.4.14", | ||
"globals": "^15.12.0", | ||
"npm-run-all": "^4.1.5", | ||
"postcss": "^8.4.49", | ||
"tailwindcss": "^3.4.16", | ||
"vite": "^6.0.1" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
export default { | ||
plugins: { | ||
tailwindcss: {}, | ||
autoprefixer: {}, | ||
}, | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Oops, something went wrong.