chore(docs): updates blog posts for community bonding,and weeks 1-3

smilingprogrammer · smilingprogrammer · commit 02ae5fe793ca · 2025-06-28T13:34:00.000+01:00
diff --git a/docs/2025/data-pipeline/index.md b/docs/2025/data-pipeline/index.md
@@ -30,7 +30,7 @@ Currently, Safaa provides a strong framework designed to deal with copyright not
 4. Preprocessing Tools
 
 However, Currently in the Safaa Project, data is manually curated And we see that most of the things are manual here. 
-This project wil concentrate on creating a pipeline, Utilizing LLMS if required to increase the accuracy, or use deep learning techniques to improve. 
+This project will concentrate on creating a pipeline, Utilizing LLMs if required to increase the accuracy, or use deep learning techniques to improve. 
 
 Writing scripts to copy copyright data automatically(group's data or some users data) from fossology instance to train the model.
 
diff --git a/docs/2025/data-pipeline/updates/2025-06-04.md b/docs/2025/data-pipeline/updates/2025-06-04.md
@@ -7,17 +7,17 @@ tags: [gsoc25, Safaa Data for Pipeline]
 <!--
 SPDX-License-Identifier: CC-BY-SA-4.0
 
-SPDX-FileCopyrightText: 2024 Shreya Gautam <oyewaleabdulsobur@gmail.com>
+SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
 -->
 
 # WEEK 1
-*(May 30, 2024)*
+*(June 4, 2024)*
 
 ## Attendees:
 - [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
 - [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
 
-### Enagagements
+### Engagements
 * I engaged in the installation of Fossology locally, and solved the obstacle of working with Windows. Since Fossology installation guide works best with Linux, I was able to achieve this installation with WSL2.
 * I also conducted various examples on the Safaa agent to tests out it features and functionalities which also gives me the insight of how it currently works. You can find this here.
 
@@ -29,4 +29,3 @@ SPDX-FileCopyrightText: 2024 Shreya Gautam <oyewaleabdulsobur@gmail.com>
 
 ## Subsequent Steps
 * I was tasked to begin with the first task in the project list which is about the creation of script to get copyright data from a fossology instance.
-* 
diff --git a/docs/2025/data-pipeline/updates/2025-06-11.md b/docs/2025/data-pipeline/updates/2025-06-11.md
@@ -0,0 +1,37 @@
+---
+title: Week 2
+author: Abdulsobur Oyewale
+tags: [gsoc25, Safaa Data for Pipeline]
+---
+
+<!--
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
+-->
+
+# WEEK 1
+*(June 11, 2024)*
+
+## Attendees:
+- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
+- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
+- [Kaushlendra Pratap](https://github.com/Kaushl2208)
+
+### Engagements
+* This week i started full engagement with this year project. And the first task on the list to achieve this goal is the creation of a script to fetch copyright content from the fossology server.
+* I started by trying to write out SQL codes to fetch this content from the fossology server, and after different tweaking i was able to achieve this goal.
+* After a successful writing of the SQL script to fetch the required content from the fossology server, I proceeded to write a python program to embed the PostgreSQL script into the program using the psycog library to achieve the connection to the Postgres database server.
+* With this, i was able to automate the collection of copyright content data from the fossology server running in the local host.
+
+
+## Meeting Discussion:
+* I discuss with the mentors about the progress of the week and how the project s going, including if there was any obstacle.
+* We discussed about the current progress which is the content fetching script from the fossology localhost server.
+* I also gave them a demo to show them how it works and the expected output from the script.
+
+
+## Subsequent Steps
+* I was tasked to write include timestamp with the generated data, so as to track the sequence data update
+* I was also told to make some changes for the script to accommodate various sever configuration by placing the server configuration in a `.env` file.
+* And I will also continue with the preprocessing script which will allow us to preprocess the data we got from the script fetched from the fossology server.
diff --git a/docs/2025/data-pipeline/updates/2025-06-18.md b/docs/2025/data-pipeline/updates/2025-06-18.md
@@ -0,0 +1,36 @@
+---
+title: Week 2
+author: Abdulsobur Oyewale
+tags: [gsoc25, Safaa Data for Pipeline]
+---
+
+<!--
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
+-->
+
+# WEEK 1
+*(June 18, 2024)*
+
+## Attendees:
+- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
+- [Kaushlendra Pratap](https://github.com/Kaushl2208)
+
+### Engagements
+* This week I began with the second task on the list, which is the creation of a script to preprocess copyright content from the fossology server.
+* I was informed last week of the available pre-written script available on the Safaa codebase which I can utilize to make this task faster to complete.
+* I then began by starting to write out this pre-written script, reading the code, understanding it, then before modifying it to suit our intent.
+* After completing the above task, I modified the script to match our int. With this, I was able to preprocess the data we retrieved from the fossology server running in the local host.
+
+
+## Meeting Discussion:
+* I discuss with the mentors about the progress of the week and how the project s going, including if there was any obstacle.
+* We discussed the current progress which is the preprocessing of data fetched from the fossology localhost server using available pre-written script.
+* I also gave them a demo to show them how it works and the expected output from the script.
+* I was told the task needs to be modified to so that it can be triggered using GitHub actions, and not manually via coding script.
+
+
+## Subsequent Steps
+* Given that we already have a working preprocessing, I was tasked to modify this to be triggered with GitHub Actions.
+* I will be continuing with the task above for the next week task achievements.