Skip to content

Commit 02ae5fe

Browse files
chore(docs): updates blog posts for community bonding,and weeks 1-3
1 parent 27ae68e commit 02ae5fe

File tree

4 files changed

+77
-5
lines changed

4 files changed

+77
-5
lines changed

docs/2025/data-pipeline/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Currently, Safaa provides a strong framework designed to deal with copyright not
3030
4. Preprocessing Tools
3131

3232
However, Currently in the Safaa Project, data is manually curated And we see that most of the things are manual here.
33-
This project wil concentrate on creating a pipeline, Utilizing LLMS if required to increase the accuracy, or use deep learning techniques to improve.
33+
This project will concentrate on creating a pipeline, Utilizing LLMs if required to increase the accuracy, or use deep learning techniques to improve.
3434

3535
Writing scripts to copy copyright data automatically(group's data or some users data) from fossology instance to train the model.
3636

docs/2025/data-pipeline/updates/2025-06-04.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ tags: [gsoc25, Safaa Data for Pipeline]
77
<!--
88
SPDX-License-Identifier: CC-BY-SA-4.0
99
10-
SPDX-FileCopyrightText: 2024 Shreya Gautam <oyewaleabdulsobur@gmail.com>
10+
SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
1111
-->
1212

1313
# WEEK 1
14-
*(May 30, 2024)*
14+
*(June 4, 2024)*
1515

1616
## Attendees:
1717
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
1818
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
1919

20-
### Enagagements
20+
### Engagements
2121
* I engaged in the installation of Fossology locally, and solved the obstacle of working with Windows. Since Fossology installation guide works best with Linux, I was able to achieve this installation with WSL2.
2222
* I also conducted various examples on the Safaa agent to tests out it features and functionalities which also gives me the insight of how it currently works. You can find this here.
2323

@@ -29,4 +29,3 @@ SPDX-FileCopyrightText: 2024 Shreya Gautam <oyewaleabdulsobur@gmail.com>
2929

3030
## Subsequent Steps
3131
* I was tasked to begin with the first task in the project list which is about the creation of script to get copyright data from a fossology instance.
32-
*
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: Week 2
3+
author: Abdulsobur Oyewale
4+
tags: [gsoc25, Safaa Data for Pipeline]
5+
---
6+
7+
<!--
8+
SPDX-License-Identifier: CC-BY-SA-4.0
9+
10+
SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
11+
-->
12+
13+
# WEEK 1
14+
*(June 11, 2024)*
15+
16+
## Attendees:
17+
- [Shaheem Azmal M MD](https://github.com/shaheemazmalmmd)
18+
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
19+
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
20+
21+
### Engagements
22+
* This week i started full engagement with this year project. And the first task on the list to achieve this goal is the creation of a script to fetch copyright content from the fossology server.
23+
* I started by trying to write out SQL codes to fetch this content from the fossology server, and after different tweaking i was able to achieve this goal.
24+
* After a successful writing of the SQL script to fetch the required content from the fossology server, I proceeded to write a python program to embed the PostgreSQL script into the program using the psycog library to achieve the connection to the Postgres database server.
25+
* With this, i was able to automate the collection of copyright content data from the fossology server running in the local host.
26+
27+
28+
## Meeting Discussion:
29+
* I discuss with the mentors about the progress of the week and how the project s going, including if there was any obstacle.
30+
* We discussed about the current progress which is the content fetching script from the fossology localhost server.
31+
* I also gave them a demo to show them how it works and the expected output from the script.
32+
33+
34+
## Subsequent Steps
35+
* I was tasked to write include timestamp with the generated data, so as to track the sequence data update
36+
* I was also told to make some changes for the script to accommodate various sever configuration by placing the server configuration in a `.env` file.
37+
* And I will also continue with the preprocessing script which will allow us to preprocess the data we got from the script fetched from the fossology server.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: Week 2
3+
author: Abdulsobur Oyewale
4+
tags: [gsoc25, Safaa Data for Pipeline]
5+
---
6+
7+
<!--
8+
SPDX-License-Identifier: CC-BY-SA-4.0
9+
10+
SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <oyewaleabdulsobur@gmail.com>
11+
-->
12+
13+
# WEEK 1
14+
*(June 18, 2024)*
15+
16+
## Attendees:
17+
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
18+
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
19+
20+
### Engagements
21+
* This week I began with the second task on the list, which is the creation of a script to preprocess copyright content from the fossology server.
22+
* I was informed last week of the available pre-written script available on the Safaa codebase which I can utilize to make this task faster to complete.
23+
* I then began by starting to write out this pre-written script, reading the code, understanding it, then before modifying it to suit our intent.
24+
* After completing the above task, I modified the script to match our int. With this, I was able to preprocess the data we retrieved from the fossology server running in the local host.
25+
26+
27+
## Meeting Discussion:
28+
* I discuss with the mentors about the progress of the week and how the project s going, including if there was any obstacle.
29+
* We discussed the current progress which is the preprocessing of data fetched from the fossology localhost server using available pre-written script.
30+
* I also gave them a demo to show them how it works and the expected output from the script.
31+
* I was told the task needs to be modified to so that it can be triggered using GitHub actions, and not manually via coding script.
32+
33+
34+
## Subsequent Steps
35+
* Given that we already have a working preprocessing, I was tasked to modify this to be triggered with GitHub Actions.
36+
* I will be continuing with the task above for the next week task achievements.

0 commit comments

Comments
 (0)