Exploring the Landscape of the Egyptian Software Market

A Data-driven Approach

Our story begins with a quest for knowledge. The information we obtained from LinkedIn offers us unprecedented insights into the landscape of the Egyptian Software Industry.

If you love stories, you can check our conversation with Hazem and his friend Wael from here.

The Data

We have collected the data for over 1,000 software companies.

And over 1,000 software professionals.

From more than 30 universities.

Emphasizing the following aspects.

Insights

We have analyzed and tortured the data to extract the following insights.

New rising and available professions in the market.
Effect of student activities on your career.
Effect of college on your career.
Turn over rates in Egypt.
Factors affecting companies' turnover.
ITI effect on the market.
Shifting career in egypt.

Data Collection

Our data contains the scraped LinkedIn profiles of Egyptian Software Engineers and their companies. We collected different sections from each profile such as experience, skills, education, volunteering experience, and licenses and certificates. We also collected data about the companies such as the company name, company size, headquarters, industry, company description, etc…

Challenges

When we started building our scraper that will extract the needed data from the profiles, we were surprised that LinkedIn doesn’t allow web scraping bots to crawl the pages. In addition, the request needs to be authenticated in order to get the profile data. We thought of using Selenium and making it behave like a human to login and search for profiles and collect the data in a slower manner, but we found out that LinkedIn has a limit of 200-500 profile views per day and would put your account under the possibility of getting banned. So, we decided to create fake accounts on LinkedIn and let Selenium use them to collect as many profiles as possible. Unfortunately, after leaving the scrapers for a few hours, we checked the downloaded profiles’ HTML and found out that only 3-10 profiles were downloaded and the rest of the files were Sign Up pages which means that LinkedIn locked the scraper from reaching the profiles. At last, we decided to do it manually and visit people’s profiles and download the HTML. We were capable of downloading about 1000 personal profiles and more than 1000 company profiles. We know that this isn’t enough and the results may not be very accurate but we tried to answer the questions in a way that enables us to get more accurate answers as more data we have.

Data Preprocessing

After downloading the HTML, we created different parsers using JavaScript to extract the needed data from the HTML files. At first, we extracted the section from each personal profile into separate files. Then created a parser for each section, Experience, Education, Volunteering Experience, Licenses and Certifications, Skills, and Languages. Each parser takes the HTML of its section and creates a JSON containing the extracted data. Lastly, we cleaned and normalized the JSON data and created CSV files in order to work with it easily.

Parsing

Experience

Transform each position into a JSON object with the following attributes

Attribute	Description
Company	The name of the company
CompanyUrl	The URL of the company that the person worked in
Duration	The duration of the person in the position (in months)
EmploymentType	Full-time, Parme, or Internship
From	The starting date in the position (MM YYYY)
Location	The location of the person during this job position
Title	The job title of the person in the company
To	The end date in the position (MM YYYY)
User	The username of the person

Education Parser

Transform each education into a JSON object with the following attributes:

Attribute	Description
Date	The start and end dates of the education
Degree	The degree that the person obtained at the university
University	The name of the university attended by the person
User	The username of the person

Licenses and Certifications Parser

Transform each license/certificate into a JSON object with these attributes:

Attribute	Description
Date	The date of getting the license/certificate
Issuer	The issuer of the license/certificate
Title	The name of the license/certificate
User	The username of the person

Volunteering Parser

Transform each volunteering experience into a JSON object with these attributes

Attribute	Description
Date	The start and end dates of the volunteering work
Organization	The name of the organization
Role	The volunteering role that the person did
User	The username of the person

Skills Parser

Transform each skill into a JSON object with the following attributes

Attribute	Description
Skill1	The first featured skill in the person’s profile
Skill2	The second featured skill in the person’s profile
Skill3	The third featured skill in the person’s profile
User	The username of the person

Languages Parser

Transform each language into a JSON object with the following attributes

Attribute	Description
Proficiency	The person’s proficiency in the language
Title	The name of the language
User	The username of the person

Normalization

We had to normalize the data we collected from the parser and create specific tags for each categorical feature to deal with a finite set of variables. Also, we lowered case all the strings and removed any unnecessary characters like hyphens, apostrophes, quotations, commas, dots, etc…

Experience Normalization

We created a new attribute called title_tag which contains a normalized and specific tag for each field in the software industry like Frontend, Backend, AI, DevOps, etc…. For example, each of the “Machine Learning Engineer”, “Computer Vision Engineer”, and “AI Researcher” job titles would have the same title_tag which is “AI”. Note that we made the “software” tag the default title_tag if none of the other rules didn’t match. Also, we have title tags called “Teaching”, “Internship”, and “Student” which may not be looked at while answering the project’s questions.

Education Normalization

We removed the records that have no degree. And removed faulty education universities like IGCSE, Udacity, Udemy, Coursera, HarvardX, FreeCodeCamp, etc… Also, same as in experience normalization, we created a new attribute called university_name which contains normalized and a specific tag for each university. For example, each of “Ain Shams University”, and “Ain-Shams University” will have a university_name of “ain shams”. Lastly, we tried to infer the faculty name from the university name or from the degree string and assign a specific faculty name for each record in the education CSV file. For example, if the university name or the degree contains any words like “Computer”, “Technology”, “Artificial”, etc…, then the tag faculty_name would be “computer” and so on for the rest of the faculties.

Volunteering Normalization

Skills Normalization

Questions

We have asked the following questions and tried to answer them using the data we collected.

Descriptive questions

What is Job Titles Distribution Across The Software Industry in The Last Years? Detailed Answer
What Is The Percentage Of Turnover In The Market? Detailed Answer

Exploratory questions

What is the Relation Between the High Education and Career? Detailed Answer
What is The Relation Between Joining Student Activities And Working After Graduation? Detailed Answer
What is The Relation Between Company Attributes And The Time Software Engineer Works On It? Detailed Answer

Predictive Questions

Can we Predict the size of the company you will join after graduation? Detailed Answer

Casual questions

Do people have to join ITI to be able to work in the software industry? Detailed Answer

Mechanistic questions

People from different backgrounds shift to the software industry, how do they do so and why? Detailed Answer

Results Interpretation for our Questions

Descriptive questions

What is Job Titles Distribution Across The Software Industry in The Last Years?

As we saw in the notebook, the fields which are related to AI & Date (NLP, Data Science, Data Engineer, Data Analyst, and Machine Learning) are increasing, this can be explained by the increase in the demand for these fields in the last years. Also, the increase in the DevOps (Development and Operations) field can be explained as this field is relatively new, and more and more companies are adopting it.

What Is The Percentage Of Turnover In The Market?

People in the software industry often don't stay long in their companies due to factors such as a dynamic job market with numerous opportunities for career growth and new challenges, the rapid pace of technological advancements leading to the need for upskilling, and the high demand for specialized skills creating a competitive environment where employees may seek better compensation or opportunities elsewhere.

Exploratory questions

What is the Relation Between the High Education and Career? From the data we collected we can describe the effect of higher education on your career in main 3 factors, the skills you gain, the field you will work on and your first company
What is The Relation Between Joining Student Activities And Working After Graduation? It doesn't have much effect in your overall career, however, some companies favour people who have joined student activities before graduation.
What is The Relation Between Company Attributes And The Time Software Engineer Works On It? The founded year is a strong metric in affecting the time employee stays in the company which makes sense because an older company means a more stable & established company & that's a good environment for any employee. The Company size also shows a good metric but not stronger than founded metric. Making a combination between an old company & a big company size would ensure more & more that the employee time within this company will be longer.

Predictive Questions

Can we Predict the size of the company you will join after graduation?

Based on the analysis of our previous models, several key observations can be made:

Performance improvement: Our models have demonstrated superior performance compared to the dummy model in terms of both f1-score and accuracy. This signifies that the chosen features, namely "university, and faculty" significantly influence the company size that individuals are likely to join after completing their graduation. The results suggest a real and meaningful relationship between the current university and faculty attended and subsequent career outcomes.

Learning curve analysis: Upon analyzing the learning curves, it becomes evident that our model is exhibiting signs of overfitting. Despite the low complexity of our models-concluded from the bias-variance analysis-, the overfitting issue is likely attributable to the limited size of our dataset. With a small amount of training data, the model struggles to capture the underlying patterns and complexities of the problem at hand. This limitation hampers the model's ability to generalize and perform optimally on unseen data.

Addressing the overfitting concern requires careful consideration. Collecting more data could potentially help mitigate the issue and enable our models to better capture the intricacies of the relationship between the selected features and the company size.

Casual questions

Do people have to join ITI to be able to work in the software industry?

As we have seen in the data and as we have concluded from the hypothesis testing, we found that ITI is not the only way to work in the software industry. There are many people who have shifted their careers to the software industry without joining ITI. However, some professions are more affected by ITI than others. For example, most of the people working in the DevOps field have joined ITI. This comes from the fact that our universities don't teach DevOps and most of the people working in this field had to join ITI to learn it. The same applies to the UI/UX field, game development, and other fields.

Nevertheless, we can't deny the fact that ITI is one of the best ways to work in the software industry.

Mechanistic questions

People from different backgrounds shift to the software industry, how do they do so and why?

In the causal question, it was proven that there is no significant causal relationship between ITI and people shifting their career to the software industry. Shifting careers to the software industry seems to depend on the faculty individuals graduated from, with a majority of those who shifted their careers having graduated from an electronics major. The reasons for this include studying programming courses in their major, being unable to join CS/CE majors due to acceptance criteria, and having a background in the software industry through common courses. For individuals from other majors, the primary reason for shifting to the software industry is the high salaries offered compared to their original professions. Other factors include the potential for high salary growth, daily challenges that prevent monotony, numerous job opportunities, a culture of mutual respect, flexibility in work hours and location, adding value to society, not requiring formal education in a specific field, and providing fulfillment and satisfaction. Some fields, such as Embedded Systems, UI/UX, and QA, are particularly attractive for individuals from other professions to shift to within the software industry. This is due to their combination of hardware and software (embedded systems), design and software (UI/UX), and the accessibility of the field (QA). Additionally, high salaries, high demand, and work flexibility are common reasons across different fields. The causal question also demonstrated that ITI assists people in transitioning to specific jobs, providing an advantage over CS/CE graduates in professions not typically taught in CS/CE courses.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data		data
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
Story.pdf		Story.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Landscape of the Egyptian Software Market

A Data-driven Approach

Table of Contents

The Data

Insights

Data Collection

Challenges

Data Preprocessing

Parsing

Experience

Education Parser

Licenses and Certifications Parser

Volunteering Parser

Skills Parser

Languages Parser

Normalization

Experience Normalization

Education Normalization

Volunteering Normalization

Skills Normalization

Questions

Descriptive questions

Exploratory questions

Predictive Questions

Casual questions

Mechanistic questions

Results Interpretation for our Questions

Descriptive questions

Exploratory questions

Predictive Questions

Casual questions

Mechanistic questions

About

Contributors 3

Languages

Mostafa-wael/Exploring-the-Landscape-of-the-Egyptian-Software-Market

Folders and files

Latest commit

History

Repository files navigation

Exploring the Landscape of the Egyptian Software Market

A Data-driven Approach

Table of Contents

The Data

Insights

Data Collection

Challenges

Data Preprocessing

Parsing

Experience

Education Parser

Licenses and Certifications Parser

Volunteering Parser

Skills Parser

Languages Parser

Normalization

Experience Normalization

Education Normalization

Volunteering Normalization

Skills Normalization

Questions

Descriptive questions

Exploratory questions

Predictive Questions

Casual questions

Mechanistic questions

Results Interpretation for our Questions

Descriptive questions

Exploratory questions

Predictive Questions

Casual questions

Mechanistic questions

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages