Skip to content

This project explores the global usage of programming languages and visualizes their patterns across countries. By normalizing the data to represent usage per 10,000 people, we created a platform for meaningful comparisons. The dataset includes popular languages such as Python, Java, and C++, analyzed using Tableau and R programming.

License

Notifications You must be signed in to change notification settings

ayaan9618/AnalysisandVisualizationofGlobalProgrammingLangUsage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis and Visualization of Global Programming Language Usage


What I Built

For this project, I wanted to see how different programming languages are used around the world. I took a dataset showing programming language popularity across various countries and created visualizations using both R and Tableau to explore the patterns.

The cool thing about using two different tools was seeing how they each have their strengths - R was great for quickly generating statistical plots, while Tableau made it easy to build interactive maps and dashboards that you can actually click around and explore.


The Dataset

The data shows programming language usage normalized per 10,000 people in each country. This normalization is important because it lets us compare countries fairly - otherwise, bigger countries would always dominate the numbers just because they have more people.

Each row in the dataset represents a country, and each column represents a different programming language (like Python, Java, C++, etc.). The values tell us how popular each language is in that country, adjusted for population size.

I should note that this data is a snapshot in time and comes from aggregated sources, so it's better for exploring general trends rather than making precise claims about current usage.


Tools Used: R and Tableau
Project Type: Data Analysis & Visualization

Project Files

Here's what's included:

  • data.csv - The main dataset with all the programming language usage statistics
  • R_visualization.R - My R script that creates histogram visualizations
  • Tableau Workbook - The interactive dashboard (either .twb or .twbx format)

What I Did

Part 1: Visualizing with R

I started by loading the data into R and creating histograms for each programming language. The goal was to understand the distribution - are most countries using these languages at similar rates, or is usage really concentrated in just a few places?

The R approach was nice because it's completely reproducible - anyone can run my script and get the exact same plots. Plus, it's quick to iterate when you're exploring the data and trying to figure out what's interesting.

What I Found: The histograms showed that programming language adoption is actually pretty uneven across countries. Some languages are heavily concentrated in certain regions, while others are more evenly distributed.

Part 2: Interactive Dashboards in Tableau

After getting a sense of the distributions in R, I moved to Tableau to create interactive visualizations. This is where things got more interesting because I could:

  • Create world maps showing language usage by country (using color gradients)
  • Build bar charts to compare languages side-by-side
  • Add filters so you can focus on specific regions or languages
  • Enable highlighting so clicking on one element updates the whole dashboard

The geographic visualizations were especially helpful for spotting regional patterns that weren't obvious from the histograms alone.


Interesting Patterns I Noticed

While working through the visualizations, a few things stood out:

  • Programming language popularity definitely varies by region - it's not evenly distributed at all
  • Some languages are really dominant in specific geographic areas (which makes sense given different tech industries and educational systems)
  • The distribution shapes told me that most languages have a few "hotspot" countries with high usage and then a long tail of countries with lower adoption
  • Looking at the data both ways (statistical plots + geographic maps) gave a much richer picture than either approach alone

Limitations and Caveats

A few things to keep in mind about this analysis:

  • The data is normalized estimates, not exact counts of actual programmers
  • This is just a snapshot in time - it doesn't show how trends are changing
  • I can spot correlations and patterns, but that doesn't tell us why certain languages are popular in certain places
  • The quality of the insights depends on how complete and accurate the underlying data is

Basically, this is exploratory analysis that raises interesting questions, but you'd need more research to really explain the "why" behind the patterns.


Technical Notes

The R script uses base R functions to keep dependencies minimal. For Tableau, I assigned geographic roles to the country field so it could automatically generate maps. The whole analysis is pretty straightforward - no complex statistical modeling, just solid data visualization fundamentals.

The trickiest part was making sure the data was clean and properly formatted for both tools, since R and Tableau have slightly different expectations for data structure.


What I Learned

This project really reinforced for me that the right visualization tool depends on what you're trying to discover. Static plots are perfect when you know what you're looking for, but interactive dashboards are amazing for open-ended exploration.

Also, presenting data both statistically and geographically can reveal patterns that neither view alone would show. Different visualization perspectives lead to different insights.


Files and Setup

To explore this yourself:

  1. Open R_visualization.R in RStudio and run it to see the histogram analysis
  2. Open the Tableau workbook to interact with the dashboards
  3. The data.csv file works with both tools

No special packages required for R (just base R), and Tableau Public works fine for the visualizations.

About

This project explores the global usage of programming languages and visualizes their patterns across countries. By normalizing the data to represent usage per 10,000 people, we created a platform for meaningful comparisons. The dataset includes popular languages such as Python, Java, and C++, analyzed using Tableau and R programming.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages