Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
-
Updated
Jan 27, 2026 - Python
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
RedisGears python client
There are Python 2.7 codes and learning notes for Spark 2.1.1
Build scalable data pipelines on YTsaurus with automatic stage management, local development simulation, and more.
Code for paper "Locally Distributed Deep Learning Inference on Edge Device Clusters"
Iterable Java8 style Streams for Python
A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.
Source code of the numerical experiments presented in "Energy-Efficient Edge-Facilitated Wireless Collaborative Computing using Map-Reduce" by Antoine Paris, Hamed Mirghasemi, Ivan Stupia and Luc Vandendorpe (presented at SPAWC19).
Distributed encoding, second generation.
🎓Repository for masters labs on FCSN, BSUIR
A package for working with lists distributed over MPI
Implementation of Girvan-Newman Algorithm to detect communities in graphs using Yelp dataset
Scatter gather with AWS lambda
A case study on mining association rules between different factors related to deaths of people in the United States
Parallel implementation of Breadth-First Search algorith in Java MapReduce and PySpark. This implementation finds degrees of separation between Twitter Users
Learn Big Data tools/ framework by doing examples, POC, per projects.
Add a description, image, and links to the map-reduce topic page so that developers can more easily learn about it.
To associate your repository with the map-reduce topic, visit your repo's landing page and select "manage topics."