You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analysis of Apache server logs to find the most visited website using Pig, Hadoop and Python UDF script.
1
+
# Project Details
2
+
3
+
This repository contains code to analyze Apache server logs to find the most visited website using Apache Hadoop's Pig script extended using Python's User-defined Functions (UDF). It was run on an Ubuntu instance deployed on the Oracle's VMware with the help of Vagrant.
4
+
5
+
# Contents
6
+
7
+
-*shareFiles/pig_script .py* contains code to compute the page hits and store them.
8
+
-*shareFiles/scipt .py* contains the Python UDF to parse the sample Apache logs.
9
+
-*shareFiles/sample_log* contains the sample logs on which the scripts are run.
0 commit comments