top of page

Implement your own Mapper Reducer from Scratch



In this blog, I am going to show how we you can implement your own Mapper Reducer from scratch.


Follow the below steps in Python to implement your own Mapper and Reducer -


STEP 1 : Import the required libraries needed for data cleaning activities.

STEP 2 : Next read the input file from your local directory.

In the below code we are trying to open a connection to the file.

The read function is used to read file data and store it in variable.


STEP 3 : In the below Data Cleaning function, we remove Punctuation's, Apostrophe, convert words to Lower case, Remove Numbers.

STEP 4 : Split the file into 2 parts with 5000 lines into one file and the rest into another file.


STEP 5 : Tokenize the words in both the documents and append all those words to a list.

STEP 6 : Passing all the words in each list to a dataframe with its first column having all the words of the list and second column containing the value '1'.


The dataframe now looks like a Mapper Function Output


STEP 7 : The below code acts as a Reducer by giving the frequency of each word in the document.

That's all!!! You have implemented your very own Mapper Reducer function from scratch.

 
 
 

Commentaires


Subscribe Form

©2020 by Data Science Innovation. Proudly created with Wix.com

bottom of page