Implement your own Mapper Reducer from Scratch
- bvsankitds
- Feb 25, 2020
- 1 min read

In this blog, I am going to show how we you can implement your own Mapper Reducer from scratch.
Follow the below steps in Python to implement your own Mapper and Reducer -
STEP 1 : Import the required libraries needed for data cleaning activities.
STEP 2 : Next read the input file from your local directory.
In the below code we are trying to open a connection to the file.
The read function is used to read file data and store it in variable.
STEP 3 : In the below Data Cleaning function, we remove Punctuation's, Apostrophe, convert words to Lower case, Remove Numbers.
STEP 4 : Split the file into 2 parts with 5000 lines into one file and the rest into another file.
STEP 5 : Tokenize the words in both the documents and append all those words to a list.
STEP 6 : Passing all the words in each list to a dataframe with its first column having all the words of the list and second column containing the value '1'.
The dataframe now looks like a Mapper Function Output
STEP 7 : The below code acts as a Reducer by giving the frequency of each word in the document.
That's all!!! You have implemented your very own Mapper Reducer function from scratch.
Commentaires