Need to solve the Word Count program using Hadoop Streaming following the below instructions

1. In class we wrote a MapReduce program in Java to compute the word counts for any given input. In this assignment, you will repeat solving the same problem but using Hadoop streaming. 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

2. Create two scripts in Python namely wordcount_map.py and wordcount_reduce.py to be used by the mappers and reducers of the streaming job. 

3. Your script files must be executable (consider chmod command), and must include the necessary shebang (like in the attached script files).

 4. Attached are the script files we used in class to demonstrate Hadoop streaming, namely: maxtemp_map.py and maxtemp_reduce.py. They can help you to get started. 

5. Recall the streaming command:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

 $ mapred streaming \    

 -files , \   

  -mapper \    

 -reducer \   

  -input \    

 -output  

(extra options: -combiner, -numReduceTasks, etc.)

MaxTemperature Example file is the program file discussed in Class.

Instructions:

1. In class we wrote a MapReduce program in Java to compute the word counts for any given input. In this assignment you will repeat solving the same problem but using Hadoop streaming.

2. Create two scripts in Python namely wordcount_map.py and wordcount_reduce.py to be used by the mappers and reducers of the streaming job.

3. Your script files must be executable (consider chmod command), and must include the necessary shebang (like in the attached script files).

4. Attached are the script files we used in class to demonstrate Hadoop streaming, namely: maxtemp_map.py and maxtemp_reduce.py. They can help you to get started.

5. Recall the streaming command:

$ mapred streaming \

-files , \

-mapper \

-reducer \

-input \

-output

(extra options: -combiner, -numReduceTasks, etc.)

#!/usr/bin/env python

import re

import sys

for line in sys.stdin:

val = line.strip()

year, temp, q = val[15:19], val[87:92], val[92:93]

if (temp != “+9999” and re.match(“[01459]”, q)):

print(“%s\t%s” % (year, temp))

#!/usr/bin/env python

import re

import sys

for line in sys.stdin:

val = line.strip()

year, temp, q = val[15:19], val[87:92], val[92:93]

if (temp != “+9999” and re.match(“[01459]”, q)):

print(“%s\t%s” % (year, temp))

Mapper for the Maximum temperature Example:

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper

extends Mapper {

private static final int MISSING = 9999;

@Override

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();

String year = line.substring(15, 19);

int airTemperature;

if (line.charAt(87) == ‘+’) { // parseInt doesn’t like leading plus signs

airTemperature = Integer.parseInt(line.substring(88, 92));

}

else {

airTemperature = Integer.parseInt(line.substring(87, 92));

}

String quality = line.substring(92, 93);

if (airTemperature != MISSING && quality.matches(“[01459]”)) {

context.write(new Text(year), new IntWritable(airTemperature));

}
}
}

Reducer for the max temperature example

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer
extends Reducer {

@Override
public void reduce(Text key, Iterable values, Context context)
throws IOException, InterruptedException {

int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}

Application to find the maximum temperature in the weather dataset

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println(“Usage: MaxTemperature “);
System.exit(-1);
}

Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName(“Max temperature”);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP