Monday, June 17, 2013

Hadoop: MapReduce: WordCount: Pseduo Cluster Environment

To list files/folders available under hadoop root directory:
hadoop fs -ls /

Create a file with name xyz.txt:
sudo gedit xyz.txt

Check whether it has created or not(/home/user_name/):
ll   i.e., LL

verify the contents of xyz.txt:
cat xyz.txt

Before writing xyz.txt to HDFS, make sure it doesn't exists in the hadoop root directory:
hadoop fs -ls /

Write xyz.txt to HDFS:
hadoop fs -put xyz.txt /xyz.txt

Now, check, whether xyz.txt exists in hadoop root directory or not:
hadoop fs -ls /

To view files/blocks/locations info use:
hadoop fsck /xyz.txt -files -blocks -locations

Goto the folder structure, where your java programs exists:
cd training_materials/developer/exercises/wordcount/

Compile all your java files, by setting the hadoop-core.jar on classpath:
javac -classpath /usr/lib/hadoop/hadoop-core.jar *.java

Create a jar file out of all compiled java classes(i.e., all .class files):
jar cvf wordcount.jar *.class

Now, run/execute the jar by specifying
  1. the .class file name which has got main()
  2. the input file/folder name(including path)
  3. the output file name(including path)
hadoop jar wordcount.jar WordCount /xyz.txt /xyz_output1

Now, go to /xyz_output1 location:
hadoop fs -ls /xyz_output1
Note: you will see: _SUCCESS, _logs, part_00000. Where as, part_00000 has final reducer results/output.

To view the output:
hadoop fs -cat /xyz_output1/part-00000

To save whatever the commands you entered till now:
cd (means now i am at /home/user_name/)
history > command_history.txt