# 解压提前准备好的莎士比亚全集 [sujx@elephant ~]$ gzip -d shakespeare.txt.gz # 上传至hadoop文件系统 [sujx@elephant ~]$ hdfs dfs -mkdir /user/sujx/input [sujx@elephant ~]$ hdfs dfs -put shakespeare.txt /user/sujx/input # 查看有哪些测试程序可用 [sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. # 执行mapreduce运算,output文件夹会自动建立 [sujx@elephant ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/sujx/input/shakespeare.txt /user/sujx/output/ # 查看输出结果 [sujx@elephant ~]$ hdfs dfs -ls /user/sujx/output Found 4 items -rw-r--r-- 3 sujx supergroup 0 2020-03-09 02:47 /user/sujx/output/_SUCCESS -rw-r--r-- 3 sujx supergroup 238211 2020-03-09 02:47 /user/sujx/output/part-r-00000 -rw-r--r-- 3 sujx supergroup 236617 2020-03-09 02:47 /user/sujx/output/part-r-00001 -rw-r--r-- 3 sujx supergroup 238668 2020-03-09 02:47 /user/sujx/output/part-r-00002 # 查看输出内容 [sujx@elephant ~]$ hdfs dfs -tail /user/sujx/output/part-r-00000 . 3 writhled 1 writing, 4 writings. 1 writs 1 written, 3 wrong 112 wrong'd- 1 wrong-should 1 wrong. 39 wrong: 1 wronged 11 wronged. 3 wronger, 1 wronger; 1 wrongfully? 1 wrongs 40 wrongs, 9 wrongs; 9 wrote? 1 wrought, 4 …………