提交 11a82c2a 编辑于 作者: esthrus's avatar esthrus
浏览文件

Initial commit

上级
流水线 #1499 已失败 ,包含阶段
in 3 minute 和 18 second
# 作业6
#### 171098064 马小堤
### 1. 创建maven项目
maven已在作业5中安装和配置完成,此处省略。
在~/IdeaProjects目录下创建一个名为SocialNetwork的java应用项目,运行命令 `mvn archetype:generate "-DgroupId=sn" "-DartifactId=SocialNetwork" "-DarchetypeArtifactId=maven-archetype-quickstart" "-DinteractiveMode=false"`,创建成功。
<img src="/Users/sheddy_ma/Library/Application Support/typora-user-images/image-20201101124233818.png" alt="image-20201101124233818" style="zoom:50%;" />
通过IDEA打开项目并修改 `pom.xml`文件,加载必要的依赖包,完成配置。
### 2. 项目设计思路
项目的整体结构为:
```
- Class SocialNetwork
- Class GetFriendsPairMapper
- map()
- Class GetFriendsPairReducer
- reduce()
- Class FindCommonFriendsMapper
- map()
- Class FindCommonFriendsReducer
- reduce()
```
在GetFriendsPairMapper中读入输入txt文件,将每行字符串划分为一个person和多个friend,并以 `<friend, person>`对的形式输出至GetFriendsPairReducer。在GetFriendsPairReducer中,对于同一个friend,拼接所有的person列表并输出。
在FindCommonFriendsMapper中,读入上一个mapreduce流程输出的内容,每行为一个friend和多个person列表,在person列表中循环输出 `<person_i,person_j>: friend`至FindCommonFriendsReducer。在FindCommonFriendsReducer中,对于同一个 `<person_i,person_j>`对,拼接所有的friend列表并输出,则得到用户对所有的共同好友。
### 3. 程序输出
`cd`进入项目目录(此处为 `~/IdeaProjects/SocialNetwork)`,执行 `mvn clean package`将项目打包。
![image-20201103100820672](/Users/sheddy_ma/Library/Application Support/typora-user-images/image-20201103100820672.png)
此时在目录 `~/IdeaProjects/SocialNetwork/target`目录下有打包好的SocialNetwork-1.0.jar。启动hadoop,创建`/input`目录并将friends_list1.txt和friends_list2.txt拷贝到该目录下。
运行命令 `hadoop jar ~/IdeaProjects/Social/target/SocialNetwork-1.0.jar /input/friends_list1.txt /input/friends_list2.txt /output/output1 /output/output2`,则socialnetwork程序最终输出的结果被存储于 `/output/output2`文件件中。执行 `hadoop fs -cat /output/output2/part-r-00000`,最终输出如下:
<img src="/Users/sheddy_ma/Library/Application Support/typora-user-images/image-20201103101518233.png" alt="image-20201103101518233" style="zoom:50%;" />
`http://localhost:9870`可以看到输入和输出目录:
<img src="/Users/sheddy_ma/Library/Application Support/typora-user-images/image-20201103101606852.png" alt="image-20201103101606852" style="zoom:50%;" />
### 4. 实验中遇到的问题和解决方案
<?xml version="1.0" encoding="UTF-8"?>
<module org.jetbrains.idea.maven.project.MavenProjectsManager.isMavenModule="true" type="JAVA_MODULE" version="4">
<component name="NewModuleRootManager" LANGUAGE_LEVEL="JDK_1_8">
<output url="file://$MODULE_DIR$/target/classes" />
<output-test url="file://$MODULE_DIR$/target/test-classes" />
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src/main/java" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/src/test/java" isTestSource="true" />
<excludeFolder url="file://$MODULE_DIR$/target" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" scope="TEST" name="Maven: junit:junit:3.8.1" level="project" />
<orderEntry type="library" name="Maven: commons-logging:commons-logging:1.1.3" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-common:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-annotations:2.7.4" level="project" />
<orderEntry type="library" name="Maven: com.google.guava:guava:11.0.2" level="project" />
<orderEntry type="library" name="Maven: commons-cli:commons-cli:1.2" level="project" />
<orderEntry type="library" name="Maven: org.apache.commons:commons-math3:3.1.1" level="project" />
<orderEntry type="library" name="Maven: xmlenc:xmlenc:0.52" level="project" />
<orderEntry type="library" name="Maven: commons-httpclient:commons-httpclient:3.1" level="project" />
<orderEntry type="library" name="Maven: commons-codec:commons-codec:1.4" level="project" />
<orderEntry type="library" name="Maven: commons-io:commons-io:2.4" level="project" />
<orderEntry type="library" name="Maven: commons-net:commons-net:3.1" level="project" />
<orderEntry type="library" name="Maven: commons-collections:commons-collections:3.2.2" level="project" />
<orderEntry type="library" name="Maven: javax.servlet:servlet-api:2.5" level="project" />
<orderEntry type="library" name="Maven: org.mortbay.jetty:jetty:6.1.26" level="project" />
<orderEntry type="library" name="Maven: org.mortbay.jetty:jetty-util:6.1.26" level="project" />
<orderEntry type="library" name="Maven: org.mortbay.jetty:jetty-sslengine:6.1.26" level="project" />
<orderEntry type="library" scope="RUNTIME" name="Maven: javax.servlet.jsp:jsp-api:2.1" level="project" />
<orderEntry type="library" name="Maven: com.sun.jersey:jersey-core:1.9" level="project" />
<orderEntry type="library" name="Maven: com.sun.jersey:jersey-json:1.9" level="project" />
<orderEntry type="library" name="Maven: org.codehaus.jettison:jettison:1.1" level="project" />
<orderEntry type="library" name="Maven: com.sun.xml.bind:jaxb-impl:2.2.3-1" level="project" />
<orderEntry type="library" name="Maven: javax.xml.bind:jaxb-api:2.2.2" level="project" />
<orderEntry type="library" name="Maven: javax.xml.stream:stax-api:1.0-2" level="project" />
<orderEntry type="library" name="Maven: javax.activation:activation:1.1" level="project" />
<orderEntry type="library" name="Maven: org.codehaus.jackson:jackson-jaxrs:1.8.3" level="project" />
<orderEntry type="library" name="Maven: org.codehaus.jackson:jackson-xc:1.8.3" level="project" />
<orderEntry type="library" name="Maven: com.sun.jersey:jersey-server:1.9" level="project" />
<orderEntry type="library" name="Maven: asm:asm:3.1" level="project" />
<orderEntry type="library" name="Maven: log4j:log4j:1.2.17" level="project" />
<orderEntry type="library" name="Maven: net.java.dev.jets3t:jets3t:0.9.0" level="project" />
<orderEntry type="library" name="Maven: org.apache.httpcomponents:httpclient:4.1.2" level="project" />
<orderEntry type="library" name="Maven: org.apache.httpcomponents:httpcore:4.1.2" level="project" />
<orderEntry type="library" name="Maven: com.jamesmurty.utils:java-xmlbuilder:0.4" level="project" />
<orderEntry type="library" name="Maven: commons-lang:commons-lang:2.6" level="project" />
<orderEntry type="library" name="Maven: commons-configuration:commons-configuration:1.6" level="project" />
<orderEntry type="library" name="Maven: commons-digester:commons-digester:1.8" level="project" />
<orderEntry type="library" name="Maven: commons-beanutils:commons-beanutils:1.7.0" level="project" />
<orderEntry type="library" name="Maven: commons-beanutils:commons-beanutils-core:1.8.0" level="project" />
<orderEntry type="library" name="Maven: org.slf4j:slf4j-api:1.7.10" level="project" />
<orderEntry type="library" name="Maven: org.slf4j:slf4j-log4j12:1.7.10" level="project" />
<orderEntry type="library" name="Maven: org.codehaus.jackson:jackson-core-asl:1.9.13" level="project" />
<orderEntry type="library" name="Maven: org.codehaus.jackson:jackson-mapper-asl:1.9.13" level="project" />
<orderEntry type="library" name="Maven: org.apache.avro:avro:1.7.4" level="project" />
<orderEntry type="library" name="Maven: com.thoughtworks.paranamer:paranamer:2.3" level="project" />
<orderEntry type="library" name="Maven: org.xerial.snappy:snappy-java:1.0.4.1" level="project" />
<orderEntry type="library" name="Maven: com.google.protobuf:protobuf-java:2.5.0" level="project" />
<orderEntry type="library" name="Maven: com.google.code.gson:gson:2.2.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-auth:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.directory.server:apacheds-kerberos-codec:2.0.0-M15" level="project" />
<orderEntry type="library" name="Maven: org.apache.directory.server:apacheds-i18n:2.0.0-M15" level="project" />
<orderEntry type="library" name="Maven: org.apache.directory.api:api-asn1-api:1.0.0-M20" level="project" />
<orderEntry type="library" name="Maven: org.apache.directory.api:api-util:1.0.0-M20" level="project" />
<orderEntry type="library" name="Maven: org.apache.curator:curator-framework:2.7.1" level="project" />
<orderEntry type="library" name="Maven: com.jcraft:jsch:0.1.54" level="project" />
<orderEntry type="library" name="Maven: org.apache.curator:curator-client:2.7.1" level="project" />
<orderEntry type="library" name="Maven: org.apache.curator:curator-recipes:2.7.1" level="project" />
<orderEntry type="library" name="Maven: com.google.code.findbugs:jsr305:3.0.0" level="project" />
<orderEntry type="library" name="Maven: org.apache.htrace:htrace-core:3.1.0-incubating" level="project" />
<orderEntry type="library" name="Maven: org.apache.zookeeper:zookeeper:3.4.6" level="project" />
<orderEntry type="library" name="Maven: org.apache.commons:commons-compress:1.4.1" level="project" />
<orderEntry type="library" name="Maven: org.tukaani:xz:1.0" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-hdfs:2.7.4" level="project" />
<orderEntry type="library" name="Maven: commons-daemon:commons-daemon:1.0.13" level="project" />
<orderEntry type="library" name="Maven: io.netty:netty:3.6.2.Final" level="project" />
<orderEntry type="library" name="Maven: io.netty:netty-all:4.0.23.Final" level="project" />
<orderEntry type="library" name="Maven: xerces:xercesImpl:2.9.1" level="project" />
<orderEntry type="library" name="Maven: xml-apis:xml-apis:1.3.04" level="project" />
<orderEntry type="library" name="Maven: org.fusesource.leveldbjni:leveldbjni-all:1.8" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-client:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-mapreduce-client-app:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-mapreduce-client-common:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-yarn-client:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-yarn-server-common:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-mapreduce-client-shuffle:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-yarn-api:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-mapreduce-client-core:2.7.4" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-yarn-common:2.7.4" level="project" />
<orderEntry type="library" name="Maven: com.sun.jersey:jersey-client:1.9" level="project" />
<orderEntry type="library" name="Maven: org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.7.4" level="project" />
</component>
</module>
\ No newline at end of file
100, 200 300 400 500
200, 100 300 400
300, 100 200 400 500
400, 100 200 300
\ No newline at end of file
500, 100 300
600, 100
\ No newline at end of file
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>sn</groupId>
<artifactId>SocialNetwork</artifactId>
<packaging>jar</packaging>
<version>1.0</version>
<name>SocialNetwork</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/commonslogging/commons-logging -->
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.4</version>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<classesDirectory>target/classes/</classesDirectory>
<archive>
<manifest>
<mainClass>sn.SocialNetwork</mainClass>
<useUniqueVersions>false</useUniqueVersions>
<addClasspath>false</addClasspath>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
package sn;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob;
import org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.fs.FileSystem;
import java.io.IOException;
import java.util.Arrays;
public class SocialNetwork {
public static class GetFriendsPairMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] personAndfriends = line.split(", ");
String person = personAndfriends[0]; //获取person
String[] friends = personAndfriends[1].split(" "); //获取friends列表
for (String friend : friends) {
context.write(new Text(friend), new Text(person)); //输出friend和person配对
}
}
}
public static class FindCommonFriendsMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] friendAndpersons = line.split("\t");
String friend = friendAndpersons[0];
String[] persons = friendAndpersons[1].split(",");
Arrays.sort(persons);
for (int i = 0; i < persons.length - 1; i++) {
for (int j = i + 1; j < persons.length; j++) {
context.write(new Text("([" + persons[i] + "," + persons[j] + "],"), new Text(friend));
}
}
}
}
public static class GetFriendsPairReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text friend, Iterable<Text> persons, Context context) throws IOException, InterruptedException {
StringBuffer pairs = new StringBuffer();
for (Text person : persons) {
pairs.append(person).append(","); //对键值相同的键值对<A,B><A,D><A,F>...进行拼接
}
context.write(friend, new Text(pairs.toString())); //以<A B,D,F, ...>的形式写入到HDFS上,作为中间结果,注意中间是以制表符(\t)分隔开
}
}
public static class FindCommonFriendsReducer extends Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text person, Iterable<Text> friends, Context context) throws IOException, InterruptedException {
StringBuffer list = new StringBuffer();
list.append('[');
for (Text friend : friends) {
list.append(friend).append(","); //对key值相同的键值对进行拼接
}
String result = list.toString();
result = result.substring(0,result.length()-1);
result = result+']'+')';
context.write(person, new Text(result));
}
}
public static void main(String[] args) throws Exception {
Configuration conf1 = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf1, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("error");
System.exit(2); //如果获取到的参数少于两个,报错之后退出作业
}
Job job1 = Job.getInstance(conf1, "get friends pairs");
job1.setJarByClass(SocialNetwork.class);
job1.setMapperClass(GetFriendsPairMapper.class);
job1.setReducerClass(GetFriendsPairReducer.class);
job1.setInputFormatClass(TextInputFormat.class);
for (int i = 0; i < otherArgs.length - 2; ++i) {
FileInputFormat.addInputPath(job1, new Path(otherArgs[i]));
}
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
// 判断output文件夹是否存在,如果存在则删除
Path path1 = new Path(otherArgs[otherArgs.length - 2]);// 取输出目录参数
FileSystem fileSystem1 = path1.getFileSystem(conf1);// 根据path找到这个文件
if (fileSystem1.exists(path1)) {
fileSystem1.delete(path1, true);// true的意思是,就算output有东西,也一带删除
}
FileOutputFormat.setOutputPath(job1, new Path(otherArgs[otherArgs.length - 2]));
ControlledJob ctrlJob1 = new ControlledJob(conf1);
ctrlJob1.setJob(job1);
Configuration conf2 = new Configuration(true);
Job job2 = Job.getInstance(conf2, "find common friends");
job2.setJarByClass(SocialNetwork.class);
job2.setMapperClass(FindCommonFriendsMapper.class);
job2.setReducerClass(FindCommonFriendsReducer.class);
job2.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job2, new Path(otherArgs[otherArgs.length - 2]));
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
// 判断output文件夹是否存在,如果存在则删除
Path path2 = new Path(otherArgs[3]);
FileSystem fileSystem2 = path2.getFileSystem(conf2);
if (fileSystem2.exists(path2)) {
fileSystem2.delete(path2, true);
}
FileOutputFormat.setOutputPath(job2, new Path(otherArgs[otherArgs.length - 1]));
ControlledJob ctrlJob2 = new ControlledJob(conf2);
ctrlJob2.setJob(job2);
ctrlJob2.addDependingJob(ctrlJob1);
JobControl jobCtrl = new JobControl("Task: Common friends");
//添加到总的JobControl里,进行控制
jobCtrl.addJob(ctrlJob1);
jobCtrl.addJob(ctrlJob2);
System.out.println("Job Start!");
Thread thread = new Thread(jobCtrl);
thread.start();
while (true) {
if (jobCtrl.allFinished()) {
System.out.println(jobCtrl.getSuccessfulJobList());
jobCtrl.stop();
break;
}
}
}
}
package sn;
import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;
/**
* Unit test for simple App.
*/
public class SocialNetworkTest
extends TestCase
{
/**
* Create the test case
*
* @param testName name of the test case
*/
public SocialNetworkTest(String testName )
{
super( testName );
}
/**
* @return the suite of tests being tested
*/
public static Test suite()
{
return new TestSuite( SocialNetworkTest.class );
}
/**
* Rigourous Test :-)
*/
public void testApp()
{
assertTrue( true );
}
}
支持 Markdown
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册