目录
背景
北京IDC托管价格越来越贵,公司决定将机房迁移到环京地区。最近在参与公司机房迁移工作,主要负责实时计算和用户画像两个方面。用户画像任务还是很多的,在收尾阶段发现一个隐藏很深的任务,git上已经找不到源代码了,正常运行的任务只有一个jar包。尝试将jar包直接提交到新集群上,任务运行十分钟后抛出Exception,有可能代码里把HDFS的Nameservice直接写死了,需要反编译进行确认。
13-09-2021 09:56:03 CST dumpMongo INFO - Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: pink 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:707) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:650) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2643) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2680) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2662) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:379) 13-09-2021 09:56:03 CST dumpMongo INFO - at DumpMongoJob$.main(DumpMongoJob.scala:64) 13-09-2021 09:56:03 CST dumpMongo INFO - at DumpMongoJob.main(DumpMongoJob.scala) 13-09-2021 09:56:03 CST dumpMongo INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 13-09-2021 09:56:03 CST dumpMongo INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 13-09-2021 09:56:03 CST dumpMongo INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 13-09-2021 09:56:03 CST dumpMongo INFO - at java.lang.reflect.Method.invoke(Method.java:498) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) 13-09-2021 09:56:03 CST dumpMongo INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 13-09-2021 09:56:03 CST dumpMongo INFO - Caused by: java.net.UnknownHostException: pink 13-09-2021 09:56:03 CST dumpMongo INFO - ... 25 more 13-09-2021 09:56:03 CST dumpMongo INFO - 21/09/13 09:56:03 INFO spark.SparkContext: Invoking stop() from shutdown hook 13-09-2021 09:56:03 CST dumpMongo INFO - 21/09/13 09:56:03 INFO server.AbstractConnector: Stopped Spark@30e92cb9{HTTP/1.1,[http/1.1]}{0.0.0.0:4047}
Java class文件反编译
支持“.class”文件反编译的工具有很多,比如IDEA、JD-GUI、Recaf等。在解决上述问题时本人首选IDEA,具体步骤如下:
- 打开任意一个工程(建议新建),在src同级目录下建立“lib”文件夹并移动jar包到该目录
- 右键点击lib文件夹并在弹出的选项中选择“Add as Library”
添加完成后可以看到jar已经可以支持浏览了,找到异常中提示的DumpMongoJob(Scala)并反编译成java代码:
从反编译后的文件中可以看到,HDFS的Nameservice已经写死为固定值Pink:
SparkConf sparkConf = (new SparkConf()).setAppName(this.getClass().getSimpleName()); SparkSession spark = .MODULE$.builder().enableHiveSupport().config(sparkConf).getOrCreate(); OptionParser optionParser = new OptionParser(); optionParser.parse(args); Config conf = optionParser.getConf(); String mongoDbUri = (String)conf.getOptString("mongo.db.uri").get(); String execSql = (String)conf.getOptString("dim.exec.sql").get(); String output = (String)conf.getOptString("output.path").get(); FileSystem hdfs = FileSystem.get(new URI("hdfs://pink"), new Configuration()); Path p = new Path(output); if (hdfs.exists(p) && output.contains("/user/somebody/other_path")) { BoxesRunTime.boxToBoolean(hdfs.delete(p, true)); } else { BoxedUnit var10000 = BoxedUnit.UNIT; }
jclasslib修改字节码
定为具体原因后就需要对jar包中对应的.class文件进行编辑并二次打包,将jar包进行解压并将DumpMongoJob$.class加载到jclasslib bytecode viewer中:
通过查找可以定位到包含“pink”常量的字节码位置,点击编辑将“pink”改为目标集群名称最后保存即可。
编辑后替换原来的.class文件并再次打包,此时提交到集群上后发现运行正常,可以正常将数据落到HDFS上。
15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at SparkHadoopWriter.scala:78) finished in 159.193 s 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO scheduler.DAGScheduler: Job 1 finished: runJob at SparkHadoopWriter.scala:78, took 159.205425 s 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO io.SparkHadoopWriter: Job job_20210922195858_0015 committed. 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO server.AbstractConnector: Stopped Spark@790132f7{HTTP/1.1,[http/1.1]}{0.0.0.0:4042} 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.41.1:4042 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices 15-09-2021 20:02:23 CST dumpMongo INFO - (serviceOption=None, 15-09-2021 20:02:23 CST dumpMongo INFO - services=List(), 15-09-2021 20:02:23 CST dumpMongo INFO - started=false) 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO cluster.YarnClientSchedulerBackend: Stopped 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO memory.MemoryStore: MemoryStore cleared 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO storage.BlockManager: BlockManager stopped 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO spark.SparkContext: Successfully stopped SparkContext 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO util.ShutdownHookManager: Shutdown hook called 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ac0222b4-68a1-4a7d-8ec8-e05c8aee1907 15-09-2021 20:02:23 CST dumpMongo INFO - 21/09/15 20:02:23 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d6e79675-cad5-400b-985e-72842ed7eb48 15-09-2021 20:02:24 CST dumpMongo INFO - Process completed successfully in 275 seconds. 15-09-2021 20:02:24 CST dumpMongo INFO - Finishing job dumpMongo attempt: 0 at 1632312144837 with status SUCCEEDED
此外ingokegel还提供IDEA 插件版本的jclasslib bytecode viewer,插件版本的好处是不需要解压jar包和二次打包,编辑完成后直接点击保存图标即可。具体使用步骤:
- 选中需要编辑的“.class”文件;
- 菜单栏中选中“View”选项卡;
- 选择“Show Bytecode With jclasslib”;
编辑完成后记得保存再退出。
参考资料
https://github.com/ingokegel/jclasslib
https://plugins.jetbrains.com/plugin/9248-jclasslib-bytecode-viewer
转载请注明:雪后西塘 » jclasslib修改字节码并重打jar包