目录
前言
日志是应用软件中不可缺少的部分,Apache的开源项目Log4j是一个功能强大的日志组件,提供方便的日志记录。在Apache网站:https://logging.apache.org/log4j/2.x/可以免费下载到Log4j最新版本的软件包和项目详细说明。
Spark 使用 log4j 作为日志记录工具。 默认配置是将所有日志写入到标准错误输出,这种方式对于批处理作业来说很好。 但是对于流式作业,最好使用RollingFileAppender按大小切割日志文件并仅保留几个最近的文件。
Log4j配置样例说明
log4j.rootLogger=INFO, rolling log4j.appender.rolling=org.apache.log4j.RollingFileAppender log4j.appender.rolling.layout=org.apache.log4j.PatternLayout log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n log4j.appender.rolling.maxFileSize=50MB log4j.appender.rolling.maxBackupIndex=5 log4j.appender.rolling.file=/var/log/spark/${vm.logging.name}.log log4j.appender.rolling.encoding=UTF-8 log4j.logger.org.apache.spark=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.com.w3sun.core=${vm.logging.level}
从上述配置可以看到log4j将在日志文件达到50MB时进行滚动并且在数量上只保留最近5个。日志文件会保存在/var/log/spark目录中,文件名从系统属性vm.logging.name中获取。此外还可以根据vm.logging.level属性设置com.w3sun.core包下日志的打印级别。另外如果将org.apache.spark包下日志的打印级别设置为WARN可以忽略来自Spark的详细日志。
Spark Log4j配置实践
Standalone 模式
standalone模式下,Spark Driver在提交job的机器上运行, Spark 每个Worker节点都会为该job运行一个Executor。因此,需要为Driver和Executor同时设置 log4j。
spark-submit --master spark://127.0.0.1:7077 --driver-java-options "-Dlog4j.configuration=file:/path/to/log4j-driver.properties -Dvm.logging.level=DEBUG" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j-executor.properties -Dvm.logging.name=myapp -Dvm.logging.level=DEBUG"
Spark on YARN
spark-submit --master yarn-cluster --files /path/to/log4j-spark.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties"
从配置中可以看到,Driver和Executor使用的是相同的配置文件,因为在 yarn-cluster 模式下,Driver与Executor都在 YARN 提供的容器中运行。
log4j-spark.properties样例
# Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN # Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=WARN log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR #Any custom class debug log4j.logger.com.w3sun.core=DEBUG #Netty classes log4j.logger.org.apache.spark.rpc.netty.NettyRpcEndpointRef=WARN,RollingAppender log4j.logger.org.apache.spark.rpc.RpcEndpointRef=WARN,RollingAppender log4j.logger.org.apache.spark.ExecutorAllocationManager=WARN,RollingAppender
影音资料
[youtube]RtNhoEP7Sb8[/youtube]
转载请注明:雪后西塘 » Spark中log4j配置方法