一站式搭建Spark-Hadoop集群及Zeppelin
在使用 zeppelin 的时候,可能想要使用真实的 spark、hadoop集群。但是无论是上是spark集群、还是hadoop集群都是比较麻烦。
zeppelin 0.12.0 要求spark版本高于3.2,我以前一篇文章里模仿教程的做的spark-hadoop镜像,其中spark版本比较低,zeppelin使用不了。所以我又制作了一个镜像,内置了spark3.5.6、hadoop3.4.0,可以很方便部署spark、hadoop集群。
如果想要zeppelin容器能配合使用这个spark、hadoop集群,还需要做一些网络上的联通、配置共享等处理的。
(1) docker-compose-with-zeppelin.yml 文件内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
| version: '2'
services: spark: image: kongxr7/spark-hadoop:3.5.6-hadoop3.4.0 hostname: master environment: - SPARK_MODE=master - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no volumes: - hadoop_conf:/opt/hadoop/etc/hadoop - spark_lib:/opt/bitnami/spark ports: - '8090:8080' - '4040:4040' - '8088:8088' - '8042:8042' - '9870:9870' - '19888:19888' - '7077:7077' spark-worker-1: image: kongxr7/spark-hadoop:3.5.6-hadoop3.4.0 hostname: worker1 environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no ports: - '8081:8081' spark-worker-2: image: kongxr7/spark-hadoop:3.5.6-hadoop3.4.0 hostname: worker2 environment: - SPARK_MODE=worker - SPARK_MASTER_URL=spark://master:7077 - SPARK_WORKER_MEMORY=1G - SPARK_WORKER_CORES=1 - SPARK_RPC_AUTHENTICATION_ENABLED=no - SPARK_RPC_ENCRYPTION_ENABLED=no - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no - SPARK_SSL_ENABLED=no ports: - '8082:8081' zeppelin: image: apache/zeppelin:0.12.0 hostname: zeppelin depends_on: - spark environment: - SPARK_HOME=/opt/spark - HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop - YARN_CONF_DIR=/opt/hadoop/etc/hadoop volumes: - spark_lib:/opt/spark - hadoop_conf:/opt/hadoop/etc/hadoop:ro - zeppelin_notebooks:/opt/zeppelin/notebook - zeppelin_conf:/opt/zeppelin/conf ports: - '8089:8080'
volumes: hadoop_conf: driver: local spark_lib: driver: local zeppelin_notebooks: driver: local zeppelin_conf: driver: local
|
(2)一键启动
1
| docker-compose -f docker-compose-with-zeppelin.yml up -d
|

启动之后,就可以访问各组件了。
Spark UI: http://localhost:8090
HDFS UI: http://localhost:9870
YARN UI: http://localhost:8088
Zeppelin UI: http://localhost:8089