Skip to content
This repository was archived by the owner on Nov 16, 2019. It is now read-only.

Commit 19df500

Browse files
authored
Merge pull request #208 from arundasan91/patch-3
Adding Docker support for CaffeOnSpark
2 parents 054aa08 + 7ff5306 commit 19df500

File tree

7 files changed

+422
-0
lines changed

7 files changed

+422
-0
lines changed

docker/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# CaffeOnSpark Standalone Docker
2+
3+
Dockerfiles for both CPU and GPU builds are available in `standalone` folder. To use the CPU only version use the commands given. A GPU version of docker can be run using the command [`nvidia-docker`](https://github.com/NVIDIA/nvidia-docker) instead of `docker` using the `standalone/gpu` folder.
4+
5+
Dockerfiles for CPU build is provided in `standalone/cpu` folder. The image can be built by running:
6+
```
7+
docker build -t caffeonspark:cpu standalone/cpu
8+
```
9+
After the image is built, use `docker images` to validate.
10+
11+
## Launching CaffeOnSpark container
12+
Hadoop and Spark are essential requirements for CaffeOnSpark. To ensure that both process runs flawless, we have included `standalone/cpu/config/bootstrap.sh` script which must be run everytime the container is started.
13+
14+
To launch a container running CaffeOnSpark please use:
15+
```
16+
docker run -it caffeonspark:cpu /etc/bootstrap.sh -bash
17+
```
18+
19+
Now you have a working environment with CaffeOnSpark.
20+
21+
To verify installation, please follow [GetStarted_yarn](https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_yarn) guide from `Step 7`.

docker/standalone/cpu/Dockerfile

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Copyright 2016 Yahoo Inc.
2+
# Licensed under the terms of the Apache 2.0 license.
3+
# Please see LICENSE file in the project root for terms.
4+
#
5+
# This file is the dockerfile to setup caffeonspark cpu standalone version.
6+
7+
FROM ubuntu:14.04
8+
9+
RUN apt-get update && apt-get install -y software-properties-common
10+
RUN add-apt-repository ppa:openjdk-r/ppa
11+
RUN apt-get update && apt-get install -y --no-install-recommends \
12+
build-essential \
13+
vim \
14+
cmake \
15+
git \
16+
wget \
17+
libatlas-base-dev \
18+
libboost-all-dev \
19+
libgflags-dev \
20+
libgoogle-glog-dev \
21+
libhdf5-serial-dev \
22+
libleveldb-dev \
23+
liblmdb-dev \
24+
libopencv-dev \
25+
libprotobuf-dev \
26+
libsnappy-dev \
27+
protobuf-compiler \
28+
python-dev \
29+
python-numpy \
30+
python-pip \
31+
python-scipy \
32+
maven \
33+
unzip \
34+
zip \
35+
unzip \
36+
libopenblas-dev \
37+
openssh-server \
38+
openssh-client \
39+
libopenblas-dev \
40+
libboost-all-dev \
41+
openjdk-8-jdk
42+
43+
RUN rm -rf /var/lib/apt/lists/*
44+
45+
46+
# Passwordless SSH
47+
RUN ssh-keygen -y -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
48+
RUN ssh-keygen -y -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
49+
RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
50+
RUN cp /root/.ssh/id_rsa.pub ~/.ssh/authorized_keys
51+
52+
53+
# Apache Hadoop and Spark section
54+
RUN wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
55+
RUN wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
56+
57+
RUN gunzip hadoop-2.6.4.tar.gz
58+
RUN gunzip spark-1.6.0-bin-hadoop2.6.tgz
59+
RUN tar -xf hadoop-2.6.4.tar
60+
RUN tar -xf spark-1.6.0-bin-hadoop2.6.tar
61+
62+
RUN sudo cp -r hadoop-2.6.4 /usr/local/hadoop
63+
RUN sudo cp -r spark-1.6.0-bin-hadoop2.6 /usr/local/spark
64+
65+
RUN rm hadoop-2.6.4.tar spark-1.6.0-bin-hadoop2.6.tar
66+
RUN rm -rf hadoop-2.6.4/ spark-1.6.0-bin-hadoop2.6/
67+
68+
RUN sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
69+
RUN sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
70+
71+
# Environment variables
72+
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
73+
ENV HADOOP_HOME=/usr/local/hadoop
74+
ENV SPARK_HOME=/usr/local/spark
75+
ENV PATH $PATH:$JAVA_HOME/bin
76+
ENV PATH $PATH:$HADOOP_HOME/bin
77+
ENV PATH $PATH:$HADOOP_HOME/sbin
78+
ENV PATH $PATH:$SPARK_HOME/bin
79+
ENV PATH $PATH:$SPARK_HOME/sbin
80+
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
81+
ENV HADOOP_COMMON_HOME /usr/local/hadoop
82+
ENV HADOOP_HDFS_HOME /usr/local/hadoop
83+
ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
84+
ENV YARN_HOME /usr/local/hadoop
85+
ENV HADOOP_COMMON_LIB_NATIVE_DIR /usr/local/hadoop/lib/native
86+
ENV HADOOP_OPTS "-Djava.library.path=$HADOOP_HOME/lib"
87+
88+
# Clone CaffeOnSpark
89+
ENV CAFFE_ON_SPARK=/opt/CaffeOnSpark
90+
WORKDIR $CAFFE_ON_SPARK
91+
RUN git clone https://github.com/yahoo/CaffeOnSpark.git . --recursive
92+
93+
# Some of the Hadoop part extracted from "https://hub.docker.com/r/sequenceiq/hadoop-docker/~/dockerfile/"
94+
RUN mkdir $HADOOP_HOME/input
95+
RUN cp $HADOOP_HOME/etc/hadoop/*.xml $HADOOP_HOME/input
96+
RUN cd /usr/local/hadoop/input
97+
98+
# Copy .xml files.
99+
RUN cp ${CAFFE_ON_SPARK}/scripts/*.xml ${HADOOP_HOME}/etc/hadoop
100+
101+
# Format namenode and finish hadoop, spark installations.
102+
RUN $HADOOP_HOME/bin/hdfs namenode -format
103+
104+
RUN ls /root/.ssh/
105+
ADD config/ssh_config /root/.ssh/config
106+
RUN chmod 600 /root/.ssh/config
107+
RUN chown root:root /root/.ssh/config
108+
109+
ADD config/bootstrap.sh /etc/bootstrap.sh
110+
RUN chown root:root /etc/bootstrap.sh
111+
RUN chmod 700 /etc/bootstrap.sh
112+
113+
ENV BOOTSTRAP /etc/bootstrap.sh
114+
115+
RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_HOME/etc/hadoop/hadoop-env.sh
116+
RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/:' $HADOOP_HOME/etc/hadoop/hadoop-env.sh
117+
118+
# workingaround docker.io build error
119+
RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
120+
RUN chmod +x /usr/local/hadoop/etc/hadoop/*-env.sh
121+
RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
122+
123+
# fix the 254 error code
124+
RUN sed -i "/^[^#]*UsePAM/ s/.*/#&/" /etc/ssh/sshd_config
125+
RUN echo "UsePAM no" >> /etc/ssh/sshd_config
126+
RUN echo "Port 2122" >> /etc/ssh/sshd_config
127+
128+
RUN service ssh start && $HADOOP_HOME/etc/hadoop/hadoop-env.sh && $HADOOP_HOME/sbin/start-dfs.sh && $HADOOP_HOME/bin/hdfs dfs -mkdir -p /user/root
129+
RUN service ssh start && $HADOOP_HOME/etc/hadoop/hadoop-env.sh && $HADOOP_HOME/sbin/start-dfs.sh && $HADOOP_HOME/bin/hdfs dfs -put $HADOOP_HOME/etc/hadoop/ input
130+
131+
CMD ["/etc/bootstrap.sh", "-bash"]
132+
133+
# Hdfs ports
134+
EXPOSE 50010 50020 50070 50075 50090 8020 9000
135+
# Mapred ports
136+
EXPOSE 10020 19888
137+
#Yarn ports
138+
EXPOSE 8030 8031 8032 8033 8040 8042 8088
139+
#Other ports
140+
EXPOSE 49707 2122
141+
142+
143+
# Continue with CaffeOnSpark build.
144+
# ENV CAFFE_ON_SPARK=/opt/CaffeOnSpark
145+
WORKDIR $CAFFE_ON_SPARK
146+
# RUN git clone https://github.com/yahoo/CaffeOnSpark.git . --recursive
147+
RUN cp caffe-public/Makefile.config.example caffe-public/Makefile.config
148+
RUN echo "INCLUDE_DIRS += ${JAVA_HOME}/include" >> caffe-public/Makefile.config
149+
RUN sed -i "s/# CPU_ONLY := 1/CPU_ONLY := 1/g" caffe-public/Makefile.config
150+
RUN sed -i "s|CUDA_DIR := /usr/local/cuda|# CUDA_DIR := /usr/local/cuda|g" caffe-public/Makefile.config
151+
RUN sed -i "s|CUDA_ARCH :=|# CUDA_ARCH :=|g" caffe-public/Makefile.config
152+
RUN sed -i "s|BLAS := atlas|BLAS := open|g" caffe-public/Makefile.config
153+
RUN sed -i "s|TEST_GPUID := 0|# TEST_GPUID := 0|g" caffe-public/Makefile.config
154+
155+
RUN make build
156+
157+
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:$CAFFE_ON_SPARK/caffe-public/distribute/lib:$CAFFE_ON_SPARK/caffe-distri/distribute/lib
158+
159+
WORKDIR /root
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/bin/bash
2+
# Copyright 2016 Yahoo Inc.
3+
# Licensed under the terms of the Apache 2.0 license.
4+
# Please see LICENSE file in the project root for terms.
5+
#
6+
# This script starts hadoop dfs and yarn while the docker container is started.
7+
8+
: ${HADOOP_PREFIX:=/usr/local/hadoop}
9+
10+
$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
11+
12+
rm /tmp/*.pid
13+
14+
# installing libraries if any - (resource urls added comma separated to the ACP system variable)
15+
cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do echo == $cp; curl -LO $cp ; done; cd -
16+
17+
# adding necessary paths to environment variables (FIXME: These are already in Dockerfile, but does not work. So giving them explicitly.)
18+
export PATH=$PATH:$SPARK_HOME/bin
19+
export PATH=$PATH:$HADOOP_HOME/bin
20+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CAFFE_ON_SPARK/caffe-public/distribute/lib:$CAFFE_ON_SPARK/caffe-distri/distribute/lib
21+
22+
service ssh start
23+
$HADOOP_PREFIX/sbin/start-dfs.sh
24+
$HADOOP_PREFIX/sbin/start-yarn.sh
25+
26+
if [[ $1 == "-d" ]]; then
27+
while true; do sleep 1000; done
28+
fi
29+
30+
if [[ $1 == "-bash" ]]; then
31+
/bin/bash
32+
fi
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Copyright 2016 Yahoo Inc.
2+
# Licensed under the terms of the Apache 2.0 license.
3+
# Please see LICENSE file in the project root for terms.
4+
#
5+
# This file creates user specific ssh configuration
6+
#
7+
Host *
8+
UserKnownHostsFile /dev/null
9+
StrictHostKeyChecking no
10+
LogLevel quiet
11+
Port 2122

docker/standalone/gpu/Dockerfile

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Copyright 2016 Yahoo Inc.
2+
# Licensed under the terms of the Apache 2.0 license.
3+
# Please see LICENSE file in the project root for terms.
4+
#
5+
# This file is the dockerfile to setup caffeonspark cpu standalone version.
6+
7+
FROM nvidia/cuda:7.5-cudnn5-devel-ubuntu14.04
8+
9+
RUN apt-get update && apt-get install -y software-properties-common
10+
RUN add-apt-repository ppa:openjdk-r/ppa
11+
RUN apt-get update && apt-get install -y --no-install-recommends \
12+
build-essential \
13+
vim \
14+
cmake \
15+
git \
16+
wget \
17+
libatlas-base-dev \
18+
libboost-all-dev \
19+
libgflags-dev \
20+
libgoogle-glog-dev \
21+
libhdf5-serial-dev \
22+
libleveldb-dev \
23+
liblmdb-dev \
24+
libopencv-dev \
25+
libprotobuf-dev \
26+
libsnappy-dev \
27+
protobuf-compiler \
28+
python-dev \
29+
python-numpy \
30+
python-pip \
31+
python-scipy \
32+
maven \
33+
unzip \
34+
zip \
35+
unzip \
36+
libopenblas-dev \
37+
openssh-server \
38+
openssh-client \
39+
libopenblas-dev \
40+
libboost-all-dev \
41+
openjdk-8-jdk
42+
43+
RUN rm -rf /var/lib/apt/lists/*
44+
45+
46+
# Passwordless SSH
47+
RUN ssh-keygen -y -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
48+
RUN ssh-keygen -y -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
49+
RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
50+
RUN cp /root/.ssh/id_rsa.pub ~/.ssh/authorized_keys
51+
52+
53+
# Apache Hadoop and Spark section
54+
RUN wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz
55+
RUN wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
56+
57+
RUN gunzip hadoop-2.6.4.tar.gz
58+
RUN gunzip spark-1.6.0-bin-hadoop2.6.tgz
59+
RUN tar -xf hadoop-2.6.4.tar
60+
RUN tar -xf spark-1.6.0-bin-hadoop2.6.tar
61+
62+
RUN sudo cp -r hadoop-2.6.4 /usr/local/hadoop
63+
RUN sudo cp -r spark-1.6.0-bin-hadoop2.6 /usr/local/spark
64+
65+
RUN rm hadoop-2.6.4.tar spark-1.6.0-bin-hadoop2.6.tar
66+
RUN rm -rf hadoop-2.6.4/ spark-1.6.0-bin-hadoop2.6/
67+
68+
RUN sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
69+
RUN sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
70+
71+
# Environment variables
72+
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
73+
ENV HADOOP_HOME=/usr/local/hadoop
74+
ENV SPARK_HOME=/usr/local/spark
75+
ENV PATH $PATH:$JAVA_HOME/bin
76+
ENV PATH $PATH:$HADOOP_HOME/bin
77+
ENV PATH $PATH:$HADOOP_HOME/sbin
78+
ENV PATH $PATH:$SPARK_HOME/bin
79+
ENV PATH $PATH:$SPARK_HOME/sbin
80+
ENV HADOOP_MAPRED_HOME /usr/local/hadoop
81+
ENV HADOOP_COMMON_HOME /usr/local/hadoop
82+
ENV HADOOP_HDFS_HOME /usr/local/hadoop
83+
ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
84+
ENV YARN_CONF_DIR /usr/local/hadoop/etc/hadoop
85+
ENV YARN_HOME /usr/local/hadoop
86+
ENV HADOOP_COMMON_LIB_NATIVE_DIR /usr/local/hadoop/lib/native
87+
ENV HADOOP_OPTS "-Djava.library.path=$HADOOP_HOME/lib"
88+
89+
# Clone CaffeOnSpark
90+
ENV CAFFE_ON_SPARK=/opt/CaffeOnSpark
91+
WORKDIR $CAFFE_ON_SPARK
92+
RUN git clone https://github.com/yahoo/CaffeOnSpark.git . --recursive
93+
94+
# Some of the Hadoop part extracted from "https://hub.docker.com/r/sequenceiq/hadoop-docker/~/dockerfile/"
95+
RUN mkdir $HADOOP_HOME/input
96+
RUN cp $HADOOP_HOME/etc/hadoop/*.xml $HADOOP_HOME/input
97+
RUN cd /usr/local/hadoop/input
98+
99+
# Copy .xml files.
100+
RUN cp ${CAFFE_ON_SPARK}/scripts/*.xml ${HADOOP_HOME}/etc/hadoop
101+
102+
# Format namenode and finish hadoop, spark installations.
103+
RUN $HADOOP_HOME/bin/hdfs namenode -format
104+
105+
RUN ls /root/.ssh/
106+
ADD config/ssh_config /root/.ssh/config
107+
RUN chmod 600 /root/.ssh/config
108+
RUN chown root:root /root/.ssh/config
109+
110+
ADD config/bootstrap.sh /etc/bootstrap.sh
111+
RUN chown root:root /etc/bootstrap.sh
112+
RUN chmod 700 /etc/bootstrap.sh
113+
114+
ENV BOOTSTRAP /etc/bootstrap.sh
115+
116+
RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_HOME/etc/hadoop/hadoop-env.sh
117+
RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/:' $HADOOP_HOME/etc/hadoop/hadoop-env.sh
118+
119+
# workingaround docker.io build error
120+
RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
121+
RUN chmod +x /usr/local/hadoop/etc/hadoop/*-env.sh
122+
RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
123+
124+
# fix the 254 error code
125+
RUN sed -i "/^[^#]*UsePAM/ s/.*/#&/" /etc/ssh/sshd_config
126+
RUN echo "UsePAM no" >> /etc/ssh/sshd_config
127+
RUN echo "Port 2122" >> /etc/ssh/sshd_config
128+
129+
RUN service ssh start && $HADOOP_HOME/etc/hadoop/hadoop-env.sh && $HADOOP_HOME/sbin/start-dfs.sh && $HADOOP_HOME/bin/hdfs dfs -mkdir -p /user/root
130+
RUN service ssh start && $HADOOP_HOME/etc/hadoop/hadoop-env.sh && $HADOOP_HOME/sbin/start-dfs.sh && $HADOOP_HOME/bin/hdfs dfs -put $HADOOP_HOME/etc/hadoop/ input
131+
132+
CMD ["/etc/bootstrap.sh", "-bash"]
133+
134+
# Hdfs ports
135+
EXPOSE 50010 50020 50070 50075 50090 8020 9000
136+
# Mapred ports
137+
EXPOSE 10020 19888
138+
#Yarn ports
139+
EXPOSE 8030 8031 8032 8033 8040 8042 8088
140+
#Other ports
141+
EXPOSE 49707 2122
142+
143+
# Continue with CaffeOnSpark build.
144+
# ENV CAFFE_ON_SPARK=/opt/CaffeOnSpark
145+
WORKDIR $CAFFE_ON_SPARK
146+
# RUN git clone https://github.com/yahoo/CaffeOnSpark.git . --recursive
147+
RUN cp caffe-public/Makefile.config.example caffe-public/Makefile.config
148+
RUN echo "INCLUDE_DIRS += ${JAVA_HOME}/include" >> caffe-public/Makefile.config
149+
#RUN sed -i "s/# USE_CUDNN := 1/USE_CUDNN := 1/g" caffe-public/Makefile.config
150+
RUN sed -i "s|BLAS := atlas|BLAS := open|g" caffe-public/Makefile.config
151+
152+
RUN make build
153+
154+
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:$CAFFE_ON_SPARK/caffe-public/distribute/lib:$CAFFE_ON_SPARK/caffe-distri/distribute/lib
155+
156+
WORKDIR /root

0 commit comments

Comments
 (0)