Set up data science stack on ERISOne
ERISOne is a “computing cluster with a job scheduling system for batch jobs, remote desktops for graphical applications and GPUs running on a Linux OS” maintained by Partners Healthcare.
Ideally, users do not need to purchase and maintain compute workstations. Rather, we use Macbooks to connect to a cluster behind the Partners firewall, This is ideal for working on a shared data repository, especially since we analyze PHI.
Unfortunately, setting up a modern data science stack on ERISOne is challenging. Many software packages are old, and obtaining desired memory, GPU, and CPU resources on a compute node requires additional steps.
This guide walks you through setup of Zsh, git, and more. It assumes you requested and have an ERISOne account, and connect via SSH (see RISC guide here).
Follow the steps in order. For example, you should install
oh-my-zshbecause the commands for
oh-my-zshrequire a recent version of
ERISOne comes with
zsh 4.3.11, but the current version is 5.8. Users lack root access on ERISOne and cannot install
Note: you may need to first install
This means we have to install
zsh from source:
(base) -bash-4.1$ curl -L https://downloads.sourceforge.net/project/zsh/zsh/5.8/zsh-5.8.tar.xz > zsh.tar.xz (base) -bash-4.1$ tar xf zsh.tar.xz (base) -bash-4.1$ cd zsh (base) -bash-4.1$ ./configure --prefix="$HOME/local" \ CPPFLAGS="-I$HOME/local/include" \ LDFLAGS="-L$HOME/local/lib" (base) -bash-4.1$ make -j && make install
zsh is installed, let’s change our default logging shell. However, users on ERISOne do not have root, and thus cannot use
Add the following to
export PATH=$HOME/local/bin:$PATH export SHELL=`which zsh` [ -f "$SHELL" ] && exec "$SHELL" -l
source ~.bash_profile to make the changes take effect. Now
zsh will be activated each time you log in to ERISOne.
The next time you log in, you will be prompted to set up the
zsh shell. The prompt should look different:
Delete source files from your home directory:
[ab123@eris1n3]~% rm -rf ~/zsh*
If you log out and back in, your shell should look much improved:
git 1.7.1 by default, which was released in 2010. Subsequent installation of
oh-my-zsh involves a shell script that invokes
git clone, which was introduced in
git 1.7.7. Thus, we must upgrade
You can install
git from source, but there are many other dependencies, so doing this would take a lot of effort.
Instead we can get a sufficiently recent (albeit not most current) version of
git using the
load command of
module load git/2.17.0
Check if a more recent version of
MODULE_NAME is available via
module avail MODULE_NAME.
.zshrc loads newer versions of
The newest version of Anaconda on ERISOne (via
4.4.0-p3, whereas the current Linux version is
4.8.3. Ideally, one would use the existing Anaconda module on ERISOne and set up a new environment with the desired Python version and packages. Unfortunately, there is an error with older versions of Conda that prevents users from even creating new environments:
RemoveError: 'setuptools' is a dependency of conda and cannot be removed from conda's operating environment.
To set up the desired Python environment, install Miniconda following the instructions in my Linux setup post.
Like most Linux machines, ERISOne uses the bash shell by default. Therefore Conda will modify
/.bashrc with the appropriate path for you to call
conda. However, only
/.bash_profile is called when you log in to ERISOne.
To ensure Conda is on your path so you can call
echo 'export source ~/.bashrc' >> ~/.bash_profile
After you install Anaconda, log out, and log back in ERISOne, your command line prompt should reflect that you are now using the base environment:
Don’t forget to delete the Anaconda installer.
Interactive session on compute node
Request a compute node on
interact-big instead of running jobs on the login nodes (
bsub -q interact-big -Is -XF -n 4 $HOME/local/bin/zsh
Read more about compute nodes on ERISOne here.