Getting Started with mpi4cloud
First, create an account here.
Install
Then, install the CLI tool
- On Linux / macOS:
bash <(curl -s releases.mpi4cloud.com/install.sh)
- On Windows:
todo
Then, log in to your mpi4cloud account with the CLI:
mpi4cloud login
# Username: ...
# Password: ...
Launch an AWS cluster
First, you need to install and configure the AWS CLI so that mpi4cloud can launch, provision, and monitor instances for you. You should follow the installation instructions and then the configuration instructions.
Then, choose an instance type.
You can use this nice tool to
compare the different offerings.
For this tutorial, we will use 4 t2.micro instances.
mpi4cloud create \
--provider aws \
--instance-type t2.micro \
--nodes 4
Note: The default instance name is
mpi4cloud-cluster, but you can change it with the--cluster-nameflag. Similarly, the default AWS AMI used is the latest Ubuntu 20.04 image, but you can use a custom AMI with--image-id. You can runmpi4cloud run --helpfor more details on default options.
Launch an Azure cluster
todo
Launch a GCP cluster
todo
Run your first job
mpi4cloud attempts to create a seamless development
experience by zipping your local directory (ignoring
whatever Git ignores),
uploading it to all VMs in the cluster,
and then running mpirun to launch the job.
For example, to launch a job using all nodes in the cluster we just created (from any of the cloud providers), you'd run:
mpi4cloud run -np 4 python3 main.py
All arguments following run are passed to mpirun.
However, you will sometimes need to provide arguments to
mpi4cloud (i.e., to change the cluster name).
To do this, bash provides a "double dash"
for separating these options.
For example, to specify the name of the cluster to run the job on, you'd run:
mpi4cloud run --cluster-name my-cluster -- -np 4 python3 main.py
All flags before the -- are passed to mpi4cloud and the ones after
are passed to mpirun.
Note: the
-npflag tellsmpirunhow many processes to launch. You do not need to provide a hostfile; it is generated for you. Check out the mpirun documentation for more control over how processes are mapped to nodes (aka "hosts").
Provisioning a cluster
Oftentimes, you will need more dependencies than are installed on
the VM.
To do this, create a setup.sh file in the root directory
of your project and fill it with any installation commands.
For example:
# setup.sh
sudo apt update
sudo apt upgrade -y
Note: an MPI implementation and the mpi4cloud Python library along with some other useful packages for working with MPI in Python will automatically be installed for you.
Listing nodes in a cluster
You can list the nodes in a cluster with the following command:
mpi4cloud list-nodes --cluster-name ...
# ID TYPE STATE PUBLIC IP PRIVATE IP
# i-abcdefgh t2.micro running 10.0.0.45 10.0.3.10
# ...
You can omit the --cluster-name flag to list the nodes for the default cluster.
SSH into a node
First, use the list-nodes command to get the IDs of all nodes.
Then, you can SSH into a particular one with its ID:
mpi4cloud ssh <instance-id>
Working with Docker
In general, most projects can get away with running directly on the host OS,
installing the necessary dependencies with setup.sh.
However, you may need more fine-grain control over the dependencies in your application, especially if you are working with multiple collaborators or switching between on-prem and cloud environments.
mpi4cloud provides the run-docker command to launch a job using Docker
containers.
For example, to run a job using the latest PyTorch container, you would run:
mpi4cloud run-docker --nodes 4 -- pytorch/pytorch
To use your own Dockerfile,
you would replace the image name with ..
However, mpi4cloud will automatically look for a Dockerfile in the
root directory of your project, so you can just run:
mpi4cloud run-docker --nodes 4
Next Steps
To see examples of how to use mpi4cloud with a variety of tools leveraging model/data parallelism, check out the GitHub repo.
If you have issues running something or seek general advice, feel free to leave an issue.