Skip to content

Getting Started with SambaNova

Accounts:

1. Get an account on the SambaNova (SN) system. Contact Venkat Vishwanath (venkat@anl.gov) for access.

2. You will first need to login to one of the MCS or CELS machines first and then ssh to the SambaNova system that has 2 nodes sm-01 and sm-02. Hostname for SambaNova is sm-01.cels.anl.gov and sm-02.cels.anl.gov

Documents:

SambaNova technical documents to read through. Part 1 is H/W and Part 2 is S/W

These are under NDA. Please don't share. If you are unable to access these documents, please reach out to Venkat/Murali for access.

There is a dedicated documentation portal, please reach out to Murali Emani (memani@anl.gov) for access.

First Steps:

After logging in, you need to activate SN virtual environment (venv) before running models.

  • Activating venv environment: source /opt/sambaflow/venv/bin/activate

  • Deactivating venv environment: deactivate

How to Run

To run an application on the SambaNova nodes, it has to be written in 'SambaFlow' which is similar to PyTorch. This software stack includes the compilers, runtime and SambaFlow Python SDK. It is to be noted not all operators are supported yet, they will be released in monthly releases. Support for Tensorflow is work in progress.

The workflow includes the following four steps to run a model. A high-level overview is mentioned here. Detailed information can be obtained from the official documentation.

1. Compile: Compiles the model and generates a .pef file. This file contains information on how to reconfigure the hardware like many compute and memory resources are required, and will be used in all subsequent steps. The pef files are usually saved in 'out' directory, it is advised to save it in a separate dir with '--output-folder' option

python myapp.py compile --pef-name="myapp.pef"

2. Test (optional): Runs a test both on the host CPU and SN node and will raise errors if any discrepancies are found. Pass the pef file generated above as the input.

python myapp.py test --pef="out/myapp/myapp.pef"

3. Run: This will run the application on SN nodes.

python myapp.py run --pef="out/myapp/myapp.pef"

4. Measure Performance: This step will report the measured performance. The parameters depend on the model and can include latency, samples/sec.

python myapp.py measure-performance --pef="out/myapp/myapp.pef

Other parameters can be found with --help command in any step, (ex. python myapp,py compile --help)

There are some sample programs at /opt/sambaflow/apps/ to try out.

These steps can be submitted in a single script via Slurm using sbatch command.

sbatch --output=<path>/output.log  submit-job.sh

The node and number of resources required can be passed as arguments to sbatch

sbatch --output=<path>/output.log -w <node: sm-01/sm-2> --cpus-per-task=4 --gres=rdu:1 submit-job.sh

For any other questions, please reach out to Venkat (venkat@anl.gov), Murali (memani@anl.gov).