Install Steps

Install

Shell script

You can use this shell shell script to install everything.

sh –c “$(wget https://pku-ahs.github.io/tutorial/en/master/_downloads/9064601015f9cd5e747a641dbdacf3aa/install_ahs.sh –O -)”
source ~/.bashrc

The shell script is tested under Ubuntu:20.04LTS. If you use another OS, or if you use Anaconda or Virtualenv for python, you may need to modify the script yourself. For windows users, it is best to use WSL.

Docker

You can pull our docker. We had everything prepared, configured and installed for you.

docker pull ericlyun/ahsmicro:latest
docker run –it ahsmicro:latest /bin/bash

Requirement

Apt

python3
python3-pip
git
llvm-9
cmake
build-essential
make
autoconf
automake
scons
libboost-all-dev
libgmp10-dev
libtool
default-jdk
csvtool

Pip

numpy
decorator
attrs
tornado
psutil
xgboost
cloudpickle
tensorflow
tqdm
IPython
botorch
jinja2
pandas
scipy
scikit-learn
plotly

Sbt

echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install sbt

Git

git clone --recursive -b micro_tutorial https://github.com/pku-liang/HASCO.git
git clone --recursive -b micro_tutorial https://github.com/pku-liang/TENET.git
git clone https://github.com/KnowingNothing/FlexTensor-Micro.git
git clone -b demo https://github.com/pku-liang/TensorLib.git

Configure & Compile

Hasco

cd ./ HASCO
bash ./install.sh

# Settings
vim ~/.bashrc
# append:
# export TVM_HOME=<install_dir>/HASCO/src/tvm
# export AX_HOME=<install_dir>/HASCO/src/Ax
# export PYTHONPATH=$TVM_HOME/python:$AX_HOME:${PYTHONPATH}
source ~/.bashrc

TENET

cd ./TENET
bash ./init.sh
vim ~/.bashrc
# append:
# export LD_LIBRARY_PATH=<install_dir>/TENET/external/lib:$LD_LIBRARY_PATH
source ~/.bashrc

cd TENET
make cli
make hasco

Dockerfile

The size of the docker is about 7G. If you find it difficult to pull it due to its size, you can run the following Dockerfile to build the docker by yourself.

# syntax=docker/dockerfile:1
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninterative

RUN apt-get update \
    && apt-get -y -q install git sudo vim python3 python3-pip llvm-9 cmake build-essential make autoconf automake scons libboost-all-dev libgmp10-dev libtool curl default-jdk csvtool \
    && pip3 install tensorflow decorator attrs tornado psutil xgboost cloudpickle tqdm IPython botorch jinja2 pandas scipy scikit-learn plotly \
    && echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list \
    && echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list \
    && curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add \
    && sudo apt-get update \
    && sudo apt-get -y -q install sbt \
    && mkdir AHS \
    && cd AHS \
    && git clone --recursive -b micro_tutorial https://github.com/pku-liang/HASCO.git \
    && git clone --recursive -b micro_tutorial https://github.com/pku-liang/TENET.git \
    && git clone -b demo https://github.com/pku-liang/TensorLib.git \
    && git clone https://github.com/KnowingNothing/FlexTensor-Micro.git \
    && cd HASCO \
    && bash ./install.sh \
    && cd ../TENET \
    && bash ./init.sh

Run

HASCO

Config

vim src/codesign/config.py

mastro_home = "<install_dir>/HASCO/src/maestro"
tenet_path = "<install_dir>/TENET/bin/HASCO_Interface"

tenet_params = {
    "avg_latency":16 # average latency for each computation
    "f_trans":12 # energy consume for each element transfered
    "f_work":16 # energy consume for each element in the workload
}

tensorlib_home = "<install_dir>/TensorLib"
tensorlib_main = "tensorlib.ParseJson"

Python API

python3 testbench/co_mobile_conv.py
python3 testbench/co_resnet_gemm.py
...

CLI

cd HASCO
./hasco.py -h
# Run a GEMM intrinsic with MobileNetV2 benchmark
./hasco.py -i GEMM -b MobileNetv2 -f gemm_example.json -l 1000 -p 20 -a 0

Results:

rst/MobileNetV2_CONV.csv config of best design for each constraint, view with column -s, -t < MobileNetV2_CONV.csv
rst/software/MobileNetV2_CONV_* tvm IR for each design
rst/hardware/CONV_*.json TensorLib config for each design
rst/hardware/CONV_*.v TensorLib generated Verilog

TENET

cd TENET

# Help Text
./bin/tenet -h

# Run a KC-systolic dataflow
./bin/tenet -p ./dataflow_example/pe_array.p -s ./dataflow_example/conv.s -m ./dataflow_example/KC_systolic_dataflow.m -o output.csv --all

# Run a OxOy dataflow
./bin/tenet -p ./dataflow_example/pe_array.p -s ./dataflow_example/conv.s -m ./dataflow_example/OxOy_dataflow.m -o output.csv --all

# Run all layers in MobileNet
./bin/tenet -e ./network_example/MobileNet/config -d ./network_example -o output.csv --all

Result:output.csv

TensorLib

cd TensorLib

# Optional, download the requirements from MAVEN, so that the rest instructions runs faster
sbt compile

# Examples of Scala APIs
sbt "runMain tensorlib.Example_GenConv2D"

sbt "runMain tensorlib.Example_GenGEMM"

# Examples of JSON interface
sbt "runMain tensorlib.ParseJson ./examples/conv2d.json ./output/conv2d.v"

sbt "runMain tensorlib.ParseJson ./examples/gemm.json ./output/gemm.v"

# Testing the result
sbt "runMain tensorlib.Test_Runner_Gemm"

Result:

Scala Interface: PEArray.v

ParseJson: the second argument you specified.

FlexTensor

cd FlexTensor-Micro
export PYTHONPATH=$PYTHONPATH:/path/to/FlexTensor-Micro
cd FlexTensor-Micro/flextensor/tutorial

# First, CPU experiments
cd conv2d_llvm

# run flextensor
python optimize_conv2d.py --shapes res --target llvm --parallel 8 --timeout 20 --log resnet_config.log

# run test
python optimize_conv2d.py --test resnet_optimize_log.txt

# run baseline
python conv2d_baseline.py --type tvm_generic --shapes res --number 100

# run plot
python plot.py

# Next, GPU experiments
cd ../conv2d_cuda

# run flextensor
python optimize_conv2d.py --shapes res --target cuda --parallel 4 --timeout 20 --log resnet_config.log

# run test
python optimize_conv2d.py --test resnet_optimize_log.txt

# run baseline
python conv2d_baseline.py --type pytorch --shapes res --number 100

# run plot
python plot.py

# At last, VNNI experiments
cd ../gemm_vnni

# run flextensor (cascadelake)
python optimize_gemm.py --target "llvm -mcpu=cascadelake" --target_host "llvm -mcpu=cascadelake" --parallel 8 --timeout 20 --log gemm_config.log --dtype int32

# run flextensor (skylake)
python optimize_gemm.py --target "llvm -mcpu=skylake-avx512" --target_host "llvm -mcpu=skylake-avx512" --parallel 8 --timeout 20 --log gemm_config.log

# run test
python optimize_gemm.py --test gemm_optimize_log.txt

# run baseline
python gemm_baseline.py --type numpy --number 100

# run plot
python plot.py