Installation¶

Prerequisites¶

Python 3.10 - 3.12
NVIDIA GPU with CUDA support (recommended)
TensorFlow 2.x

Install via pip (PyPI)¶

Install the latest release from PyPI:

pip install nextgen-wireless-dl-demos

Install via pip (GitHub)¶

Alternatively, install the latest development version directly from the repository:

pip install git+https://github.com/SrikanthPagadarai/nextgen-wireless-dl-demos.git

Install via Poetry¶

For development, clone the repository and install dependencies using Poetry:

git clone --recurse-submodules https://github.com/SrikanthPagadarai/nextgen-wireless-dl-demos.git
cd nextgen-wireless-dl-demos
poetry install

If you’ve already cloned without --recurse-submodules, initialize the submodule separately:

git submodule update --init --recursive

Dependencies¶

The project depends on:

Sionna (≥0.19.0) - NVIDIA’s library for link-level simulations
TensorFlow 2.x - Deep learning framework
Matplotlib - Visualization

GPU Setup¶

For optimal performance, ensure you have:

NVIDIA drivers installed
CUDA toolkit compatible with your TensorFlow version
cuDNN library

Refer to the TensorFlow GPU guide for detailed setup instructions.

Cloud and Container Setup¶

For running GPU-accelerated workloads in the cloud using Docker containers, follow these three steps in order:

Provision cloud infrastructure (Step 1)
Configure the host NVIDIA runtime (Step 2)
Build and run the Docker container (Step 3)

Step 1: Cloud Platform Setup (GCP)¶

This project provides helper scripts for provisioning GPU-enabled virtual machines on Google Cloud Platform (GCP). These scripts are included as a git submodule in the gcp-management/ directory (source repository: gcp-management).

Note

While GCP-specific tooling is provided, any cloud provider can be used that offers GPU instances (AWS, Azure, etc.) or your own hardware. The key requirements are an NVIDIA GPU with CUDA support and Docker with the NVIDIA Container Toolkit.

GCP Management Files Overview¶

The gcp-management/ submodule contains five files:

.env.example - Template for environment variables (copy to .env and fill in your values)
.gitignore - Ensures sensitive .env files are not committed
README.md - Quick reference for common commands
gcloud-setup.sh - Main script for creating CPU or GPU VMs
gcloud-reset.sh - Script to reset gcloud configuration and credentials

Environment Configuration¶

Navigate to the submodule directory and create a .env file from the template:

cd gcp-management
cp .env.example .env

Edit .env with your GCP project details:

PROJECT_ID=your-gcp-project-id
INSTANCE_NAME=your-vm-name
BUCKET_NAME=your-bucket-name
ZONE=us-central1-a
SSH_USER=$(whoami)

Authentication¶

Before running the setup script, authenticate with GCP:

gcloud auth login
gcloud config set project YOUR_PROJECT_ID

Creating a GPU VM¶

Make the setup script executable and run it:

chmod +x gcloud-setup.sh

# Create a GPU VM with NVIDIA L4
./gcloud-setup.sh \
  --mode gpu \
  --project YOUR_PROJECT_ID \
  --name YOUR_VM_NAME \
  --zone us-central1-a \
  --gpu-type nvidia-l4 \
  --gpu-count 1 \
  --gpu-machine g2-standard-4

The script supports multiple GPU types and configurations:

# T4 GPU on n1 machine
./gcloud-setup.sh --mode gpu --gpu-type nvidia-tesla-t4 --gpu-machine n1-standard-8

# A100 GPU with spot pricing (cost savings)
./gcloud-setup.sh --mode gpu --gpu-type nvidia-a100 --gpu-machine a2-highgpu-2g --provisioning SPOT

# CPU-only VM for development
./gcloud-setup.sh --mode cpu --cpu-type e2-standard-2

Connecting to the VM¶

SSH into your newly created VM:

gcloud compute ssh YOUR_USERNAME@YOUR_VM_NAME --zone=us-central1-a

Resetting GCP Configuration¶

To reset your gcloud credentials and configurations:

# Soft reset (revoke credentials, clear configs)
bash gcloud-reset.sh

# Full reset (removes ~/.config/gcloud entirely)
bash gcloud-reset.sh --nuke

Step 2: Host NVIDIA Runtime Setup¶

Before running GPU-accelerated Docker containers, you must configure the NVIDIA Container Toolkit on your host machine. This step is required whether you’re using a cloud VM or local hardware.

The host_nvidia_runtime_setup.sh script in the repository root automates this process:

chmod +x host_nvidia_runtime_setup.sh
./host_nvidia_runtime_setup.sh

What the Script Does¶

The script performs four operations:

Installs NVIDIA Container Toolkit - The toolkit allows Docker containers to access the host GPU.
```
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
```
Configures Docker runtime - Registers the NVIDIA runtime with Docker.
```
sudo nvidia-ctk runtime configure --runtime=docker
```

Enables graphics capabilities - Configures the runtime to expose compute, utility, and graphics capabilities to containers (required for Sionna’s ray tracing features).

sudo sed -i \
  's/^#\?\s*env = .*NVIDIA_DRIVER_CAPABILITIES.*/env = ["NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics"]/g' \
  /etc/nvidia-container-runtime/config.toml

Restarts Docker - Applies the configuration changes.
```
sudo systemctl restart docker
```

Verification¶

After running the setup script, verify the installation:

# Check NVIDIA driver is accessible
nvidia-smi

# Verify Docker can access the GPU
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi

Step 3: Docker Setup¶

The project includes a multi-stage Dockerfile supporting both CPU and GPU builds. The Docker configuration files are located in the repository root (Dockerfile) and the docker/ directory.

Docker Files Overview¶

Dockerfile - Multi-stage build supporting CPU and GPU configurations
docker/docker-instructions.md - Quick reference for build and run commands
docker/entrypoint.sh - Container entrypoint script that auto-detects OptiX for GPU acceleration

Building Docker Images¶

GPU Build (recommended for training and inference):

docker build \
  --build-arg BASE_IMAGE=nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04 \
  --build-arg ENABLE_GPU=1 \
  --build-arg TF_PACKAGE=tensorflow \
  --build-arg TF_VERSION=2.15.1 \
  -t nextgen-wireless-dl-demos:gpu-12.2 .

CPU Build (for development or systems without GPU):

docker build \
  --build-arg BASE_IMAGE=ubuntu:24.04 \
  --build-arg ENABLE_GPU=0 \
  --build-arg ENABLE_CUDA_CHECK=0 \
  --build-arg TF_PACKAGE=tensorflow-cpu \
  --build-arg TF_VERSION=2.15.1 \
  -t nextgen-wireless-dl-demos:cpu .

Build Arguments Reference¶

The Dockerfile accepts several build-time arguments:

BASE_IMAGE - Base Docker image (NVIDIA CUDA image for GPU, Ubuntu for CPU)
ENABLE_GPU - Set to 1 for GPU support, 0 for CPU-only
ENABLE_CUDA_CHECK - Set to 1 to verify TensorFlow CUDA version matches the base image
TF_PACKAGE - TensorFlow package name (tensorflow for GPU, tensorflow-cpu for CPU)
TF_VERSION - TensorFlow version to install

Running Containers¶

Run with GPU access:

docker run -it --gpus all \
  -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics \
  -e DRJIT_LIBOPTIX_PATH=/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 \
  -v "$PWD":/app -w /app \
  nextgen-wireless-dl-demos:gpu-12.2

Run CPU-only:

docker run -it \
  -v "$PWD":/app -w /app \
  nextgen-wireless-dl-demos:cpu bash

Entrypoint Script Behavior¶

The docker/entrypoint.sh script automatically detects OptiX availability at container startup. It searches for libnvoptix.so.1 in common locations and configures the Sionna/Mitsuba backend accordingly:

GPU mode (cuda_ad_rgb): Used when OptiX is found, enabling GPU-accelerated ray tracing
CPU mode (llvm_ad_rgb): Fallback when OptiX is not available

The script outputs which mode is selected at container startup, for example:

[entrypoint] OptiX found at: /usr/lib/x86_64-linux-gnu/libnvoptix.so.1 -> using CUDA (cuda_ad_rgb)

Running Demo Scripts in Container¶

Once inside the container, run demos using Python module syntax:

# Training
python -m demos.dpd.training_nn
python -m demos.mimo_ofdm_neural_receiver.training
python -m demos.pusch_autoencoder.training

# Inference
python -m demos.dpd.inference
python -m demos.mimo_ofdm_neural_receiver.inference
python -m demos.pusch_autoencoder.inference