Skip to content

๐Ÿ“ฆ Introduction to NVIDIA Jetson

Author: Dr. Kaikai Liu, Ph.D.
Position: Associate Professor, Computer Engineering
Institution: San Jose State University
Contact: kaikai.liu@sjsu.edu

๐Ÿ” What is NVIDIA Jetson?

๐Ÿงญ Overview of NVIDIA Jetson

NVIDIA Jetson is a series of small, powerful computers (system-on-modules and developer kits) designed for edge AI, robotics, and embedded systems. It brings the power of NVIDIAโ€™s GPU architecture to low-power, compact platforms, enabling real-time computer vision, deep learning inference, and autonomous decision-making on devices deployed outside the data center. - Launched: 2014, starting with the Jetson TK1. - Key Purpose: To bring GPU-accelerated computing to embedded and edge devices for applications such as robotics, drones, autonomous vehicles, and AI at the edge.

๐Ÿ“œ Historical Evolution of NVIDIA Jetson

Release Date Product Highlights
Mar 2014 Jetson TK1 First Jetson board; Tegra K1 chip with Kepler GPU.
Nov 2015 Jetson TX1 Introduced Maxwell GPU and enhanced CUDA support.
Mar 2017 Jetson TX2 Pascal GPU; improved efficiency and performance.
Sep 2018 Jetson AGX Xavier Volta GPU with Tensor Cores (~32 TOPS); industrial-grade AI.
Mar 2019 Jetson Nano Developer Kit Affordable AI platform (~0.5โ€ฏTOPS), ideal for education and prototyping.
May 2020 Jetson Xavier NX Volta GPU (~21โ€ฏTOPS); compact and powerful for edge applications.
Apr 2022 Jetson AGX Orin Developer Kit Ampere GPU (~275โ€ฏTOPS); debuted in MLPerf benchmarks.
Dec 2022 Jetson AGX Orin Production Module (32โ€ฏGB) Production version began shipping.
Sep 2022 Jetson Orin Nano Developer Kit Entry-level Orin (~40โ€ฏTOPS) announced at GTC.
Dec 17, 2024 Jetson Orin Nano Super Dev Kit Enhanced Orin Nano with 67 INT8 TOPS (1.7ร— generative AI boost).

NVIDIA Unveils Its Most Affordable Generative AI Supercomputer The new NVIDIA Jetson Orin Nano Super Developer Kit, which fits in the palm of a hand, provides everyone from commercial AI developers to hobbyists and students, gains in generative AI capabilities and performance. And the price is now $249, down from $499. It delivers as much as a 1.7x leap in generative AI inference performance, a 70% increase in performance to 67 INT8 TOPS, and a 50% increase in memory bandwidth to 102GB/s compared with its predecessor. Introduction Video

๐Ÿ”— Relation to Other NVIDIA Platforms

Platform Chipset Use Case
Jetson Nano Maxwell Entry-level AI and CV
Jetson Xavier NX Volta Intermediate robotics, drones
Jetson Orin Nano Ampere Education, AI, edge compute
NVIDIA DRIVE AGX Orin Autonomous driving compute
Switch 2 (rumor) Custom Orin High-performance gaming console
  • The Orin family spans from Jetson to automotive and even rumored consumer devices like the Nintendo Switch 2.
  • Jetson Orin Nano shares architectural DNA with NVIDIA DRIVE Orin, used in autonomous vehicles.

โš–๏ธ Performance Comparison with Other Platforms

Device/Chipset AI Throughput (TOPS) Power (W) Notes
Jetson Orin Nano (8GB) \~40 TOPS 15W Optimized for edge AI and inference
Apple M2 \~15 TOPS (NPU est.) 20Wโ€“30W General-purpose SoC with ML acceleration
Intel Core i7 (12th Gen) \~1โ€“2 TOPS (CPU only) 45W+ High compute, poor AI power efficiency
Raspberry Pi 5 <0.5 TOPS 5โ€“7W General ARM SBC, no dedicated AI engine
  • Jetson Orin Nano provides a highly efficient balance of AI compute and power usage, ideal for on-device inference.
  • It outperforms embedded CPUs and SBCs while being more power-efficient than traditional desktops.

๐Ÿ“ฆ Jetson Modules and Their GPU Architecture

Jetson modules use cut-down versions of NVIDIAโ€™s main GPU architectures (Kepler, Maxwell, Pascal, Volta, Ampere, and now Blackwell) that are optimized for power efficiency, thermal limits, and edge deployment.

Jetson Module GPU Architecture Related Desktop GPU Series
Jetson TK1 Kepler GTX 600 / 700 Series
Jetson TX1 Maxwell GTX 900 Series
Jetson TX2 Pascal GTX 10 Series
Jetson AGX Xavier Volta Tesla V100-class (with Tensor Cores)
Jetson Orin Series Ampere RTX 30 Series / A100-class
(Future) Jetson Blackwell Blackwell (Expected) RTX 50 / B100-class GPUs

Jetson shares the CUDA, cuDNN, TensorRT, and DeepStream SDK software stacks with desktop and server-class GPUs, allowing AI/vision models developed in the cloud or lab to scale down for embedded inference.

The Jetson Orin Nano brings the powerful Ampere architecture to embedded AI platforms. With Ampereโ€™s Tensor Cores and optimized power/performance, Jetson Orin Nano can run modern transformers, YOLOv8, and vision-language modelsโ€”right on the edge. - GPU: 512-core Ampere GPU with 16 Tensor Cores - AI Performance: Up to 40 TOPS (INT8), or 67 TOPS on the Orin Nano Super - CPU: 6-core ARM Cortex-A78AE - Memory: 4GB or 8GB LPDDR5 - Target Use Cases: Robotics, smart cameras, low-power edge AI

โš™๏ธ Architecture Comparison: Desktop / Data Center

Architecture GPUs / Chips Precision Support Tensor Core Gen Memory Bandwidth Notes
Kepler GTX 600 / Jetson TK1 FP32 N/A ~192 GB/s First unified memory
Maxwell GTX 900 / Jetson TX1 FP32 N/A ~200 GB/s Energy efficiency focus
Pascal GTX 10 / Jetson TX2 FP32, FP16 None ~300 GB/s Deep learning training begins
Volta Tesla V100 / Xavier FP32, FP16, INT8 1st Gen ~900 GB/s (HBM2) Introduced Tensor Cores
Ampere RTX 30xx / A100 / Orin FP32, FP16, TF32, INT8 3rd Gen 1.5 TB/s (A100), 204 GB/s (Orin) TF32 and structured sparsity
Ada Lovelace RTX 40xx FP8, FP16, INT8 4th Gen ~1 TB/s Optimized for raster + transformer
Blackwell RTX 50xx, B100, GB200 FP8, TF32, INT4 5th Gen 1.8โ€“3.0 TB/s (HBM3E) AI fusion, FP8/INT4 LLM inference

๐Ÿง  Introduction to GPU Architecture

A Graphics Processing Unit (GPU) is a parallel processor optimized for data-parallel throughput computing. Unlike CPUs which have a handful of powerful cores optimized for control flow and single-threaded performance, GPUs feature many simpler cores that execute instructions on SIMD or SIMT (Single Instruction, Multiple Threads) principlesโ€”ideal for vectorizable and matrix-heavy workloads like:

  • Deep neural network inference
  • Image and signal processing
  • Linear algebra (matrix multiplication, convolutions)
  • Physics and fluid dynamics simulations

A GPU is composed of several Streaming Multiprocessors (SMs), each containing:

  • CUDA Cores: Scalar ALUs for FP32/INT32 operations
  • Tensor Cores: Fused multiply-accumulate (FMA) engines for low-precision matrix ops (e.g., FP16/INT8/INT4)
  • Warp Scheduler: Dispatches 32-thread warps to available execution units
  • Register Files & Shared Memory: On-chip fast memory for intra-thread block communication
  • Special Function Units: For transcendental math like sin, cos, exp, rsqrt

GPUs are designed with a non-uniform memory hierarchy to balance throughput and latency:

  • Global Memory (DRAM): High-latency, high-bandwidth (e.g., LPDDR5 on Jetson, HBM on data center GPUs)
  • Shared Memory / L1 Cache: Low-latency memory within SMs for intra-thread block comms
  • L2 Cache: Shared across SMs; allows memory coalescing
  • Texture/Constant Memory: Specialized caches for spatial or read-only access

Bandwidth is often the bottleneck in GPU computing, not ALU count. Efficient memory coalescing and reuse (e.g., tiling, blocking) are key to performance.

NVIDIA GPUs follow a SIMT (Single Instruction, Multiple Threads) model:

  • Threads are grouped into warps (32 threads)
  • Each warp executes the same instruction path; divergence (e.g., if branches) leads to warp serialization
  • Multiple warps and thread blocks are scheduled per SM

Execution granularity is fine-tuned through occupancy: the ratio of active warps to maximum supported warps on an SM.

The Streaming Multiprocessor (SM) is the fundamental hardware unit in NVIDIA GPUs responsible for executing parallel instructions. It encapsulates the resources necessary to support thousands of concurrent threads, and its microarchitecture directly determines latency hiding, throughput, and occupancy. Each GPU consists of multiple SMs (e.g., 16โ€“128+), and each SM contains:

Component Description
CUDA Cores (ALUs) Scalar processors for FP32, INT32, and logical ops
Tensor Cores Matrix-multiplyโ€“accumulate units for FP16, BF16, INT8, and sparsity-optimized operations
Warp Scheduler Dispatches one or more warps per cycle to execution pipelines
Instruction Dispatch Units Decodes and routes instructions to functional units
Shared Memory / L1 Cache Programmable, low-latency memory for inter-thread communication
Register File Stores private per-thread variables (e.g., 64K registers per SM)
Special Function Units (SFUs) Handles transcendental math like exp, sin, rsqrt
Load/Store Units Handles memory transactions to/from global/local memory

Each SM contains Tensor Coresโ€”specialized FMA (fused multiply-accumulate) units capable of processing small matrices at very high throughput.

  • Operate on 4ร—4 or 8ร—8 matrices internally.
  • Support mixed-precision input/output (FP16, INT8, FP8, TF32).
  • Enable high-throughput operations for convolutions, transformers, and matrix multiplications.

Example: On Jetson Orin Nano (Ampere): - Each SM has 1 Tensor Core - Each Tensor Core processes 64 FP16 or 128 INT8 FMA ops per cycle - With 16 SMs ร— 128 INT8 ops, theoretical peak = ~32K ops/cycle

Each SM has a large register file (e.g., 64 KB per SM) and shared memory / L1 cache (up to 128 KB depending on configuration).

  • Registers are used for fast local thread variables.
  • Shared memory is explicitly managed by the programmer and ideal for:
  • Tiled matrix multiplication
  • Reductions
  • Communication across threads in a block

Proper register allocation and shared memory usage are critical for occupancyโ€”too many registers per thread can limit the number of resident warps.

Example: SM Configuration on Jetson Orin Nano

Feature Value
SMs 16
CUDA Cores per SM 32
Tensor Cores per SM 1
Total CUDA Cores 512
Warp Schedulers per SM 1
Registers per SM 64K
Shared Memory per SM 64โ€“128 KB
FP16/INT8 Tensor Ops Accelerated by dedicated tensor units
Max Warps per SM 64
Max Threads per SM 2048

๐Ÿ“ฆ Execution Granularity: Threads, Warps, and Thread Blocks

A single SM can hold multiple warps from multiple thread blocks. The GPU scheduler dynamically switches between warps to hide memory and instruction latency. - Thread: Basic unit of execution; executes the kernelโ€™s code independently. - Warp: Group of 32 threads executed in SIMT fashion (Single Instruction, Multiple Threads). - Thread Block: Group of warps scheduled together and sharing resources like shared memory.

Example: When one warp stalls on a memory load, another ready warp is dispatched without pipeline stalls.

Each SM contains multiple warp schedulers (e.g., 4 in Ampere SMs), which issue instructions per cycle from active warps to the relevant execution pipelines. Warp scheduling is round-robin or greedy-then-oldest, depending on architecture. Execution Pipelines (Ampere example):

Pipeline Operations Handled
FP32 Units Scalar arithmetic (add, mul)
INT Units Integer math, bitwise logic
Tensor Cores Fused matrix ops (e.g. D = Aร—B + C)
SFUs sin, exp, log, sqrt, etc.
LD/ST Units Memory read/write transactions
Branch Units Handles divergence and predication

Up to 4 instructions from different warps can be issued per cycle per SM, depending on available resources.

GPUs do not use traditional out-of-order execution. Instead, they rely on:

  • Thread-level parallelism (TLP): Multiple warps in-flight per SM
  • Warp-level parallelism (WLP): Warp interleaving masks memory/instruction latency

Occupancy = (Active warps per SM) / (Maximum warps per SM)

  • Higher occupancy helps hide memory latency
  • Too high occupancy can lead to register pressure or shared memory contention

๐Ÿ”น NVIDIA's official software stack for Jetson

NVIDIA's official software stack for Jetson, includes:

  • Ubuntu 20.04
  • CUDA Toolkit
  • cuDNN (Deep Neural Network library)
  • TensorRT (optimized inference engine)
  • OpenCV and multimedia APIs

๐Ÿš€ CUDA, cuDNN, TensorRT Comparison and Modern GPU Architectures: - ๐Ÿ”น CUDA (Compute Unified Device Architecture): Parallel computing platform and programming model that allows developers to harness the power of NVIDIA GPUs for general-purpose computing. - ๐Ÿ”น TensorRT: High-performance deep learning inference optimizer and runtime engine. Used to accelerate models exported from PyTorch or ONNX. - ๐Ÿ”น cuDNN: CUDA Deep Neural Network library: provides optimized implementations of operations such as convolution, pooling, and activation for deep learning.

๐Ÿ“ฆ Layered Abstraction for GPU AI Inference

Layer Tool Purpose
High-Level TensorRT Optimized deployment, quantization, engine runtime
Mid-Level cuDNN Primitives for DL ops (Conv, Pool, RNN, etc.)
Low-Level CUDA General GPU programming with warp/thread/memory control

โš™๏ธ Jetson Orin Nano Super Developer Kit

Jetson Orin Nano Super Developer Kit

Module Specifications

The Jetson Orin Nano 8GB Module features: - Architecture: NVIDIA Ampere with 1024 CUDA cores and 32 tensor cores - AI Performance: Up to 67 INT8 TOPS - Memory: 8GB 128-bit LPDDR5 (102GB/s memory bandwidth) - CPU: 6-core Armยฎ Cortexยฎ-A78AE v8.2 64-bit (1.7GHz) with 1.5MB L2 + 4MB L3 - Power Range: 7Wโ€“25W

You can flash the base L4T BSP using SDK Manager on any of these storage media: - SD card slot (1) - External NVMe (2280-size on slot 10, 2230-size on slot 11) - USB drive on any USB port (4 or 6)

Feature Value
Model Jetson Orin Nano
Form Factor 69.6mm ร— 45mm
JetPack SDK Ubuntu 20.04 + CUDA, cuDNN, TensorRT
IO Support GPIO, I2C, SPI, UART, MIPI CSI
Variants 4GB RAM (5W) and 8GB RAM (7-15W)
Storage microSD / M.2 NVMe SSD support

Key Components of the Carrier Board

  • Camera Connectors:
  • 2ร— MIPI CSI-2 camera connectors (0.5mm pitch 22-pin flex connectors)
  • Compatible with 15-pin connector (like Raspberry Pi Camera Module v2) using a 15-pin to 22-pin conversion cable
  • Supports: CAM0: CSI 1 ร—2 lane, CAM1: CSI 1 ร—2 lane or 1 ร—4 lane

  • Storage Expansion:

  • M.2 Key M slot with ร—4 PCIe Gen3
  • M.2 Key M slot with ร—2 PCIe Gen3
  • M.2 Key E slot

  • USB Connectivity:

  • 4ร— USB 3.2 Gen2 Type-A ports
  • USB Type-C port for UFP (cannot output display signal)

    • Host mode: Functions as downstream-facing port (DFP), like the Type-A ports
    • Device mode: Connects to PC as USB Mass Storage, Serial, and Ethernet (RNDIS) device (Jetson IP: 192.168.55.1)
    • Recovery mode: Used for flashing Jetson from PC
  • Other Interfaces:

  • Gigabit Ethernet port
  • DisplayPort: 1ร— DP 1.2 (+MST) connector
  • 40-pin expansion header (UART, SPI, I2S, I2C, GPIO)
  • 12-pin button header and 4-pin fan header
  • DC power jack for 19V power input

  • Dimensions: 103mm ร— 90.5mm ร— 34.77mm

Carrier Board Connectors

Mark. Name Note
1 microSD card slot
2 40-pin Expansion Header
3 Power Indicator LED
4 USB-C port For data only
5 Gigabit Ethernet Port
6 USB 3.2 Gen2 Type-A ports (ร—4) 10Gbps
7 DisplayPort Output Connector
8 DC Power Jack 5.5mm x 2.5mm
9 MIPI CSI Camera Connectors (x2) 22pin, 0.5mm pitch
10 M.2 Slot (Key-M, Type 2280) PCIe 3.0 x4
11 M.2 Slot (Key-M, Type 2230) PCIe 3.0 x2
12 M.2 Slot (Key-E, Type 2230) (populated)

40-pin Expansion Header

40-pin Expansion Header Pinout

References

๐Ÿงช First Boot on SSD

  1. Connect Jetson to the Monitor:
  2. If still plugged, remove the jumper from header (that was used to put it in Forced Recovery mode)
  3. Connect the DisplayPort cable or adapter and USB keyboard and mouse to Jetson Orin Nano Developer Kit, or hook up the USB to TTL Serial cable.
  4. Unplug the power supply and put back in to power cycle.
  5. Jetson should now boot into the Jetson Linux (BSP) of your selected JetPack version from the storage of your choice.

  6. Power up Jetson โ€” it will boot from SSD automatically.

  7. Complete initial Ubuntu setup wizard (username, password, time zone).

  8. Optional: Verify SSD is rootfs:

    df -h /
    # Output should show something like: /dev/nvme0n1p1
    #Identify your NVMe SSD
    sjsujetson@sjsujetson-01:~$ lsblk
    

  9. Optional: Check JetPack version

    sjsujetson@sjsujetson-01:~$ dpkg-query --show nvidia-l4t-core
    nvidia-l4t-core  36.4.3-20250107174145
    sjsujetson@sjsujetson-01:~$ dpkg -l | grep nvidia*
    
    It shows L4T 36.4.3, which corresponds to JetPack 6.2 Official mapping reference. JetPack 6.2 is the latest production release of JetPack 6. This release includes Jetson Linux 36.4.3, featuring the Linux Kernel 5.15 and an Ubuntu 22.04-based root file system. The Jetson AI stack packaged with JetPack 6.2 includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3, VPI 3.2, DLA 3.1, and DLFW 24.0.

๐Ÿงช Jetson Development Workflow in SJSU

We have prepared a master Jetson image preloaded with the latest JetPack 6.2, NVIDIA Container Toolkit (Docker support), and all essential runtime and development components that typically require elevated privileges. This includes CUDA, cuDNN, TensorRT, DeepStream, and necessary drivers.

Students can simply SSH into their assigned Jetson device and begin testing functionality, running containerized applications, or developing their own AI/robotics projectsโ€”without needing to configure the system themselves or worry about low-level device setup. This streamlined environment is ideal for focusing on learning and experimentation rather than system administration.

โœ… No sudo access required. โœ… Pre-installed JetPack, Docker, and AI libraries. โœ… Access Jetson remotely via .local hostname or static IP. โœ… Custom designed sjsujetsontool to update, launch shell/JupyterLab, run Python scripts, llm models, and monitor system.

๐Ÿ”Œ Jetson Orin Nano Hardware Deep Dive

๐Ÿ—๏ธ System-on-Module (SOM) Architecture

The Jetson Orin Nano consists of two main components: 1. Jetson Orin Nano Module - The compute module containing CPU, GPU, memory 2. Developer Kit Carrier Board - Provides I/O, power, and expansion interfaces

Module Specifications:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 Jetson Orin Nano Module                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   6-core    โ”‚  โ”‚   512-core   โ”‚  โ”‚    8GB LPDDR5   โ”‚ โ”‚
โ”‚  โ”‚ Cortex-A78AEโ”‚  โ”‚ Ampere GPU   โ”‚  โ”‚   102 GB/s BW   โ”‚ โ”‚
โ”‚  โ”‚  @ 1.7GHz   โ”‚  โ”‚ 16 Tensor    โ”‚  โ”‚                 โ”‚ โ”‚
โ”‚  โ”‚             โ”‚  โ”‚    Cores     โ”‚  โ”‚                 โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚              Tegra Orin SoC                         โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Video Encoders: 2x 4K30 H.264/H.265             โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Video Decoders: 2x 4K60 H.264/H.265             โ”‚ โ”‚
โ”‚  โ”‚  โ€ข ISP: 2x 12MP cameras                             โ”‚ โ”‚
โ”‚  โ”‚  โ€ข PCIe: 3.0 x8 + 3.0 x4                           โ”‚ โ”‚
โ”‚  โ”‚  โ€ข USB: 4x USB 3.2 Gen2                             โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”Œ Comprehensive Connector Analysis

Power System

  • DC Jack (19V): Primary power input, 5.5mm x 2.5mm barrel connector
  • Power Modes:
  • 5W Mode: CPU @ 1.2GHz, GPU @ 510MHz (fanless operation)
  • 15W Mode: CPU @ 1.7GHz, GPU @ 918MHz (active cooling)
  • 25W Mode: Maximum performance (requires adequate cooling)
# Check current power mode
sudo nvpmodel -q

# Set to maximum performance
sudo nvpmodel -m 0

# Set to power-efficient mode
sudo nvpmodel -m 1

Display and Video

  • DisplayPort 1.2: Supports up to 4K@60Hz with Multi-Stream Transport (MST)
  • Video Encoding: 2x 4K30 H.264/H.265 hardware encoders
  • Video Decoding: 2x 4K60 H.264/H.265 hardware decoders

Camera Interfaces (MIPI CSI-2)

Camera Connector Pinout (22-pin, 0.5mm pitch):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Pin โ”‚ Signal    โ”‚ Pin โ”‚ Signal          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  1  โ”‚ GND       โ”‚ 12  โ”‚ CSI_D1_N        โ”‚
โ”‚  2  โ”‚ CSI_CLK_P โ”‚ 13  โ”‚ GND             โ”‚
โ”‚  3  โ”‚ CSI_CLK_N โ”‚ 14  โ”‚ CSI_D0_P        โ”‚
โ”‚  4  โ”‚ GND       โ”‚ 15  โ”‚ CSI_D0_N        โ”‚
โ”‚  5  โ”‚ CSI_D3_P  โ”‚ 16  โ”‚ GND             โ”‚
โ”‚  6  โ”‚ CSI_D3_N  โ”‚ 17  โ”‚ CAM_I2C_SCL     โ”‚
โ”‚  7  โ”‚ GND       โ”‚ 18  โ”‚ CAM_I2C_SDA     โ”‚
โ”‚  8  โ”‚ CSI_D2_P  โ”‚ 19  โ”‚ GND             โ”‚
โ”‚  9  โ”‚ CSI_D2_N  โ”‚ 20  โ”‚ CAM_PWDN        โ”‚
โ”‚ 10  โ”‚ GND       โ”‚ 21  โ”‚ CAM_RST_N       โ”‚
โ”‚ 11  โ”‚ CSI_D1_P  โ”‚ 22  โ”‚ +3.3V           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Supported Camera Configurations: - CAM0: 1x2 lane or 1x4 lane MIPI CSI-2 - CAM1: 1x2 lane MIPI CSI-2 - Maximum Resolution: 12MP per camera - Compatible Cameras: IMX219, IMX477, OV5693, and many others

Storage Expansion (M.2 Slots)

M.2 Key-M Slot (2280 size) - PCIe 3.0 x4: - Use Cases: High-performance NVMe SSDs - Max Speed: ~3.5 GB/s sequential read - Recommended: Samsung 980, WD SN570, Crucial P3

M.2 Key-M Slot (2230 size) - PCIe 3.0 x2: - Use Cases: Compact SSDs, additional storage - Max Speed: ~1.7 GB/s sequential read

M.2 Key-E Slot (2230 size) - PCIe 3.0 x1 + USB 2.0: - Use Cases: WiFi/Bluetooth modules, cellular modems - Pre-populated: Intel AX201 WiFi 6 + Bluetooth 5.2 - Alternatives: Quectel EM05-G (4G LTE), Sierra Wireless modules

๐Ÿ”ง 40-Pin GPIO Expansion Header

The 40-pin header provides extensive I/O capabilities compatible with Raspberry Pi HATs:

     3.3V  (1) (2)  5V
 GPIO2/SDA  (3) (4)  5V
 GPIO3/SCL  (5) (6)  GND
    GPIO4  (7) (8)  GPIO14/TXD
      GND  (9) (10) GPIO15/RXD
   GPIO17 (11) (12) GPIO18/PWM
   GPIO27 (13) (14) GND
   GPIO22 (15) (16) GPIO23
     3.3V (17) (18) GPIO24
 GPIO10/MOSI(19) (20) GND
 GPIO9/MISO (21) (22) GPIO25
 GPIO11/SCLK(23) (24) GPIO8/CE0
      GND (25) (26) GPIO7/CE1
   ID_SD  (27) (28) ID_SC
    GPIO5 (29) (30) GND
    GPIO6 (31) (32) GPIO12/PWM
   GPIO13 (33) (34) GND
   GPIO19 (35) (36) GPIO16
   GPIO26 (37) (38) GPIO20
      GND (39) (40) GPIO21

Available Interfaces:

  • I2C: 2 channels (I2C-1: pins 3,5; I2C-0: pins 27,28)
  • SPI: 2 channels (SPI0: pins 19,21,23,24,26; SPI1: pins 12,35,38,40)
  • UART: 1 channel (pins 8,10)
  • PWM: 4 channels (pins 12,32,33,35)
  • GPIO: 26 digital I/O pins
  • Power: 3.3V, 5V, and multiple GND pins

๐Ÿ”— Resources