KAGGLE · GEMMA 4 GOOD HACKATHON · 2026

LASTBOX

offline survival assistant in a pelican case.

A Raspberry Pi 5, a LoRa 868 MHz radio HAT, a camera and a fine-tuned Gemma 4 E2B, trained with Unsloth on a single GB10 and served by llama.cpp on an ARM CPU. Answers survival questions, identifies plants and wounds from its lens, relays terse messages over a Meshtastic mesh — no internet, no cloud, no phone-home. When the network is the first thing to fail, you still have the box.

FIRST TOKEN ~700 ms WARM SUSTAINED 6.4–7 TOK/S MODEL GEMMA-4-E2B Q4_K_M (3.4 GB) VISION MMPROJ-F16 (940 MB) TRAINED WITH UNSLOTH · LORA r=8 α=8 SERVED BY LLAMA.CPP · DOCKER HOST RPI 5 · 8 GB · ARM CORTEX-A76 LORA HAT SX1262 DOWN (SIMULATION)

REPO → FULL WRITE-UP KAGGLE

WHAT IT IS

LastBox is what you reach for when the grid stops working. The hardware is deliberately boring — a Raspberry Pi 5 with 8 GB of RAM in a sealed Pelican case, a 12 V LiFePO₄ battery, a camera module, a LoRa radio HAT — but the software does three things that are normally cloud features:

Survival Q&A

Touch or type a question. A fine-tuned Gemma 4 E2B replies in 1–2 sentences with a numbered procedure if appropriate. Hard byte caps so every reply fits.

FIRST AID · BUSHCRAFT · NAVIGATION · POWER · HAZARDS

Optical triage

Aim the camera at a plant, a wound or a piece of gear. The same model answers via a 940 MB SigLIP vision encoder. Defaults to conservative replies — "unknown plant, do not eat" beats a wrong identification.

SIGLIP MMPROJ · CC-LICENSED EVAL IMAGES

Mesh radio relay

Reply payloads are hard-capped at 150 bytes UTF-8 so they fit in a single LoRa packet at legal duty cycle. The box becomes a thinking router in a Meshtastic mesh of handhelds.

868 MHz · MESHTASTIC USB OR SX1262 SPI

ARCHITECTURE

┌──────────────── Touchscreen / Web UI / LoRa packet in ────────────────┐
│                                                                      │
│  webapp/server.py     ──or──     demo.py orchestrator                 │
│       │                                │                              │
│       ▼   HTTP /v1/chat/completions    ▼                              │
│  ┌──────────────────────────────────────────────────────┐             │
│  │ llama-server (Docker on RPi 5)                       │             │
│  │   ghcr.io/ggml-org/llama.cpp:server                  │             │
│  │   ────────────────────────────────                   │             │
│  │   -m  lastbox-gemma4-e2b-q4_k_m.gguf  (3.4 GB)       │             │
│  │   --mmproj mmproj-F16.gguf            (940 MB)       │             │
│  │   --threads 4  --ctx 2048  --parallel 1              │             │
│  │   port 11436 → 8080                                  │             │
│  └──────────────────────────────────────────────────────┘             │
│       │                                                              │
│       ▼   text + optional <tool_call> blocks                          │
│  Tool dispatcher (in demo.py):                                       │
│  - search_knowledge   → local SQLite / dict of survival manuals      │
│  - capture_image      → RPi camera + multimodal model                │
│  - analyze_signal     → LoRa HAT RSSI / SNR stats                    │
│  - send_lora_message  → Meshtastic firmware via serial               │
│  - get_system_status  → psutil + /sys/class/thermal                  │
│  - listen_lora        → channel scan w/ pattern filter               │
│  - update_memory      → atomic toml write                            │
│                                                                      │
│  Tool result → injected as next user turn → final answer             │
└──────────────────────────────────────────────────────────────────────┘

TRAINING

One GB10 (Grace Blackwell, DGX Spark, 121 GB unified RAM, aarch64 + CUDA 13). Unsloth's FastModel on Gemma 4 E2B-it, LoRA r=8 α=8, bf16, cosine schedule. Dataset generated by Kimi K2.5 as teacher via OpenRouter — 30 inline survival seeds × 5 categories × 8 variants, JSONL-strict, byte-cap validated, deduped.

The numbers

DATA GEN	$1.10 · ~30 min · 1151 raw → 1148 kept (99.7%)
SFT v2 (3 epochs)	43 min on GB10 · train loss 0.08
SFT v3 no-think (shipped)	27 min on GB10 · 2 epochs · template fix kills CoT preamble
GGUF EXPORT	~35 s · bf16 → Q4_K_M
DEPLOY	5-7 min rsync over Tailscale to lastbox
END-TO-END CLEAN RUN	~1.5 h data → deploy → eval

Loss curve (SFT v2)

step  train    eval
   5  3.37
  10  2.29
  15  1.38
  20  1.09
  30  0.77
  50  0.26    2.62
 100  0.08    2.64
 150  0.07    2.65
 195  0.08    2.64

Eval plateau ≈ 2.64 reflects the 114-dialog held-out set; what counts is agent-level eval below.

BENCHMARKS — V2 vs V3 ON THE DEPLOYED BOX

Two trained checkpoints, both benchmarked live against the RPi 5 over Tailscale. v3 shipped because the qualitative gap was decisive even when headline numbers looked similar.

METRIC	v2 (thinking)	v3 (no-think) — shipped
Response-quality (25 samples)	0.518	0.506
Of completed dialogs	13/14 (93%)	13/14 (93%)
format_ok hybrid	0.52	0.52
byte_compliance (≤150 / ≤200 B)	0.48	0.48
persona_ok (no preambles)	0.56	0.52
Median first-token (warm)	~1.5 s	~0.7 s
Smoke-test response	"Thinking Process: 1. Analyze… 2. Determine…" (516 B)	"1. Apply direct, firm pressure to the wound with a clean cloth." (63 B)
Smoke-test end-to-end	19 s	4.7 s
Sustained generation	6.4–7 tok/s	6.4–7 tok/s

The difference between v2 and v3 is the difference between a thinking-out-loud model with a tighter style and a survival agent that just gives you the answer. v3 is shipped.

Try it now — chat with v6 (Hugging Face Space)

Live demo on Hugging Face Spaces · ZeroGPU backend · three modes: LoRa Radio (≤150 B), Free Chat, RAG Chat with citations. Cold start ~30 s, warm responses 3–10 s.

🤗 OPEN CHAT IN HF SPACE →

Post-deadline v6 (SFT warmup) — tool emission solved

After GRPO v4+v5 plateaued at 0% tool emission, we executed roadmap #1 (SFT warmup on tool-only pairs) and discovered the eval itself had a prompt-mismatch bug. Two changes, one breakthrough:

Filtered train_v2 to 1034 [user, assistant_tool_call] pairs.
12-minute Unsloth SFT (r=8, α=8, lr=2e-4, 1 epoch, 65 steps). Loss 0.018.
Fixed the eval to use the full training-time system prompt (with tool definitions JSON + format hint) — not the shorter SYSTEM_PROMPT_EN alone. Eval had been suppressing emission by omitting the tool defs block.

Metric	v3 SFT	v4 GRPO	v5 GRPO	v6 (stream eval)	v6 final
tool_emission_rate	~0%	0%	0%	48%	72%
tool_accuracy	0%	0%	0%	44%	64%
arg_validity	4%	4%	4%	36%	56%
agentic_score	0.016	0.016	0.016	0.408	0.608
byte_compliance	0.48	0.52	0.52	0.52	1.000
format_ok	0.52	0.52	0.52	0.52	1.000
persona_ok	0.52	0.52	0.52	0.52	1.000
response_quality	0.506	0.520	0.520	0.520	1.000
completed / 25	14	13	13	13	25

38× jump in agentic_score (0.016 → 0.608) from one 12-minute SFT pass + two eval-methodology fixes. The 0.52 ceiling on byte_compliance / format_ok / persona_ok turned out to be 13/25 completion rate from streaming-SSE disconnects — never a quality issue. Switching the eval to non-streaming POST + 2-retry on disconnect moved completion to 25/25 and pulled every flag to 1.000.

Post-deadline GRPO experiment (v4 + v5)

After the hackathon submission we ran two GRPO iterations to lift tool_emission from the v3 baseline. The headline result: RAG is live on the box and answers cite source IDs; GRPO with KL=0.04 cannot move tool_emission from ~0% in 200 steps even with an active −0.5 penalty for skipping the tool when expected.

Metric	v3 SFT	v4 GRPO (reward v1)	v5 GRPO (reward v2)
tool_emission_rate	~0%	0%	0%
byte_compliance	0.48	0.52	0.52
format_ok	0.52	0.52	0.52
response_quality	0.506	0.520	0.520
median first-token	9 184 ms	1 735 ms	8 460 ms
completed / 25	14	13	13

The lift in byte_compliance + the v4 latency win prove GRPO can move smaller behaviours under this beta. The tool_emission plateau is about policy-shift size, not reward density: the KL term blocks the large move from p(tool)≈0 to p(tool)≈1. Two paths actually fix this, neither is "another GRPO iteration":

SFT warmup on tool-only pairs — filter train_v2.jsonl to the 993 prompts whose first assistant turn is a <tool_call>, do a quick 50–200-step SFT to set the prior, then GRPO refines without needing to break the KL ceiling.
GBNF-constrained decoding in llama.cpp — force the first tokens into <tool_call>{…}</tool_call> form, gated by a "should I tool?" classifier in the orchestrator.

Sample real interactions

you (touchscreen)> How many watts solar panel for my RPi 5?
lastbox> Min 15 W panel, recommended 20 W with an MPPT regulator for
         24/7 operation in cloudy conditions.
         (first token 668 ms)

you (lora)> hypothermia signs?
lastbox> Shivering, slurred speech, confusion. Warm gradually, dry
         clothes, sweet warm drink. Severe: handle gently, evacuate.
         (138 B UTF-8 — under LoRa 150 B cap)

you (touchscreen)> Pi 5 throws undervoltage warning even with the
                   official 27W PSU. What to check?
lastbox> Check USB-C cable AWG ≤20, no powered USB peripherals draining
         current, dmesg | grep undervolt for kernel-side power throttles.
         (165 B — under 200 B touchscreen cap)

THE BOX // LIVE UI

The same Pip-Boy aesthetic as this page is the lastbox's actual interface. Two columns: live MJPEG stream from the Raspberry Pi camera on the left, operator-chatter radio chat on the right. Snap a frame, ask Gemma about what it sees; type a query "from a remote handheld", get a reply that fits in a 150-byte LoRa packet — both go into the same chat log so the timeline is coherent.

Served by webapp/server.py — stdlib Python only, zero pip deps. Lives on the SD card, not the NVMe. Reachable on http://lastbox.local:8080/ from any device on the same LAN.

[lastbox-a] all systems up · mesh listener active · gemma 4 e2b q4_k_m loaded · awaiting traffic

[node-remote → LASTBOX-A] stop bleeding arm fast
[LASTBOX-A → mesh] Apply direct, firm pressure to the wound immediately.
1. Use a clean cloth or bandage.
2. Keep pressure constant.
113 / 150 bytes [OK] · gemma 6.2 s

[OPTICAL → LASTBOX-A] what do you see?
[LASTBOX-A · vision] The image shows a plain, light-colored, flat surface with a subtle shadow across it. There are no visible plants, wounds, or immediate hazards present.
gemma vision 31 343 ms · mmproj-F16 + Gemma 4 E2B Q4_K_M

>_

ROADMAP — WIRED, NOT VAPOR

A handful of features were intentionally left as the next iteration. Each one is wired in the codebase and gated on a clear external signal — a plugged-in device, a freed GPU hour, a register-level fix. They are not vapor, they are switches.

Voice in, voice out

The orchestrator separates intent capture from intent dispatch, so a mic path on the front is one endpoint:

arecord 16 kHz mono 5 s
   → whisper.cpp tiny.en (~75 MB)
   → POST /radio-query
   → Gemma 4 E2B reply
   → piper | espeak-ng → speaker

Blocker: the ReSpeaker 2-Mic HAT we have ships with a TLV320AIC3104 codec instead of the silkscreened WM8960; the standard overlay fails with -121, the fallback overlay loads but leaves the ADC muted. Two known fixes (custom overlay or an i2cset register sequence) — both short, neither shippable inside the deadline window.

Mesh radio — real packets

demo.py already calls meshtastic --port for send_lora_message and listen_lora; the Pip-Boy "RADIO" UI calls those same code paths. /mesh-status reports the live hardware truth, so the UI degrades to local inference under the real 150-byte cap.

The moment a working LoRa device shows up on /dev/ttyUSB0 or the SX1262 SPI pins go active, the relay path lights up without a code change.

Tool-call training (GRPO)

The SFT model rarely emits <tool_call> blocks (~0% in eval), so the orchestrator currently keyword-routes between tools. The clean fix is a GRPO pass with r = +1 if expected_tool_called else 0 against the same dialog set — ~1 h additional GB10 time.

RAG over offline survival manuals

v1 (Polish) shipped a working RAG pipeline — nomic-embed-text (~180 MB) + libzim ZIM dumps + top-K passage injection. v2 ships without it on purpose so the baseline numbers measure what the fine-tune itself knows. The next iteration brings it back behind a ?rag=true flag.

user → embed (nomic, ~80 ms)
     → sqlite-vss ANN top-K
     → inject passages
     → llama-server (existing path)
     → answer + cited passage IDs

Corpus on the SD card (~2.5 GB total, fits today's budget): US Army FM 21-76 (public domain), WikiMed ZIM dump, our own train_v2.jsonl, and a trimmed Wikipedia survival/first-aid subset. Win is citations — every answer carries a "FM 21-76, Ch. 4, p. 87" tag so the operator knows where the advice came from.

Image-paired fine-tune

Today the vision branch is untrained — Gemma's pretraining handles "what is in this picture?" perfectly, but the rowan-vs-yew toxicity distinction (load-bearing for a survival assistant) needs an image-paired SFT. ~500 CC-licensed plant photos × hybrid-format labels would close the gap.

Access-point mode

Today the lastbox joins an existing WiFi network and is reachable on http://lastbox.local:8080/. With hostapd + dnsmasq, the same box becomes the network — connect from any phone to SSID lastbox and the same UI is there. Out of scope for v1.

NVMe power-saving fix

The on-device NVMe crashed ~2 h before the deadline (classic RPi 5 PCIe power-saving fault — CSTS=0xffffffff). The webapp was rebuilt stdlib-only and deployed to the SD card so the demo wouldn't blink. v2 boots with nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off.