🧠
Local LLM Inference

Runs Qwen2.5 7B instruct (q5_k_m quantized) entirely on-device via a llama.cpp server. No internet required after setup.

done
💾
Persistent Memory

Conversations and context stored in a local SQLite database, letting the assistant remember past interactions.

done
👁️
Vision Capabilities

Image understanding through a dedicated vision service, allowing the assistant to process and describe visual input.

in progress
🌐
Web Search

Planned integration to give the assistant access to live web results to answer questions beyond its training data.

planned
📚
RAG (Retrieval-Augmented Generation)

Planned support for loading personal documents and files so the assistant can answer questions grounded in your own data.

planned
🛠️
Tool Use

Exploring function calling and tool-use patterns to let the assistant take actions, not just answer questions.

planned
🐍
Python
Core language
🦙
llama.cpp
LLM inference server
🤖
Qwen2.5 7B
Language model
🗄️
SQLite
Memory & storage
CPU AMD Ryzen 5 5500U @ 2.1 GHz
RAM 16 GB
GPU NVIDIA RTX 3050 Laptop — 4 GB VRAM
Model qwen2.5-7b-instruct-q5_k_m.gguf
Inference llama.cpp local server — fully offline