AI Infrastructure
Inference on your hardware. Designed, deployed, monitored.
A single GPU server or a multi-node cluster. Model routing, auto-scaling, latency and token-throughput monitoring.
- vLLM
- SGLang
- LiteLLM
- Multi-GPU
Independent Consultant
I help Austrian companies run AI and critical services on their own hardware — no cloud lock-in, no data leaks, full control.
01 / Services
Everything listed below, I run in production myself. No PowerPoint, no reseller markup.
Inference on your hardware. Designed, deployed, monitored.
A single GPU server or a multi-node cluster. Model routing, auto-scaling, latency and token-throughput monitoring.
Open-source models on-prem. Audit-ready, optionally air-gapped.
Llama, Mistral, Qwen, DeepSeek — hosted, routed, documented. GDPR- and NISG-2026-compliant.
Knowledge retrieval over your own data. Provisioned and operated.
Embedding pipelines, vector stores, re-ranking, eval loops. For internal docs, codebases, support tickets — running on your infrastructure.
Claude Code & co. — introduced, guard-railed, measured.
Training, workflow integration, review pipelines that keep AI output honest. Realistic expectations, measurable results.
SaaS out, your own servers in. Reproducible and documented.
Mail, identity, monitoring, backups, CI — one stack, no vendor lock-in. Proxmox as the foundation under all of the above.
02 / About
Independent, technical, hands-on.
I run the same stack I recommend: a 4-node Proxmox cluster, a GPU server with two RTX PRO 6000s, sovereign LLMs behind LiteLLM, my own CI runners, monitoring, mail and Vault.
Based in Austria. Working languages: German and English.
03 / Contact
I respond within one business day.