Question 1

Can I run Llama 3, Mistral, or Gemma?

Accepted Answer

Yes. Our stack is optimized for all models supported by vLLM, including Llama 3, Mistral, Gemma, Qwen, and custom fine-tuned models.

Question 2

What about data privacy and GDPR?

Accepted Answer

We sign a DPA. Your data stays on your dedicated servers in European datacenters (Germany and Finland). We do not use your data to train models. Full GDPR compliance by default.

Question 3

Is it OpenAI API compatible?

Accepted Answer

Yes. You can swap your OpenAI base URL and API key, and your app will work without code changes. We serve an OpenAI-compatible REST API.

Question 4

How does pricing compare to AWS SageMaker?

Accepted Answer

Our dedicated GPU hardware costs a fraction of equivalent cloud GPU instance pricing. A single GEX44 GPU server costs ~$400/mo vs $2,000+/mo on AWS for equivalent compute.

Question 5

Can I fine-tune models on your infrastructure?

Accepted Answer

Yes, with the AI Full Stack tier. We provide LoRA fine-tuning pipelines and "Train-by-Night, Serve-by-Day" GPU scheduling to maximize utilization.

Question 6

Do you offer trials or demos?

Accepted Answer

We offer paid pilots. Since we provision physical hardware, we cannot offer free tiers. Pilots start at $1,000 for 30 days.

Question 7

What GPU hardware do you use?

Accepted Answer

Hetzner GEX44 (RTX 4000 Ada) for inference and GEX131 (RTX PRO 6000 Blackwell, 96 GB VRAM) for training and large models. All dedicated — no shared tenancy.

Question 8

Can I scale to multiple GPU nodes?

Accepted Answer

Yes. Both tiers support add-on GPU nodes. AI Inference: +$400/mo per GEX44 node. AI Full Stack: +$1,000/mo per GEX131 node.

Question 9

What monitoring do I get?

Accepted Answer

Full Prometheus + Grafana dashboards with DCGM GPU metrics, vLLM throughput tracking, and Loki logging. You see token/s, queue depth, and GPU utilization in real time.

Question 10

What is the difference between the two tiers?

Accepted Answer

AI Inference is for serving 7B-13B models on GEX44 hardware. AI Full Stack adds RAG, fine-tuning pipelines, multi-model management, and 96 GB VRAM GPUs for 70B+ models. Need more capacity? Add GPU nodes to either tier.

Question 11

Where are the GPU servers located?

Accepted Answer

All GPU infrastructure runs on dedicated Hetzner servers in European datacenters (Germany and Finland). GPU servers are not available in US regions. This ensures full EU data sovereignty and compliance with the EU AI Act. For applications requiring US-based inference endpoints, we can architect hybrid solutions with EU-hosted models and edge caching.

Service	Setup Fee	Monthly	Best For
AI Inference	$3,000	$1,000	AI Agencies & SaaS with AI features
AI Full Stack	$2,000	$4,000	AI startups, enterprises & GDPR-sensitive companies

AI Infrastructure Suite: Sovereign AI on Dedicated Hetzner GPUs

Why Should You Run AI on Dedicated Hardware?

Which AI Product Fits Your Needs?

AI Inference

For AI Agencies & SaaS

AI Full Stack

For AI Startups, Enterprises & GDPR

How Do the AI Products Compare?

What Are the Service Boundaries?

Have questions about our AI Infrastructure Suite?