Skip to main content
Possible SetupAI & Knowledge

Internal Knowledge AI: GDPR-Compliant RAG Assistant (On-Premise)

Build an internal knowledge AI without data leakage — a RAG chatbot answers team questions from Confluence, Jira and Git repos, fully on-premise, with source citations and permission checks instead of hallucinations.

AIRAGOn-PremiseDSGVOKnowledge ManagementLLMEntwicklerteams
Industry
Engineering / Knowledge Management
Implementation
8 weeks
Search Time per Question
<30 sec

"Ask Markus, he knows." — Markus is on vacation. Or left the company three months ago.

The knowledge exists: in Confluence pages, Jira tickets, README files, old architecture decisions. But nobody finds it. So the same question gets asked in the team chat for the fourth time, the senior developer gets interrupted for the fourth time — and the documentation that would have answered it stays unread.

The obvious shortcut — pasting internal docs into a public AI tool — is off the table for many companies. Source code, customer data, trade secrets in someone else's cloud? Exactly.

This showcase demonstrates the other way: a knowledge AI that runs inside your own network, only states what it can back up with sources, and only shows what the person asking is allowed to see.

Automation Workflow

How the internal knowledge AI answers a question — step by step, with permission checks and mandatory citations

BPMN Elements
Trigger
Start Event
Processing
Task
Integration
Service Task
Output
End Event
Gateway
XOR (exclusive)

Before vs. After

Knowledge Search
Before
20–30 min across Confluence, Jira, chats
After
One question, answer in seconds
Data Flow
Before
Copy-paste into external AI tools
After
100% on-premise, nothing leaves the network
Reliability
Before
Outdated docs, hearsay
After
Every answer with source link and date
Access Control
Before
Knowledge shared ad hoc, uncontrolled
After
Answers respect permissions

The Challenge

Development and engineering teams lose time every day searching for knowledge: the answer exists somewhere in Confluence, Jira, GitLab or on network drives — but the search takes 20 to 30 minutes or ends at the most experienced colleague, who then can't get their own work done. For new hires, onboarding drags on for months because knowledge is only passed on by word of mouth.

At the same time, data protection and IP protection rule out the easy solution: internal documentation, source code or customer data must not be copied into public AI services. In regulated or safety-critical environments there's an additional constraint: not everyone is allowed to see everything. A knowledge chatbot that ignores permissions would be a bigger risk than no chatbot at all. And a chatbot that invents convincing-sounding answers when it lacks knowledge destroys trust faster than it creates value.

Our Solution

The reference architecture consists of two parts: an indexing pipeline and an answering workflow — both running entirely on your own infrastructure.

The indexing pipeline uses n8n to synchronize the knowledge sources every night: Confluence pages, Jira tickets, GitLab READMEs and architecture documents. Before embedding into the vector database (Qdrant), every document passes through a redaction filter that detects and removes API keys, tokens, credentials and personal data. For each text chunk, the original permissions (ACLs) of the source system are stored alongside it.

In the answering workflow, an employee asks a question via web interface or Slack. After SSO login, the system retrieves the most relevant text chunks — filtered to sources the person asking is allowed to see in the source system. A locally hosted LLM (e.g. Llama via Ollama, running in Docker on your own GPU hardware) formulates the answer — with one hard rule: every statement needs a source citation with link and date. If the search finds no reliable source, the system answers honestly with "I don't have a documented answer for that" and names the responsible expert instead of hallucinating. Every question and answer is recorded in an audit log with defined retention periods.

Key Features

100% On-Premise LLM

The language model runs via Ollama and Docker on your own hardware. No outbound API calls, no training on your data, full control over the model and its updates.

Permission-Aware Retrieval (RAG)

The vector search filters to documents the person asking is allowed to see in the source system. Permissions from Confluence, Jira and GitLab are enforced at retrieval time — not after the fact.

Mandatory Citations, No Hallucinations

Every answer links the underlying documents with their last-updated date. Without a reliable source, the system honestly answers "I don't know" and names the responsible expert.

Secrets & PII Redaction

Before indexing, a filter detects and removes API keys, tokens, credentials and personal data — sensitive content never reaches the search index in the first place.

Results

Possible setup, not a packaged product

The figures shown are target values and expected magnitudes for a possible setup – based on industry benchmarks, public studies of comparable setups, and our own tests on a real stack. They are not measured outcomes from a specific customer project; actual results depend on company size, process maturity, and integration depth. We do not offer this setup as a packaged product. We help teams design, automate, and run such processes themselves – through architecture consulting, workshops, and implementation support with n8n. For regulated third-party systems with certification or license requirements (e.g. HIS, gematik, DATEV-certified), we partner with specialized providers.

<30 sec
Search Time per Question
100%
Answers with Sources
0
Data Sent to External Clouds
-40%
Onboarding Time

Knowledge questions answered in seconds instead of half-hour searches — fully on-premise, every answer backed by sources, zero data leaving for external clouds

Integrations

Seamless connection to your existing infrastructure

Ollama (On-Premise LLM)

AI Engine

Locally hosted open language model on your own GPU hardware — answer generation without external APIs

Qdrant

Vector Database

Semantic search across all indexed knowledge sources with per-request ACL filtering

Confluence & Jira

Knowledge Sources

Nightly synchronization of pages and tickets, including their original permissions

GitLab

Code & Docs

READMEs, architecture decisions and wikis from the repositories — with secrets filtering

Keycloak

SSO & Permissions

Single sign-on and group resolution — determines which sources are searched per user

Security & Compliance

Enterprise-ready with highest security standards

100% On-Premise

LLM, vector database and workflows run entirely on your own infrastructure. No external AI APIs; deployable even in air-gapped networks.

Permission Inheritance

Access rights from Confluence, Jira and GitLab are enforced on every search. Nobody gets answers from documents they couldn't open themselves.

Secrets Redaction

API keys, tokens and credentials are detected and removed before indexing — they never reach the search index at all.

Audit Log & GDPR

Every question and answer is logged in an audit-proof way, with defined retention periods and GDPR information-request capability.

Technology Stack

n8nOllama (On-Premise LLM)Qdrant Vector DBConfluence APIJira APIGitLabKeycloak SSODocker

Frequently Asked Questions

No. The language model runs via Ollama on your own hardware, as do the vector database and all workflows. There are no API calls to external AI services — neither during indexing nor when answering. The architecture also works in networks without internet access.
Through three mechanisms: the LLM may only answer from the retrieved document chunks (retrieval-augmented generation), every statement requires a source citation with a link, and if the search yields no reliable source, the system honestly answers "I don't know" instead of guessing. Answer quality is additionally monitored via spot-check reviews and user feedback.
Before indexing, every document passes through a redaction pipeline that identifies and removes API keys, tokens, credentials and personal data using pattern- and AI-based detection. Sensitive content never reaches the search index in the first place — which is more robust than filtering it at output time.
In this reference architecture: Confluence, Jira and GitLab. In principle, any source with an API or file access — SharePoint, network drives, internal databases or requirements management systems. The n8n indexing pipeline is modular; new sources are added as an additional sync branch.
Less than commonly assumed. For teams of up to around 200 users, a single server with a current GPU is usually sufficient, since requests are short and queue well. The model size is matched to the hardware — for knowledge questions with source context, mid-sized open models already deliver very good results.

Would this automation pay off in your case?

You've just seen one possible setup. The 5-minute bottleneck diagnosis shows you — for your own process: maturity level, ROI estimate and whether this path is worth it. Free, instant result.