## The Problem Nobody Wants to Think About
You build a beautiful RAG system. It indexes your entire corporate knowledge base. Policies, financial reports, HR documents, engineering specs, board meeting notes, salary bands, M&A plans, client contracts, disciplinary records.
Then an intern asks it a question and gets back the CEO's compensation package.
This isn't hypothetical. I've seen it happen. More than once.
The default state of a RAG system is "everyone can see everything." And in an enterprise, that's not just inconvenient. It's a compliance violation, a legal liability, and a trust-destroying event.
Access control in RAG isn't a nice-to-have. It's table stakes. Build it in from day one or don't build the system at all.
## Why It's Harder Than You Think
Traditional access control works at the document level. This file is readable by these groups. Check permissions, grant or deny. Done.
RAG breaks this model in three ways.
**Chunking fragments permissions.** A document might have public sections and confidential sections. When you chunk it, those sections become independent vectors. The chunk doesn't inherently know it came from a restricted part of a document. It is worth reading about [least-privilege principles for agents](/blog/ai-agent-permissions-least-privilege) alongside this.
**Retrieval happens before display.** In traditional systems, the user clicks on a file, permissions are checked, access is granted or denied. In RAG, retrieval happens server-side, and the retrieved chunks influence the LLM's response. Even if you don't show the chunk directly, its content leaks through the generated answer.
**Aggregation creates new sensitivities.** Individual chunks might be fine independently. But combining salary data from HR, project assignments from engineering, and performance reviews creates a composite picture that should be restricted. The whole is more sensitive than the parts.
## Architecture: Pre-Retrieval Filtering
The most reliable approach. Filter chunks BEFORE retrieval based on the user's permissions.
```python
class SecureRetriever:
def __init__(self, vector_store, permission_service):
self.vectors = vector_store
self.permissions = permission_service
async def search(self, query: str, user: User, top_k: int = 10):
# Step 1: Get user's accessible document IDs
accessible_docs = await self.permissions.get_accessible_documents(
user_id=user.id,
groups=user.groups,
roles=user.roles,
)
# Step 2: Search with metadata filter
results = await self.vectors.search(
query=query,
top_k=top_k,
filter={
"document_id": {"$in": accessible_docs}
},
)
return results
```
This is clean and secure. The vector store never returns chunks the user can't see. No leakage possible.
The downside: maintaining the accessible document list per user can be expensive. If you have 100,000 documents and complex group-based permissions, that filter can get large.
### Optimization: Permission Tags
Instead of filtering by individual document IDs, tag chunks with permission groups at ingestion time.
```python
async def ingest_document(document, chunker, embedder, vector_store):
# Get document permissions from source system
permissions = await get_document_permissions(document.id)
permission_tags = permissions.to_tags()
# e.g., ["group:engineering", "group:senior-engineering", "role:admin"]
chunks = chunker.chunk(document)
for chunk in chunks:
embedding = await embedder.embed(chunk.text)
await vector_store.upsert(
id=chunk.id,
embedding=embedding,
metadata={
"document_id": document.id,
"permission_tags": permission_tags,
"source": document.source,
"updated_at": document.updated_at,
}
)
```
At query time, filter on permission tags instead of document IDs:
```python
async def search(self, query: str, user: User, top_k: int = 10):
user_tags = self.permissions.get_user_tags(user)
# e.g., ["group:engineering", "role:team-lead", "department:platform"]
results = await self.vectors.search(
query=query,
top_k=top_k,
filter={
"permission_tags": {"$containsAny": user_tags}
},
)
return results
```
This is faster because permission tags are small, indexed metadata fields rather than lists of 100K document IDs.
## The Ingestion Challenge: Keeping Permissions in Sync
Permissions change. People join teams. People leave. Documents get reclassified. Projects go from confidential to public.
Your RAG index needs to reflect these changes, and it needs to reflect them promptly. A document that was restricted yesterday and unrestricted today should be retrievable. A document that was public yesterday and restricted today must NOT be retrievable by unauthorized users.
```python
class PermissionSyncPipeline:
async def sync(self):
"""Run periodically to keep permissions in sync."""
# Get all permission changes since last sync
changes = await self.permission_source.get_changes(
since=self.last_sync_timestamp
)
for change in changes:
if change.type == "document_permission_changed":
# Update all chunks for this document
new_tags = change.new_permissions.to_tags()
await self.vector_store.update_metadata(
filter={"document_id": change.document_id},
metadata={"permission_tags": new_tags},
)
elif change.type == "document_deleted":
await self.vector_store.delete(
filter={"document_id": change.document_id}
)
self.last_sync_timestamp = now()
```
How often to sync depends on your risk tolerance. For most enterprises, hourly is fine. For sensitive environments (healthcare, finance, legal), you might need real-time sync via webhooks. For a deeper look, see [data leakage risks](/blog/data-leakage-ai-agents).
## Sub-Document Permissions
Here's where it gets properly complicated. Some documents have mixed sensitivity.
A board meeting transcript might have public announcements, confidential financials, and restricted personnel discussions. One document, three permission levels.
```python
class SubDocumentPermissionChunker:
def chunk_with_permissions(self, document, sections):
"""
Chunk document respecting section-level permissions.
Never merge chunks across permission boundaries.
"""
chunks = []
for section in sections:
section_chunks = self.chunker.chunk(section.text)
for chunk in section_chunks:
chunk.metadata["permission_tags"] = section.permission_tags
chunk.metadata["sensitivity_level"] = section.sensitivity
chunks.append(chunk)
return chunks
```
The key rule: never merge content across permission boundaries during chunking. If section A is public and section B is confidential, they must be separate chunks even if they're small enough to combine.
## Post-Retrieval Verification
Belt and suspenders. Even with pre-retrieval filtering, add a post-retrieval permission check.
```python
async def search_with_verification(self, query, user, top_k=10):
# Pre-filtered retrieval
candidates = await self.secure_search(query, user, top_k=top_k * 2)
# Post-retrieval verification
verified = []
for chunk in candidates:
if await self.permissions.verify_access(
user_id=user.id,
document_id=chunk.metadata["document_id"],
chunk_id=chunk.id,
):
verified.append(chunk)
if len(verified) >= top_k:
break
return verified
```
Why double-check? Because metadata filters can have bugs. Permission tags can be stale. The verification step catches any leakage that slipped through pre-filtering.
## Audit Trail: Non-Negotiable
Every retrieval should be logged with the user identity, what was retrieved, and what permissions were checked.
```python
class AuditedRetriever:
async def search(self, query, user, top_k=10):
results = await self.secure_retriever.search(query, user, top_k)
await self.audit_log.record({
"timestamp": now(),
"user_id": user.id,
"user_groups": user.groups,
"query": query,
"retrieved_chunk_ids": [r.id for r in results],
"retrieved_document_ids": list(set(
r.metadata["document_id"] for r in results
)),
"permission_check": "pre_filter + post_verify",
})
return results
```
This isn't optional for regulated industries. And even for unregulated ones, when someone asks "did anyone access X?", you need to be able to answer.
## Integration with Identity Providers
Don't build your own permission system. Integrate with what the organization already uses.
```python
class EnterprisePermissionService:
def __init__(self, idp_client):
self.idp = idp_client # Okta, Azure AD, Google Workspace, etc.
async def get_user_tags(self, user: User) -> list[str]:
"""Resolve user's effective permissions from IdP."""
# Direct groups
groups = await self.idp.get_user_groups(user.id)
# Transitive groups (nested group membership)
all_groups = await self.idp.resolve_transitive_groups(groups)
# Roles
roles = await self.idp.get_user_roles(user.id)
# Department
dept = await self.idp.get_user_department(user.id)
tags = []
tags.extend(f"group:{g}" for g in all_groups)
tags.extend(f"role:{r}" for r in roles)
tags.append(f"department:{dept}") The related post on [keeping the index current](/blog/rag-document-sync-incremental) goes further on this point.
return tags
```
This ensures that when someone is removed from a group in your identity provider, they immediately lose access in your RAG system (at next sync). No separate permission management to maintain.
## The Minimum Viable Approach
If this all feels overwhelming, here's the simplest approach that's still secure:
1. Tag every chunk with its source document's permission group at ingestion
2. Filter on permission group at query time
3. Sync permissions daily
4. Log all retrievals
That's four things. Not twenty. Start there. Add sub-document permissions and real-time sync when you need them.
The worst approach is no access control at all, followed closely by "we'll add it later." Later never comes. Or it comes after the incident. Build it in from the start.