File Storage Kit
Overview
Purpose: Provide secure file upload, storage, and retrieval with metadata tracking, access control, and multiple storage backend support (local filesystem, Google Cloud Storage, or memory).
Key Features:
- Multiple storage backends: local, Google Cloud Storage (GCS), or in-memory
- Database metadata tracking for all files
- Owner-based access control (user or group ownership)
- File size limits and content type restrictions
- Public and private file support
- Search and filtering capabilities
- Automatic MIME type detection
- Tagging and categorization
Dependencies:
- Injected services: None
- Port dependencies: FileRepository (metadata storage), FileStorageAdapter (file content storage)
- Note: Kits cannot directly import from other kits (enforced by import-linter contract #6). Dependencies are injected via constructor in
compose.py.
Quick Start
from portico import compose
from portico.ports.file_storage import FileUploadRequest, FileOwnerType
# Basic configuration with local storage
app = compose.webapp(
database_url="postgresql://localhost/myapp",
kits=[
compose.user(),
compose.file(
storage_backend="local",
storage_path="./uploads",
max_file_size_mb=50,
),
]
)
# Access the file service
file_service = app.kits["file"].service
# Upload a file
with open("document.pdf", "rb") as f:
metadata = await file_service.upload_file(
file_content=f,
upload_request=FileUploadRequest(
filename="document.pdf",
content_type="application/pdf",
owner_type=FileOwnerType.USER,
owner_id=user_id,
is_public=False,
)
)
# Retrieve the file
file_content = await file_service.get_file(
file_id=metadata.id,
requesting_user_id=user_id
)
Core Concepts
Storage Backends
The File Storage Kit supports three storage backends:
# Local filesystem - files stored in directory
compose.file(
storage_backend="local",
storage_path="./uploads", # Required for local
max_file_size_mb=100,
)
# Google Cloud Storage - files stored in GCS bucket
compose.file(
storage_backend="gcs",
gcs_bucket="my-app-files", # Required for GCS
gcs_project="my-gcp-project", # Required for GCS
gcs_credentials_path="/path/to/credentials.json", # Optional
max_file_size_mb=500,
)
# In-memory - files stored in RAM (development/testing only)
compose.file(
storage_backend="memory",
max_file_size_mb=10,
)
Local Storage:
- Files stored in configured directory
- Fast access, no external dependencies
- Not suitable for multi-server deployments
- Best for: Single-server apps, development
GCS Storage:
- Files stored in Google Cloud Storage bucket
- Scalable, durable, distributed
- Requires GCP credentials
- Best for: Production multi-server apps, high availability
Memory Storage:
- Files stored in RAM
- Very fast but volatile (lost on restart)
- Limited by available memory
- Best for: Testing, temporary files
File Ownership and Access Control
Files are owned by either a user or a group:
from portico.ports.file_storage import FileOwnerType
# User-owned file (private by default)
upload_request = FileUploadRequest(
filename="resume.pdf",
owner_type=FileOwnerType.USER,
owner_id=user_id,
is_public=False, # Only owner can access
)
# Group-owned file
upload_request = FileUploadRequest(
filename="team-doc.pdf",
owner_type=FileOwnerType.GROUP,
owner_id=group_id,
is_public=False, # Only group members can access (TODO)
)
# Public file (anyone can access)
upload_request = FileUploadRequest(
filename="public-doc.pdf",
owner_type=FileOwnerType.USER,
owner_id=user_id,
is_public=True, # Anyone can read
)
Access rules:
- Owner can always read, update, and delete their files
- Public files can be read by anyone (but only owner can modify/delete)
- Private files can only be accessed by owner
- Group files require group membership check (TODO: requires Group Kit integration)
File Metadata
All files have metadata stored in the database:
# Upload returns metadata
metadata = await file_service.upload_file(file_content, upload_request)
# Metadata fields
print(metadata.id) # UUID
print(metadata.filename) # "document.pdf"
print(metadata.original_filename) # "document.pdf"
print(metadata.content_type) # "application/pdf"
print(metadata.size_bytes) # 1024000
print(metadata.owner_type) # FileOwnerType.USER
print(metadata.owner_id) # UUID
print(metadata.storage_path) # Internal path (don't use directly)
print(metadata.is_public) # False
print(metadata.description) # Optional description
print(metadata.tags) # ["invoice", "2024"]
print(metadata.created_at) # datetime
print(metadata.updated_at) # datetime
File Operations
Upload, retrieve, update, delete files with built-in access control:
# Upload file
metadata = await file_service.upload_file(
file_content=file_bytes,
upload_request=FileUploadRequest(...)
)
# Get file with content
file_content = await file_service.get_file(
file_id=metadata.id,
requesting_user_id=user_id
)
# file_content.metadata, file_content.content
# Get metadata only (no file content download)
metadata = await file_service.get_file_metadata(
file_id=file_id,
requesting_user_id=user_id
)
# Update metadata
updated = await file_service.update_file_metadata(
file_id=file_id,
update_request=FileUpdateRequest(
description="Updated description",
tags=["updated", "2024"]
),
requesting_user_id=user_id
)
# Delete file
success = await file_service.delete_file(
file_id=file_id,
requesting_user_id=user_id
)
File Search and Listing
Find files by owner, search query, or public access:
# List user's files
user_files = await file_service.list_files_by_owner(
owner_type=FileOwnerType.USER,
owner_id=user_id,
requesting_user_id=user_id,
limit=50,
offset=0
)
# List group's files
group_files = await file_service.list_files_by_owner(
owner_type=FileOwnerType.GROUP,
owner_id=group_id,
requesting_user_id=user_id,
limit=50
)
# List all public files
public_files = await file_service.list_public_files(
limit=100,
offset=0
)
# Search files (by filename, description, tags)
results = await file_service.search_files(
query="invoice",
owner_type=FileOwnerType.USER,
owner_id=user_id,
requesting_user_id=user_id,
limit=20
)
Configuration
Required Settings
Storage backend must be configured with backend-specific settings.
For local storage:
| Setting | Type | Required | Description |
|---|---|---|---|
storage_backend |
"local" |
Yes | Use local filesystem |
storage_path |
str |
Yes | Directory path for file storage |
For GCS storage:
| Setting | Type | Required | Description |
|---|---|---|---|
storage_backend |
"gcs" |
Yes | Use Google Cloud Storage |
gcs_bucket |
str |
Yes | GCS bucket name |
gcs_project |
str |
Yes | GCP project ID |
gcs_credentials_path |
str |
No | Path to service account JSON |
For memory storage:
| Setting | Type | Required | Description |
|---|---|---|---|
storage_backend |
"memory" |
Yes | Use in-memory storage |
Optional Settings
| Setting | Type | Default | Description |
|---|---|---|---|
max_file_size_mb |
int |
100 |
Maximum file size in megabytes |
allowed_content_types |
set[str] \| None |
None |
Set of allowed MIME types (None = all allowed) |
Example Configurations:
from portico import compose
# Local storage for development
compose.file(
storage_backend="local",
storage_path="./uploads",
max_file_size_mb=50,
)
# GCS for production
compose.file(
storage_backend="gcs",
gcs_bucket="myapp-production-files",
gcs_project="myapp-prod",
gcs_credentials_path="/secrets/gcs-key.json",
max_file_size_mb=500,
)
# Restrict content types (images only)
compose.file(
storage_backend="local",
storage_path="./images",
max_file_size_mb=10,
allowed_content_types={"image/jpeg", "image/png", "image/gif", "image/webp"}
)
# Memory for testing
compose.file(
storage_backend="memory",
max_file_size_mb=5,
)
Usage Examples
Example 1: File Upload Endpoint
from fastapi import FastAPI, UploadFile, File
from portico.ports.file_storage import FileUploadRequest, FileOwnerType
from portico.kits.fastapi import Dependencies
deps = Dependencies(app)
@app.post("/files/upload")
async def upload_file(
file: UploadFile = File(...),
user = deps.current_user
):
file_service = deps.webapp.kits["file"].service
# Read file content
content = await file.read()
# Upload with metadata
metadata = await file_service.upload_file(
file_content=content,
upload_request=FileUploadRequest(
filename=file.filename,
content_type=file.content_type,
owner_type=FileOwnerType.USER,
owner_id=user.id,
is_public=False,
)
)
return {
"file_id": str(metadata.id),
"filename": metadata.filename,
"size_bytes": metadata.size_bytes,
"url": f"/files/{metadata.id}"
}
Example 2: File Download Endpoint
from fastapi import Response
from portico.exceptions import FileNotFoundError, FileAccessError
@app.get("/files/{file_id}")
async def download_file(
file_id: UUID,
user = deps.optional_user
):
file_service = deps.webapp.kits["file"].service
try:
# Get file with access control
file_content = await file_service.get_file(
file_id=file_id,
requesting_user_id=user.id if user else None
)
# Return file as response
return Response(
content=file_content.content,
media_type=file_content.metadata.content_type,
headers={
"Content-Disposition": f'attachment; filename="{file_content.metadata.filename}"'
}
)
except FileNotFoundError:
raise HTTPException(404, "File not found")
except FileAccessError:
raise HTTPException(403, "Access denied")
Example 3: User File Gallery
@app.get("/users/{user_id}/files")
async def list_user_files(
user_id: UUID,
current_user = deps.current_user,
limit: int = 50,
offset: int = 0
):
file_service = deps.webapp.kits["file"].service
# List user's files (with access control)
files = await file_service.list_files_by_owner(
owner_type=FileOwnerType.USER,
owner_id=user_id,
requesting_user_id=current_user.id,
limit=limit,
offset=offset
)
return {
"files": [
{
"id": str(f.id),
"filename": f.filename,
"size_bytes": f.size_bytes,
"content_type": f.content_type,
"created_at": f.created_at.isoformat(),
"is_public": f.is_public,
}
for f in files
],
"count": len(files)
}
Example 4: File Search
@app.get("/files/search")
async def search_files(
q: str,
current_user = deps.current_user,
limit: int = 20
):
file_service = deps.webapp.kits["file"].service
# Search across user's files
results = await file_service.search_files(
query=q,
owner_type=FileOwnerType.USER,
owner_id=current_user.id,
requesting_user_id=current_user.id,
limit=limit
)
return {
"query": q,
"results": [
{
"id": str(f.id),
"filename": f.filename,
"description": f.description,
"tags": f.tags,
}
for f in results
]
}
Example 5: File Metadata Update
from pydantic import BaseModel
class UpdateFileRequest(BaseModel):
description: str | None = None
tags: list[str] | None = None
is_public: bool | None = None
@app.patch("/files/{file_id}")
async def update_file(
file_id: UUID,
update_data: UpdateFileRequest,
user = deps.current_user
):
file_service = deps.webapp.kits["file"].service
# Update metadata (with access control)
updated = await file_service.update_file_metadata(
file_id=file_id,
update_request=FileUpdateRequest(
description=update_data.description,
tags=update_data.tags,
is_public=update_data.is_public
),
requesting_user_id=user.id
)
return {
"file_id": str(updated.id),
"description": updated.description,
"tags": updated.tags,
"is_public": updated.is_public
}
Domain Models
FileMetadata
Represents file metadata stored in database.
| Field | Type | Default | Description |
|---|---|---|---|
id |
UUID |
Auto | Unique file identifier |
filename |
str |
- | Current filename |
original_filename |
str |
- | Original filename at upload |
content_type |
str |
- | MIME type (e.g., "image/png") |
size_bytes |
int |
- | File size in bytes |
owner_type |
FileOwnerType |
- | "user" or "group" |
owner_id |
UUID |
- | UUID of owner |
storage_path |
str \| None |
- | Internal storage path (don't use directly) |
is_public |
bool |
False |
Whether file is publicly accessible |
description |
str \| None |
None |
Optional file description |
tags |
List[str] |
[] |
Tags for categorization |
created_at |
datetime |
Auto | When file was uploaded (UTC) |
updated_at |
datetime |
Auto | When metadata was last updated (UTC) |
FileUploadRequest
Request model for uploading a file.
| Field | Type | Default | Description |
|---|---|---|---|
filename |
str |
- | Filename to store |
content_type |
str \| None |
Auto-detect | MIME type |
owner_type |
FileOwnerType |
- | "user" or "group" |
owner_id |
UUID |
- | UUID of owner |
is_public |
bool |
False |
Make file publicly accessible |
description |
str \| None |
None |
File description |
tags |
List[str] |
[] |
Tags for categorization |
FileUpdateRequest
Request model for updating file metadata.
| Field | Type | Description |
|---|---|---|
filename |
str \| None |
New filename |
is_public |
bool \| None |
Change public access |
description |
str \| None |
New description |
tags |
List[str] \| None |
New tags (replaces existing) |
FileContent
File content with metadata (returned by get_file).
| Field | Type | Description |
|---|---|---|
metadata |
FileMetadata |
File metadata |
content |
bytes |
File binary content |
FileOwnerType
Enum for file owner types.
| Value | Description |
|---|---|
USER |
File owned by individual user |
GROUP |
File owned by group |
Database Models
FileMetadataModel
Table: file_metadata
Columns:
id: UUID, primary keyfilename: String(255)original_filename: String(255)content_type: String(100)size_bytes: BigIntegerowner_type: String(20) - "user" or "group"owner_id: UUIDstorage_path: String(500), nullableis_public: Boolean, default Falsedescription: Text, nullabletags: JSON (array of strings)created_at: DateTime with timezoneupdated_at: DateTime with timezone
Indexes:
idx_file_metadata_owner: Onowner_type,owner_idcolumnsidx_file_metadata_public: Onis_publiccolumn
Events
This kit does not currently publish events, but could be extended to publish:
FileUploadedEvent- When file is uploadedFileDeletedEvent- When file is deletedFileAccessedEvent- When file is accessed (for analytics)
Best Practices
1. Always Use Access Control
Pass requesting_user_id to enforce access control:
# ✅ GOOD - Access control enforced
file_content = await file_service.get_file(
file_id=file_id,
requesting_user_id=current_user.id
)
# Raises FileAccessError if user doesn't have permission
# ❌ BAD - No access control
file_content = await file_service.get_file(
file_id=file_id,
requesting_user_id=None # Bypasses access control!
)
# Only works for public files
2. Set Appropriate File Size Limits
Configure limits based on your application needs:
# ✅ GOOD - Reasonable limits per use case
# Image uploads
compose.file(storage_backend="local", max_file_size_mb=10)
# Document uploads
compose.file(storage_backend="local", max_file_size_mb=50)
# Video uploads
compose.file(storage_backend="gcs", max_file_size_mb=500)
# ❌ BAD - No limit or too large
compose.file(storage_backend="local", max_file_size_mb=10000)
# Can exhaust disk space
3. Restrict Content Types for Security
Limit allowed file types to prevent malicious uploads:
# ✅ GOOD - Whitelist allowed types
compose.file(
storage_backend="local",
storage_path="./uploads",
allowed_content_types={
"image/jpeg", "image/png", "image/gif",
"application/pdf",
"text/plain",
}
)
# Rejects executable files, scripts, etc.
# ❌ BAD - No restrictions
compose.file(storage_backend="local", storage_path="./uploads")
# Allows any file type, including malware
4. Use GCS for Production
Local storage doesn't scale across multiple servers:
# ✅ GOOD - GCS for production
if os.getenv("ENVIRONMENT") == "production":
compose.file(
storage_backend="gcs",
gcs_bucket="myapp-prod-files",
gcs_project="myapp-prod"
)
else:
compose.file(
storage_backend="local",
storage_path="./uploads"
)
# ❌ BAD - Local storage in multi-server production
compose.file(storage_backend="local", storage_path="./uploads")
# Files not shared across servers
5. Use Tags for Organization
Tag files for easier search and filtering:
# ✅ GOOD - Descriptive tags
upload_request = FileUploadRequest(
filename="invoice.pdf",
owner_type=FileOwnerType.USER,
owner_id=user_id,
tags=["invoice", "2024", "Q1", "client-ABC"]
)
# Easy to find: "Find all invoices from Q1 2024"
# ❌ BAD - No tags
upload_request = FileUploadRequest(
filename="invoice.pdf",
owner_type=FileOwnerType.USER,
owner_id=user_id,
tags=[] # Hard to organize and search
)
6. Handle Upload Errors Gracefully
Provide clear error messages to users:
# ✅ GOOD - Specific error handling
from portico.exceptions import FileSizeExceededError, FileUploadError
try:
metadata = await file_service.upload_file(content, upload_request)
return {"success": True, "file_id": str(metadata.id)}
except FileSizeExceededError as e:
return {"error": f"File too large (max {e.max_size_bytes / 1024 / 1024}MB)"}
except FileUploadError as e:
return {"error": f"Upload failed: {e.message}"}
# ❌ BAD - Generic error
try:
metadata = await file_service.upload_file(content, upload_request)
except Exception:
return {"error": "Upload failed"} # No details for user
7. Clean Up Deleted Files
Ensure both metadata and content are deleted:
# ✅ GOOD - Service handles both
success = await file_service.delete_file(
file_id=file_id,
requesting_user_id=user_id
)
# Deletes both database record and storage file
# ❌ BAD - Manual deletion (error-prone)
await file_repository.delete_metadata(file_id)
# Forgot to delete from storage! Orphaned file
Security Considerations
File Upload Security
Validate and sanitize uploaded files:
- Limit file sizes to prevent DoS attacks
- Restrict content types to prevent malicious file uploads
- Scan files for malware (consider integrating antivirus)
- Generate unique filenames to prevent path traversal attacks
Access Control
Always enforce ownership checks:
# Check ownership before sensitive operations
metadata = await file_service.get_file_metadata(file_id, user.id)
if metadata.owner_type == FileOwnerType.USER and metadata.owner_id != user.id:
raise HTTPException(403, "Not your file")
Storage Path Security
Never expose internal storage paths to users:
# Use file IDs, not paths
url = f"/files/{metadata.id}" # ✅ Safe
# Never do this:
url = f"/files/{metadata.storage_path}" # ❌ Exposes internal structure
Public File Considerations
Be careful with public files:
- Don't store sensitive data in public files
- Consider adding watermarks or DRM for copyrighted content
- Monitor public file access for abuse
- Implement rate limiting on public file downloads
FAQs
Q: How do I migrate from local to GCS storage?
A: Files must be manually copied to GCS. Steps:
- Upload all local files to GCS bucket
- Update
storage_pathin database to match GCS paths - Change configuration to
storage_backend="gcs" - Test file retrieval
- Remove local files after verification
Q: Can I use S3 instead of GCS?
A: Not currently. The kit supports local, GCS, and memory backends. To add S3:
- Implement
S3FileStorageAdapterconforming toFileStorageAdapterport - Add S3 backend option to
compose.file() - Contribute back to Portico!
Q: How are file permissions checked for group files?
A: Currently, group file permissions require Group Kit integration (marked as TODO in code). Basic check: owner can access. Full implementation requires checking group membership.
Q: What happens if I upload a file with the same name?
A: Each upload creates a new file with unique ID. Filename is just metadata. Multiple files can have the same filename but different IDs.
Q: How do I serve files through a CDN?
A: Extend get_file_url() to generate CDN URLs:
# Override in custom service
async def get_file_url(self, file_id: UUID) -> str:
metadata = await self.file_repository.get_metadata_by_id(file_id)
return f"https://cdn.example.com/files/{metadata.storage_path}"
Q: Can I store file thumbnails or previews?
A: Not directly. Store thumbnails as separate files with tags linking them:
# Upload original
original = await file_service.upload_file(...)
# Upload thumbnail
thumbnail = await file_service.upload_file(
thumbnail_bytes,
FileUploadRequest(
filename=f"thumb_{original.filename}",
tags=["thumbnail", f"original:{original.id}"]
)
)
Q: How do I handle large file uploads?
A: Consider streaming uploads and chunking:
- Use FastAPI's streaming request body
- Stream directly to storage adapter
- Validate size incrementally
- For very large files, consider presigned upload URLs (GCS/S3)
Q: What's the performance impact of metadata tracking?
A: Minimal. Each upload requires one database INSERT. Retrieval requires one SELECT. Database lookups are fast compared to file I/O.
Q: Can I use the File Kit without a database?
A: No, the kit requires database for metadata tracking. For pure storage without metadata, use the storage adapters directly (not recommended - loses access control and search).
Q: How do I implement file versioning?
A: Not built-in. Implement by:
- Add
versionfield to metadata - Don't delete old versions, mark as archived
- Store multiple files with same
original_filenamebut different versions - Add
parent_file_idto link versions