Railway Deployment Guide - Cidadão.AI Backend¶
Autor: Anderson Henrique da Silva Localização: Minas Gerais, Brasil Última Atualização: 2025-10-13 15:15:18 -0300
Author: Anderson Henrique da Silva Date: 2025-10-13 Status: Production Deployment Analysis
Overview¶
This guide documents the Railway deployment process, common issues, and solutions for the Cidadão.AI backend application.
Table of Contents¶
- Railway CLI Setup
- Project Configuration
- Deployment Analysis
- Common Issues & Solutions
- Environment Variables
- Health Check Configuration
Railway CLI Setup¶
Installation¶
Railway CLI is already installed and configured with project token authentication.
Authentication¶
The project uses a Project Token for authentication, configured in ~/.bashrc:
# Railway CLI - Project Token (cidadao.ai)
export RAILWAY_TOKEN=d2d7c6a7-e999-4d3e-b11d-67b76e4e5678
Essential Commands¶
Project Information¶
# Check project status
railway status
# Open Railway dashboard in browser
railway open
# Check current authentication
railway whoami
Deployment¶
# Deploy local code
railway up
# Redeploy last version
railway redeploy
# Remove last deployment
railway down
Logs & Monitoring¶
# View logs (requires service selection)
railway logs --tail 100
# View logs from specific service
railway logs --environment production --service <service-name>
# Follow logs in real-time
railway logs --tail 100 --follow
Variables & Configuration¶
# View environment variables
railway variables
# Run command with Railway environment
railway run <command>
# Open shell with Railway environment
railway shell
Service Management¶
Domain Management¶
Important Notes¶
- Some commands require explicit service specification when using Project Token:
- The Project Token provides limited access compared to user authentication
- Always check service name before running service-specific commands
Project Configuration¶
railway.json¶
The project is configured to use Nixpacks builder with custom startup script:
{
"$schema": "https://railway.app/railway.schema.json",
"build": {
"builder": "NIXPACKS"
},
"deploy": {
"startCommand": "bash start.sh",
"restartPolicyType": "ON_FAILURE",
"restartPolicyMaxRetries": 10
}
}
Configuration Details:
- Builder: Nixpacks (automatic Python environment detection)
- Start Command: bash start.sh (includes database migrations)
- Restart Policy: Restart on failure, max 10 retries
- Port: Detected from $PORT environment variable (default 8080)
start.sh¶
The startup script handles database migrations before starting the server:
#!/bin/bash
set -e # Exit on error
echo "🔄 Running database migrations..."
python -m alembic upgrade head
echo "✅ Migrations completed successfully"
echo "🚀 Starting Uvicorn server..."
exec uvicorn src.api.app:app --host 0.0.0.0 --port ${PORT:-8080}
Startup Flow:
1. Run Alembic migrations (gracefully skips if DATABASE_URL not available)
2. Start Uvicorn server on port 8080 (or $PORT if set)
3. Use exec to replace shell process with Uvicorn (proper signal handling)
Deployment Analysis¶
Current Deployment Logs (2025-10-13)¶
Startup Sequence¶
12:31:07 [inf] Starting Container
12:31:08 [inf] 🔄 Running database migrations...
12:31:08 [inf] ⚠️ WARNING: No valid DATABASE_URL found. Skipping migrations.
12:31:08 [inf] To enable migrations:
1. Add PostgreSQL database in Railway dashboard
2. DATABASE_URL will be automatically provided by Railway
3. Redeploy the application
12:31:08 [inf] ✅ Migrations completed successfully
12:31:08 [inf] 🚀 Starting Uvicorn server...
Status: ✅ Normal - Application gracefully handles missing DATABASE_URL
Application Startup¶
12:31:13 [inf] 🚀 Using Supabase REST service for investigations (Railway/VPS)
12:31:13 [inf] === CHAT.PY LOADING - VERSION 13:45:00 ===
12:31:14 [err] INFO: Started server process [1]
12:31:14 [err] INFO: Waiting for application startup.
12:31:14 [inf] Cidadão.AI API started (env: production)
12:31:14 [err] INFO: Application startup complete.
12:31:14 [err] INFO: Uvicorn running on http://0.0.0.0:8080
Status: ✅ Healthy - Server started successfully on port 8080
Post-Startup (5 Minutes Later)¶
Status: ⚠️ Issue Detected - Potential health check failures
Problem Diagnosis¶
Symptoms¶
- Application starts successfully
- After ~5 minutes, warning/error logs appear
- Pattern suggests health check failures or timeouts
Root Causes¶
1. Health Check Timeout
Railway performs health checks on deployed services. The /ready endpoint in src/api/routes/health.py makes a real HTTP request to Portal da Transparência:
@router.get("/ready")
async def readiness_probe():
transparency_status = await _check_transparency_api()
if transparency_status["status"] == "healthy":
return {"status": "ready"}
else:
raise HTTPException(status_code=503, detail="Service not ready")
Issues: - External API calls in health checks can be slow or fail - Portal da Transparência has 78% of endpoints returning 403 - Health checks should be fast (<1s) and not depend on external services
2. Database Connection Check
The health check also attempts to verify database connectivity, which may fail if DATABASE_URL is not configured properly.
3. Railway Health Check Configuration
Railway default health check settings: - Initial Delay: 0 seconds (starts immediately) - Timeout: 10 seconds per check - Failure Threshold: 3 consecutive failures - Interval: Every 30 seconds
If health checks fail for 5 minutes (10 consecutive failures), Railway may: - Mark the service as unhealthy - Restart the container - Return 503 errors to clients
Common Issues & Solutions¶
Issue 1: Database Migration Warnings¶
Symptom:
Solution:
This is expected behavior when PostgreSQL is not configured. The application works with Supabase REST API for investigations.
To Enable PostgreSQL (Optional):
-
Add PostgreSQL database in Railway dashboard:
-
Railway automatically provides
DATABASE_URLenvironment variable -
Redeploy application:
-
Verify migrations ran:
Issue 2: Health Check Failures¶
Symptom:
Solution:
Create a lightweight health check endpoint that doesn't depend on external services.
Recommended Implementation:
@router.get("/health")
async def simple_health():
"""Ultra-fast health check for Railway."""
return {"status": "ok", "timestamp": datetime.utcnow()}
Railway Configuration:
Configure Railway to use the simple endpoint:
1. Go to Railway dashboard → Service Settings
2. Set Health Check Path: /health
3. Set Initial Delay: 10 seconds (allow startup time)
4. Set Timeout: 5 seconds
Issue 3: Port Binding Issues¶
Symptom:
Solution:
Ensure application uses $PORT environment variable:
Railway automatically sets $PORT - never hardcode port numbers.
Issue 4: CLI Authentication Errors¶
Symptom:
Solution:
-
Verify token is set:
-
If not set, load from .bashrc:
-
Verify authentication:
-
If still failing, the token may have expired. Generate new token:
- Go to Railway dashboard
- Project Settings → Tokens
- Generate new Project Token
- Update
~/.bashrc
Environment Variables¶
Required Variables¶
# LLM Provider
GROQ_API_KEY=your-groq-api-key
# Security
JWT_SECRET_KEY=your-jwt-secret-key
SECRET_KEY=your-app-secret-key
API_SECRET_KEY=your-api-secret-key
# Portal da Transparência (Optional)
TRANSPARENCY_API_KEY=your-transparency-api-key
# Supabase (For investigations)
SUPABASE_URL=your-supabase-url
SUPABASE_KEY=your-supabase-key
Optional Variables¶
# Database (PostgreSQL)
DATABASE_URL=postgresql://user:pass@host:port/db
# Cache (Redis)
REDIS_URL=redis://host:port
# Application
APP_ENV=production
DEBUG=false
LOG_LEVEL=INFO
Managing Variables in Railway¶
# View all variables
railway variables
# Set a variable (interactive)
railway variables set
# Set via command line
railway run --env production env VAR_NAME=value
# View specific variable
railway variables get VAR_NAME
Health Check Configuration¶
Current Implementation¶
The application has three health check endpoints:
/health- Basic health check (recommended for Railway)/health/detailed- Comprehensive system check/health/ready- Kubernetes-style readiness probe (slow)
Recommended Railway Configuration¶
Railway Dashboard Settings¶
- Health Check Path:
/health/live - Initial Delay: 15 seconds
- Timeout: 5 seconds
- Interval: 30 seconds
- Failure Threshold: 3
Application Code¶
Ensure /health/live is ultra-fast and doesn't check external dependencies:
@router.get("/live")
async def liveness_probe():
"""Simple liveness check - no external dependencies."""
return {"status": "alive", "timestamp": datetime.utcnow()}
Testing Health Checks Locally¶
# Test basic health
curl http://localhost:8080/health/live
# Test detailed health
curl http://localhost:8080/health/detailed
# Test readiness (may be slow)
curl http://localhost:8080/health/ready
Next Steps¶
Immediate Actions¶
- Fix Health Check ✅
- Implement lightweight
/health/liveendpoint - Update Railway health check configuration
-
Remove external API calls from critical health checks
-
Configure Database (Optional)
- Add PostgreSQL service in Railway
- Verify migrations run successfully
-
Test application with PostgreSQL
-
Monitor Deployment
- Watch logs after redeploy
- Verify no health check failures
- Confirm uptime > 5 minutes without errors
Long-term Improvements¶
- Implement Proper Database
- PostgreSQL for persistent storage
- Connection pooling
-
Backup/recovery strategy
-
Add Redis Cache
- Reduce API calls to Portal da Transparência
- Improve response times
-
Implement distributed caching
-
Set Up Monitoring
- Grafana dashboards for Railway metrics
- Alert on health check failures
-
Track API response times
-
CI/CD Pipeline
- Automated testing before deployment
- Staging environment for testing
- Automated rollback on failures
Useful Resources¶
- Railway Documentation: https://docs.railway.app
- Railway CLI GitHub: https://github.com/railwayapp/cli
- Nixpacks Documentation: https://nixpacks.com/docs
- FastAPI Deployment: https://fastapi.tiangolo.com/deployment/
Troubleshooting Checklist¶
When deployment fails, check in order:
- Verify RAILWAY_TOKEN is set:
echo $RAILWAY_TOKEN - Check Railway service status:
railway status - View recent logs:
railway logs --tail 100 - Test health endpoint locally:
curl localhost:8080/health/live - Verify environment variables:
railway variables - Check build logs in Railway dashboard
- Confirm port binding uses $PORT variable
- Review Alembic migration logs
- Test external API connectivity
- Check Railway resource limits (CPU/Memory)
Contact & Support¶
For issues specific to this deployment: - Repository: https://github.com/anderson-ufrj/cidadao.ai-backend - Author: Anderson Henrique da Silva - Location: Minas Gerais, Brasil
For Railway platform issues: - Railway Support: https://railway.app/help - Community Discord: https://discord.gg/railway