Latest News Feed
Real-time AI and ML news from Google News, Reddit, and Twitter/X
Fetching news feed...
Architecture & Workflow
How the AI News Tracker processes and generates content
Data Pipeline Overview
The AI News Tracker implements a comprehensive data acquisition and processing pipeline with two-tier security guardrails:
Three Feed Sources:
- Feed A (Real-Time): Google News API for AI/tech articles
- Feed B (Community): Reddit (r/MachineLearning, r/artificial)
- Feed C (Expert): Twitter/X expert accounts
Strategy 1 - Content Filtering:
- Remove HTML/Markdown formatting
- Strip boilerplate text (copyright, footers)
- Truncate to max tokens (Google News: 2000, Reddit: 2000, Twitter: 1000)
- Validate against domain allowlist
- Detect injection patterns
Strategy 2 - Prompt-Level Guardrails:
- System prompts with clear persona and constraints
- Topic filtering (allowlist ML/AI topics, blocklist finance/politics)
- Input validation for escape sequences and JSON manipulation
- Output validation against JSON schema
Parallel Processing:
- Article summarization (150-300 words)
- Video idea generation
- Thumbnail generation via Leonardo API
Final Output:
- Unified feed.json with all metadata
- Web UI display with filtering/sorting
- API endpoints for external consumption
Automation & Scheduling
n8n Workflow: Webhook-triggered orchestration
- POST /webhook/run-pipeline triggers full pipeline
- Cron scheduling every 6 hours via bash script
- Execution logs and error handling
Security Features
- ✓ Pre-LLM content sanitization prevents injection attacks
- ✓ Prompt-level guardrails enforce topic focus
- ✓ Domain whitelist prevents untrusted sources
- ✓ Output validation ensures data integrity
- ✓ Rate limiting and API key isolation (via .env)
Raw Output Feed
Complete feed.json data structure for API consumption
{
"loading": true,
"message": "Fetching feed.json..."
}
Get in Touch
Have a question or feedback? Send us a message!