# Article Processing Pipeline

This system implements a database-driven pipeline for processing articles from PMC (PubMed Central) with automatic fetching, summarization, validation, and activation.

## Pipeline Flow

```
Fetch → Summarize → Validate → Activate
  ↓         ↓         ↓         ↓
fetched → summarized → validated → active
```

## Database Changes Required

Run the migration to add the new state tracking fields:

```bash
php artisan migrate
```

This will add:
- `state` (enum): Tracks the current processing stage
- `last_processed_at` (timestamp): When the article was last processed
- `processing_notes` (text): Notes about processing status

## Commands

### 1. Weekly Fetch (Wednesday 2pm)
```bash
php artisan articles:fetch-weekly
```
- Fetches articles from the past week (Monday to Sunday)
- Uses all keywords from the database
- Sets articles to `fetched` state

### 2. Process Summarization (Every 5 minutes)
```bash
php artisan articles:summarize
```
- Processes articles in `fetched` state
- Dispatches summarization jobs
- Updates state to `summarized`

### 3. Process Validation (Every 5 minutes)
```bash
php artisan articles:validate
```
- Processes articles in `summarized` state
- Dispatches validation jobs
- Updates state to `validated`

### 4. Activate Articles (Every 5 minutes)
```bash
php artisan articles:activate
```
- Activates articles with `overall_score >= 85%`
- Sets state to `active`

### 5. Monitor Pipeline
```bash
php artisan articles:monitor
```
- Shows pipeline status and statistics
- Displays recent activity and failed articles

## Scheduling

The scheduler is configured in `app/Console/Kernel.php`:

- **Weekly Fetch**: Every Wednesday at 2pm
- **Summarization**: Every 5 minutes
- **Validation**: Every 5 minutes  
- **Activation**: Every 5 minutes

## Article States

| State | Description |
|-------|-------------|
| `fetched` | Article fetched from PMC |
| `pending_summary` | Queued for summarization |
| `summarized` | Summarization completed |
| `pending_validation` | Queued for validation |
| `validated` | Validation completed (score available) |
| `active` | Article activated (score >= 85%) |
| `failed` | Processing failed |

## Configuration

### Queue Configuration
Make sure your queue is running:
```bash
php artisan queue:work
```

### Environment Variables
Ensure these are set in your `.env`:
```
QUEUE_CONNECTION=database
APP_URL=your-app-url
```

## Monitoring

### Check Pipeline Status
```bash
php artisan articles:monitor
```

### Check Queue Status
```bash
php artisan queue:work --once
```

### View Failed Jobs
```bash
php artisan queue:failed
```

## Troubleshooting

### Articles Stuck in Pipeline
1. Check the `processing_notes` field for error details
2. Run `php artisan articles:monitor` to see distribution
3. Manually retry failed articles by updating their state

### Failed Jobs
1. Check `php artisan queue:failed`
2. Review logs in `storage/logs/laravel.log`
3. Restart queue workers if needed

### Database Issues
1. Ensure migration has been run
2. Check database connection
3. Verify table structure

## Manual Commands

### Process Specific Articles
```bash
# Summarize specific article
php artisan articles:summarize --limit=1

# Validate specific article  
php artisan articles:validate --limit=1

# Activate specific article
php artisan articles:activate --limit=1
```

### Reset Failed Articles
```sql
UPDATE articles SET state = 'fetched' WHERE state = 'failed';
```

## Performance Tips

1. **Queue Workers**: Run multiple queue workers for better throughput
2. **Batch Processing**: Use `--limit` option to control batch sizes
3. **Monitoring**: Use the monitor command regularly to track progress
4. **Logs**: Monitor logs for errors and performance issues

## API Integration

The system integrates with external APIs for:
- **Summarization**: Via webhook to n8n workflow
- **Validation**: Via webhook to validation service

Ensure webhook URLs are correctly configured in your jobs. 
