Files

mattwick f57d8f0539 first stage of extraction complete

2025-07-28 09:44:28 -05:00

9.5 KiB

Raw Blame History

N8N Workflow Documentation - Troubleshooting Guide

Overview

This document details the challenges encountered during the workflow documentation process and provides solutions for common issues. It serves as a guide for future documentation efforts and troubleshooting similar problems.

Approaches That Failed

1. Browser Automation with Playwright

What We Tried

// Attempted approach
await page.goto('https://localhost:8000');
await page.selectOption('#categoryFilter', 'Business Process Automation');
await page.waitForLoadState('networkidle');

Why It Failed

Dynamic Loading Bottleneck: The web application loads all 2,055 workflows before applying client-side filtering
Timeout Issues: Browser automation timed out waiting for the filtering process to complete
Memory Constraints: Loading all workflows simultaneously exceeded browser memory limits
JavaScript Complexity: The client-side filtering logic was too complex for reliable automation

Symptoms

Page loads but workflows never finish loading
Browser automation hangs on category selection
"Waiting for page to load" messages that never complete
Network timeouts after 2+ minutes

Error Messages

TimeoutError: page.waitForLoadState: Timeout 30000ms exceeded
Waiting for load state to be NetworkIdle

2. Firecrawl with Dynamic Filtering

What We Tried

// Attempted approach
firecrawl_scrape({
  url: "https://localhost:8000",
  actions: [
    {type: "wait", milliseconds: 5000},
    {type: "executeJavascript", script: "document.getElementById('categoryFilter').value = 'Business Process Automation'; document.getElementById('categoryFilter').dispatchEvent(new Event('change'));"},
    {type: "wait", milliseconds: 30000}
  ]
})

Why It Failed

60-Second Timeout Limit: Firecrawl's maximum wait time was insufficient for complete data loading
JavaScript Execution Timing: The filtering process required waiting for all workflows to load first
Response Size Limits: Filtered results still exceeded token limits for processing
Inconsistent State: Scraping occurred before filtering was complete

Symptoms

Firecrawl returns incomplete data (1 workflow instead of 77)
Timeout errors after 60 seconds
"Request timed out" or "Internal server error" responses
Inconsistent results between scraping attempts

Error Messages

Failed to scrape URL. Status code: 408. Error: Request timed out
Failed to scrape URL. Status code: 500. Error: (Internal server error) - timeout
Total wait time (waitFor + wait actions) cannot exceed 60 seconds

3. Single Large Web Scraping

What We Tried

Direct scraping of the entire page without category filtering:

curl -s "https://localhost:8000" | html2text

Why It Failed

Data Overload: 2,055 workflows generated responses exceeding 25,000 token limits
No Organization: Results were unstructured and difficult to categorize
Missing Metadata: HTML scraping didn't provide structured workflow details
Pagination Issues: Workflows are loaded progressively, not all at once

Symptoms

"Response exceeds maximum allowed tokens" errors
Truncated or incomplete data
Missing workflow details and metadata
Unstructured output difficult to process

What Worked: Direct API Strategy

Why This Approach Succeeded

1. Avoided JavaScript Complexity

Direct Data Access: API endpoints provided structured data without client-side processing
No Dynamic Loading: Each API call returned complete data immediately
Reliable State: No dependency on browser state or JavaScript execution

2. Manageable Response Sizes

Individual Requests: Single workflow details fit within token limits
Structured Data: JSON responses were predictable and parseable
Metadata Separation: Workflow details were properly structured in API responses

3. Rate Limiting Control

Controlled Pacing: Small delays between requests prevented server overload
Batch Processing: Category-based organization enabled logical processing
Error Recovery: Individual failures didn't stop the entire process

Technical Implementation That Worked

# Step 1: Get category mappings (single fast call)
curl -s "${API_BASE}/category-mappings" | jq '.mappings'

# Step 2: Group by category  
jq 'to_entries | group_by(.value) | map({category: .[0].value, count: length, files: map(.key)})'

# Step 3: For each workflow, get details
for file in $workflow_files; do
    curl -s "${API_BASE}/workflows/${file}" | jq '.metadata'
    sleep 0.05  # Small delay for rate limiting
done

Common Issues and Solutions

Issue 1: JSON Parsing Errors

Symptoms

jq: parse error: Invalid numeric literal at line 1, column 11

Cause

API returned non-JSON responses (HTML error pages, empty responses)

Solution

# Validate JSON before processing
response=$(curl -s "${API_BASE}/workflows/${filename}")
if echo "$response" | jq -e '.metadata' > /dev/null 2>&1; then
    echo "$response" | jq '.metadata'
else
    echo "{\"error\": \"Failed to fetch $filename\", \"filename\": \"$filename\"}"
fi

Issue 2: URL Encoding Problems

Symptoms

404 errors for workflows with special characters in filenames
API calls failing for certain workflow files

Cause

Workflow filenames contain special characters that need URL encoding

Solution

# Proper URL encoding
encoded_filename=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$filename'))")
curl -s "${API_BASE}/workflows/${encoded_filename}"

Issue 3: Missing Workflow Data

Symptoms

Empty fields in generated documentation
"Unknown" values for workflow properties

Cause

API response structure nested metadata under .metadata key

Solution

# Extract from correct path
workflow_name=$(echo "$workflow_json" | jq -r '.name // "Unknown"')
# Changed to:
workflow_name=$(echo "$response" | jq -r '.metadata.name // "Unknown"')

Issue 4: Script Timeouts During Bulk Processing

Symptoms

Scripts timing out after 10 minutes
Incomplete documentation generation
Process stops mid-category

Cause

Processing 2,055 API calls with delays takes significant time

Solution

# Process categories individually
for category in $categories; do
    generate_single_category "$category"
done

# Or use timeout command
timeout 600 ./generate_all_categories.sh

Issue 5: Inconsistent Markdown Formatting

Symptoms

Trailing commas in integration lists
Missing or malformed data fields
Inconsistent status display

Cause

Variable data quality and missing fallback handling

Solution

# Clean integration lists
workflow_integrations=$(echo "$workflow_json" | jq -r '.integrations[]?' 2>/dev/null | tr '\n' ', ' | sed 's/, $//')

# Handle boolean fields properly
workflow_active=$(echo "$workflow_json" | jq -r '.active // false')
status=$([ "$workflow_active" = "1" ] && echo "Active" || echo "Inactive")

Prevention Strategies

1. API Response Validation

Always validate API responses before processing:

if ! echo "$response" | jq -e . >/dev/null 2>&1; then
    echo "Invalid JSON response"
    continue
fi

2. Graceful Error Handling

Don't let individual failures stop the entire process:

workflow_data=$(fetch_workflow_details "$filename" || echo '{"error": "fetch_failed"}')

3. Progress Tracking

Include progress indicators for long-running processes:

echo "[$processed/$total] Processing $filename"

4. Rate Limiting

Always include delays to be respectful to APIs:

sleep 0.05  # Small delay between requests

5. Data Quality Checks

Verify counts and data integrity:

expected_count=77
actual_count=$(grep "^###" output.md | wc -l)
if [ "$actual_count" -ne "$expected_count" ]; then
    echo "Warning: Count mismatch"
fi

Future Recommendations

For Similar Projects

Start with API exploration before attempting web scraping
Test with small datasets before processing large volumes
Implement resume capability for long-running processes
Use structured logging for better debugging
Build in validation at every step

For API Improvements

Category filtering endpoints would eliminate need for client-side filtering
Batch endpoints could reduce the number of individual requests
Response pagination for large category results
Rate limiting headers to guide appropriate delays

For Documentation Process

Automated validation against source API counts
Incremental updates rather than full regeneration
Parallel processing where appropriate
Better error reporting and recovery mechanisms

Emergency Recovery Procedures

If Process Fails Mid-Execution

Identify completed categories: Check which markdown files exist
Resume from failure point: Process only missing categories
Validate existing files: Ensure completed files have correct counts
Manual intervention: Handle problematic workflows individually

If API Access Is Lost

Verify connectivity: Check tunnel/proxy status
Test API endpoints: Confirm they're still accessible
Switch to backup: Use alternative access methods if available
Document outage: Note any missing data for later completion

This troubleshooting guide ensures that future documentation efforts can avoid the pitfalls encountered and build upon the successful strategies identified.

9.5 KiB Raw Blame History

N8N Workflow Documentation - Troubleshooting Guide

Overview

Approaches That Failed

1. Browser Automation with Playwright

What We Tried

Why It Failed

Symptoms

Error Messages

2. Firecrawl with Dynamic Filtering

What We Tried

Why It Failed

Symptoms

Error Messages

3. Single Large Web Scraping

What We Tried

Why It Failed

Symptoms

What Worked: Direct API Strategy

Why This Approach Succeeded

1. Avoided JavaScript Complexity

2. Manageable Response Sizes

3. Rate Limiting Control

Technical Implementation That Worked

Common Issues and Solutions

Issue 1: JSON Parsing Errors

Symptoms

Cause

Solution

Issue 2: URL Encoding Problems

Symptoms

Cause

Solution

Issue 3: Missing Workflow Data

Symptoms

Cause

Solution

Issue 4: Script Timeouts During Bulk Processing

Symptoms

Cause

Solution

Issue 5: Inconsistent Markdown Formatting

Symptoms

Cause

Solution

Prevention Strategies

1. API Response Validation

2. Graceful Error Handling

3. Progress Tracking

4. Rate Limiting

5. Data Quality Checks

Future Recommendations

For Similar Projects

For API Improvements

For Documentation Process

Emergency Recovery Procedures

If Process Fails Mid-Execution

If API Access Is Lost

9.5 KiB

Raw Blame History