# N8N Workflow Documentation - Troubleshooting Guide ## Overview This document details the challenges encountered during the workflow documentation process and provides solutions for common issues. It serves as a guide for future documentation efforts and troubleshooting similar problems. ## Approaches That Failed ### 1. Browser Automation with Playwright #### What We Tried ```javascript // Attempted approach await page.goto('https://localhost:8000'); await page.selectOption('#categoryFilter', 'Business Process Automation'); await page.waitForLoadState('networkidle'); ``` #### Why It Failed - **Dynamic Loading Bottleneck**: The web application loads all 2,055 workflows before applying client-side filtering - **Timeout Issues**: Browser automation timed out waiting for the filtering process to complete - **Memory Constraints**: Loading all workflows simultaneously exceeded browser memory limits - **JavaScript Complexity**: The client-side filtering logic was too complex for reliable automation #### Symptoms - Page loads but workflows never finish loading - Browser automation hangs on category selection - "Waiting for page to load" messages that never complete - Network timeouts after 2+ minutes #### Error Messages ``` TimeoutError: page.waitForLoadState: Timeout 30000ms exceeded Waiting for load state to be NetworkIdle ``` ### 2. Firecrawl with Dynamic Filtering #### What We Tried ```javascript // Attempted approach firecrawl_scrape({ url: "https://localhost:8000", actions: [ {type: "wait", milliseconds: 5000}, {type: "executeJavascript", script: "document.getElementById('categoryFilter').value = 'Business Process Automation'; document.getElementById('categoryFilter').dispatchEvent(new Event('change'));"}, {type: "wait", milliseconds: 30000} ] }) ``` #### Why It Failed - **60-Second Timeout Limit**: Firecrawl's maximum wait time was insufficient for complete data loading - **JavaScript Execution Timing**: The filtering process required waiting for all workflows to load first - **Response Size Limits**: Filtered results still exceeded token limits for processing - **Inconsistent State**: Scraping occurred before filtering was complete #### Symptoms - Firecrawl returns incomplete data (1 workflow instead of 77) - Timeout errors after 60 seconds - "Request timed out" or "Internal server error" responses - Inconsistent results between scraping attempts #### Error Messages ``` Failed to scrape URL. Status code: 408. Error: Request timed out Failed to scrape URL. Status code: 500. Error: (Internal server error) - timeout Total wait time (waitFor + wait actions) cannot exceed 60 seconds ``` ### 3. Single Large Web Scraping #### What We Tried Direct scraping of the entire page without category filtering: ```bash curl -s "https://localhost:8000" | html2text ``` #### Why It Failed - **Data Overload**: 2,055 workflows generated responses exceeding 25,000 token limits - **No Organization**: Results were unstructured and difficult to categorize - **Missing Metadata**: HTML scraping didn't provide structured workflow details - **Pagination Issues**: Workflows are loaded progressively, not all at once #### Symptoms - "Response exceeds maximum allowed tokens" errors - Truncated or incomplete data - Missing workflow details and metadata - Unstructured output difficult to process ## What Worked: Direct API Strategy ### Why This Approach Succeeded #### 1. Avoided JavaScript Complexity - **Direct Data Access**: API endpoints provided structured data without client-side processing - **No Dynamic Loading**: Each API call returned complete data immediately - **Reliable State**: No dependency on browser state or JavaScript execution #### 2. Manageable Response Sizes - **Individual Requests**: Single workflow details fit within token limits - **Structured Data**: JSON responses were predictable and parseable - **Metadata Separation**: Workflow details were properly structured in API responses #### 3. Rate Limiting Control - **Controlled Pacing**: Small delays between requests prevented server overload - **Batch Processing**: Category-based organization enabled logical processing - **Error Recovery**: Individual failures didn't stop the entire process ### Technical Implementation That Worked ```bash # Step 1: Get category mappings (single fast call) curl -s "${API_BASE}/category-mappings" | jq '.mappings' # Step 2: Group by category jq 'to_entries | group_by(.value) | map({category: .[0].value, count: length, files: map(.key)})' # Step 3: For each workflow, get details for file in $workflow_files; do curl -s "${API_BASE}/workflows/${file}" | jq '.metadata' sleep 0.05 # Small delay for rate limiting done ``` ## Common Issues and Solutions ### Issue 1: JSON Parsing Errors #### Symptoms ``` jq: parse error: Invalid numeric literal at line 1, column 11 ``` #### Cause API returned non-JSON responses (HTML error pages, empty responses) #### Solution ```bash # Validate JSON before processing response=$(curl -s "${API_BASE}/workflows/${filename}") if echo "$response" | jq -e '.metadata' > /dev/null 2>&1; then echo "$response" | jq '.metadata' else echo "{\"error\": \"Failed to fetch $filename\", \"filename\": \"$filename\"}" fi ``` ### Issue 2: URL Encoding Problems #### Symptoms - 404 errors for workflows with special characters in filenames - API calls failing for certain workflow files #### Cause Workflow filenames contain special characters that need URL encoding #### Solution ```bash # Proper URL encoding encoded_filename=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$filename'))") curl -s "${API_BASE}/workflows/${encoded_filename}" ``` ### Issue 3: Missing Workflow Data #### Symptoms - Empty fields in generated documentation - "Unknown" values for workflow properties #### Cause API response structure nested metadata under `.metadata` key #### Solution ```bash # Extract from correct path workflow_name=$(echo "$workflow_json" | jq -r '.name // "Unknown"') # Changed to: workflow_name=$(echo "$response" | jq -r '.metadata.name // "Unknown"') ``` ### Issue 4: Script Timeouts During Bulk Processing #### Symptoms - Scripts timing out after 10 minutes - Incomplete documentation generation - Process stops mid-category #### Cause Processing 2,055 API calls with delays takes significant time #### Solution ```bash # Process categories individually for category in $categories; do generate_single_category "$category" done # Or use timeout command timeout 600 ./generate_all_categories.sh ``` ### Issue 5: Inconsistent Markdown Formatting #### Symptoms - Trailing commas in integration lists - Missing or malformed data fields - Inconsistent status display #### Cause Variable data quality and missing fallback handling #### Solution ```bash # Clean integration lists workflow_integrations=$(echo "$workflow_json" | jq -r '.integrations[]?' 2>/dev/null | tr '\n' ', ' | sed 's/, $//') # Handle boolean fields properly workflow_active=$(echo "$workflow_json" | jq -r '.active // false') status=$([ "$workflow_active" = "1" ] && echo "Active" || echo "Inactive") ``` ## Prevention Strategies ### 1. API Response Validation Always validate API responses before processing: ```bash if ! echo "$response" | jq -e . >/dev/null 2>&1; then echo "Invalid JSON response" continue fi ``` ### 2. Graceful Error Handling Don't let individual failures stop the entire process: ```bash workflow_data=$(fetch_workflow_details "$filename" || echo '{"error": "fetch_failed"}') ``` ### 3. Progress Tracking Include progress indicators for long-running processes: ```bash echo "[$processed/$total] Processing $filename" ``` ### 4. Rate Limiting Always include delays to be respectful to APIs: ```bash sleep 0.05 # Small delay between requests ``` ### 5. Data Quality Checks Verify counts and data integrity: ```bash expected_count=77 actual_count=$(grep "^###" output.md | wc -l) if [ "$actual_count" -ne "$expected_count" ]; then echo "Warning: Count mismatch" fi ``` ## Future Recommendations ### For Similar Projects 1. **Start with API exploration** before attempting web scraping 2. **Test with small datasets** before processing large volumes 3. **Implement resume capability** for long-running processes 4. **Use structured logging** for better debugging 5. **Build in validation** at every step ### For API Improvements 1. **Category filtering endpoints** would eliminate need for client-side filtering 2. **Batch endpoints** could reduce the number of individual requests 3. **Response pagination** for large category results 4. **Rate limiting headers** to guide appropriate delays ### For Documentation Process 1. **Automated validation** against source API counts 2. **Incremental updates** rather than full regeneration 3. **Parallel processing** where appropriate 4. **Better error reporting** and recovery mechanisms ## Emergency Recovery Procedures ### If Process Fails Mid-Execution 1. **Identify completed categories**: Check which markdown files exist 2. **Resume from failure point**: Process only missing categories 3. **Validate existing files**: Ensure completed files have correct counts 4. **Manual intervention**: Handle problematic workflows individually ### If API Access Is Lost 1. **Verify connectivity**: Check tunnel/proxy status 2. **Test API endpoints**: Confirm they're still accessible 3. **Switch to backup**: Use alternative access methods if available 4. **Document outage**: Note any missing data for later completion This troubleshooting guide ensures that future documentation efforts can avoid the pitfalls encountered and build upon the successful strategies identified.