Skip to content

Broken Link Monitoring for SEO

Broken links hurt your SEO rankings and frustrate visitors. Google's crawlers notice dead links and may lower your page quality score. Users who hit a 404 leave and don't come back. The fix is checking your links regularly. The Apixies Link Checker API makes it automatic.

The Approach

  1. Get a list of URLs from your sitemap (or a text file)
  2. Check each one via the API
  3. Flag anything that isn't a 200
  4. Run this weekly on a schedule

You'll catch dead links before Google does and before users report them.

Python Script: Sitemap Checker

This script fetches your sitemap XML, extracts all <loc> URLs, checks each one, and reports the results.

import requests
import xml.etree.ElementTree as ET
import sys
import time

API_KEY = "YOUR_API_KEY"
SITEMAP_URL = sys.argv[1] if len(sys.argv) > 1 else "https://apixies.io/sitemap.xml"

def check_link(url):
    response = requests.get(
        "https://apixies.io/api/v1/check-link",
        headers={"X-API-Key": API_KEY},
        params={"url": url},
    )
    if response.status_code != 200:
        return {"url": url, "reachable": False, "status_code": None, "error": "API error"}
    return response.json().get("data", {})

# Fetch and parse sitemap
print(f"Fetching sitemap: {SITEMAP_URL}")
sitemap_response = requests.get(SITEMAP_URL)
root = ET.fromstring(sitemap_response.content)

# Handle namespace
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = [loc.text for loc in root.findall(".//sm:loc", ns)]

print(f"Found {len(urls)} URLs\n")

working = []
redirects = []
broken = []
unreachable = []

for i, url in enumerate(urls):
    result = check_link(url)
    status = result.get("status_code")
    reachable = result.get("reachable", False)

    if not reachable:
        unreachable.append(url)
        print(f"  [{i+1}/{len(urls)}] UNREACHABLE: {url}")
    elif status and status >= 400:
        broken.append((url, status))
        print(f"  [{i+1}/{len(urls)}] BROKEN ({status}): {url}")
    elif status and status >= 300:
        final = result.get("final_url", "?")
        redirects.append((url, status, final))
        print(f"  [{i+1}/{len(urls)}] REDIRECT ({status}): {url} -> {final}")
    else:
        working.append(url)
        print(f"  [{i+1}/{len(urls)}] OK ({status}): {url}")

    time.sleep(1)

# Summary
print(f"\n--- Results ---")
print(f"Working: {len(working)}")
print(f"Redirects: {len(redirects)}")
print(f"Broken: {len(broken)}")
print(f"Unreachable: {len(unreachable)}")

if broken:
    print(f"\nBroken links:")
    for url, status in broken:
        print(f"  {status} {url}")

if unreachable:
    print(f"\nUnreachable:")
    for url in unreachable:
        print(f"  {url}")

Run it:

python check-sitemap.py https://yoursite.com/sitemap.xml

Bash Version: Check URLs from a File

If you don't have a sitemap, put your URLs in a text file and check them:

#!/bin/bash
# check-links.sh - Check URLs from a text file
API_KEY="${APIXIES_API_KEY:-YOUR_API_KEY}"
INPUT="${1:-urls.txt}"
BROKEN=0

while IFS= read -r url; do
  [ -z "$url" ] && continue
  RESPONSE=$(curl -s -H "X-API-Key: $API_KEY" \
    "https://apixies.io/api/v1/check-link?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$url', safe=''))")")

  STATUS=$(echo "$RESPONSE" | jq -r '.data.status_code // "unreachable"')
  REACHABLE=$(echo "$RESPONSE" | jq -r '.data.reachable')

  if [ "$REACHABLE" != "true" ] || [ "$STATUS" -ge 400 ] 2>/dev/null; then
    echo "BROKEN ($STATUS): $url"
    BROKEN=$((BROKEN + 1))
  else
    echo "OK ($STATUS): $url"
  fi

  sleep 1
done < "$INPUT"

echo ""
echo "$BROKEN broken link(s) found"
exit $BROKEN

Checking Outbound Links

Your site links to external pages. Those break too. The concept: parse the HTML of a page, extract all <a href> values, and check each one.

Here's the idea in Python:

from html.parser import HTMLParser
import requests

class LinkExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            for name, value in attrs:
                if name == "href" and value and value.startswith("http"):
                    self.links.append(value)

# Fetch your page
page = requests.get("https://yoursite.com/blog/some-post")
parser = LinkExtractor()
parser.feed(page.text)

print(f"Found {len(parser.links)} outbound links")

# Check each one (same check_link function from above)
for link in parser.links:
    result = check_link(link)
    status = result.get("status_code", "?")
    if not result.get("reachable") or (isinstance(status, int) and status >= 400):
        print(f"  BROKEN: {link} ({status})")

This won't catch every link (JavaScript-rendered content, for example), but it handles most static HTML pages.

Running on a Schedule

Cron (weekly, Sunday at 2 AM)

0 2 * * 0 /path/to/check-links.sh /path/to/urls.txt >> /var/log/link-check.log 2>&1

GitHub Actions (weekly)

name: Link Check
on:
  schedule:
    - cron: '0 2 * * 0'

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - name: Check sitemap links
        run: |
          pip install requests
          python check-sitemap.py https://yoursite.com/sitemap.xml
        env:
          APIXIES_API_KEY: ${{ secrets.APIXIES_API_KEY }}

Tracking Changes Over Time

Run your checks weekly and save the results. Diff against the previous run to see what changed:

# This week's check
./check-links.sh urls.txt > results-$(date +%Y%m%d).txt

# Compare with last week
diff results-20260219.txt results-20260226.txt

New broken links show up as additions in the diff. Fixed links show as removals. This gives you a changelog of your site's link health.

Rate Limits

75 requests per day on the free tier. That's 75 URLs you can check daily. For most personal sites and small blogs, this is plenty. If you have more URLs, split them across days or prioritize the most important pages.

Next Steps

Try the Link Checker API

Free tier is for development & small projects. 75 requests/day with a registered account.

Getting Started

Explore

Resources

Get Started Free

Free for development and small projects.

Get Free API Key

We use cookies for analytics to understand how our site is used. Privacy Policy