Broken Link Monitoring for SEO
Broken links hurt your SEO rankings and frustrate visitors. Google's crawlers notice dead links and may lower your page quality score. Users who hit a 404 leave and don't come back. The fix is checking your links regularly. The Apixies Link Checker API makes it automatic.
The Approach
- Get a list of URLs from your sitemap (or a text file)
- Check each one via the API
- Flag anything that isn't a 200
- Run this weekly on a schedule
You'll catch dead links before Google does and before users report them.
Python Script: Sitemap Checker
This script fetches your sitemap XML, extracts all <loc> URLs, checks each one, and reports the results.
import requests
import xml.etree.ElementTree as ET
import sys
import time
API_KEY = "YOUR_API_KEY"
SITEMAP_URL = sys.argv[1] if len(sys.argv) > 1 else "https://apixies.io/sitemap.xml"
def check_link(url):
response = requests.get(
"https://apixies.io/api/v1/check-link",
headers={"X-API-Key": API_KEY},
params={"url": url},
)
if response.status_code != 200:
return {"url": url, "reachable": False, "status_code": None, "error": "API error"}
return response.json().get("data", {})
# Fetch and parse sitemap
print(f"Fetching sitemap: {SITEMAP_URL}")
sitemap_response = requests.get(SITEMAP_URL)
root = ET.fromstring(sitemap_response.content)
# Handle namespace
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = [loc.text for loc in root.findall(".//sm:loc", ns)]
print(f"Found {len(urls)} URLs\n")
working = []
redirects = []
broken = []
unreachable = []
for i, url in enumerate(urls):
result = check_link(url)
status = result.get("status_code")
reachable = result.get("reachable", False)
if not reachable:
unreachable.append(url)
print(f" [{i+1}/{len(urls)}] UNREACHABLE: {url}")
elif status and status >= 400:
broken.append((url, status))
print(f" [{i+1}/{len(urls)}] BROKEN ({status}): {url}")
elif status and status >= 300:
final = result.get("final_url", "?")
redirects.append((url, status, final))
print(f" [{i+1}/{len(urls)}] REDIRECT ({status}): {url} -> {final}")
else:
working.append(url)
print(f" [{i+1}/{len(urls)}] OK ({status}): {url}")
time.sleep(1)
# Summary
print(f"\n--- Results ---")
print(f"Working: {len(working)}")
print(f"Redirects: {len(redirects)}")
print(f"Broken: {len(broken)}")
print(f"Unreachable: {len(unreachable)}")
if broken:
print(f"\nBroken links:")
for url, status in broken:
print(f" {status} {url}")
if unreachable:
print(f"\nUnreachable:")
for url in unreachable:
print(f" {url}")
Run it:
python check-sitemap.py https://yoursite.com/sitemap.xml
Bash Version: Check URLs from a File
If you don't have a sitemap, put your URLs in a text file and check them:
#!/bin/bash
# check-links.sh - Check URLs from a text file
API_KEY="${APIXIES_API_KEY:-YOUR_API_KEY}"
INPUT="${1:-urls.txt}"
BROKEN=0
while IFS= read -r url; do
[ -z "$url" ] && continue
RESPONSE=$(curl -s -H "X-API-Key: $API_KEY" \
"https://apixies.io/api/v1/check-link?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$url', safe=''))")")
STATUS=$(echo "$RESPONSE" | jq -r '.data.status_code // "unreachable"')
REACHABLE=$(echo "$RESPONSE" | jq -r '.data.reachable')
if [ "$REACHABLE" != "true" ] || [ "$STATUS" -ge 400 ] 2>/dev/null; then
echo "BROKEN ($STATUS): $url"
BROKEN=$((BROKEN + 1))
else
echo "OK ($STATUS): $url"
fi
sleep 1
done < "$INPUT"
echo ""
echo "$BROKEN broken link(s) found"
exit $BROKEN
Checking Outbound Links
Your site links to external pages. Those break too. The concept: parse the HTML of a page, extract all <a href> values, and check each one.
Here's the idea in Python:
from html.parser import HTMLParser
import requests
class LinkExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.links = []
def handle_starttag(self, tag, attrs):
if tag == "a":
for name, value in attrs:
if name == "href" and value and value.startswith("http"):
self.links.append(value)
# Fetch your page
page = requests.get("https://yoursite.com/blog/some-post")
parser = LinkExtractor()
parser.feed(page.text)
print(f"Found {len(parser.links)} outbound links")
# Check each one (same check_link function from above)
for link in parser.links:
result = check_link(link)
status = result.get("status_code", "?")
if not result.get("reachable") or (isinstance(status, int) and status >= 400):
print(f" BROKEN: {link} ({status})")
This won't catch every link (JavaScript-rendered content, for example), but it handles most static HTML pages.
Running on a Schedule
Cron (weekly, Sunday at 2 AM)
0 2 * * 0 /path/to/check-links.sh /path/to/urls.txt >> /var/log/link-check.log 2>&1
GitHub Actions (weekly)
name: Link Check
on:
schedule:
- cron: '0 2 * * 0'
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Check sitemap links
run: |
pip install requests
python check-sitemap.py https://yoursite.com/sitemap.xml
env:
APIXIES_API_KEY: ${{ secrets.APIXIES_API_KEY }}
Tracking Changes Over Time
Run your checks weekly and save the results. Diff against the previous run to see what changed:
# This week's check
./check-links.sh urls.txt > results-$(date +%Y%m%d).txt
# Compare with last week
diff results-20260219.txt results-20260226.txt
New broken links show up as additions in the diff. Fixed links show as removals. This gives you a changelog of your site's link health.
Rate Limits
75 requests per day on the free tier. That's 75 URLs you can check daily. For most personal sites and small blogs, this is plenty. If you have more URLs, split them across days or prioritize the most important pages.
Next Steps
- Link Checker API Tutorial -- learn the API basics
- Link Checker tool -- check a URL right now in the browser
- Redirect Tracer -- see the full redirect chain for any URL
- URL Validator -- deeper URL analysis
- All guides