Guides & Tutorials

Step-by-step tutorials for getting the most out of crawler.sh.

GuideMar 14, 2026

How to Preprocess Web Content for RLHF Training Pairs

A step-by-step guide to crawling web content, cleaning it, and structuring it into preference pairs for RLHF reward model training.

Mehmet Kose
GuideMar 11, 2026

How to Fetch a Single Page with CLI

Learn how to fetch a single URL using crawler.sh CLI without crawling the entire site. Get clean output with smart, path-based filenames.

Mehmet Kose
GuideMar 8, 2026

How to Integrate crawler.sh into MLOps Pipelines

Learn how to use crawler.sh CLI in MLOps workflows to collect training data, validate documentation sites, and automate web crawling in CI/CD pipelines.

Mehmet Kose
GuideMar 7, 2026

How to Find Orphan Pages on a Website with CLI

Learn how to detect orphan pages with zero incoming internal links using crawler.sh CLI. Identify isolated pages and fix your internal linking.

Mehmet Kose
GuideMar 6, 2026

How to Crawl Data to Train AI Model with CLI

Learn how to crawl website content and extract clean Markdown for AI training datasets using crawler.sh CLI. Export structured data for LLM fine-tuning.

Mehmet Kose
GuideMar 6, 2026

How to Find Long Content with CLI

Learn how to detect pages with over 5,000 words using crawler.sh CLI. Find excessively long pages that may need to be split for better user experience and SEO.

Mehmet Kose
GuideMar 6, 2026

How to Find Empty H1 Tags with CLI

Learn how to detect pages with empty H1 tags using crawler.sh CLI. Find headings that contain no text and fix them to improve SEO and page structure.

Mehmet Kose
GuideMar 6, 2026

How to Find Broken Links of a Website with CLI

Learn how to detect broken links and dead pages on any website using crawler.sh CLI. Crawl your site, identify 4xx/5xx errors, and export a report.

Mehmet Kose

Showing 8 of 30 guides

Crawler.sh - Free Local AEO & SEO Spider and a Markdown content extractor | Product Hunt