How to use Inspyder OrFind for fast website crawling
1) Install & open
- Download OrFind from Inspyder and install on Windows.
- Launch the app and choose “New Project.”
2) Set target and scope
- Enter starting URL.
- Limit scope: set include/exclude patterns (domains, subfolders) and max crawl depth to avoid over-crawling.
3) Configure crawl options for speed
- Threads/concurrency: increase thread count for faster crawling (start moderate, e.g., 10–25).
- Request delay: set a small delay (100–500 ms) to balance speed and server load.
- Timeouts & retries: lower per-request timeout slightly (e.g., 5–10s) and 1–2 retries to avoid long stalls.
- Respect robots.txt: enable if you need polite crawling; disable only when you have permission.
4) Authentication & rendering
- Logins: add HTTP Basic or form credentials if pages sit behind authentication.
- JavaScript pages: enable any available JS rendering if the site relies on client-side links (note: JS rendering is slower).
5) Extraction & filters
- Link types: include internal links, images, scripts, and assets you need.
- File extensions: exclude large binaries (e.g., .zip, .mp4) to speed up the crawl.
- Custom extraction: add XPath/CSS patterns if you need specific data.
6) Start crawl and monitor
- Click “Start.” Monitor progress, queue, active threads, and error counts. Pause if errors spike and adjust settings.
7) Post-crawl actions
- Export results (CSV/Excel, XML) for URLs, status codes, anchors, and link sources.
- Use built-in reports to find broken links, redirects, duplicate titles, or orphan pages.
8) Practical tips
- Run a short shallow crawl first (depth 1–2) to validate config before full run.
- Increase concurrency on local fast networks; reduce it for shared/slow targets.
- Schedule off-peak runs for large sites and always test politeness settings to avoid triggering defenses.
If you want, I can create step-by-step settings (exact values for threads, delays, depth) tailored to a small, medium, or large site—tell me the site size you expect.
Leave a Reply