This little utility, given a starting URL, will crawl a website and find all the static assets and links on that site.
- Crawl an entire site and report on its structure
- Flexible output formats (i.e. json, tab, digraph)
- Customize performance characteristics
- The utility will stay within the same domain
- THe utility, when it finds duplicate URLs, it will not traverse into its links, but still report on the links found.
- Ability to save results to a file
- Set number of worker threads/goroutines to crawl a site
- Set rate limiter, if desired
- Set inactivity timeout
- Read in saved results and redisplay in different formats
docker run mkboudreau/sitemap ....
go get github.com/mkboudreau/sitemap
make install
sitemap www.microsoft.com
sitemap -w 50 www.microsoft.com
sitemap -r 0s www.microsoft.com
sitemap -f json www.microsoft.com
sitemap -f tab www.microsoft.com
sitemap -f digraph www.microsoft.com
sitemap -o saved.json www.microsoft.com
sitemap -i saved.json -f digraph