Sitemap Generator

Summary

This little utility, given a starting URL, will crawl a website and find all the static assets and links on that site.

Design Goals

Crawl an entire site and report on its structure
Flexible output formats (i.e. json, tab, digraph)
Customize performance characteristics

Design Decisions

The utility will stay within the same domain
THe utility, when it finds duplicate URLs, it will not traverse into its links, but still report on the links found.

Features

Ability to save results to a file
Set number of worker threads/goroutines to crawl a site
Set rate limiter, if desired
Set inactivity timeout
Read in saved results and redisplay in different formats

How to get it

(1) You have Docker installed

docker run mkboudreau/sitemap ....

(2) You have Go installed

go get github.com/mkboudreau/sitemap 
make install

Example Usage

Crawl site with sensible defaults

sitemap www.microsoft.com

Crawl site with 50 workers

sitemap -w 50 www.microsoft.com

Crawl site with rate limiting turned off

sitemap -r 0s www.microsoft.com

Crawl site and output JSON

sitemap -f json www.microsoft.com

Crawl site and output tabular format (default)

sitemap -f tab www.microsoft.com

Crawl site and output digraph (dot)

sitemap -f digraph www.microsoft.com

Crawl site and save results to file

sitemap -o saved.json www.microsoft.com

Use saved results and output as a digraph

sitemap -i saved.json -f digraph

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
builder		builder
command		command
domain		domain
example		example
format		format
.gitignore		.gitignore
.godir		.godir
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sitemap Generator

Summary

Design Goals

Design Decisions

Features

How to get it

(1) You have Docker installed

(2) You have Go installed

Example Usage

Crawl site with sensible defaults

Crawl site with 50 workers

Crawl site with rate limiting turned off

Crawl site and output JSON

Crawl site and output tabular format (default)

Crawl site and output digraph (dot)

Crawl site and save results to file

Use saved results and output as a digraph

About

Releases

Packages

Languages

License

mkboudreau/sitemap

Folders and files

Latest commit

History

Repository files navigation

Sitemap Generator

Summary

Design Goals

Design Decisions

Features

How to get it

(1) You have Docker installed

(2) You have Go installed

Example Usage

Crawl site with sensible defaults

Crawl site with 50 workers

Crawl site with rate limiting turned off

Crawl site and output JSON

Crawl site and output tabular format (default)

Crawl site and output digraph (dot)

Crawl site and save results to file

Use saved results and output as a digraph

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages