Heritrix3
@ https://github.com/internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
A suite of open source tools and packages to capture interactive websites and replay them at a later time as accurately as possible.
Allows you to download a World Wide Web site from the Internet to a local directory.
Retrieves files using HTTP, follows links, maps links to resources in filesystem, useful for archiving and mirroring web resources.