The Spider is a tool that is used to automatically discover new resources (URLs) on a particular Site. It begins with a list of URLs to visit, called the seeds, which depends on how the Spider is started. The Spider then visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit and the process continues recursively as long as new resources are found.
The Spider can be configured and started using the Spider dialogue.
During the processing of an URL, the Spider makes a request to fetch the resource and then parses the response, identifying hyperlinks. It currently has the following behavior when processing types of responses:
Processes the specific tags, identifying links to new resources:
If set in the Options Spider screen, it also analyzes the ‘Robots.txt’ file and tries to identify new resources using the specified rules. It has to be mentioned that the Spider does not follow the rules specified in the ‘Robots.txt’ file.
If set in the Options Spider screen, the Spider also analyzes the ‘sitemap.xml’ file and tries to identify new resources.
If set in the Options Spider screen, the Spider should also parse SVN metadata files and tries to identify new resources.
If set in the Options Spider screen, the Spider should also parse Git metadata files and tries to identify new resources.
If set in the Options Spider screen, the Spider should also parse .DS_Store files and tries to identify new resources.
OData content using the Atom format is currently supported. All included links (relative or absolute) are processed.
SVG image files are parsed to identify HREF attributes and extract/resolve any contained links.
Text responses are parsed scanning for the URL pattern
Currently, the Spider does not process this type of resources.
The spider is configured using the Spider Options screen.
Spider Options screen | for an overview of the Spider Options |
ZAP In Ten: Explore Your Applications (10:36) | |
ZAP Deep Dive: Exploring Applications: Standard Spider (34:35) |