Package org.jweaver.crawler
Interface JWeaverCrawler.Builder
- All Known Implementing Classes:
JWeaverBuilderImpl
- Enclosing interface:
JWeaverCrawler
public static interface JWeaverCrawler.Builder
The Builder interface provides methods for building and customize an instance of
JWeaverCrawler.
-
Method Summary
Modifier and TypeMethodDescriptionBuilds and returns a new instance of JWeaverCrawler with the configured parameters.exportConfiguration
(ExportConfig configuration) Sets the export configuration for configuring data export options.httpClient
(HttpClient httpClient) Sets the HTTP client to be used by the crawler for making HTTP requests.maxDepth
(int maxDepth) Sets the maximum depth of crawling .parser
(DocumentParser documentParser) Sets the document parser to be used by the crawler for parsing relevant information from HTML body.politenessDelay
(Duration duration) Sets the politeness delay between consecutive requests made by the crawler to the same hostwriter
(JWeaverWriter writer) Sets the writer for exporting the crawled data.
-
Method Details
-
httpClient
Sets the HTTP client to be used by the crawler for making HTTP requests.Default Will use a default HttpClient if not provided with connection timeout 5 seconds and follow redirects policy to AlWAYS
- Parameters:
httpClient
- The HTTP client to be used. (Optional)- Returns:
- This builder instance.
-
parser
Sets the document parser to be used by the crawler for parsing relevant information from HTML body.Default Will use the
JWeaverDocumentParser
if not provided.- Parameters:
documentParser
- The document parser to be used. (Optional)- Returns:
- This builder instance.
-
writer
Sets the writer for exporting the crawled data.Default Will use a
JWeaverFileWriter
if not provided- Parameters:
writer
- The writer for exporting data. (Optional)- Returns:
- This builder instance for method chaining.
-
exportConfiguration
Sets the export configuration for configuring data export options.Default Will use the default
ExportConfig.exportDefault()
which uses a Markdown format and the 'output'/ path for files.- Parameters:
configuration
- The export configuration. (Optional)- Returns:
- This builder instance for method chaining.
-
politenessDelay
Sets the politeness delay between consecutive requests made by the crawler to the same hostDefault 3 seconds.
- Parameters:
duration
- The politeness delay duration. (Optional)- Returns:
- This builder instance for method chaining.
-
maxDepth
Sets the maximum depth of crawling .Default 3
- Parameters:
maxDepth
- The maximum depth of crawling. (Optional)- Returns:
- This builder instance for method chaining.
-
build
Builds and returns a new instance of JWeaverCrawler with the configured parameters.- Parameters:
uriList
- The required initial set of URIs to start crawling from. Each URI should be from a different host. For each URI, a new JWeaverTask (execution) will be created.- Returns:
- A new instance of JWeaverCrawler.
-