Class JWeaverBuilderImpl
java.lang.Object
org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- All Implemented Interfaces:
JWeaverCrawler.Builder
A concrete implementation of the
JWeaverCrawler.Builder
interface used to configure and
build instances of JWeaverCrawler
.-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionBuilds and returns a new instance of JWeaverCrawler with the configured parameters.exportConfiguration
(ExportConfig exportConfiguration) Sets the export configuration for configuring data export options.httpClient
(HttpClient httpClient) Sets the HTTP client to be used by the crawler for making HTTP requests.maxDepth
(int maxDepth) Sets the maximum depth of crawling .parser
(DocumentParser documentParser) Sets the document parser to be used by the crawler for parsing relevant information from HTML body.politenessDelay
(Duration politenessDelay) Sets the politeness delay between consecutive requests made by the crawler to the same hostwriter
(JWeaverWriter writer) Sets the writer for exporting the crawled data.
-
Constructor Details
-
JWeaverBuilderImpl
public JWeaverBuilderImpl()Constructs a new JWeaverBuilderImpl instance.
-
-
Method Details
-
httpClient
Description copied from interface:JWeaverCrawler.Builder
Sets the HTTP client to be used by the crawler for making HTTP requests.Default Will use a default HttpClient if not provided with connection timeout 5 seconds and follow redirects policy to AlWAYS
- Specified by:
httpClient
in interfaceJWeaverCrawler.Builder
- Parameters:
httpClient
- The HTTP client to be used. (Optional)- Returns:
- This builder instance.
-
parser
Description copied from interface:JWeaverCrawler.Builder
Sets the document parser to be used by the crawler for parsing relevant information from HTML body.Default Will use the
JWeaverDocumentParser
if not provided.- Specified by:
parser
in interfaceJWeaverCrawler.Builder
- Parameters:
documentParser
- The document parser to be used. (Optional)- Returns:
- This builder instance.
-
writer
Description copied from interface:JWeaverCrawler.Builder
Sets the writer for exporting the crawled data.Default Will use a
JWeaverFileWriter
if not provided- Specified by:
writer
in interfaceJWeaverCrawler.Builder
- Parameters:
writer
- The writer for exporting data. (Optional)- Returns:
- This builder instance for method chaining.
-
exportConfiguration
Description copied from interface:JWeaverCrawler.Builder
Sets the export configuration for configuring data export options.Default Will use the default
ExportConfig.exportDefault()
which uses a Markdown format and the 'output'/ path for files.- Specified by:
exportConfiguration
in interfaceJWeaverCrawler.Builder
- Parameters:
exportConfiguration
- The export configuration. (Optional)- Returns:
- This builder instance for method chaining.
-
politenessDelay
Description copied from interface:JWeaverCrawler.Builder
Sets the politeness delay between consecutive requests made by the crawler to the same hostDefault 3 seconds.
- Specified by:
politenessDelay
in interfaceJWeaverCrawler.Builder
- Parameters:
politenessDelay
- The politeness delay duration. (Optional)- Returns:
- This builder instance for method chaining.
-
maxDepth
Description copied from interface:JWeaverCrawler.Builder
Sets the maximum depth of crawling .Default 3
- Specified by:
maxDepth
in interfaceJWeaverCrawler.Builder
- Parameters:
maxDepth
- The maximum depth of crawling. (Optional)- Returns:
- This builder instance for method chaining.
-
build
Description copied from interface:JWeaverCrawler.Builder
Builds and returns a new instance of JWeaverCrawler with the configured parameters.- Specified by:
build
in interfaceJWeaverCrawler.Builder
- Parameters:
uriSet
- The required initial set of URIs to start crawling from. Each URI should be from a different host. For each URI, a new JWeaverTask (execution) will be created.- Returns:
- A new instance of JWeaverCrawler.
-