Class JWeaverBuilderImpl
java.lang.Object
org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- All Implemented Interfaces:
JWeaverCrawler.Builder
A concrete implementation of the
JWeaverCrawler.Builder interface used to configure and
build instances of JWeaverCrawler.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionBuilds and returns a new instance of JWeaverCrawler with the configured parameters.exportConfiguration(ExportConfig exportConfiguration) Sets the export configuration for configuring data export options.httpClient(HttpClient httpClient) Sets the HTTP client to be used by the crawler for making HTTP requests.maxDepth(int maxDepth) Sets the maximum depth of crawling .parser(DocumentParser documentParser) Sets the document parser to be used by the crawler for parsing relevant information from HTML body.politenessDelay(Duration politenessDelay) Sets the politeness delay between consecutive requests made by the crawler to the same hostwriter(JWeaverWriter writer) Sets the writer for exporting the crawled data.
-
Constructor Details
-
JWeaverBuilderImpl
public JWeaverBuilderImpl()Constructs a new JWeaverBuilderImpl instance.
-
-
Method Details
-
httpClient
Description copied from interface:JWeaverCrawler.BuilderSets the HTTP client to be used by the crawler for making HTTP requests.Default Will use a default HttpClient if not provided with connection timeout 5 seconds and follow redirects policy to AlWAYS
- Specified by:
httpClientin interfaceJWeaverCrawler.Builder- Parameters:
httpClient- The HTTP client to be used. (Optional)- Returns:
- This builder instance.
-
parser
Description copied from interface:JWeaverCrawler.BuilderSets the document parser to be used by the crawler for parsing relevant information from HTML body.Default Will use the
JWeaverDocumentParserif not provided.- Specified by:
parserin interfaceJWeaverCrawler.Builder- Parameters:
documentParser- The document parser to be used. (Optional)- Returns:
- This builder instance.
-
writer
Description copied from interface:JWeaverCrawler.BuilderSets the writer for exporting the crawled data.Default Will use a
JWeaverFileWriterif not provided- Specified by:
writerin interfaceJWeaverCrawler.Builder- Parameters:
writer- The writer for exporting data. (Optional)- Returns:
- This builder instance for method chaining.
-
exportConfiguration
Description copied from interface:JWeaverCrawler.BuilderSets the export configuration for configuring data export options.Default Will use the default
ExportConfig.exportDefault()which uses a Markdown format and the 'output'/ path for files.- Specified by:
exportConfigurationin interfaceJWeaverCrawler.Builder- Parameters:
exportConfiguration- The export configuration. (Optional)- Returns:
- This builder instance for method chaining.
-
politenessDelay
Description copied from interface:JWeaverCrawler.BuilderSets the politeness delay between consecutive requests made by the crawler to the same hostDefault 3 seconds.
- Specified by:
politenessDelayin interfaceJWeaverCrawler.Builder- Parameters:
politenessDelay- The politeness delay duration. (Optional)- Returns:
- This builder instance for method chaining.
-
maxDepth
Description copied from interface:JWeaverCrawler.BuilderSets the maximum depth of crawling .Default 3
- Specified by:
maxDepthin interfaceJWeaverCrawler.Builder- Parameters:
maxDepth- The maximum depth of crawling. (Optional)- Returns:
- This builder instance for method chaining.
-
build
Description copied from interface:JWeaverCrawler.BuilderBuilds and returns a new instance of JWeaverCrawler with the configured parameters.- Specified by:
buildin interfaceJWeaverCrawler.Builder- Parameters:
uriSet- The required initial set of URIs to start crawling from. Each URI should be from a different host. For each URI, a new JWeaverTask (execution) will be created.- Returns:
- A new instance of JWeaverCrawler.
-