Interface JWeaverCrawler.Builder

All Known Implementing Classes:
JWeaverBuilderImpl
Enclosing interface:
JWeaverCrawler

public static interface JWeaverCrawler.Builder
The Builder interface provides methods for building and customize an instance of JWeaverCrawler.
  • Method Details

    • httpClient

      JWeaverCrawler.Builder httpClient(HttpClient httpClient)
      Sets the HTTP client to be used by the crawler for making HTTP requests.

      Default Will use a default HttpClient if not provided with connection timeout 5 seconds and follow redirects policy to AlWAYS

      Parameters:
      httpClient - The HTTP client to be used. (Optional)
      Returns:
      This builder instance.
    • parser

      JWeaverCrawler.Builder parser(DocumentParser documentParser)
      Sets the document parser to be used by the crawler for parsing relevant information from HTML body.

      Default Will use the JWeaverDocumentParser if not provided.

      Parameters:
      documentParser - The document parser to be used. (Optional)
      Returns:
      This builder instance.
    • writer

      Sets the writer for exporting the crawled data.

      Default Will use a JWeaverFileWriter if not provided

      Parameters:
      writer - The writer for exporting data. (Optional)
      Returns:
      This builder instance for method chaining.
    • exportConfiguration

      JWeaverCrawler.Builder exportConfiguration(ExportConfig configuration)
      Sets the export configuration for configuring data export options.

      Default Will use the default ExportConfig.exportDefault() which uses a Markdown format and the 'output'/ path for files.

      Parameters:
      configuration - The export configuration. (Optional)
      Returns:
      This builder instance for method chaining.
    • politenessDelay

      JWeaverCrawler.Builder politenessDelay(Duration duration)
      Sets the politeness delay between consecutive requests made by the crawler to the same host

      Default 3 seconds.

      Parameters:
      duration - The politeness delay duration. (Optional)
      Returns:
      This builder instance for method chaining.
    • maxDepth

      JWeaverCrawler.Builder maxDepth(int maxDepth)
      Sets the maximum depth of crawling .

      Default 3

      Parameters:
      maxDepth - The maximum depth of crawling. (Optional)
      Returns:
      This builder instance for method chaining.
    • build

      JWeaverCrawler build(Set<String> uriList)
      Builds and returns a new instance of JWeaverCrawler with the configured parameters.
      Parameters:
      uriList - The required initial set of URIs to start crawling from. Each URI should be from a different host. For each URI, a new JWeaverTask (execution) will be created.
      Returns:
      A new instance of JWeaverCrawler.