Index
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form
B
- body() - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Returns the value of the
body
record component. - build(Set<String>) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- build(Set<String>) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Builds and returns a new instance of JWeaverCrawler with the configured parameters.
- builder() - Static method in interface org.jweaver.crawler.JWeaverCrawler
-
Returns a new instance of the builder for configuring and creating a JWeaverCrawler.
- BuilderValidator - Class in org.jweaver.crawler.internal.util
-
The BuilderValidator class provides utility methods for validating builder parameters.
C
- characters() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns the value of the
characters
record component. - child() - Method in record class org.jweaver.crawler.internal.result.Connection
-
Returns the value of the
child
record component. - Connection - Record Class in org.jweaver.crawler.internal.result
-
The Connection record represents a connection between a parent URI and a child URI, along with the depth of the connection.
- Connection(String, String, int) - Constructor for record class org.jweaver.crawler.internal.result.Connection
-
Creates an instance of a
Connection
record class. - CONNECTIONS_PREFIX - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The prefix for connections.
- Constants - Class in org.jweaver.crawler.internal.util
-
This class contains constants used throughout the crawling process.
- content() - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Returns the value of the
content
record component. - content() - Method in interface org.jweaver.crawler.internal.result.ResultPage
-
Returns the content of the result page.
- content() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
content
record component. - CONTENT_TYPE_STR - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The header key for specifying the content type.
- create() - Static method in class org.jweaver.crawler.internal.runner.TaskExecutorImpl
-
Creates a new instance of TaskExecutorImpl.
- create() - Static method in class org.jweaver.crawler.internal.write.JWeaverFileWriter
-
Constructs a new JWeaverFileWriter instance.
- create(PageLink, String) - Static method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Creates an ErrorResultPage instance based on the provided PageLink and error content.
- create(PageLink, String, String, Set<PageLink>) - Static method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Creates a SuccessResultPage instance based on the provided PageLink, title, content, and link set.
D
- DEFAULT_OUTPUT_PATH - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The default output path for file export.
- depth() - Method in record class org.jweaver.crawler.internal.result.Connection
-
Returns the value of the
depth
record component. - depth() - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Returns the value of the
depth
record component. - depth() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns the value of the
depth
record component. - depth() - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Returns the value of the
depth
record component. - depth() - Method in record class org.jweaver.crawler.internal.result.PageLink
-
Returns the value of the
depth
record component. - depth() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
depth
record component. - DocumentParser - Interface in org.jweaver.crawler.internal.parse
-
The DocumentParser interface defines methods for extracting relevant information from HTML documents.
E
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.Connection
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.PageLink
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Indicates whether some other object is "equal to" this one.
- equals(Object) - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Indicates whether some other object is "equal to" this one.
- error() - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Returns the value of the
error
record component. - ErrorResultPage - Record Class in org.jweaver.crawler.internal.result
-
The ErrorResultPage record represents a result page containing an error encountered during web crawling.
- ErrorResultPage(String, int, String) - Constructor for record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Creates an instance of a
ErrorResultPage
record class. - ERRORS_PREFIX - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The prefix for errors.
- ExportConfig - Interface in org.jweaver.crawler.internal.write
-
The ExportConfig interface defines methods for configuring data export options.
- exportConfiguration(ExportConfig) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- exportConfiguration(ExportConfig) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the export configuration for configuring data export options.
- exportDefault() - Static method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Creates and returns a new ExportConfig instance with default settings for exporting data in Markdown format.
- ExportFileFormat - Enum Class in org.jweaver.crawler.internal.write
-
The ExportFileFormat enum represents the file formats supported for data export.
- exportJson(String, boolean) - Static method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Creates and returns a new ExportConfig instance configured for exporting data in JSON format.
- exportMarkdown(String) - Static method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Creates and returns a new ExportConfig instance configured for exporting data in Markdown format.
- extension() - Method in enum class org.jweaver.crawler.internal.write.ExportFileFormat
-
Returns the file extension associated with the file format.
F
- FILE_EXPORT_DT_FORMAT - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The date-time format for file export.
- FileUtils - Class in org.jweaver.crawler.internal.util
-
This utility class provides file-related operations.
- format() - Method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Returns the file format for exported data.
- format() - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Retrieves the export file format, which is JSON for this configuration.
- format() - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Retrieves the export file format, which is Markdown for this configuration.
H
- hashCode() - Method in record class org.jweaver.crawler.internal.result.Connection
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.PageLink
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Returns a hash code value for this object.
- hashCode() - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Returns a hash code value for this object.
- httpClient(HttpClient) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- httpClient(HttpClient) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the HTTP client to be used by the crawler for making HTTP requests.
I
- isAllowedContentType(String) - Static method in class org.jweaver.crawler.internal.util.URIHelper
-
Checks if a content type is allowed.
- isAllowedUrl(String) - Static method in class org.jweaver.crawler.internal.util.URIHelper
-
Checks if a URL's extension is allowed.
- isExternalUri(String, String) - Static method in class org.jweaver.crawler.internal.util.URIHelper
-
Checks if the child URI is external to the base URI.
- isSuccess() - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Checks if the response indicates a successful request.
- isValidUri(String) - Static method in class org.jweaver.crawler.internal.util.URIHelper
-
Check if the provided URI is valid
J
- JSON - Enum constant in enum class org.jweaver.crawler.internal.write.ExportFileFormat
-
JSON file format.
- JsonExportConfig - Record Class in org.jweaver.crawler.internal.write
-
The JsonExportConfig record represents the configuration for exporting data in JSON format.
- JsonExportConfig(String, boolean) - Constructor for record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Creates an instance of a
JsonExportConfig
record class. - JWeaverBuilderImpl - Class in org.jweaver.crawler.internal.runner
-
A concrete implementation of the
JWeaverCrawler.Builder
interface used to configure and build instances ofJWeaverCrawler
. - JWeaverBuilderImpl() - Constructor for class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
-
Constructs a new JWeaverBuilderImpl instance.
- JWeaverCrawler - Interface in org.jweaver.crawler
-
Represents the JWeaverCrawler abstract class, which facilitates web crawling operations.
- JWeaverCrawler.Builder - Interface in org.jweaver.crawler
-
The Builder interface provides methods for building and customize an instance of JWeaverCrawler.
- JWeaverCrawlerImpl - Class in org.jweaver.crawler.internal.runner
-
A concrete implementation of
JWeaverCrawler
providing web crawling functionality. - JWeaverCrawlerImpl(JWeaverBuilderImpl) - Constructor for class org.jweaver.crawler.internal.runner.JWeaverCrawlerImpl
-
Constructs a new JWeaverCrawlerImpl instance.
- JWeaverDocumentParser - Class in org.jweaver.crawler.internal.parse
-
The JWeaverDocumentParser class is responsible for parsing HTML documents to extract relevant information.
- JWeaverDocumentParser() - Constructor for class org.jweaver.crawler.internal.parse.JWeaverDocumentParser
-
Constructs a new JWeaverDocumentParser instance.
- JWeaverExecutionException - Exception Class in org.jweaver.crawler.internal.exception
-
The JWeaverExecutionException class represents an unchecked exception that occurs during the execution of a JWeaver task.
- JWeaverExecutionException(String) - Constructor for exception class org.jweaver.crawler.internal.exception.JWeaverExecutionException
-
Constructs a new JWeaverExecutionException (RuntimeException) with the specified detail message.
- JWeaverFileWriter - Class in org.jweaver.crawler.internal.write
-
A concrete implementation of the
JWeaverWriter
interface for writing data to files. - JWeaverTask - Class in org.jweaver.crawler.internal.runner
-
Handles the crawling process for a base URI *
- JWeaverWriter - Interface in org.jweaver.crawler.internal.write
-
The JWeaverWriter interface defines methods for processing and writing the results of the web crawling process.
L
- linkSet() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
linkSet
record component.
M
- MARKDOWN - Enum constant in enum class org.jweaver.crawler.internal.write.ExportFileFormat
-
Markdown file format.
- MarkdownExportConfig - Record Class in org.jweaver.crawler.internal.write
-
The MarkdownExportConfig record represents the configuration for exporting data in Markdown format.
- MarkdownExportConfig(String) - Constructor for record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Creates an instance of a
MarkdownExportConfig
record class. - maxDepth(int) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- maxDepth(int) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the maximum depth of crawling .
- metadata() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
metadata
record component. - metadata() - Method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Returns a boolean indicating whether metadata should be included in the export.
- metadata() - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Returns the value of the
metadata
record component. - metadata() - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Specifies whether metadata should be included in the export.
- Metadata - Record Class in org.jweaver.crawler.internal.result
-
The Metadata record represents metadata associated with a web page.
- Metadata(String, int, String, int) - Constructor for record class org.jweaver.crawler.internal.result.Metadata
-
Creates an instance of a
Metadata
record class. - mkdir(File, boolean) - Static method in class org.jweaver.crawler.internal.util.FileUtils
-
Creates a directory if it does not exist.
N
- NodeError - Record Class in org.jweaver.crawler.internal.result
-
The NodeError record represents an error associated with a specific node during web crawling.
- NodeError(String, int, String) - Constructor for record class org.jweaver.crawler.internal.result.NodeError
-
Creates an instance of a
NodeError
record class.
O
- org.jweaver.crawler - package org.jweaver.crawler
- org.jweaver.crawler.internal.exception - package org.jweaver.crawler.internal.exception
- org.jweaver.crawler.internal.parse - package org.jweaver.crawler.internal.parse
- org.jweaver.crawler.internal.result - package org.jweaver.crawler.internal.result
- org.jweaver.crawler.internal.runner - package org.jweaver.crawler.internal.runner
- org.jweaver.crawler.internal.util - package org.jweaver.crawler.internal.util
- org.jweaver.crawler.internal.write - package org.jweaver.crawler.internal.write
- OutputFileException - Exception Class in org.jweaver.crawler.internal.exception
-
The OutputFileException class represents an unchecked exception that occurs when there is an issue with an output file or directory.
- OutputFileException(Exception) - Constructor for exception class org.jweaver.crawler.internal.exception.OutputFileException
-
Constructs a new OutputFileException (RuntimeException) with the specified cause.
P
- PageLink - Record Class in org.jweaver.crawler.internal.result
-
The PageLink record represents a link to a web page along with its depth in the crawling hierarchy.
- PageLink(String, int) - Constructor for record class org.jweaver.crawler.internal.result.PageLink
-
Creates an instance of a
PageLink
record class. - parent() - Method in record class org.jweaver.crawler.internal.result.Connection
-
Returns the value of the
parent
record component. - parseBody(String, String) - Method in interface org.jweaver.crawler.internal.parse.DocumentParser
-
Parses the HTML body of a web page and extracts the main content body.
- parseBody(String, String) - Method in class org.jweaver.crawler.internal.parse.JWeaverDocumentParser
- parseLinks(String, String) - Method in interface org.jweaver.crawler.internal.parse.DocumentParser
-
Parses the HTML body of a web page and extracts the links contained within it.
- parseLinks(String, String) - Method in class org.jweaver.crawler.internal.parse.JWeaverDocumentParser
- parser(DocumentParser) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- parser(DocumentParser) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the document parser to be used by the crawler for parsing relevant information from HTML body.
- parseTitle(String, String) - Method in interface org.jweaver.crawler.internal.parse.DocumentParser
-
Parses the HTML body of a web page and extracts the title.
- parseTitle(String, String) - Method in class org.jweaver.crawler.internal.parse.JWeaverDocumentParser
- path() - Method in interface org.jweaver.crawler.internal.write.ExportConfig
-
Returns the path where exported data will be saved.
- path() - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Returns the value of the
path
record component. - path() - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Returns the value of the
path
record component. - politenessDelay(Duration) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- politenessDelay(Duration) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the politeness delay between consecutive requests made by the crawler to the same host
- processConnectionMap(String, List<Connection>, ExportConfig) - Method in class org.jweaver.crawler.internal.write.JWeaverFileWriter
- processConnectionMap(String, List<Connection>, ExportConfig) - Method in interface org.jweaver.crawler.internal.write.JWeaverWriter
-
Processes connection map information generated during crawling and writes it using the provided export configuration.
- processErrors(String, List<NodeError>, ExportConfig) - Method in class org.jweaver.crawler.internal.write.JWeaverFileWriter
- processErrors(String, List<NodeError>, ExportConfig) - Method in interface org.jweaver.crawler.internal.write.JWeaverWriter
-
Processes errors encountered during crawling and writes error information using the provided export configuration.
- processSuccess(SuccessResultPage, ExportConfig) - Method in class org.jweaver.crawler.internal.write.JWeaverFileWriter
- processSuccess(SuccessResultPage, ExportConfig) - Method in interface org.jweaver.crawler.internal.write.JWeaverWriter
-
Processes a successfully crawled page and writes the result using the provided export configuration.
R
- requireNonEmpty(Collection<T>) - Static method in class org.jweaver.crawler.internal.util.BuilderValidator
-
Validates that the specified collection is not null.
- requireNonEmpty(Collection<T>, String) - Static method in class org.jweaver.crawler.internal.util.BuilderValidator
-
Validates that the specified collection is not null or empty.
- requireNonEmpty(T) - Static method in class org.jweaver.crawler.internal.util.BuilderValidator
-
Validates that the specified object is not null.
- requireNonEmpty(T, String) - Static method in class org.jweaver.crawler.internal.util.BuilderValidator
-
Validates that the specified object is not null or empty.
- ResponseData<T> - Record Class in org.jweaver.crawler.internal.result
-
The ResponseData record represents the response data received from a web request.
- ResponseData(int, T) - Constructor for record class org.jweaver.crawler.internal.result.ResponseData
-
Creates an instance of a
ResponseData
record class. - ResultPage - Interface in org.jweaver.crawler.internal.result
-
The ResultPage interface represents a result page obtained during web crawling.
- retrievedOn() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns the value of the
retrievedOn
record component. - run() - Method in class org.jweaver.crawler.internal.runner.JWeaverCrawlerImpl
- run() - Method in interface org.jweaver.crawler.JWeaverCrawler
-
Runs the executions sequentially.
- run(List<JWeaverTask>) - Method in interface org.jweaver.crawler.internal.runner.TaskExecutor
-
Executes the specified list of tasks sequentially.
- run(List<JWeaverTask>) - Method in class org.jweaver.crawler.internal.runner.TaskExecutorImpl
- RUNNER_THREAD_NAME - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The prefix for the runner thread name.
- runParallel() - Method in class org.jweaver.crawler.internal.runner.JWeaverCrawlerImpl
- runParallel() - Method in interface org.jweaver.crawler.JWeaverCrawler
-
This should be the preferred choice to run the crawler.
- runParallel(List<JWeaverTask>) - Method in interface org.jweaver.crawler.internal.runner.TaskExecutor
-
Executes the specified list of tasks in parallel.
- runParallel(List<JWeaverTask>) - Method in class org.jweaver.crawler.internal.runner.TaskExecutorImpl
S
- source() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns the value of the
source
record component. - statusCode() - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Returns the value of the
statusCode
record component. - SuccessResultPage - Record Class in org.jweaver.crawler.internal.result
-
The SuccessResultPage record represents a successful result page obtained during web crawling.
- SuccessResultPage(String, String, String, Set<PageLink>, Metadata, int) - Constructor for record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Creates an instance of a
SuccessResultPage
record class.
T
- TaskExecutor - Interface in org.jweaver.crawler.internal.runner
-
The TaskExecutor interface defines methods for executing tasks either in parallel or sequentially.
- TaskExecutorImpl - Class in org.jweaver.crawler.internal.runner
-
A concrete implementation of the
TaskExecutor
interface responsible for executing tasks. - title() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
title
record component. - toString() - Method in record class org.jweaver.crawler.internal.result.Connection
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.Metadata
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.PageLink
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.ResponseData
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.write.JsonExportConfig
-
Returns a string representation of this record class.
- toString() - Method in record class org.jweaver.crawler.internal.write.MarkdownExportConfig
-
Returns a string representation of this record class.
U
- uri() - Method in record class org.jweaver.crawler.internal.result.ErrorResultPage
-
Returns the value of the
uri
record component. - uri() - Method in record class org.jweaver.crawler.internal.result.NodeError
-
Returns the value of the
uri
record component. - uri() - Method in interface org.jweaver.crawler.internal.result.ResultPage
-
Returns the URI of the result page.
- uri() - Method in record class org.jweaver.crawler.internal.result.SuccessResultPage
-
Returns the value of the
uri
record component. - URIHelper - Class in org.jweaver.crawler.internal.util
-
The URIHelper class provides utility methods for handling and validating URIs.
- url() - Method in record class org.jweaver.crawler.internal.result.PageLink
-
Returns the value of the
url
record component.
V
- valueOf(String) - Static method in enum class org.jweaver.crawler.internal.write.ExportFileFormat
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class org.jweaver.crawler.internal.write.ExportFileFormat
-
Returns an array containing the constants of this enum class, in the order they are declared.
W
- writer(JWeaverWriter) - Method in class org.jweaver.crawler.internal.runner.JWeaverBuilderImpl
- writer(JWeaverWriter) - Method in interface org.jweaver.crawler.JWeaverCrawler.Builder
-
Sets the writer for exporting the crawled data.
- WRITER_THREAD_NAME - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The prefix for the writer thread name.
- WWW_STR - Static variable in class org.jweaver.crawler.internal.util.Constants
-
The string representation for 'www'.
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form