All Known Implementing Classes:: JWeaverDocumentParser

public interface DocumentParser

The DocumentParser interface defines methods for extracting relevant information from HTML documents. Implementations of this interface are responsible for parsing HTML content to extract titles, bodies, and links from web pages.

Method Summary

Modifier and Type

Method

Description

String

parseBody(String htmlBody, String pageUri)

Parses the HTML body of a web page and extracts the main content body.

Set<String>

parseLinks(String htmlBody, String pageUri)

Parses the HTML body of a web page and extracts the links contained within it.

String

parseTitle(String htmlBody, String pageUri)

Parses the HTML body of a web page and extracts the title.

Method Details
- parseTitle
  
  String parseTitle(String htmlBody, String pageUri)
  
  Parses the HTML body of a web page and extracts the title.
  
  Parameters:
  
  htmlBody - The HTML body of the web page.
  
  pageUri - The URI of the web page.
  
  Returns:
  
  The title of the web page.
- parseBody
  
  String parseBody(String htmlBody, String pageUri)
  
  Parses the HTML body of a web page and extracts the main content body.
  
  Parameters:
  
  htmlBody - The HTML body of the web page.
  
  pageUri - The URI of the web page.
  
  Returns:
  
  The main content body of the web page.
- parseLinks
  
  Set<String> parseLinks(String htmlBody, String pageUri)
  
  Parses the HTML body of a web page and extracts the links contained within it.
  
  Parameters:
  
  htmlBody - The HTML body of the web page.
  
  pageUri - The URI of the web page.
  
  Returns:
  
  A set of URIs representing the links found in the web page.

Interface DocumentParser

Method Summary

Method Details

parseTitle

parseBody

parseLinks