GitHub - KevenGustavo/HTML-Analyzer: Solução em Java puro (Zero Dependencies) para análise de profundidade DOM e validação de HTML. Algoritmo otimizado O(N) baseado em Pilhas.

EASTER_EGG_URLS

HtmlAnalyzer - Software Development Intern Challenge

This project is a pure Java solution (no external libraries) designed to analyze the HTML content of a URL and extract the text contained at the deepest level of the structure.
The solution was developed strictly adhering to the functional and technical requirements, including malformed HTML validation (bonus feature) and robust connection error handling.

Prerequisites

Java JDK 17 installed and configured in the PATH.
Internet connection to access the provided URL.

How to Compile and Run

The project is designed to be compiled and executed via the command line, without the need for IDEs or external build tools (Maven/Gradle).

1. Compilation

Navigate to the directory where the HtmlAnalyzer.java file is located and run the following command:

javac HtmlAnalyzer.java

2. Execution

After compilation, run the program passing the desired URL as an argument:

java HtmlAnalyzer http://hiring.axreng.com/internship/example1.html

Implemented Features

Deepest Level Extraction: The algorithm traverses the DOM structure and returns the text located at the deepest nesting level.
Tie-Breaking Rule: If multiple text snippets exist at the same maximum depth, the program returns the first one found, as per the specification.
Malformed HTML Detection (Bonus): The solution identifies structural inconsistencies in the HTML. The program will output malformed HTML if it encounters:
- Closing tags without a corresponding opening tag (e.g., \</div\> without an open \<div\>).
- Incorrectly crossed tags (e.g., \<div\>\<span\>\</div\>).
- Tags that remain open at the end of the file.
Error Handling: Returns URL connection error in cases of network failures, invalid URLs, or timeouts.

Design and Architecture Decisions

To ensure performance and compliance with the "Zero Dependencies" constraint, the following decisions were made:

Stack-Based Algorithm: A java.util.Stack data structure was used. This allows tracking the current depth and validating correct tag nesting at runtime with linear complexity O(N), ideal for parsing hierarchical structures like HTML.
Stream Processing: The use of BufferedReader enables line-by-line processing. This optimizes memory usage by avoiding loading the entire page content into memory before processing it.
Standard JDK API: The solution uses only native libraries (java.net, java.io, java.util), ensuring total portability and complying with the prohibition of third-party libraries or XML/DOM parsing classes.

Author: [Keven Gustavo Dos Santos Gomes]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
HtmlAnalyzer.java		HtmlAnalyzer.java
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HtmlAnalyzer - Software Development Intern Challenge

Prerequisites

How to Compile and Run

1. Compilation

2. Execution

Implemented Features

Design and Architecture Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

KevenGustavo/HTML-Analyzer

Folders and files

Latest commit

History

Repository files navigation

HtmlAnalyzer - Software Development Intern Challenge

Prerequisites

How to Compile and Run

1. Compilation

2. Execution

Implemented Features

Design and Architecture Decisions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages