reliq offers a powerful solution for HTML parsing with an intuitive matching language. Whether for CLI usage or as a library, it provides flexible options to parse and search HTML structures efficiently. Explore its features through comprehensive documentation and examples.
reliq is a powerful HTML parsing and searching tool that offers a unique matching language designed for flexibility and efficiency.
Features
- Custom Matching Language: Effective querying of HTML documents to extract data based on specific criteria.
- Benchmarking: Performance metrics available here.
- Syntax Highlighting: Enhanced development experience with syntax examples in vim.
Usage Examples
The manual provides comprehensive documentation for the tool and its expression language. Below are several examples demonstrating the usage:
# Get all `div` tags with class `tile`
reliq 'div class="tile"'
# Fetch `div` tags with class `tile` and id `current`
reliq 'div .tile #current' index.html
# Retrieve tags without inner tags from a file
reliq '* c@[0]' index.html
# Select hyperlinks from level greater than or equal to 6
reliq 'a href @l[6:] | "%@(href)v\n"' index.html
# Extract images with id starting with `img-`
reliq '* .cont -#b>img-' index.html
JSON-Like Output
The tool allows for structured output in a JSON-like format:
reliq '.links.a a href | "%@(href)v\n", img src | "%@(src)v\n"'
This statement will yield output consisting of extracted hyperlinks and images, organized for easy readability.
Integration
The Python interface for reliance on the reliability of the tool can be found at reliq-python. Several notable projects have utilized reliq for their scraping needs, including:
For a detailed overview of syntactical utilization, see the provided examples in the usage manual. Reliq is highly adaptable for various scraping tasks, making it an essential tool for developers working with HTML data extraction.
No comments yet.
Sign in to be the first to comment.