Notifier is a versatile web scraper that monitors websites for changes and sends notifications. Easily define scrape rules with CSS selectors in a JSON config file and format alerts with Liquid templates. Designed for cron job integration, it allows for flexible scheduling to suit various monitoring needs.
Notifier: Config-Driven Web Scraper and Monitoring Tool
Notifier is a versatile, config-driven web scraper designed to efficiently monitor websites for changes and deliver timely email notifications. Users can easily define scraping parameters using CSS selectors within a JSON configuration file, while the tool formats notifications through Liquid templates, providing flexibility in how updates are presented.
Key Features
- Custom Rules: Define specific scraping tasks with personalized rules, including how often they run, utilizing cron expressions.
- Scheduled Monitoring: Each rule operates on its own schedule, enabling frequent checks, such as every hour or as needed.
- Email Notifications: Stay informed with configurable email alerts when changes are detected.
- Error Handling: The system is designed to handle errors gracefully, with alerts for problems such as invalid configurations or unexpected changes in HTML structure.
- Dynamic Content Extraction: Utilize CSS selectors to extract relevant information from web pages, supporting various data types and formats.
Usage Overview
Notifier is intended to run periodically via system cron jobs, processing rules based on defined schedules. Below are the primary commands:
python3 index.py # Process rules scheduled to run
python3 index.py --force # Execute all rules immediately
python3 index.py --dry-run # Fetch and display data without making changes
python3 index.py --save-email # Store emails to a file instead of sending
python3 index.py --validate # Validate configuration against schema
python3 index.py --verbose # Display detailed output during execution
python3 index.py -q # Suppress output, including errors
Configurable Structure
The Notifier configuration is divided into three primary sections within the config.json file:
- Email Settings: Define SMTP server details to enable email notifications.
- Definitions: Create reusable scraping definitions detailing URL, data extraction methods, and pagination setups.
- Rules: Specify when and how each definition should be executed, including custom email subjects and Liquid template paths.
Example of a Scraping Definition
Here is a sample of how to define a scraping task:
"hackernews": {
"url": "https://news.ycombinator.com",
"query": {
"type": "list",
"selector": "tr.athing.submission",
"variables": {
"title": { "selector": ".titleline > a", "value": { "type": "text" } }
}
}
}
Error Reporting
The tool includes robust error reporting, notifying users of configuration issues, failure to fetch data, or changes in HTML structure. Notifications are dispatched to all unique email addresses set across rules, ensuring that users remain informed about the scraper's status and any action required.
Real-World Applications
Notifier can be utilized for various monitoring tasks:
- Track updates for news articles on websites like Hacker News or Reddit.
- Monitor prices of cryptocurrencies or stocks, sending alerts on price thresholds.
- Scrutinize web pages for the availability of specific features or products, ensuring quick responses to changes.
Conclusion
Notifier offers a sophisticated solution for users looking to automate their web scraping and monitoring processes, combining ease of use with powerful customization features. The integration of Javascript templates allows for tailored notifications, keeping users updated on critical changes in the online landscape.
No comments yet.
Sign in to be the first to comment.