Data Flow
This document details the lifecycle of data within the Ultimate Web Novel & Manga Scraper, from configuration to persistent storage.
1. Rule Creation (Input)
Actor: Administrator
Interface: WordPress Admin (ums_rules_list.php, ums_novel_list.php)
1. User inputs target URL (e.g., FanFox manga URL).
2. User configures parameters:
* Schedule (e.g., "Every hour").
* Chapter limits.
* Translation settings.
* Status mapping.
3. Storage: Rule is saved as a serialized array in wp_options.
* Key: ums_rules_list (Manga), ums_novel_list (Novels).
* Format: array( id => array( url, schedule, active, last_run, ... ) ).
2. Cron Execution (Trigger)
Trigger: WP-Cron (umsaction hook).
1. Loader: ums_cron() fetches rule arrays from wp_options.
2. Evaluator: Iterates through each rule.
* Calculates next_run = last_run + schedule.
* If now >= next_run, proceeds.
3. Dispatcher: Calls ums_run_rule($id, $type).
3. Scraping Process (Ingestion)
Function: ums_run_rule()
1. Locking: Checks ums_running_list to ensure the rule isn't already running.
2. Fetching (Phase 1: Listing):
* Fetches the main Manga/Novel TOC page.
* Strategy: ums_get_web_page() -> attempts cURL -> falls back to PhantomJS/Puppeteer if configured or if Cloudflare is detected.
3. Parsing:
* DOM Parser extracts: Title, Cover Image, Author, Genre, Status, Chapter List.
* Checks if Manga already exists in DB (by Title or _manga_import_slug).
4. Creation (Parent Post):
* If new, calls wp_insert_post() (post_type = 'wp-manga').
* Downloads Cover Image -> wp_insert_attachment() -> Sets as Featured Image.
* Sets Taxonomies (wp-manga-genre, etc.).
5. Fetching (Phase 2: Chapters):
* Iterates through detected chapters.
* Checks against existing chapters in DB (wp_manga_chapter->get_chapter_by_slug).
* If new:
* Fetches Chapter Page.
* Manga: Extracts image URLs (ums_extractMangaImages).
* Novel: Extracts text content (ums_repairHTML, ums_strip_links).
4. Processing & Transformation
1. Translation (Optional):
* If enabled, content (Novel text or Manga Title) is sent to ums_translate().
* Calls external API (Google/DeepL/Bing).
2. Text Spinning (Optional):
* If enabled, replaces words with synonyms from synonyms.dat.
5. Storage (Persistence)
Manga Images:
1. Downloads image binary.
2. Path: wp-content/uploads/manga/{manga_id}/{chapter_slug}/.
3. File System: Uses WP_Filesystem to write files.
Database:
1. Madara Tables:
* Typically stores chapter metadata in custom tables (depending on Madara version) or custom post types.
* The plugin uses WP_MANGA_STORAGE global class to abstract this.
* Calls $wp_manga_storage->wp_manga_upload_single_chapter().
6. Cleanup
1. Logging: Execution result logged to ums_info.log (if enabled).
2. State Update: Updates last_run timestamp in the rule array in wp_options.
3. Unlock: Removes ID from ums_running_list.