Getting Started Guide

Welcome to Ultimate Web Novel & Manga Scraper! This guide will help you set up and start scraping manga and web novels in minutes.

Quick Start (5 Minutes)

Prerequisites

Before you begin, ensure you have: ✅ WordPress 5.0 or higher ✅ Madara theme installed and activated ✅ Madara Core plugin installed and activated ✅ PHP 7.2 or higher with cURL, DOM, and mbstring extensions

Step 1: Install the Plugin

1. Download the plugin ZIP file 2. Go to WordPress Admin → Plugins → Add New → Upload Plugin 3. Choose the ZIP file and click Install Now 4. Click Activate Plugin

Step 2: Enable the Plugin

1. Navigate to Ultimate Web Novel & Manga Scraper → Main Settings 2. Check the Main Switch to enable the plugin 3. Click Save Settings

Step 3: Create Your First Scraping Rule

Let's scrape a manga from a Madara-based site: 1. Go to Ultimate Web Novel & Manga Scraper → Manga Scraper (Madara Theme Sites) 2. Click Add New Rule 3. Fill in the form: - Manga URL: https://example-madara-site.com/manga/manga-title/ - Schedule: 24 (check every 24 hours) - Max # Chapters: 10 - Status: Publish - Active: ✅ Checked 4. Click Save Rule

Step 4: Test the Rule

1. Find your newly created rule in the list 2. Click Run This Rule Now 3. Wait for execution (may take 30-60 seconds) 4. Go to Activity & Logging to see progress

Step 5: Verify Success

1. Navigate to Manga → All Manga in WordPress admin 2. You should see the newly scraped manga 3. Click on the manga to view chapters 4. View the manga on your site's frontend to see images 🎉 Congratulations! You've successfully scraped your first manga. ---

Detailed Setup

Understanding the Interface

The plugin's admin interface is organized into several tabs:
Ultimate Web Novel & Manga Scraper
├── Main Settings         (Global configuration)
├── Manga Scraper (Madara)       (Generic Madara sites)
├── Web Novel Scraper (Madara)   (Generic Madara novel sites)
├── Manga Scraper (FanFox)       (FanFox-specific)
├── Web Novel Scraper (NovLove)  (NovLove-specific)
├── Web Novel Scraper (WuxiaWorld) (WuxiaWorld-specific)
├── Madara Enhancements          (Search & clone from external sites)
└── Activity & Logging           (Logs, system info, cleanup)

Main Settings Explained

General Options
SettingDescriptionRecommended
Main SwitchMaster enable/disableOn
LoggingEnable basic loggingOn (for debugging)
Detailed LoggingVerbose debug logsOff (enable only for troubleshooting)
Auto Clear LogsAutomatic log rotationWeekly
Scraper Settings
SettingDescriptionRecommended
CloudFlare CachingHandle Cloudflare timeoutsOn (if your server uses Cloudflare)
Disable RerunPrevent immediate retry of failed rulesOff
Manga StorageWhere to store imagesLocal (or S3 if configured)
Request TimeoutDelay between requests (seconds)2-5
Rule TimeoutMax execution time per rule (seconds)300
Headless Browser Settings
Only needed if scraping JavaScript-heavy sites
SettingDescriptionExample
PhantomJS PathAbsolute path to PhantomJS binary/usr/bin/phantomjs
PhantomJS TimeoutMax wait time for rendering30
HeadlessBrowserAPI KeyThird-party service API key(optional)
Proxy Settings
Only needed if your IP is blocked or rate-limited
SettingDescriptionExample
Proxy URLProxy server address123.45.67.89:8080
Proxy AuthUsername and passworduser:password
Translation API Keys
Only needed if you want automatic translation
SettingDescriptionWhere to Get
Google Trans AuthGoogle Translate API keyGoogle Cloud Console
Bing AuthMicrosoft Translator keyAzure Portal
DeepL AuthDeepL API keyDeepL API

---

Creating Scraping Rules

Rule Types

Choose the appropriate scraper based on your target site:
ScraperUse For
Manga Scraper (Madara)Any Madara-based manga site
Web Novel Scraper (Madara)Any Madara-based novel site
Manga Scraper (FanFox)FanFox.net, MangaFox sites
Web Novel Scraper (NovLove)NovLove.com
Web Novel Scraper (WuxiaWorld)WuxiaWorld.site

Rule Configuration Fields

Basic Settings
FieldDescriptionExample
Manga URLTable of Contents (TOC) pagehttps://site.com/manga/title/
ScheduleCheck interval in hours24
ActiveEnable/disable this rule
Content Settings
FieldDescriptionExample
Max # ChaptersLimit chapters per run (0 = all)10
StatusPost status for created mangaPublish
AuthorWordPress user to assignSelect from dropdown
Create TagsImport original tags
Default CategoryFallback categorySelect from dropdown
Auto CategoriesCreate categories from genres
Advanced Settings
FieldDescriptionWhen to Use
Use PhantomJSForce headless browserJavaScript-heavy sites, Cloudflare
Reverse ChaptersScrape oldest firstBackfilling old chapters
TranslationTarget languageDisabled (or select language)
Translation SourceAPI to useGoogle (Free), Google (API), Bing, DeepL

---

Common Use Cases

Use Case 1: Daily Manga Updates

Goal: Automatically check for new chapters every day Configuration:
  • Schedule: 24 hours
  • Max # Chapters: 5
  • Status: Publish
  • Active:
Workflow: 1. Plugin checks site every 24 hours 2. Finds new chapters (up to 5) 3. Downloads and publishes them automatically

Use Case 2: Backfilling Old Chapters

Goal: Scrape all historical chapters of a manga Configuration:
  • Schedule: 1 hour (or manual "Run Now")
  • Max # Chapters: 0 (all chapters)
  • Reverse Chapters:
  • Active:
Workflow: 1. Run once to grab all chapters 2. After completion, change schedule to 24 hours for updates 3. Uncheck "Reverse Chapters"

Use Case 3: Scraping with Translation

Goal: Translate Chinese manga to English Prerequisites:
  • Google Translate API key configured in Main Settings
Configuration:
  • Translation: English (Google Translate)
  • Translation Source: Google (API) or Google (Free)
  • All other settings: Normal
Workflow: 1. Plugin scrapes content 2. Translates titles, descriptions, and chapter titles 3. Saves translated content to WordPress Note: Only text is translated (manga images require OCR, not supported)

Use Case 4: Using Madara Enhancements

Goal: Search and add manga from multiple external Madara sites Setup: 1. Go to Madara Enhancements tab 2. Set Manga Fetch URL: https://external-site.com/wp-admin/admin-ajax.php 3. Choose search type: Latest, Trending, Search by keyword 4. Click Load More to fetch results 5. Click Add This Manga to create a scraping rule 6. The manga is now added to your regular scraping queue Benefits:
  • No need to manually find manga URLs
  • Search external Madara libraries
  • One-click addition to your site

---

Automation with Cron

Why Use System Cron?

WordPress Cron (WP-Cron) runs on page load, which is unreliable for automated scraping. System cron ensures: ✅ Consistent execution ✅ No dependency on site traffic ✅ Better for high-volume scraping

Setup System Cron

Step 1: Disable WP-Cron
Edit wp-config.php and add:
define('DISABLE_WP_CRON', true);
Step 2: Add Crontab Entry
SSH into your server and edit crontab:
crontab -e

Add this line (run every minute):

    * wget -q -O - https://yoursite.com/wp-cron.php?doing_wp_cron >/dev/null 2>&1

Or using curl:

    * curl -s https://yoursite.com/wp-cron.php?doing_wp_cron >/dev/null 2>&1
Step 3: Verify
Wait a few minutes, then check Activity & Logging to see if rules are executing. ---

Monitoring & Debugging

Viewing Logs

1. Go to Activity & Logging tab 2. Click View Logs 3. Logs show: - Rule execution start/end - Chapters found - Download progress - Errors

Log Levels

LevelDescription
INFONormal operation (rule started, chapters found)
WARNINGNon-critical issues (missing metadata)
ERRORCritical failures (download failed, timeout)

Common Log Messages

MessageMeaning
Rule #5 startedScraping rule #5 began execution
Found 10 chaptersDetected 10 chapters on TOC page
Chapter already exists: ch-1Chapter 1 already in database, skipped
Downloaded image: 001.jpgImage download succeeded
Failed to fetch URLNetwork error or invalid URL
Cloudflare detectedTarget site has anti-bot protection

System Info

Click System Info to view:
  • Server environment
  • PHP version
  • User agent
  • Plugin version
  • Active rules count

---

Troubleshooting Quick Fixes

Problem: Rule not running

Solutions: 1. Check Main Switch is ON 2. Check rule's Active checkbox is checked 3. Verify schedule time has passed 4. Check logs for errors 5. Setup system cron (see above)

Problem: No chapters found

Solutions: 1. Verify URL is the Table of Contents page (not a chapter) 2. Check if site structure changed 3. Enable Use PhantomJS for the rule 4. Check logs for parsing errors

Problem: Images not loading

Solutions: 1. Check file permissions: chmod 755 wp-content/uploads/manga/ 2. Enable proxy if source has hotlink protection 3. Check available disk space 4. Verify images exist in wp-content/uploads/manga/{id}/

Problem: Timeout errors

Solutions: 1. Increase Rule Timeout in Main Settings 2. Reduce Max # Chapters per run 3. Increase PHP max_execution_time in php.ini 4. Split large manga into multiple rules ---

Best Practices

DO:

✅ Start with small tests (1-2 chapters) ✅ Use reasonable schedules (12-24 hours) ✅ Enable logging for debugging ✅ Backup your database before large operations ✅ Monitor server resources (CPU, disk space) ✅ Respect target sites (don't overload)

DON'T:

❌ Set schedule too frequently (<6 hours) ❌ Scrape all chapters of 100 manga at once ❌ Run multiple rules simultaneously without testing ❌ Ignore legal/ethical considerations ❌ Use on production without testing on staging ❌ Forget to rotate logs regularly ---

Next Steps

Now that you have the basics: 1. Read the full documentation: - CONFIGURATION.md - Detailed settings reference - TROUBLESHOOTING.md - Common issues - SECURITY.md - Security considerations 2. Explore advanced features: - API_REFERENCE.md - Hooks and filters for developers - ARCHITECTURE.md - System design - DATA_FLOW.md - How data moves through the system 3. Join the community: - GitHub Issues - FAQ --- Need help? Check the FAQ or open an issue on GitHub. Ready to contribute? See CONTRIBUTING.md (if available) or submit a pull request. Happy scraping! 🚀