phivolcs-earthquake-data-scraper

🌏 PHIVOLCS Earthquake Data Scraper

Scrape PHIVOLCS Data

Automated daily scraping of earthquake data from the Philippine Institute of Volcanology and Seismology (PHIVOLCS).

Obtain the scraped data by using the following links:

πŸ“Š About This Project

This repository automatically collects and archives earthquake data from PHIVOLCS, providing:

πŸ€– Automated Updates

πŸ“ Data Files

All earthquake data is stored in the data/ folder:

File Description
phivolcs_earthquake_2023.csv All earthquakes from 2023
phivolcs_earthquake_2024.csv All earthquakes from 2024
phivolcs_earthquake_2025.csv All earthquakes from 2025 (current year)
phivolcs_earthquake_all_years.csv Combined data from all years

Data Columns

Each CSV file contains the following columns:

πŸš€ Quick Start

View the Data

Simply browse to the data/ folder to see the latest earthquake data.

Download the Data

Click on any CSV file in the data/ folder, then click β€œDownload” or β€œRaw” to get the data.

Use in Your Project

You can directly link to the raw CSV files in your applications:

https://raw.githubusercontent.com/zekejulia/phivolcs-earthquake-scraper/main/data/phivolcs_earthquake_all_years.csv

πŸ’» Running Locally

Prerequisites

pip install requests pandas lxml html5lib

Run the Scraper

python scrape_phivolcs.py

The script will automatically:

  1. Detect the current year
  2. Scrape data for the last 3 years
  3. Save separate CSV files for each year
  4. Create a combined CSV with all data

πŸ“ˆ Data Statistics

The scraper provides automatic statistics including:

Example output:

πŸ“Š Summary by Year:
  β€’ 2023: 15,234 earthquakes
  β€’ 2024: 16,789 earthquakes
  β€’ 2025: 8,456 earthquakes

πŸ“ˆ Total Records: 40,479

πŸ”§ Configuration

Change the Scraping Period

Edit scrape_phivolcs.py and modify:

YEARS_TO_SCRAPE = 3  # Change to scrape more/fewer years

Change the Schedule

Edit .github/workflows/scrape-earthquake-data.yml:

on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM UTC (10 AM PHT)

Common schedules:

Learn more about cron syntax

πŸ“‚ Repository Structure

phivolcs-earthquake-scraper/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── scrape-earthquake-data.yml    # GitHub Actions workflow
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ phivolcs_earthquake_2023.csv      # 2023 data
β”‚   β”œβ”€β”€ phivolcs_earthquake_2024.csv      # 2024 data
β”‚   β”œβ”€β”€ phivolcs_earthquake_2025.csv      # 2025 data
β”‚   └── phivolcs_earthquake_all_years.csv # Combined data
β”œβ”€β”€ scrape_phivolcs.py                    # Main scraper script
└── README.md                             # This file

🎯 Use Cases

This dataset can be used for:

πŸ› οΈ Manual Trigger

You can manually trigger the scraper from GitHub:

  1. Go to the Actions tab
  2. Click β€œScrape PHIVOLCS Earthquake Data”
  3. Click β€œRun workflow”
  4. Click the green β€œRun workflow” button

πŸ“Š Example: Loading Data in Python

import pandas as pd

# Load the latest year's data
df = pd.read_csv('data/phivolcs_earthquake_2025.csv')

# Display basic info
print(f"Total earthquakes: {len(df)}")
print(f"Average magnitude: {df['Magnitude'].mean():.2f}")

# Filter for strong earthquakes (magnitude >= 5.0)
strong_quakes = df[df['Magnitude'] >= 5.0]
print(f"Strong earthquakes: {len(strong_quakes)}")

πŸ“Š Example: Loading Data in R

library(tidyverse)

# Load the data
df <- read_csv('data/phivolcs_earthquake_2025.csv')

# Summary statistics
summary(df$Magnitude)

# Plot magnitude distribution
ggplot(df, aes(x = Magnitude)) +
  geom_histogram(binwidth = 0.5, fill = "steelblue") +
  theme_minimal() +
  labs(title = "Earthquake Magnitude Distribution")

⚠️ Important Notes

🀝 Contributing

Contributions are welcome! Feel free to:

πŸ“ Data Source & Attribution

All earthquake data is sourced from:

This project is for educational and research purposes. Please cite PHIVOLCS as the original data source when using this data.

πŸ“œ License

This project is open source and available under the MIT License.

The earthquake data itself belongs to PHIVOLCS and is subject to their terms of use.

πŸ“§ Contact

For questions or suggestions, please open an issue.

πŸ”„ Last Updated

This README was last updated: October 2025

Check the commit history or GitHub Actions runs for the latest data update timestamp.


Made with ❀️ for the Philippine data science community

If you find this project useful, please ⭐ star the repository!