KMueller-Lab
diff --git a/‎.gitignore‎
Lines changed: 26 additions & 0 deletions b/‎.gitignore‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 88 additions & 0 deletions b/‎README.md‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎global_macro_data/__init__.py‎
Lines changed: 10 additions & 0 deletions b/‎global_macro_data/__init__.py‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎global_macro_data/gmd.py‎
Lines changed: 164 additions & 0 deletions b/‎global_macro_data/gmd.py‎
Lines changed: 164 additions & 0 deletions
@@ -0,0 +1,26 @@
+# Python bytecode
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Distribution / packaging
+*.egg-info/
+dist/
+build/
+*.egg
+
+# Virtual environments
+venv/
+env/
+.env
+.venv
+ENV/
+
+# IDE related
+.idea/
+.vscode/
+*.swp
+*.swo
+
+# OS related
+.DS_Store
@@ -0,0 +1,88 @@
+# The Global Macro Database (Python Package)
+<a href="https://www.globalmacrodata.com" target="_blank" rel="noopener noreferrer">
+    <img src="https://img.shields.io/badge/Website-Visit-blue?style=flat&logo=google-chrome" alt="Website Badge">
+</a>
+
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+
+[Link to paper 📄](https://www.globalmacrodata.com/research-paper.html)
+
+This repository complements paper, **Müller, Xu, Lehbib, and Chen (2025)**, which introduces a panel dataset of **46 macroeconomic variables across 243 countries** from historical records beginning in the year **1086** until **2024**, including projections through the year **2030**.
+
+## Features
+
+- **Unparalleled Coverage**: Combines data from **32 contemporary sources** (e.g., IMF, World Bank, OECD) with **78 historical datasets**.
+- **Extensive Variables**: GDP, inflation, government finance, trade, employment, interest rates, and more.
+- **Harmonized Data**: Resolves inconsistencies and splices all available data together.
+- **Scheduled Updates**: Regular releases ensure data reliability.
+- **Full Transparency**: All code is open source and available in this repository.
+- **Accessible Formats**: Provided in `.dta`, `.csv` and as **<a href="https://github.com/KMueller-Lab/Global-Macro-Database" target="_blank" rel="noopener noreferrer">Stata</a>
+/<a href="https://github.com/Yangbo-Wang/global_macro_data_python" target="_blank" rel="noopener noreferrer">Python</a>/<a href="https://github.com/Yangbo-Wang/global_macro_data_R" target="_blank" rel="noopener noreferrer">R</a> package**.
+
+## Data access
+
+<a href="https://www.globalmacrodata.com/data.html" target="_blank" rel="noopener noreferrer">Download via website</a>
+
+**Python package:**
+```
+pip install global_macro_data
+```
+
+**How to use (examples)**
+```python
+from global_macro_data import gmd
+
+# Get preview data (Singapore 2000-2020)
+df = gmd()
+
+# Get data from latest available version
+df = gmd(show_preview=False)
+
+# Get data from a specific version
+df = gmd(version="2025_01")
+
+# Get data for a specific country
+df = gmd(country="USA")
+
+# Get data for multiple countries
+df = gmd(country=["USA", "CHN", "DEU"])
+
+# Get specific variables
+df = gmd(variables=["rGDP", "infl", "unemp"])
+
+# Combine parameters
+df = gmd(version="2025_01", country=["USA", "CHN"], variables=["rGDP", "unemp", "CPI"])
+```
+
+## Parameters
+- **version (str)**: Dataset version in format 'YYYY_MM' (e.g., '2025_01'). If None, the latest dataset is used.
+- **country (str or list)**: ISO3 country code(s) (e.g., "SGP" or ["MRT", "SGP"]). If None, returns all countries.
+- **variables (list)**: List of variable codes to include (e.g., ["rGDP", "unemp"]). If None, all variables are included.
+- **show_preview (bool)**: If True and no other parameters are provided, shows a preview.
+
+## Release schedule 
+
+| Release Date | Details          |
+|--------------|------------------|
+| 2025-01-30   | Initial release: v2025-01 |
+| 2025-04-01   | v2025-04         |
+| 2025-07-01   | v2025-09         |
+| 2025-10-01   | v2025-12         |
+| 2026-01-01   | v2026-03         |
+
+## Citation
+
+To cite this dataset, please use the following reference:
+
+```bibtex
+@techreport{mueller2025global, 
+    title = {The Global Macro Database: A New International Macroeconomic Dataset}, 
+    author = {Müller, Karsten and Xu, Chenzi and Lehbib, Mohamed and Chen, Ziliang}, 
+    year = {2025}, 
+    type = {Working Paper}
+}
+```
+
+## Acknowledgments
+
+The development of the Global Macro Database would not have been possible without the generous funding provided by the Singapore Ministry of Education (MOE) through the PYP grants (WBS A-0003319-01-00 and A-0003319-02-00), a Tier 1 grant (A-8001749- 00-00), and the NUS Risk Management Institute (A-8002360-00-00). This financial support laid the foundation for the successful completion of this extensive project.
@@ -0,0 +1,10 @@
+import os
+import requests
+import pandas as pd
+
+# Allowed quarters
+VALID_QUARTERS = ["01", "03", "06", "09", "12"]
+
+from .gmd import gmd, find_latest_data
+
+__all__ = ["gmd", "find_latest_data", "VALID_QUARTERS"]
@@ -0,0 +1,164 @@
+import os
+import requests
+import pandas as pd
+import io
+import re
+
+# Allowed quarters
+VALID_QUARTERS = ["01", "03", "06", "09", "12"]
+
+def gmd(version=None, country=None, variables=None, show_preview=True):
+    """
+    Download and filter Global Macro Data.
+    
+    Parameters:
+    - version (str): Dataset version in format 'YYYY_MM' (e.g., '2025_01').
+                   If None, the latest available version is used.
+                   Note: '01' quarter is only valid for year 2025.
+    - country (str or list): ISO3 country code(s) (e.g., "SGP" or ["MRT", "SGP"]).
+                          If None, returns all countries.
+    - variables (list): List of variable codes to include (e.g., ["rGDP", "unemp"]).
+                      If None, all variables are included.
+    - show_preview (bool): If True and no other parameters are provided, shows a preview.
+    
+    Returns:
+    - pd.DataFrame: The requested data.
+    """
+    # Check if this is a default call (no specific parameters)
+    default_call = (version is None and country is None and variables is None and show_preview)
+    
+    base_url = "https://www.globalmacrodata.com"
+
+    # Process version parameter or find latest
+    if version is None:
+        # Automatically select the latest available dataset
+        year, quarter = find_latest_data(base_url)
+        version = f"{year}_{quarter:02d}"
+    else:
+        # Validate the version format
+        if not re.match(r'^\d{4}_(01|03|06|09|12)$', version):
+            raise ValueError("Version must be in format 'YYYY_MM' where MM is one of: 01, 03, 06, 09, 12")
+        
+        # Parse the version
+        year_str, quarter = version.split('_')
+        year = int(year_str)
+        
+        # Special validation for quarter 01
+        if quarter == "01" and year != 2025:
+            raise ValueError("Quarter '01' is only valid for year 2025")
+
+    # Construct URL
+    data_url = f"{base_url}/GMD_{version}.csv"
+    print(f"Downloading: {data_url}")
+
+    # Download data
+    response = requests.get(data_url)
+    if response.status_code != 200:
+        raise FileNotFoundError(f"Error: Data file not found at {data_url}")
+
+    # Read the data
+    df = pd.read_csv(io.StringIO(response.text))
+
+    # Filter by country if specified
+    if country:
+        # Convert single country to list for consistent handling
+        if isinstance(country, str):
+            country = [country]
+        
+        # Convert all country codes to uppercase
+        country = [c.upper() for c in country]
+        
+        # Check if all specified countries exist in the dataset
+        invalid_countries = [c for c in country if c not in df["ISO3"].unique()]
+        if invalid_countries:
+            # Load isomapping for better error handling
+            try:
+                # Try to load isomapping from the expected location
+                script_dir = os.path.dirname(os.path.abspath(__file__))
+                isomapping_path = os.path.join(script_dir, 'isomapping.csv')
+                isomapping = pd.read_csv(isomapping_path)
+                
+                # Display helpful error message with available countries
+                print(f"Error: Invalid country code(s): {', '.join(invalid_countries)}. Available country codes are:")
+                for i, row in isomapping.iterrows():
+                    print(f"{row['ISO3']}: {row['countryname']}")
+            except Exception:
+                # If isomapping.csv can't be loaded, use the country list from the dataset
+                print(f"Error: Invalid country code(s): {', '.join(invalid_countries)}. Available country codes are:")
+                country_list = sorted(set(zip(df["ISO3"], df["countryname"])))
+                for iso3, name in country_list:
+                    if pd.notna(iso3) and pd.notna(name):
+                        print(f"{iso3}: {name}")
+            
+            raise ValueError(f"Invalid country code(s): {', '.join(invalid_countries)}")
+        
+        # Filter for multiple countries
+        df = df[df["ISO3"].isin(country)]
+        print(f"Filtered data for countries: {', '.join(country)}")
+    
+    # Filter by variables if specified
+    if variables:
+        # Always include identifier columns
+        required_cols = ["ISO3", "countryname", "year"]
+        all_cols = required_cols + [var for var in variables if var not in required_cols]
+        
+        # Check if all requested variables exist in the dataset
+        missing_vars = [var for var in variables if var not in df.columns]
+        if missing_vars:
+            print(f"Warning: The following requested variables are not in the dataset: {missing_vars}")
+            print("Available variables are:")
+            for i, col in enumerate(sorted(df.columns)):
+                if i > 0 and i % 4 == 0:
+                    print("")  # Line break every 4 columns
+                print(f"- {col}", end="  ")
+            print("\n")
+        
+        # Filter to only include requested variables (plus identifiers)
+        existing_vars = [var for var in all_cols if var in df.columns]
+        df = df[existing_vars]
+        print(f"Selected {len(existing_vars)} variables")
+    
+    # Only show the preview for default calls (no specific parameters)
+    if default_call:
+        # Get Singapore data from 2000-2020
+        sample_df = df[(df["ISO3"] == "SGP") & (df["year"] >= 2000) & (df["year"] <= 2020)]
+        
+        if len(sample_df) > 0:
+            print(f"Singapore (SGP) data, 2000-2020")
+            print(f"{len(sample_df)} rows out of {len(df)} total rows in the dataset")
+            
+            # Display the data with specified columns, sorted by year
+            pd.set_option('display.max_columns', None)
+            pd.set_option('display.width', 1000)
+            
+            # Define the preview columns in the exact order requested
+            preview_cols = ["year", "ISO3", "countryname", "nGDP", "rGDP", "pop", "unemp", "infl", 
+                            "exports", "imports", "govdebt_GDP", "ltrate"]
+            
+            # Check which columns exist in the dataset
+            available_cols = [col for col in preview_cols if col in sample_df.columns]
+            
+            # Sort by year and display with available columns
+            print(sample_df[available_cols].sort_values(by="year"))
+        else:
+            print("No data available for Singapore (SGP) between 2000-2020")
+
+    print(f"Final dataset: {len(df)} observations of {len(df.columns)} variables")
+    return df
+
+def find_latest_data(base_url):
+    """ Attempt to find the most recent available dataset """
+    import datetime
+
+    current_year = datetime.datetime.now().year
+    for year in range(current_year, 2019, -1):  # Iterate backward by year
+        for quarter in ["12", "09", "06", "03", "01"]:
+            url = f"{base_url}/GMD_{year}_{quarter}.csv"
+            try:
+                response = requests.head(url, timeout=5)
+                if response.status_code == 200:
+                    return year, int(quarter)
+            except:
+                continue
+    
+    raise FileNotFoundError("No available dataset found on the server.")