Skip to content

Commit 887b3f8

Browse files
author
Yangbo-Wang
committed
Commit package
0 parents  commit 887b3f8

File tree

7 files changed

+582
-0
lines changed

7 files changed

+582
-0
lines changed

.gitignore

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Python bytecode
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# Distribution / packaging
7+
*.egg-info/
8+
dist/
9+
build/
10+
*.egg
11+
12+
# Virtual environments
13+
venv/
14+
env/
15+
.env
16+
.venv
17+
ENV/
18+
19+
# IDE related
20+
.idea/
21+
.vscode/
22+
*.swp
23+
*.swo
24+
25+
# OS related
26+
.DS_Store

README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# The Global Macro Database (Python Package)
2+
<a href="https://www.globalmacrodata.com" target="_blank" rel="noopener noreferrer">
3+
<img src="https://img.shields.io/badge/Website-Visit-blue?style=flat&logo=google-chrome" alt="Website Badge">
4+
</a>
5+
6+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
7+
8+
[Link to paper 📄](https://www.globalmacrodata.com/research-paper.html)
9+
10+
This repository complements paper, **Müller, Xu, Lehbib, and Chen (2025)**, which introduces a panel dataset of **46 macroeconomic variables across 243 countries** from historical records beginning in the year **1086** until **2024**, including projections through the year **2030**.
11+
12+
## Features
13+
14+
- **Unparalleled Coverage**: Combines data from **32 contemporary sources** (e.g., IMF, World Bank, OECD) with **78 historical datasets**.
15+
- **Extensive Variables**: GDP, inflation, government finance, trade, employment, interest rates, and more.
16+
- **Harmonized Data**: Resolves inconsistencies and splices all available data together.
17+
- **Scheduled Updates**: Regular releases ensure data reliability.
18+
- **Full Transparency**: All code is open source and available in this repository.
19+
- **Accessible Formats**: Provided in `.dta`, `.csv` and as **<a href="https://github.com/KMueller-Lab/Global-Macro-Database" target="_blank" rel="noopener noreferrer">Stata</a>
20+
/<a href="https://github.com/Yangbo-Wang/global_macro_data_python" target="_blank" rel="noopener noreferrer">Python</a>/<a href="https://github.com/Yangbo-Wang/global_macro_data_R" target="_blank" rel="noopener noreferrer">R</a> package**.
21+
22+
## Data access
23+
24+
<a href="https://www.globalmacrodata.com/data.html" target="_blank" rel="noopener noreferrer">Download via website</a>
25+
26+
**Python package:**
27+
```
28+
pip install global_macro_data
29+
```
30+
31+
**How to use (examples)**
32+
```python
33+
from global_macro_data import gmd
34+
35+
# Get preview data (Singapore 2000-2020)
36+
df = gmd()
37+
38+
# Get data from latest available version
39+
df = gmd(show_preview=False)
40+
41+
# Get data from a specific version
42+
df = gmd(version="2025_01")
43+
44+
# Get data for a specific country
45+
df = gmd(country="USA")
46+
47+
# Get data for multiple countries
48+
df = gmd(country=["USA", "CHN", "DEU"])
49+
50+
# Get specific variables
51+
df = gmd(variables=["rGDP", "infl", "unemp"])
52+
53+
# Combine parameters
54+
df = gmd(version="2025_01", country=["USA", "CHN"], variables=["rGDP", "unemp", "CPI"])
55+
```
56+
57+
## Parameters
58+
- **version (str)**: Dataset version in format 'YYYY_MM' (e.g., '2025_01'). If None, the latest dataset is used.
59+
- **country (str or list)**: ISO3 country code(s) (e.g., "SGP" or ["MRT", "SGP"]). If None, returns all countries.
60+
- **variables (list)**: List of variable codes to include (e.g., ["rGDP", "unemp"]). If None, all variables are included.
61+
- **show_preview (bool)**: If True and no other parameters are provided, shows a preview.
62+
63+
## Release schedule
64+
65+
| Release Date | Details |
66+
|--------------|------------------|
67+
| 2025-01-30 | Initial release: v2025-01 |
68+
| 2025-04-01 | v2025-04 |
69+
| 2025-07-01 | v2025-09 |
70+
| 2025-10-01 | v2025-12 |
71+
| 2026-01-01 | v2026-03 |
72+
73+
## Citation
74+
75+
To cite this dataset, please use the following reference:
76+
77+
```bibtex
78+
@techreport{mueller2025global,
79+
title = {The Global Macro Database: A New International Macroeconomic Dataset},
80+
author = {Müller, Karsten and Xu, Chenzi and Lehbib, Mohamed and Chen, Ziliang},
81+
year = {2025},
82+
type = {Working Paper}
83+
}
84+
```
85+
86+
## Acknowledgments
87+
88+
The development of the Global Macro Database would not have been possible without the generous funding provided by the Singapore Ministry of Education (MOE) through the PYP grants (WBS A-0003319-01-00 and A-0003319-02-00), a Tier 1 grant (A-8001749- 00-00), and the NUS Risk Management Institute (A-8002360-00-00). This financial support laid the foundation for the successful completion of this extensive project.

global_macro_data/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
import os
2+
import requests
3+
import pandas as pd
4+
5+
# Allowed quarters
6+
VALID_QUARTERS = ["01", "03", "06", "09", "12"]
7+
8+
from .gmd import gmd, find_latest_data
9+
10+
__all__ = ["gmd", "find_latest_data", "VALID_QUARTERS"]

global_macro_data/gmd.py

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
import os
2+
import requests
3+
import pandas as pd
4+
import io
5+
import re
6+
7+
# Allowed quarters
8+
VALID_QUARTERS = ["01", "03", "06", "09", "12"]
9+
10+
def gmd(version=None, country=None, variables=None, show_preview=True):
11+
"""
12+
Download and filter Global Macro Data.
13+
14+
Parameters:
15+
- version (str): Dataset version in format 'YYYY_MM' (e.g., '2025_01').
16+
If None, the latest available version is used.
17+
Note: '01' quarter is only valid for year 2025.
18+
- country (str or list): ISO3 country code(s) (e.g., "SGP" or ["MRT", "SGP"]).
19+
If None, returns all countries.
20+
- variables (list): List of variable codes to include (e.g., ["rGDP", "unemp"]).
21+
If None, all variables are included.
22+
- show_preview (bool): If True and no other parameters are provided, shows a preview.
23+
24+
Returns:
25+
- pd.DataFrame: The requested data.
26+
"""
27+
# Check if this is a default call (no specific parameters)
28+
default_call = (version is None and country is None and variables is None and show_preview)
29+
30+
base_url = "https://www.globalmacrodata.com"
31+
32+
# Process version parameter or find latest
33+
if version is None:
34+
# Automatically select the latest available dataset
35+
year, quarter = find_latest_data(base_url)
36+
version = f"{year}_{quarter:02d}"
37+
else:
38+
# Validate the version format
39+
if not re.match(r'^\d{4}_(01|03|06|09|12)$', version):
40+
raise ValueError("Version must be in format 'YYYY_MM' where MM is one of: 01, 03, 06, 09, 12")
41+
42+
# Parse the version
43+
year_str, quarter = version.split('_')
44+
year = int(year_str)
45+
46+
# Special validation for quarter 01
47+
if quarter == "01" and year != 2025:
48+
raise ValueError("Quarter '01' is only valid for year 2025")
49+
50+
# Construct URL
51+
data_url = f"{base_url}/GMD_{version}.csv"
52+
print(f"Downloading: {data_url}")
53+
54+
# Download data
55+
response = requests.get(data_url)
56+
if response.status_code != 200:
57+
raise FileNotFoundError(f"Error: Data file not found at {data_url}")
58+
59+
# Read the data
60+
df = pd.read_csv(io.StringIO(response.text))
61+
62+
# Filter by country if specified
63+
if country:
64+
# Convert single country to list for consistent handling
65+
if isinstance(country, str):
66+
country = [country]
67+
68+
# Convert all country codes to uppercase
69+
country = [c.upper() for c in country]
70+
71+
# Check if all specified countries exist in the dataset
72+
invalid_countries = [c for c in country if c not in df["ISO3"].unique()]
73+
if invalid_countries:
74+
# Load isomapping for better error handling
75+
try:
76+
# Try to load isomapping from the expected location
77+
script_dir = os.path.dirname(os.path.abspath(__file__))
78+
isomapping_path = os.path.join(script_dir, 'isomapping.csv')
79+
isomapping = pd.read_csv(isomapping_path)
80+
81+
# Display helpful error message with available countries
82+
print(f"Error: Invalid country code(s): {', '.join(invalid_countries)}. Available country codes are:")
83+
for i, row in isomapping.iterrows():
84+
print(f"{row['ISO3']}: {row['countryname']}")
85+
except Exception:
86+
# If isomapping.csv can't be loaded, use the country list from the dataset
87+
print(f"Error: Invalid country code(s): {', '.join(invalid_countries)}. Available country codes are:")
88+
country_list = sorted(set(zip(df["ISO3"], df["countryname"])))
89+
for iso3, name in country_list:
90+
if pd.notna(iso3) and pd.notna(name):
91+
print(f"{iso3}: {name}")
92+
93+
raise ValueError(f"Invalid country code(s): {', '.join(invalid_countries)}")
94+
95+
# Filter for multiple countries
96+
df = df[df["ISO3"].isin(country)]
97+
print(f"Filtered data for countries: {', '.join(country)}")
98+
99+
# Filter by variables if specified
100+
if variables:
101+
# Always include identifier columns
102+
required_cols = ["ISO3", "countryname", "year"]
103+
all_cols = required_cols + [var for var in variables if var not in required_cols]
104+
105+
# Check if all requested variables exist in the dataset
106+
missing_vars = [var for var in variables if var not in df.columns]
107+
if missing_vars:
108+
print(f"Warning: The following requested variables are not in the dataset: {missing_vars}")
109+
print("Available variables are:")
110+
for i, col in enumerate(sorted(df.columns)):
111+
if i > 0 and i % 4 == 0:
112+
print("") # Line break every 4 columns
113+
print(f"- {col}", end=" ")
114+
print("\n")
115+
116+
# Filter to only include requested variables (plus identifiers)
117+
existing_vars = [var for var in all_cols if var in df.columns]
118+
df = df[existing_vars]
119+
print(f"Selected {len(existing_vars)} variables")
120+
121+
# Only show the preview for default calls (no specific parameters)
122+
if default_call:
123+
# Get Singapore data from 2000-2020
124+
sample_df = df[(df["ISO3"] == "SGP") & (df["year"] >= 2000) & (df["year"] <= 2020)]
125+
126+
if len(sample_df) > 0:
127+
print(f"Singapore (SGP) data, 2000-2020")
128+
print(f"{len(sample_df)} rows out of {len(df)} total rows in the dataset")
129+
130+
# Display the data with specified columns, sorted by year
131+
pd.set_option('display.max_columns', None)
132+
pd.set_option('display.width', 1000)
133+
134+
# Define the preview columns in the exact order requested
135+
preview_cols = ["year", "ISO3", "countryname", "nGDP", "rGDP", "pop", "unemp", "infl",
136+
"exports", "imports", "govdebt_GDP", "ltrate"]
137+
138+
# Check which columns exist in the dataset
139+
available_cols = [col for col in preview_cols if col in sample_df.columns]
140+
141+
# Sort by year and display with available columns
142+
print(sample_df[available_cols].sort_values(by="year"))
143+
else:
144+
print("No data available for Singapore (SGP) between 2000-2020")
145+
146+
print(f"Final dataset: {len(df)} observations of {len(df.columns)} variables")
147+
return df
148+
149+
def find_latest_data(base_url):
150+
""" Attempt to find the most recent available dataset """
151+
import datetime
152+
153+
current_year = datetime.datetime.now().year
154+
for year in range(current_year, 2019, -1): # Iterate backward by year
155+
for quarter in ["12", "09", "06", "03", "01"]:
156+
url = f"{base_url}/GMD_{year}_{quarter}.csv"
157+
try:
158+
response = requests.head(url, timeout=5)
159+
if response.status_code == 200:
160+
return year, int(quarter)
161+
except:
162+
continue
163+
164+
raise FileNotFoundError("No available dataset found on the server.")

0 commit comments

Comments
 (0)