Data PipelinePharma & HealthcareWeb Scraping

PMD

Automated pharma data unification for drug pricing and reimbursement policies across global markets.

About the project

PMD automates one of the most fragmented data-gathering problems in healthcare: collecting maximum drug prices and reimbursement policies from regulators across dozens of countries. Each source is structured differently — PDF tables, paginated HTML, downloadable spreadsheets with shifting layouts — and yet the resulting dataset has to be uniform, queryable, and audit-ready.

The pipeline

We built a fleet of country-specific scrapers running on a scheduled orchestrator. Each scraper extracts the source data, normalizes it against a shared schema, validates it, and publishes to a versioned warehouse. Anomalies (sudden price jumps, missing entries, regulatory format changes) are flagged for human review rather than silently dropped.

The harder problem was reconciling drug identities across regions. The same molecule may have different brand names, dosage formulations, and reimbursement classes in different markets. We modeled this as a graph linking active ingredients to local listings, which lets downstream consumers query at any level — molecule, brand, or regional SKU.

Outcome

What used to be a quarterly manual research job is now a continuously refreshed dataset. PMD has become operationally critical for analytics teams that track market access, pricing strategy, and competitive intelligence at a global scale.

Work