Kathmandu2014 - Data Liberation Scrapathon

We plan to scrape data from different sources (html or pdf) into usable format. We seek requests from different stakeholders on the kind of data they are looking for. If you have anything in mind that you think we should be scraping, consider joining this google group https://groups.google.com/forum/#!forum/opendatanepal and write to the group. Or you may update this page with the datasets source.

Learn more about data-scraping here http://en.wikipedia.org/wiki/Data_scraping

Tools
There are different tools available to do the scraping process easy and quick. Different resources (pdf/websitse) might require different techniques, some might even require manual process for getting the data.

Google Chrome Scraper

pdftohtml

http://schoolofdata.org/2013/06/18/get-started-with-scraping-extracting-simple-tables-from-pdf-documents/

Datasets
Health Budget data are in PDF. Need to liberate those. http://www.mohp.gov.np/english/budget/budget.php

All the Budget redbook are in pdf. http://mof.gov.np/en/archive-documents/budget-details--red-book-28.html

Foreign employment data http://www.dofe.gov.np/np/reports/archive.php?ci=15

District expenditure data http://www.fcgo.gov.np/report-publications/