HunterThinks.com

Healthcare is a right

Posted: 29 December 2014
Updated: 07 July 2018

Tool to generate CSV lists from Wikidata

Stack Exchange

Question from Software Recommendations on Stack Exchange

Wikidata is an online database which contains many details about many countries, politicians, paintings, etc. For instance, for each country you have name, flag, map, which are all strings or URLs to online images.

How can I simply generate a CSV file containing the item type and properties I want?

For instance, if I say I want all countrys and their name and capital, then it would generate a CSV file like this:

country;capital
India;New Delhi
Brazil;Brasília
...

Any OS/webapp/app is OK. Preferably open source.

I don’t want to download the whole Wikidata database locally, so the tool would have to make requests to the official live server.

My answer

[Read, comment, and vote on my answer at Stack Exchange]

There are many tools to accomplish your goals and the advantages and disadvantages largely depend on your current skill set. Therefore, I will simply list the tools that I know about you will have to examine which tools match the languages your know and the platforms you have access to. Furthermore, my experiences is that all of the tools are imperfect and that you will have to improve them to get exactly what you need.

Official Wikimedia information

  1. Manual:Using content from Wikipedia
  2. Alternative parsers (an excellent list of many different types of parsers but many of them are out of date
  3. Manual:Pywikibot/Scripts Official Wikimedia python-based scripts to accomplish tasks

More tools

  1. Ways to process and use Wikipedia dumps A very old blog post listing some tools
  2. DBpedia is a community focused on extracting structured data from Wikipedia
  3. Scrapy was already mentioned
  4. import.io is not specific to Wikipedia, but it has the power to accomplish your goals

Another thought

The information you are looking for almost certainly already exists somewhere. CIA world fact book, UN databases, and open data sources certainly have this information.

Good luck!