Question from Software Recommendations on Stack Exchange
Wikidata is an online database which contains many details about many countries, politicians, paintings, etc. For instance, for each
country you have
map, which are all strings or URLs to online images.
How can I simply generate a CSV file containing the item type and properties I want?
For instance, if I say I want all
countrys and their
capital, then it would generate a CSV file like this:
country;capital India;New Delhi Brazil;Brasília ...
Any OS/webapp/app is OK. Preferably open source.
I don’t want to download the whole Wikidata database locally, so the tool would have to make requests to the official live server.
[Read, comment, and vote on my answer at Stack Exchange]
There are many tools to accomplish your goals and the advantages and disadvantages largely depend on your current skill set. Therefore, I will simply list the tools that I know about you will have to examine which tools match the languages your know and the platforms you have access to. Furthermore, my experiences is that all of the tools are imperfect and that you will have to improve them to get exactly what you need.
Official Wikimedia information
- Manual:Using content from Wikipedia
- Alternative parsers (an excellent list of many different types of parsers but many of them are out of date
- Manual:Pywikibot/Scripts Official Wikimedia python-based scripts to accomplish tasks
- Ways to process and use Wikipedia dumps A very old blog post listing some tools
- DBpedia is a community focused on extracting structured data from Wikipedia
- Scrapy was already mentioned
- import.io is not specific to Wikipedia, but it has the power to accomplish your goals
The information you are looking for almost certainly already exists somewhere. CIA world fact book, UN databases, and open data sources certainly have this information.