Dear Wiki Research Community,
I am a data science student at Wesleyan University. This spring, my professor asked me to choose a data set to visualize. She suggested one that contains a combination of dates, geographic information, and quantitative and qualitative variables. My mind went to Wikipedia.
I've spent a long time searching, but the volume of available data is overwhelming. Can you recommend a specific Wiki dataset?
Thank you,
Jamie Willoughby
Hi Jamie, you can find an overview of different Wikimedia datasets here: https://meta.wikimedia.org/wiki/Research:Data You could also check out this tutorial with hands-on instructions about how to work with (some of the) Wikimedia data for research: https://meta.wikimedia.org/wiki/Wikimedia_Data_Tutorial_ICWSM_2024 I hope this helps. Best, Martin
On Tue, Feb 18, 2025 at 9:00 AM Jamie Willoughby jamiekw0302@gmail.com wrote:
Dear Wiki Research Community,
I am a data science student at Wesleyan University. This spring, my professor asked me to choose a data set to visualize. She suggested one that contains a combination of dates, geographic information, and quantitative and qualitative variables. My mind went to Wikipedia.
I've spent a long time searching, but the volume of available data is overwhelming. Can you recommend a specific Wiki dataset?
Thank you,
Jamie Willoughby _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org
Hi Jamie, I can recommend using the paintings dataset on Wikidata. It’s fun to think up visualizations for paintings.
Here’s one for you that includes the Davis Art Center at Wesleyan (one of our many under-covered art institutions on Wikidata). This is a visualisation of the Connecticut map of birth places of portrait sitters for portraits in (Connecticut) collections. You could check this against neighbouring states but you might get timeouts for NYC and Boston. Most portraits in this query are probably in US collections, but could be anywhere. To get only Connecticut collections, remove the comment hashtags from the last two query statements. The nice thing about working on such datasets is you can help improve the data as you go (e.g. adding the statements that the query will pick up, such as birth places for person items missing them, but also genre for portrait paintings missing it, etc. https://w.wiki/D7as Jane
On Feb 18, 2025, at 8:59 AM, Jamie Willoughby jamiekw0302@gmail.com wrote:
Dear Wiki Research Community,
I am a data science student at Wesleyan University. This spring, my professor asked me to choose a data set to visualize. She suggested one that contains a combination of dates, geographic information, and quantitative and qualitative variables. My mind went to Wikipedia.
I've spent a long time searching, but the volume of available data is overwhelming. Can you recommend a specific Wiki dataset?
Thank you,
Jamie Willoughby _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org
Hi Jamie,
Finding the perfect dataset for viz exercises is tough!
Tableau ships with a perfect exercise dataset called Superstore. This is a fictiitous office supply store dataset with all the variable types you are after.
As a bonus, you can find tens of thousands of tutorials and blogs for Tableau using this Superstore dataset.
Here is a link to lots of localized versions of the basic dataset. https://datawonders.atlassian.net/wiki/spaces/TABLEAU/blog/2022/10/26/195343...
Good luck!
Regards,
Paul Albert
On Feb 18, 2025 at 3:00 AM -0500, Jamie Willoughby jamiekw0302@gmail.com, wrote:
Dear Wiki Research Community,
I am a data science student at Wesleyan University. This spring, my professor asked me to choose a data set to visualize. She suggested one that contains a combination of dates, geographic information, and quantitative and qualitative variables. My mind went to Wikipedia.
I've spent a long time searching, but the volume of available data is overwhelming. Can you recommend a specific Wiki dataset?
Thank you,
Jamie Willoughby _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org
Wow, thanks a lot for the help! I really appreciate your suggestions.
On Tue, Feb 18, 2025 at 11:38 AM Paul Albert mrpaulalbert@gmail.com wrote:
Hi Jamie,
Finding the perfect dataset for viz exercises is tough!
Tableau ships with a perfect exercise dataset called Superstore. This is a fictiitous office supply store dataset with all the variable types you are after.
As a bonus, you can find tens of thousands of tutorials and blogs for Tableau using this Superstore dataset.
Here is a link to lots of localized versions of the basic dataset. https://datawonders.atlassian.net/wiki/spaces/TABLEAU/blog/2022/10/26/195343...
Good luck!
Regards,
Paul Albert
On Feb 18, 2025 at 3:00 AM -0500, Jamie Willoughby jamiekw0302@gmail.com, wrote:
Dear Wiki Research Community,
I am a data science student at Wesleyan University. This spring, my professor asked me to choose a data set to visualize. She suggested one that contains a combination of dates, geographic information, and quantitative and qualitative variables. My mind went to Wikipedia.
I've spent a long time searching, but the volume of available data is overwhelming. Can you recommend a specific Wiki dataset?
Thank you,
Jamie Willoughby _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to
wiki-research-l-leave@lists.wikimedia.org _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave@lists.wikimedia.org
wiki-research-l@lists.wikimedia.org