cryptics.georgeho.org — A Dataset of Cryptic Crossword Clues
cryptics.georgeho.org is a dataset of cryptic crossword clues1, collected
from various blogs and publicly available digital archives. I originally
started this project to practice my web scraping and data engineering skills,
but as it’s evolved I hope it can be a resource to solvers and constructors of
The project scrapes several blogs and digital archives for cryptic crosswords. Out of these collected web pages, the clues, answers, clue numbers, blogger’s explanation and commentary, puzzle title and publication date are all parsed and extracted into a tabular dataset. The result (as of September 2021) is a little over half a million clues from cryptic crosswords over the past twelve years, which makes for a rich and peculiar dataset.
Without further ado, please check out
#crossword #dataset #open-source