⁂ George Ho

cryptics.georgeho.org — A Dataset of Cryptic Crossword Clues

cryptics.georgeho.org is a dataset of cryptic crossword clues1, collected from various blogs and publicly available digital archives. I originally started this project to practice my web scraping and data engineering skills, but as it’s evolved I hope it can be a resource to solvers and constructors of cryptic crosswords.

The project scrapes several blogs and digital archives for cryptic crosswords. Out of these collected web pages, the clues, answers, clue numbers, blogger’s explanation and commentary, puzzle title and publication date are all parsed and extracted into a tabular dataset. The result (as of September 2021) is a little over half a million clues from cryptic crosswords over the past twelve years, which makes for a rich and peculiar dataset.

Without further ado, please check out cryptics.georgeho.org!

  1. If you’re new to cryptic crosswords, rejoice! A whole new world awaits you! The New Yorker has an excellent introduction to cryptic crosswords, and Matt Gritzmacher has a daily newsletter with links to crosswords↩︎

#Crossword #Dataset #Open-Source