Word Scoring project for Outreachy / GSoC

Status: Final | March 2025

The Word Score project has been proposed for 2025 GSoC and Outreachy. Multiple people have inquired about what should be done to write the proposal.

The general recommendation for most GSoC projects is to build the project and try to fix some bugs. However, this project is different enough from the rest of the code base that the normal set of newcomer bugs isn’t useful. Instead, we need an alternate approach. This doc is an attempt to provide guidance and answer some questions.

Project

Prospective interns should attempt to calculate the bigraph and trigraph scores as listed in the design doc.

They should start by making a fork of the crosswords repo and then work in their fork within the word-list/ and tools/ directories. Their proposal should include a link to what they did and explain how they went about it. I’d also like to see a doc describing their scoring approach.

Proposals will be assessed on some the following characteristics:

  • Demonstrating independence / problem solving ability

  • Knowledge/ability to work with Git.

  • Quality of the proposal for the subproject

  • Ability to write clean python code

  • Ability to generate data and understand what has been generated

  • Plan for storage of intermediate data

  • Actual score value!

  • etc..

And as a bonus:

  • Understanding how to pipe the data through the structure to the C code.

If they have relevant experience / code / contributions outside of this project, they should include those as well.

NOTE: I don’t expect everyone to be able to check off all these boxes in the proposal phase. Work here will seamlessly translate to the actual project. Making progress on it is a good beginning.

Requirements Reminder

Just a reminder that this project will require a somewhat stable internet and a machine with sufficient storage to download a 20gb data set. I also want people to be able to run a linux desktop so that they can develop in the same environment that crosswords is developed in.