Word Scoring project for Outreachy / GSoC
Status: Final | March 2025
The Word Score project has been proposed for 2025 GSoC and Outreachy. Multiple people have inquired about what should be done to write the proposal.
The general recommendation for most GSoC projects is to build the project and try to fix some bugs. However, this project is different enough from the rest of the code base that the normal set of newcomer bugs isn’t useful. Instead, we need an alternate approach. This doc is an attempt to provide guidance and answer some questions.
Project
Prospective interns should attempt to calculate the bigraph and trigraph scores as listed in the design doc.
They should start by making a fork of the crosswords repo and then
work in their fork within the word-list/
and tools/
directories. Their proposal should include a link to what they did and
explain how they went about it. I’d also like to see a doc describing
their scoring approach.
Proposals will be assessed on some the following characteristics:
Demonstrating independence / problem solving ability
Knowledge/ability to work with Git.
Quality of the proposal for the subproject
Ability to write clean python code
Ability to generate data and understand what has been generated
Plan for storage of intermediate data
Actual score value!
etc..
And as a bonus:
Understanding how to pipe the data through the structure to the C code.
If they have relevant experience / code / contributions outside of this project, they should include those as well.
NOTE: I don’t expect everyone to be able to check off all these boxes in the proposal phase. Work here will seamlessly translate to the actual project. Making progress on it is a good beginning.
Background links
Requirements Reminder
Just a reminder that this project will require a somewhat stable internet and a machine with sufficient storage to download a 20gb data set. I also want people to be able to run a linux desktop so that they can develop in the same environment that crosswords is developed in.