You probably know very well that internationalization is hard. The mere act of translating the UI texts is actually one of the easiest parts, even though it’s not a pushover either. As one example: if your messages include quantities, you need to have some logic in place to choose different forms of nouns to go with your numbers. Fortunately, most frameworks already have that, as it’s a standard i18n feature.
Not every string message in your code is something to localize, of course. Log messages that are not visible to the user can be left alone in English – they should be, in fact. Coincidentally, though, those messages are also very likely to contain many numbers, often used as numerical quantities: things to do, things done, error count, and so on:
What if that number is 1?…
Oh well. That’s hardly the end of the world, isn’t it? Anyway, let’s just make the message slightly more universal:
There, problem solved!
No worries, I haven’t gone insane. I know that no real-world software would put such a gold plating on something as irrelevant as grammar of its log messages. But it’s spring break, and we can be silly, so let’s have some fun with the idea.
Here I pose the question:
How hard would it be to construct a plural form of English noun from the singular one?
Consulting the largest repository of human knowledge (well, second largest) reveals that the rules of building English plurals are not exactly trivial – but not very complex either. There are exceptions to almost every rule, though, and a large body of exceptions in general. Still, you could expect to achieve at least some success by just disregarding them completely, and following the simple rules to the letter.
How high that success ratio would be, though?
To estimate this, I crafted a totally insolent function that attempts to capture the various intricacies of English language with half-screen worth of very simple code:
they constitute at least a few percent. With some 30 or 40 such a words, hard-coded into our simple function, I suspect going above 90% success ratio in practice is firmly within our grasp.
All in O(1) time and without any noticeable memory footprint. Who said that natural language processing must be hard? :)
Doing that with English is playing on Bring It On.
Try with polish :)
It’s not internationalization, it’s using proper english. Try doing it in english and polish at the same time :D.
ReSharper uses similar operation to suggest variable name for collection types. Also, Entity Framework generates plural table names from singular object name. It’s fast, and it has about 90% correctness, I assume method is similar or even identical.
Still, what about other languages? ;-)