Posts tagged ‘NLP’

The Great Plural Experiment

2013-03-31 22:24

You probably know very well that internationalization is hard. The mere act of translating the UI texts is actually one of the easiest parts, even though it’s not a pushover either. As one example: if your messages include quantities, you need to have some logic in place to choose different forms of nouns to go with your numbers. Fortunately, most frameworks already have that, as it’s a standard i18n feature.

Not every string message in your code is something to localize, of course. Log messages that are not visible to the user can be left alone in English – they should be, in fact. Coincidentally, though, those messages are also very likely to contain many numbers, often used as numerical quantities: things to do, things done, error count, and so on:

  1. INFO: database.py:442/init_database - 236 rows created

What if that number is 1?…

  1. INFO: files.py:132/process_files - 1 files processed

Oh well. That’s hardly the end of the world, isn’t it? Anyway, let’s just make the message slightly more universal:

  1. INFO: files.py:132/process_files - 1 file(s) processed

There, problem solved!

No worries, I haven’t gone insane. I know that no real-world software would put such a gold plating on something as irrelevant as grammar of its log messages. But it’s spring break, and we can be silly, so let’s have some fun with the idea.
Here I pose the question:

How hard would it be to construct a plural form of English noun from the singular one?

Consulting the largest repository of human knowledge (well, second largest) reveals that the rules of building English plurals are not exactly trivial – but not very complex either. There are exceptions to almost every rule, though, and a large body of exceptions in general. Still, you could expect to achieve at least some success by just disregarding them completely, and following the simple rules to the letter.

How high that success ratio would be, though?

Tags: , , , ,
Author: Xion, posted under Computer Science & IT » 4 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.