Most common words in English

Once there lived a poor woodcutter. He used to cut trees in the woods. One day he was cutting wood on the bank of a river. His axe fell down into the river. The river was deep. He could not take his axe out. He sat on the bank and began to weep.

Mercury, the god of water appeared. He asked the reason of his weeping. The woodcutter told the whole story. Mercury dived into the water and brought a golden axe. The woodcutter refused to take it. Mercury again dived and brought a silver axe. The woodcutter did not take it either. Then he brought an iron axe. The woodcutter took it gladly. Mercury was much pleased. He rewarded the woodcutter with the other two axes.

Let us find some keywords for the story. One simple approach is that choose the words which are more frequent in the documents as keywords.
The frequency list of the words in the document is given below. The list prepared without considering the meaning of the words. So axe and axes were considered as different words.

Word Frequency
the 15
he 7
woodcutter 6
axe 5
a 4
mercury 4
of 3
to 3
... ...

the, of, to etc. possess no characteristics of the document. These words are more close to grammar than meaning. They are connecting words to form sentence. These words are occurring frequently in English language and should be excluded from the frequency list.

How do we come up with the common words in English language?

Oxford has done a good job of identifying common words in English language. There are some remarkable findings in their research. The first ten most common lemmas(a lemma is the base form of a word. For example, climbs, climbing, and climbed are all examples of the one lemma climb) is constituted 25% of the corpus they built and the first 100 lemmas are 50% of the corpus. The 100 lemmas list is given below.

1    the
2    be
3    to
4    of
5    and
6    a
7    in
8    that
9    have
10   I
11   it
12   for
13   not
14   on
15   with
16   he
17   as
18   you
19   do
20   at
21   this
22   but
23   his
24   by
25   from
26   they
27   we
28   say
29   her
30   she
31   or
32   an
33   will
34   my
35   one
36   all
37   would
38   there
39   their
40   what
41   so
42   up
43   out
44   if
45   about
46   who
47   get
48   which
49   go
50   me
51   when
52   make
53   can
54   like
55   time
56   no
57   just
58   him
59   know
60   take
61   people
62   into
63   year
64   your
65   good
66   some
67   could
68   them
69   see
70   other
71   than
72   then
73   now
74   look
75   only
76   come
77   its
78   over
79   think
80   also
81   back
82   after
83   use
84   two
85   how
86   our
87   work
88   first
89   well
90   way
91   even
92   new
93   want
94   because
95   any
96   these
97   give
98   day
99   most
100   us

Here is the frequent words excluding the hundred words.

Word Frequency
woodcutter 6
axe 5
mercury 4
brought 3
river 3
bank 2
dived 2
water 2

Say these eight words(woodcutter, axe, mercury, river, brought, dived, water, bank) to any one of your friends. If the friend ever read the story, it would remind her the story. Otherwise just ask to any one of the search engines!

It produced a good set of keywords even though it failed to identify significant keywords like gold, silver, etc.