word2vecの特許の請求項9
word2vecの特許が話題だったので、請求項9をわかりやすくレイアウトしてみました。
9.
(C1) A method for assigning a respective point in a high-dimensional space to each word in a vocabulary of words,
the method comprising: {
(C2)obtaining a set of training data,
wherein {
(C3)the set of training data comprises sequences of words
}
;
(C4)training a plurality of classifiers and an embedding function on the set of training data,
wherein {
(C5)the embedding function
receives an input word
and
(C6)maps the input word to a numeric representation in the high-dimensional space
in accordance with a set of embedding function parameters,
}
wherein {
(C7)each of the classifiers corresponds to a respective position surrounding the input word in a sequence of words,
}
and wherein {
(C8)each of the classifiers processes the numeric representation of the input word
to generate a respective word score for each word in a pre-determined set of words,
wherein {
(C9)each of the respective word scores represents a predicted likelihood
that the corresponding word will be found in the corresponding position relative to the input word,
}
and wherein {
(C10)training the embedding function comprises obtaining trained values of the embedding function parameters
}
}
}
;
(C11)processing each word in the vocabulary
using the embedding function in accordance with the trained values of the embedding function parameters
to generate a respective numerical representation of each word in the vocabulary
; and
(C12)associating each word in the vocabulary
with the respective numeric representation of the word
in the high-dimensional space.
}