R/topic_modeling_core.R
predict.lda_topic_model.Rd
Obtains predictions of topics for new documents from a fitted LDA model
a fitted object of class lda_topic_model
a DTM or TCM of class dgCMatrix
or a numeric vector
one of either "gibbs" or "dot". If "gibbs" Gibbs sampling is used
and iterations
must be specified.
If method = "gibbs"
, an integer number of iterations
for the Gibbs sampler to run. A future version may include automatic stopping criteria.
If method = "gibbs"
, an integer number of burnin iterations.
If burnin
is greater than -1, the entries of the resulting "theta" matrix
are an average over all iterations greater than burnin
.
Other arguments to be passed to TmParallelApply
a "theta" matrix with one row per document and one column per topic
if (FALSE) {
# load some data
data(nih_sample_dtm)
# fit a model
set.seed(12345)
m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
iterations = 200, burnin = 175)
str(m)
# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
iterations = 200, burnin = 175)
# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")
# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue"))
}