“Based on their training data, they just model the probability that a given token, or word, will follow a set of tokens that ...