I wrote an extensive set of blog posts in 2009 under Ἡλληνιστεύκοντος (read them backwards), trying to deal with this question with a fixed(ish) corpus, that I was responsible for lemmatising: the TLG. It has a whole lot about the distinction between word tokens (individual instances of words), wordforms, and lemmata (dictionary words).
It starts with several posts about how pointless this question is. Which noone seems to pay attention to.
The count of lemmata for the Corpus in the TLG (ancient and mediaeval literature) plus PHI (inscriptions) was 214,000 in 2009. By the time I was terminated from the TLG in 2016, I had gotten recognition up to 240,000 lemmata.
For the strictly classical corpus, up to the 4th century BC, it was 66,000.
If we add Modern Greek and Modern Greek dialect, it’ll be more. I’ve seen a guess by Christophoros Charalambakis, director of the Historical Dictionary of Modern Greek (dialect dictionary) at the Academy of Athens, of 600,000. I think that’s implausible. Given Zipf, I think 350,000 to 400,000 for all periods of Greek is plausible.
OED has something like 600,000 for English.