You need to download the MivCorpus metadata (mivcorpus.tar.gz or mivcorpus.zip) first.
The 'webpages' folder (1.6GB web corpus containing English and Japanese pages) : webpages.tar.gz (354MB)