I've spent more time than I care to admit loading, dumping,
reloading, tranforming, testing, reloading...various wikipedia
databases before settling on what I think the new format will
be, but I made a discovery along the way that might be useful:
The 05/20 database dump from wikipedia weighs in at close to
600 MB. It turns out that almost 200 MB of that is cache. In
the new system, I'll write a function specifically for doing
database dumps, but in the meantime I'd suggest that the next
time you dump a tarball, clear the cache first (and don't
forget to be careful of the timestamps when you do).
0