Previous Step: Despam

Now that you have all your revisions exported, exploded, and despammed, it's time to combine them back into one XML file and import them into Git.

Combine Edits into One XML File

Now, we need to gather the revision files back into a single XML file.

$ cd nodes
   $ ruby ../iki-gather-revs.rb > ../combined.xml
   $ cd ..

Right now the revisions are sorted by file. Sort them by date:

$ xsltproc iki-sort.xsl combined.xml > combined-sorted.xml

Sanity Check

Just for a sanity check, re-scatter the file into a new directory and verify that it's identical to the original.

$ mkdir nodes-new
   $ ruby iki-scatter-revs.rb combined-sorted.xml nodes-new
   $ diff -rub nodes nodes-new
   $ rm -rf nodes-new

The diff command should produce no output. If you renamed any nodes, however, the diff output will show that their <title>s will correctly have their new names.

Sort by Date

You want to make sure that each revision is fed to git in the order it was made:

$ xsltproc iki-sort.xsl combined.xml > sorted.xml

Fast-Load Into Git

iki-fast-load will create a new git repository if one doesn't already exist. It doesn't check for duplication so make sure not to load the same repo more than once!

$ mkdir new-repo
   $ ruby iki-fast-load.rb combined-sorted.xml newrepo

Now newrepo should contain your mediawiki history.

You probably will not actually want to create a new repo. If you've already set up ikiwiki, you should just clone the repo that ikiwiki created, fast-load into that, check to make sure everything looks OK, repack it, and push that up. Easy.

git clone ssh://webadmin@iki.u32.net/var/u32.net-iki/git-repo
   ruby iki-fast-load.rb Spec5-sorted.xml git-repo
   git push

The push uploads and renders the site. Couldn't be easier!

Namespace Warnings

The import script prints each page as it's processed. This is handy for initial runs to make sure that what's happening is exactly what you would expect, but the huge amount of output tends to mask warnings. I suggest doing the final import with stdout sent to /dev/null:

ruby iki-fast-load.rb Spec5-sorted.xml newrepo > /dev/null

Now you'll be able to see the namespace warnings. For instance, some of my pages had links like:

[[http://url]]

That was obviously mean to be this:

[http://url]

This will be flagged as an unknown "http:" namespace. You can either go back and fix this, or ignore it and fix it in ikiwiki.

Filename Conversions

Unfortunately, ikiwiki does not handle filenames with spaces in them (a strange limitation in this modern age). Therefore, the fast load script changes all the spaces into underscores.

If your commits aren't destined for Ikiwiki, you probably want to turn this misfeature off. Just comment out this line from the script:

title.gsub!(" ", "_")

set it to:

# title.gsub!(" ", "_")

Partial Loads

iki-fast-load works just fine incrementally. If you split your revisions into separate files by year (a good way to make the filesizes more manageable), you could do this:

ruby iki-fast-load.rb 2006.xml newrepo
   ruby iki-fast-load.rb 2007.xml newrepo

Just make sure to load all revisions chronologically, from oldest to newest.

Repack the Repo

Finally, repack your repo. TODO: you can probably skip this step.

NOTE: Er, despite the recommendations on git-fast-import's help, this doesn't seem to make any difference. Maybe that's because we do our entire import in a single g-f-i run? No idea. Run 'git help fast-import' and search on repack to read for yourself.

git repack -a -d
   git gc --prune  # make absolutely sure nobody else is using this repo!

Adding --window=50 might take longer, but it could also produce a smaller repo. If you can spare the CPU time, it's probably worth it. See the discussion on repacking in the help for 'git fast-import'.

Rebuild Ikiwiki

Correct Last Edited time

Notice that each file's Last Edited time has today's date, not the date that it was last edited in Mediawiki. This is bad because it makes the site look a whole lot fresher than it really is.

No problem, just pass --getctime the first time you rebuild your wiki. It takes longer but all your files then have the correct Last Edited date.

$ ikiwiki --verbose --rebuild --setup iki.setup --getctime

This doesn't set the correct creation date however. The new files will appear to be created on the same day they were last edited.

Correct Creation and Last Edited time

We need to get both the page creation AND the last modified date right. To do that, we need to build ikiwiki in two steps. First get rid of your Ikiwiki's .index.

mv .index /tmp

Set the mtime of every file in the repo to the file's creation date. Note: takes a long time to run, probably due to all the forking. If your site has tens of thousands of pages, you might want to convert this into a Perl script or C program.

$ git ls-tree -r -z --name-only HEAD | xargs -0 -n 1 -I {} sh -c \
      'touch --date="$(git log --reverse --pretty="format:%aD" HEAD -- {} | head -1)" {}'

That command just asks Git for a list of files in the repo and passes them to xargs. xargs then touches each file with the time returned by git log.

Now rebuild your wiki (without --getctime because getctime will set both the file's creation date and its modification date).

$ ikiwiki --verbose --rebuild --setup iki.setup

Now the creation date of each file in the wiki is correct. You can verify this by creating an "All Pages" page with the following content:

All Pages on this wiki in order from newest to oldest:
  [[!inline  pages="* and !*/Discussion" archive="yes"]]

But the modification time for each file is wrong. No problem. We just update the mtime on each file to the date of the last checkin.

$ git ls-tree -r -z --name-only HEAD | xargs -0 -n 1 -I {} sh -c \
      'touch --date="$(git log -1 --pretty="format:%aD" HEAD -- {})" {}'

And rebuild again.

$ ikiwiki --verbose --rebuild --setup iki.setup

This updates the Last Edited time but leaves the creation time alone. And now both dates in the index are correct!

All Done!

All your Mediawiki content is now stored in ikiwiki. If you haven't made edits on these pages telling how to do things easier or fixing bugs then please go back and help make it easier others!