rmed

blog

Migrate Isso comments to Remark42

2021-01-12 13:53

I recently changed the comment system of the website from Isso to Remark42. Although Remark42 can directly import comments from Disqus and Wordpress, there is no direct approach to migrating comments from an Isso instance, and it would be a shame to lose all of the existing comments. That is why I decided to dig into the documentation of Remark42 and its source code to figure out a way to perform the migration myself.

Before starting, a couple of clarifications:

Firstly, there is nothing wrong with Isso. It is a very nice piece of software and I have been using it for a long time without any problems. However, lately I had been noticing that some comments were not being rendered correctly (or at least not as I would expect them to render) and thought that it may be positive to try out a different comment engine. After looking at several options I opted for Remark42 because it offers several functionalities that I was already using in Isso:

  • Self-hosted
  • Notifications on new comments
  • Anonymous comments
  • Markdown support
  • Code highlighting
  • Reply subscription

Secondly, I'm not using Remark42 inside a Docker container. Instead, I have downloaded the binary and run it as a service, loading the configuration from a file. That means that if I write any remark42.linux-amd64 command in the post, I assume that the configuration has already been loaded beforehand, for simplicity.

Native migration options

As I mentioned previously, Remark42 natively supports migrating from Disqus and Wordpress. Taking this into account, my first thought was to generate an XML document with the comment data as Disqus would generate it by following the schema. I wrote a small script that extracted the comments from the Isso SQLite database and rendered the XML document using Jinja2.

I then tried to import the XML with the following command:

$ remark42.linux-amd64 import -f isso-import.xml -s mysite

Although this worked perfectly and Remark42 was correctly populated, I noticed a couple of issues:

  1. After importing the data, Remark42 wouldn't retain the relation between comments (original comment, reply, etc.)
  2. Remark42 considered all users to be Disqus users regardless of me identifying them as anonymous users

After playing around with Remark42 in my testing environment, I found out that users are identified using the following pattern: <PREFIX>_<HASH>, where <PREFIX> corresponds to the authentication method and <HASH> is a SHA-1 of the user identifier (which varies depending on the authentication method). In this case, users were identified as disqus_<HASH>, which was far from ideal.

Browsing the source code, I found the module in charge of importing Disqus comments, and the particular line that handled creation of users:

User: store.User{
    ID:   "disqus_" + store.EncodeID(comment.AuthorUserName),
    Name: comment.AuthorName,
    IP:   comment.IP,
},

Apparently, the same thing happens when importing comments from Wordpress, although with the wordpress_ prefix, so it wouldn't be possible for me to import comments using the Wordpress format either.

Time for plan B.

Solution: fake backup

One nice feature offered by Remark42 is the fact that it performs periodic backups of the comments. According to the README:

Backup file is a text file with all exported comments separated by EOL. Each backup record is a valid json with all key/value unmarshaled from Comment struct

If that is the case, it should be possible to convert the data from Isso to this backup format and import it normally using the restore command. I tried to perform a manual backup of the testing site and the resulting file (after decompressing) was as follows:

{"version":1,"users":[],"posts":[]}
{"id":"1","pid":"","text":"TEST","user":{"name":"test","id":"anonymous_4e1243bd22c66e76c2ba9eddc1f91394e57f9f83","picture":"","ip":"","admin":false},"locator":{"site":"mysite",url":"http://127.0.0.1"},"score":0,"vote":0,"time":"2021-01-12T12:00:35Z","title":"Test"}

That should be very easy to write using a script! Just need to understand the Comment struct definition:

type Comment struct {
    ID        string          `json:"id"`      // comment ID, read only
    ParentID  string          `json:"pid"`     // parent ID
    Text      string          `json:"text"`    // comment text, after md processing
    Orig      string          `json:"orig"`    // original comment text
    User      User            `json:"user"`    // user info, read only
    Locator   Locator         `json:"locator"` // post locator
    Score     int             `json:"score"`   // comment score, read only
    Vote      int             `json:"vote"`    // vote for the current user, -1/1/0.
    Controversy float64       `json:"controversy,omitempty"` // comment controversy, read only
    Timestamp time.Time       `json:"time"`    // time stamp, read only
    Edit      *Edit           `json:"edit,omitempty" bson:"edit,omitempty"` // pointer to have empty default in json response
    Pin       bool            `json:"pin"`     // pinned status, read only
    Delete    bool            `json:"delete"`  // delete status, read only
    PostTitle string          `json:"title"`   // post title
}

type Locator struct {
    SiteID string `json:"site"`     // site id
    URL    string `json:"url"`      // post url
}

type Edit struct {
  Timestamp time.Time `json:"time" bson:"time"`
  Summary   string    `json:"summary"`
}

The data structures for Isso, on the other hand, are as follows in the SQLite database:

  • threads table:
    • id
    • uri
    • title
  • comments table:
    • tid: Thread ID
    • id
    • parent: Comment ID when replying
    • created
    • modified
    • mode
    • remote_addr
    • text: Plaintext (without markdown rendering)
    • author
    • email
    • website
    • likes
    • dislikes
    • voters
    • notification

The script

Now, I'm not going to attempt a 1-to-1 migration of the data. Instead, I'm only going to export the data I consider most relevant. For comments, that means:

  • tid
  • id
  • parent
  • created
  • text
  • author
  • email

The following script will parse the data from the isso_comments.db file in the same directory and write the backup file for Remark42. As I prefer pre-rendering the markdown here, the following requirement must be installed before executing the script (Python 3):

$ pip3 install markdown

Below you can find the complete script:

# -*- coding: utf-8 -*-

import hashlib
import json
import sqlite3

from datetime import datetime
from markdown import markdown

# Change to admin email
ADMIN_USER = 'set_to_email'


def main():
    conn = sqlite3.connect('isso_comments.db')

    threads = {}
    comments = []

    cursor = conn.cursor()

    # Threads
    cursor.execute('SELECT * FROM threads')

    for tid, uri, title in cursor:
        threads[tid] = {
            'uri': uri,
            'title': title
        }

    # Extract comments
    cursor.execute('SELECT tid, id, parent, created, text, author, email FROM comments')

    for tid, id, parent, created, text, author, email in cursor:
        comments.append({
            'thread': tid,
            'id': id,
            'parent': parent,
            'created': datetime.fromtimestamp(created).strftime('%Y-%m-%dT%H:%M:%SZ%z'),
            'text': text,
            'author': author,
            'email': email
        })

    # Convert threads
    with open('output', 'w') as f:
        f.write('{"version":1,"users":[],"posts":[]}\n')

        for comment in comments:
            # Prepare user
            author = comment['author'] or 'Anonymous'

            user = {
                'name': author,
                'picture': '',
                'ip': '',
                'admin': True if ADMIN_USER == comment['email'] else False
            }

            if comment['email']:
                user['id'] = 'email_{}'.format(hashlib.sha1(comment['email'].encode()).hexdigest()),

            else:
                user['id'] = 'anonymous_{}'.format(hashlib.sha1(author.encode()).hexdigest()),

            item = {
                'id': str(comment['id']),
                'pid': str(comment['parent']) if comment['parent'] else '',
                'text': markdown(comment['text'], extensions=['nl2br', 'extra', 'codehilite']),
                'user': user,
                'locator': {
                    'site': 'mysite',
                    'url': threads[comment['thread']]['uri'],
                },
                'score': 0,
                'vote': 0,
                'time': comment['created'],
                'title': threads[comment['thread']]['title']
            }

            f.write(json.dumps(item)+'\n')


if __name__ == '__main__':
    main()

Note that the ADMIN_USER variable can be set to the original admin email in Isso to also set the admin flag in Remark42.

After execution, the output file should have been created. Given that Remark42 expects the backup to be gzipped simply run:

$ gzip output

And then restore the backup (note that this will overwrite any existing data):

$ remark42.linux-amd64 restore -s mysite -p BACKUP_DIR -f output.gz