Hack of the day: downloading VOICEROID実況 from Nicovideo

実況? Is it something edible?

In recent times, I’ve been watching a lot of VOICEROID実況 (じっきょう, jikkyou, literally “commentary”) videos from the rather famous (in Japan) video service ニコニコ動画, better known as “Nicovideo”. In this case, the commentary actually refers to games: they’re basically a Japanese version of the Let’s Play videos that are all around other places like YouTube.

The difference from “regular” videos lies in the “VOICEROID” term: this is a name of a TTS software developed by AH Software using an engine devised by a company called AI Inc. The name is derived from the very famous VOCALOID singing software. Like in VOCALOID, many different voices have been created, each associated to a specific character.
This software is used to have these characters talk and provide commentary to the game being shown. Depending on the video and the uploader, these comments may range from comedy to more serious themes, and some authors even created stories featuring them in the game they are playing.

This in turns shapes the characters beyond the original designs by AHS into the realm of “secondary creations”, to use a term borrowed from Re:CREATORS. That’s what makes thse videos interesting for me (and in addition, it’s still a good way to keep my Japanese up to speed).

The problem

The video interface of Nicovideo sucks. Seriously. Up to recent times, it didn’t even offer 1080p, and most of the features (including advanced seeking, etc.) are locked beyond their premium account (which, however, grants access also to other bits like live events). In addition, when the website is under heavy traffic watching videos can be a true pain. Luckily, youtube-dl supports downloading from Nicovideo, barring some bugs.

This situation complicated recently, because Nicovideo became the target of a dDoS from outside Japan. Their response was to shut off access from outside Japan for a number of days. I could’ve just waited it out, but I wanted to work around the problem. So I started to what to think about it.

The implementation

The first ingredient in the recipe was getting a cheap VPS located in Japan. Linode did in their Tokyo 2 datacenter, so I signed up for their $5 offering. I didn’t need either processing power or storage: it would just exist as a “hop” to Nico. For the image, I chose openSUSE Leap 42.3, as I’m mostly familiar with the distribution. I installed a stock minimal install, but I used the distro-supplied kernel instead of Linode’s (there’s a reason, which I’ll show afterwards).

Then, I need some form of VPN to allow access from my home network. I thought about openVPN, but since I’ve been testing and using WireGuard with great satisfaction, I settled for that. WireGuard is much simpler to configure than openVPN, doesn’t require daemons, and routing uses the stock Linux tools like iproute2. It has also support for LEDE and OpenWRT, which meant I could hook it up in my Turris Omnia.

First of all, I added the relevant repositories:

# zypper ar -f obs://network:vpn vpn
# zypper in wireguard wireguard-tools

This installed both the tools (wg and wg-quick) and the kernel module required by WireGuard (that’s why I needed a stock distro kernel).

Then, I needed a firewall:

# zypper in firewalld
# systemctl start firewalld
# firewall-cmd --add-service=ssh
# firewall-cmd --zone=public --change-interface=eth0
# firewall-cmd --zone=public --change-interface=eth0 --permanent
# firewall-cmd --add-service=ssh --permanent
# firewall-cmd --zone=internal --add-masquerade
# firewall-cmd --zone=internal --add-masquerade --permanent

Afterwards, I had to configure WireGuard:

# mkdir /etc/wireguard
# chmod 0700 /etc/wireguard
# umask 002 # Don't make files group accessible
# wg genkey > /etc/wireguard/wg0.key # this generates a private key
# cat /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub

Then I edited /etc/wireguard/wg0.conf with the details of the interface:

[Interface]
PreUp = firewall-cmd --add-port=51820/udp
PostDown = firewall-cmd --remove-port=51820/udp
ListenPort = 51820
PrivateKey = <my private key>
Address = 10.67.53.10/32
MTU = 1500 # Different from default, see below

[Peer]
PublicKey = <my public key>
AllowedIPs = 10.67.53.0/24,192.168.35.0/24
Endpoint = <home address IP>:51820

“Allowed IPs” in WireGuard mean the destination IPs that are allowed through the tunnel (note that routing must be set separately, although wg-quick handles that for you).

Afterwards I had to tweak the firewall to ensure that:

  1. The wg0 interface was masqueraded (for packets coming from my own LAN)
  2. Packets could go from wg0 to eth0 and vice versa
  3. Apply MSS clamping

Some of the commands below may be redundant, but firewalld wasn’t really meant to be used like this (I removed the --permanent lines for brevity).

# firewall-cmd  --zone=internal --change-interface=wg0
# firewall-cmd --direct --passthrough ipv4 -t nat -A POSTROUTING -s  10.67.35.0/24 -o eth0 -j MASQUERADE
# firewall-cmd --direct --add-rule ipv4 filter FORWARD 0 -i wg0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter FORWARD 0 -i eth0 -o wg0 -m state --state RELATED,ESTABLISHED -j ACCEPT
# firewall-cmd --direct --passthrough ipv4 -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

Then, I brought the interface up:

# systemctl start wg-quick@wg0
# systemctl senable wg-quick@wg0

I set the MTU specifically to 1500, because lower values set by wg-quick would cause packet fragmentation and packets would go nowhere (I spent a lot of time with tcpdump before figuring it out).

On the Turris Omnia side, I had already WireGuard configured. It was just a matter of adding a few lines in /etc/config/network and restarting the network itself:

config wireguard_wg0
        option public_key '<public-key>'
        list allowed_ips '10.67.53.10/32'
        list allowed_ips '<nicovideo IP block>'
        option endpoint_host '<linode public IP>'
        option endpoint_port '51820'
        option persistent_keepalive '60'
        option route_allowed_ips '1'

While the dDoS was in effect, I routed data for Nicovideo through the VPN, thus bypassing the block. Now that it works so well, I might consider it expanding it to work around some programs (games) that reply on Japanese IPs, like Girls Trinary.

Admittedly, it wasn’t enough, even after the dDoS was over. Given that the VPS has a higher speed link than my own connection when it comes to Japan, why not leverage that?

To do so, I installed a couple more packages:

# zypper in rsync python3 python3-pip youtube-dl

The last package required enabling the Packman repository through YaST beforehand.

Then, I installed sarge, which wasn’t available in the distro, through pip:

# pip3 install sarge --prefix /usr/local

And then it was a matter of hacking around a “simple” script. This would fetch one or multiple video URLs (including Nicovideo’s “mylist”, similar to YT’s playlists), pass them through youtube-dl, then rsync them to the NAS I have at home (and deleting them afterwards). It makes use of youtube-dl’s “hooks” which are executed when a video has been downloaded.

The script is provided at the bottom of the post (BSD licensed). Note the total absence of error checking: it was a “hack” as the title of the post implies. It worked for me: it may or not may work for you. It might even kill every kitten in the world or bring the Great Old Ones to this planet. Exercise caution.

Afterwards, there was just the matter of filling in the Nicovideo download credentials, as login is required to view. To do I created a .netrc in the home directory of the download user:

machine niconico login <my login> password <my password

Set permissions to 0600, and it’s done (or できた! if I were to use Japanese).

Then I just need to invoke the script with one or more URLs and it will download and transfer things to my NAS. Magic!


#!/usr/bin/python3
# Copyright 2018 Luca Beltrame
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# 3. Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from this
# software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import argparse
import os
from pathlib import Path
import sys

import sarge
import youtube_dl


MY_NAS_IP = "127.0.0.1"

def download_hook(params):
    if params["status"] == "finished":
        destination = params["filename"]
        cmd = ("/usr/bin/rsync -aP "
            "{0} {1}:/home/storage/video/nico/")
        cmd = sarge.shell_format(cmd, destination, MY_NAS_IP)
        print(cmd)
        output = sarge.run(cmd)


def manual_rsync(filename):
    cmd = ("/usr/bin/rsync -aP --remove-source-files "
           "{0} {1}:/home/storage/video/nico/")
    cmd = sarge.shell_format(cmd, filename, MY_NAS_IP)
    print(cmd)
    output = sarge.run(cmd)


def file_downloaded(ydl, params):

    filename = ydl.prepare_filename(result)
    return Path(filename).exists()


def check_mylist(ydl, params):

    """Get all video files from a Nico's mylist, skipping
    already downloaded ones.
    """

    playlist_start = 1
    filenames = list()
    for entry in params["entries"]:
        filename = ydl.prepare_filename(entry)
        if Path(filename).exists():
            playlist_start += 1
            manual_rsync(filename)
            continue

        filenames.append(filename)

    return playlist_start, filenames


def main():
    youtube_params = {"usenetrc": True,
                      "progress_hooks": [download_hook, error_hook]}
    check_params = {"simulate": True,
                    "usenetrc": True, "quiet": True}

    parser = argparse.ArgumentParser()
    parser.add_argument("url", nargs="+")
    options = parser.parse_args()

    urls = options.url

    # Check filenames

    to_download = list()
    for url in urls:
        # Simulate download once to get metadata
        with youtube_dl.YoutubeDL(check_params) as ydl:
            result = ydl.extract_info(url)

            if "mylist" in url:
                playlist_start, to_download = check_mylist(ydl, result)

                if not to_download:
                    urls.remove(url)
                    continue
                # FIXME: Alters this globally for all downloads
                youtube_params["playliststart"] = playlist_start

            else:

                filename = ydl.prepare_filename(result)
                if Path(filename).exists():
                    urls.remove(url)
                    manual_rsync(filename)
                    continue

                to_download.append(filename)

    if not urls:
        return

    # Keep on retrying to work around youtube-dl's behavior with nico
    while True:
        try:
            with youtube_dl.YoutubeDL(youtube_params) as ydl:
                res = ydl.download(urls)
        except youtube_dl.utils.DownloadError:
            res = -1
            pass # DANGEROUS
        if res == 0:
            break

    for item in to_download:
        if Path(item).exists():
            Path(item).unlink()


if __name__ == "__main__":
    main()

Dialogue & Discussion