Opinions expressed here belong to your're mom


I Have A Website

Oh, hello. THIS IS A WEBSITE.

Platform

I built out this website so that I could have a platform to express myself which does not fall under the purview of any of the public-square arbiters that have come to dominate our society. If I were to use Twitter, people without Twitter accounts would not be able to see what I post. If I were to use Facebook, people without Facebook accounts would not be able to see what I post. If I were to use LinkedIn, people without LinkedIn accounts would not be able to see what I post. LinkedIn and Facebook have been walled gardens for as long as I can remember, but Twitter's move to requiring a login to browse is recent. This presents a problem for me. I do technically have login credentials for each of these sites, but my web browser doesn't store any session data between launches, so simply reading a post on one of these sites requires that I:

  1. Wait for Keepass to decrypt the database. This takes some time since I have the decryption time cranked up to the max
  2. Log into my email (also in Keepass) and get a 2FA code
    • Alternatively go find my phone, wait for it to turn on, decrypt it, and then wait for SMS to refresh
  3. Finally log into the website to see the information that I wanted

This whole ordeal takes a bit of time and normally the information on social media isn't of high quality. Furthermore, social media is a psychic poison which rots your brain and your soul. If I were to use one of these platforms, I would be enticing anyone who wants to see my rambles (maybe like 5 dudes and 1 special agent) to subject themselves to the spiritual torture of social media and any security measures that they have to go through. It would be unethical for me to require of others what I am not willing to do myself.

Why Even Bother

"Why bother putting your thoughts out there" is not a terrible question. The vast majority of people are not going to care about anything that I have to write. Those who do care about something that I write won't care about the vast majority of stuff that I do write.

Maybe someone finds something useful. Maybe I get some catharsis from writing.

Technology

Static Site Generators

I made this website all on my own. In the distant past, this same URL would reach a blog that I ran which was a Jekyll site. This worked fine for me but I had to manage a Jekyll installation, which required gem and any other related nonsense. I liked that it was a static site generator and didn't rely on a beefy webserver, since it was running on the same cheap VPS that it is running on now. However, I have grown to dislike Ruby (my main gripe is that the syntax is visually unappealing). When exploring the idea of recreating a blog, I looked at a few different options for static site generation:

However, all four of these options presented two main problems:

  1. Reliance on someone else's toolchain for creating and themeing my site.
  2. I have to learn another markup language.

The learning of another markup language is really the biggest problem here. I am already intimately familiar with DokuWiki's markup language since I use it every single day both at home and at work. I use a self-hosted DokuWiki instance to keep track of everything from recipes to server deployment procedures. DokuWiki has its own markup language and I don't want to go learn MarkDown or whatever other option these static site generators require. I just want to use DokuWiki markup, since most of what will become my early blogposts are already in that format.

Rolling My Own

DW -> HTML

Pandoc can convert from DokuWiki to HTML. This means that all I have to do is figure out how to make HTML look pretty. I don't need a flashy fancy shiny site. I wanted it as plain as possible. The site should have no JS, no external sources embedded into the page, and it should look good on every screen. This is actually really easy to do in plain HTML with a little bit of CSS. I wrote a template file by hand and split it into a top.html and bottom.html, then I wrote a Bash script to handle the conversion and everything. Here's the whole script:

#!/bin/bash

# This script generates my website based off of files in this directory
DW_FILES_ROOT="txt_root/"
HTML_FILES_ROOT="html_root/"
TEMPLATE_DIR="template/"

# Sanity check cwd
if [ ! -d ${DW_FILES_ROOT} ]; then
        echo "you're in the wrong directory bro"
        exit 1
fi

# generate arrays
dw_files=( $(find ${DW_FILES_ROOT} -type f -iname '*.txt') )
html_files=( $(find ${HTML_FILES_ROOT} -type f -iname '*.html') )
dirs=( $(find ${DW_FILES_ROOT} -mindepth 1 -type d) )

# make necessary directories
for dir in ${dirs[@]}; do
        html_dir="${HTML_FILES_ROOT}$(echo ${dir} | cut -d '/' -f 2-)"
        if [ ! -d ${html_dir} ]; then
                mkdir -p ${html_dir}
                echo "Making directory ${html_dir}"
        fi
done

# generate html files
for file in ${dw_files[@]}; do
        html_dest=${HTML_FILES_ROOT}$(echo ${file} | cut -d '/' -f 2- | sed 's/\.txt/\.html/')
        if  [ ! -z "${1}" ] || [ ! -f ${html_dest} ] || [ ${file} -nt ${html_dest} ]; then
                echo "Generating ${html_dest} from ${file} and templates"
                cat ${TEMPLATE_DIR}/top.html > ${html_dest}
                pandoc -f dokuwiki -t html ${file} >> ${html_dest}
                cat ${TEMPLATE_DIR}/bottom.html >> ${html_dest}
        fi
done

# delete html files that no longer have a txt file as their parent
for file in ${html_files[@]}; do
        dw_source=${DW_FILES_ROOT}$(echo ${file} | cut -d '/' -f 2- | sed 's/\.html/\.txt/')
        if [ ! -f ${dw_source} ]; then
                rm -v ${file}
        fi
done

# push up to webserver
rsync --delete -zav ${HTML_FILES_ROOT} username@server_address:/var/www

This script requires that I be in the directory where the website articles are written. That is fine because I always am. From this point, the entire rest of the server setup was just Nginx, which I already had running on a VPS and needed only a little bit of configuration for the particular file structure that I've built out here.

Images

You see those little badges on the bottom of this page? I love them. I love websites that have them. I made all of the ones on my site by hand. Figuring out how to do that in Gimp took longer than it took to write the page-generation script. Since they are so small, I just host them directly here on the webserver. However, if I want to host larger images or video files or any other kind of sizeable download, I will need to use something else. This VPS has a small disk and it is slow and far away. I'd want heavy content to load faster. I'm not sure what the best solution to that problem is yet, but I'm sure I'll figure it out. I don't really want to rely on a fancy CDN, because that goes against the "no external requests" part of the goal of this website.

RSS

At the time of writing this post, I have no RSS feed to speak of. I would like one though, I feel like they are an essential part of a blog. I'll probably have to hand-write the actual tool that will generate this feed.

Content

I have some goals for what this site will actually host. I want a central place to put:

9/5/23 Update

I have replaced the original Bash script with a longer and more complicated Python script. The new script is more extesnible for future modifications (I think) and it generates a valid RSS feed. This script was also the first time that I used Python classes while writing a script. I had never used them before but found them fun to work with. I think that they were a good choice of tool to use for webpages, since each webpage could get its own object. The whole new script can be found here:

#!/usr/bin/env python3

import sys
import os
import re
import subprocess
import datetime

class Webpage:
  site_name = "Ivory and Teal"
  page_txt_extension = ".txt"
  page_html_extension = ".html"
  webpagedict = {}
  templatefiles = {}

  def __init__(self, page_filename):
    self.txt_content = None
    self.html_content = None
    self.local_txt_path = None
    self.local_html_path= None
    self.title = None
    Webpage.webpagedict[page_filename] = self

  def generate(self):
    print("  GENERATING PAGE")
    with open(self.templatefiles['html_top']) as f: html_top = f.read().replace('_PAGETITLE', self.title)
    with open(self.templatefiles['html_bottom']) as f: html_bottom = f.read()
    with open(self.local_html_path, 'w') as dest_file:
      dest_file.write(html_top)
      dest_file.write('\n')
      dest_file.write(self.html_content)
      dest_file.write('\n')
      dest_file.write(html_bottom)

def find_files(path, regex):
  '''Takes in a path and a regex. Searches the path for files matching the regex, then returns a list of those files. Does not match directories'''
  matching_files = []
  for root, dirs, files in os.walk(path):
    file_regex = re.compile(regex)
    for file in files:
      file_path = root + '/' + file
      if re.match(file_regex, file_path):
        matching_files.append(file_path)
  return matching_files

def upload_everything(website_source_root):
  # This puts stdout on my terminal which is what I want
  subprocess.run(['rsync', '--delete', '-zrv', website_source_root + '/html_root/', 'cxe@punkto.org:/var/www'])

def make_xml(website_source_root):
  print("  GENERATING RSS")
  posts = []
  # Actual time of day doesn't really matter. If your RSS reader uses this as critical information to determine if there is new content, then your RSS reader sucks.
  todays_rfc_822 = datetime.date.today().strftime("%a, %d %b %Y") + " 00:00:00 PST"
  blog_post_line_regex = re.compile('^  \* ..., [0-9]{2} ... 20[0-9]{2}')
  with open(Webpage.templatefiles['xml_top']) as f: xml_top = f.read().replace('_SITENAME', Webpage.site_name).replace('_RFC822_FORMAT_DATE', todays_rfc_822)
  with open(Webpage.templatefiles['xml_middle']) as f: xml_middle = f.read()
  with open(Webpage.templatefiles['xml_bottom']) as f: xml_bottom = f.read()
  hopefully_blog_index = website_source_root + "/txt_root/blog.txt"
  if os.path.exists(hopefully_blog_index):
    # hand-typed RFC 822 compliant dates so that I can choose what shows up as the publish date in RSS feeds
    blog_index = open(hopefully_blog_index).read()
    with open(hopefully_blog_index) as blog_index:
      for line in blog_index:
        if re.match(blog_post_line_regex, line):
          # Figure out all the variable XML data
          rfc_date = ' '.join(line.split(' ')[3:7]) + " 00:00:00 PST"
          filename = re.split(':|\|', line)[1]
          link = "https://punkto.org/blog/" + filename
          content = Webpage.webpagedict[filename].html_content.replace('src="/', 'src="https://punkto.org/').replace('href="/', 'href="https://punkto.org/')
          title = Webpage.webpagedict[filename].title
          post_xml = xml_middle.replace('_TITLE', title).replace('_PAGENAME', filename).replace('_LINK', link).replace('_CONTENT', content).replace('_PUBDATE', rfc_date)
          posts.append(post_xml)
  rss_feed_path = website_source_root + "/html_root/blog.xml"
  with open(rss_feed_path, 'w') as dest_file:
    dest_file.write(xml_top)
    dest_file.write('\n')
    for post in posts:
      dest_file.write(post)
    dest_file.write('\n')
    dest_file.write(xml_bottom)

def main():
  # get the where the script lives
  website_source_root = os.path.dirname(os.path.abspath(sys.argv[0]))
  Webpage.templatefiles = {
      'html_top': website_source_root + '/template/top.html',
      'html_bottom': website_source_root + '/template/bottom.html',
      'xml_top': website_source_root + '/template/top.xml',
      'xml_middle': website_source_root + '/template/middle.xml',
      'xml_bottom': website_source_root + '/template/bottom.xml'
  }
  # Don't make any non-site files that end in txt. No README.txt that is in MarkDown format. It won't come out looking very pretty.
  # If you name a file something.html and it is not an html file then don't get mad when this script messes up.
  # Filenames must be unique. No having multiple index.html pages. This is a limitation that I don't care about. If I eventually care about it then I will change it
  for file in find_files(website_source_root, '^.*/(txt|html)_root/.*\.(txt|html)$'):
    pagename = file.replace('.','/').split('/')[-2]
    # if this page doesn't already have an object, create it
    if pagename not in Webpage.webpagedict:
      Webpage(pagename)
    if file.endswith('.txt'):
      # File paths
      Webpage.webpagedict[pagename].local_txt_path = file
      Webpage.webpagedict[pagename].local_html_path = '/'.join(file.split('/')[0:-1]).replace('txt_root', 'html_root') + '/' + pagename + Webpage.page_html_extension
      # Page TXT and HTML content
      with open(file) as f: Webpage.webpagedict[pagename].txt_content = f.read()
      # Use pandoc
      Webpage.webpagedict[pagename].html_content = str(subprocess.run(['/usr/bin/pandoc', '-f', 'dokuwiki', '-t', 'html'], input=bytes(Webpage.webpagedict[pagename].txt_content, 'utf-8'), capture_output=True).stdout, encoding='utf-8')
      self_title = Webpage.webpagedict[pagename].txt_content.split('======')[1].strip()
      if self_title != Webpage.site_name:
        Webpage.webpagedict[pagename].title = Webpage.site_name + " | " + self_title
      else:
        Webpage.webpagedict[pagename].title = Webpage.site_name
    if file.endswith('.html'):
      # File paths
      Webpage.webpagedict[pagename].local_html_path = file
      Webpage.webpagedict[pagename].local_txt_path = '/'.join(file.split('/')[0:-1]).replace('html_root', 'txt_root') + '/' + pagename + Webpage.page_txt_extension
  make_rss = False
  upload = False
  hopefully_blog_index = website_source_root + "/txt_root/blog.txt"
  for name, wp in Webpage.webpagedict.items():
    # generation of html items
    print(name)
    # Delete TXT file
    if os.path.isfile(wp.local_html_path) and not os.path.isfile(wp.local_txt_path):
      print("  DELETING")
      upload = True
      os.remove(wp.local_html_path)
    # Write HTML to the file
    elif os.path.isfile(wp.local_txt_path) and not os.path.isfile(wp.local_html_path):
      make_rss = True
      upload = True
      wp.generate()
    elif os.path.getmtime(wp.local_txt_path) > os.path.getmtime(wp.local_html_path):
      make_rss = True
      upload = True
      wp.generate()
    elif os.path.getmtime(wp.templatefiles['html_top']) > os.path.getmtime(wp.local_html_path):
      upload = True
      wp.generate()
    elif os.path.getmtime(wp.templatefiles['html_bottom']) > os.path.getmtime(wp.local_html_path):
      upload = True
      wp.generate()
    elif os.path.getmtime(wp.templatefiles['xml_bottom']) > os.path.getmtime(hopefully_blog_index):
      upload = True
      make_rss = True
    elif os.path.getmtime(wp.templatefiles['xml_middle']) > os.path.getmtime(hopefully_blog_index):
      upload = True
      make_rss = True
    elif os.path.getmtime(wp.templatefiles['xml_top']) > os.path.getmtime(hopefully_blog_index):
      upload = True
      make_rss = True
  if make_rss:
    make_xml(website_source_root)
  if upload:
    upload_everything(website_source_root)

if __name__ == "__main__":
  main()

This page is being served digitally. For a physical copy, print it off

Dancing Baby Seal Of Approval Webserver Callout My Text Editor Bash Scripting 4 Lyfe yarg Blog RSS Feed