Opinions expressed here belong to your're mom
Oh, hello. THIS IS A WEBSITE.
I built out this website so that I could have a platform to express myself which does not fall under the purview of any of the public-square arbiters that have come to dominate our society. If I were to use Twitter, people without Twitter accounts would not be able to see what I post. If I were to use Facebook, people without Facebook accounts would not be able to see what I post. If I were to use LinkedIn, people without LinkedIn accounts would not be able to see what I post. LinkedIn and Facebook have been walled gardens for as long as I can remember, but Twitter's move to requiring a login to browse is recent. This presents a problem for me. I do technically have login credentials for each of these sites, but my web browser doesn't store any session data between launches, so simply reading a post on one of these sites requires that I:
This whole ordeal takes a bit of time and normally the information on social media isn't of high quality. Furthermore, social media is a psychic poison which rots your brain and your soul. If I were to use one of these platforms, I would be enticing anyone who wants to see my rambles (maybe like 5 dudes and 1 special agent) to subject themselves to the spiritual torture of social media and any security measures that they have to go through. It would be unethical for me to require of others what I am not willing to do myself.
"Why bother putting your thoughts out there" is not a terrible question. The vast majority of people are not going to care about anything that I have to write. Those who do care about something that I write won't care about the vast majority of stuff that I do write.
Maybe someone finds something useful. Maybe I get some catharsis from writing.
I made this website all on my own. In the distant past, this same URL
would reach a blog that I ran which was a Jekyll site. This worked fine for me
but I had to manage a Jekyll installation, which required
gem
and any other related nonsense. I liked that it was a
static site generator and didn't rely on a beefy webserver, since it was
running on the same cheap VPS that it is running on now. However, I have
grown to dislike Ruby (my main gripe is that the syntax is visually
unappealing). When exploring the idea of recreating a blog, I looked at
a few different options for static site generation:
However, all four of these options presented two main problems:
The learning of another markup language is really the biggest problem here. I am already intimately familiar with DokuWiki's markup language since I use it every single day both at home and at work. I use a self-hosted DokuWiki instance to keep track of everything from recipes to server deployment procedures. DokuWiki has its own markup language and I don't want to go learn MarkDown or whatever other option these static site generators require. I just want to use DokuWiki markup, since most of what will become my early blogposts are already in that format.
Pandoc can convert from DokuWiki to
HTML. This means that all I have to do is figure out how to make HTML
look pretty. I don't need a flashy fancy shiny site. I wanted it as
plain as possible. The site should have no JS, no external sources
embedded into the page, and it should look good on every screen. This is
actually really easy to do in plain HTML with a little bit of CSS. I
wrote a template file by hand and split it into a top.html
and bottom.html
, then I wrote a Bash script to handle the
conversion and everything. Here's the whole script:
#!/bin/bash
# This script generates my website based off of files in this directory
DW_FILES_ROOT="txt_root/"
HTML_FILES_ROOT="html_root/"
TEMPLATE_DIR="template/"
# Sanity check cwd
if [ ! -d ${DW_FILES_ROOT} ]; then
echo "you're in the wrong directory bro"
exit 1
fi
# generate arrays
dw_files=( $(find ${DW_FILES_ROOT} -type f -iname '*.txt') )
html_files=( $(find ${HTML_FILES_ROOT} -type f -iname '*.html') )
dirs=( $(find ${DW_FILES_ROOT} -mindepth 1 -type d) )
# make necessary directories
for dir in ${dirs[@]}; do
html_dir="${HTML_FILES_ROOT}$(echo ${dir} | cut -d '/' -f 2-)"
if [ ! -d ${html_dir} ]; then
mkdir -p ${html_dir}
echo "Making directory ${html_dir}"
fi
done
# generate html files
for file in ${dw_files[@]}; do
html_dest=${HTML_FILES_ROOT}$(echo ${file} | cut -d '/' -f 2- | sed 's/\.txt/\.html/')
if [ ! -z "${1}" ] || [ ! -f ${html_dest} ] || [ ${file} -nt ${html_dest} ]; then
echo "Generating ${html_dest} from ${file} and templates"
cat ${TEMPLATE_DIR}/top.html > ${html_dest}
pandoc -f dokuwiki -t html ${file} >> ${html_dest}
cat ${TEMPLATE_DIR}/bottom.html >> ${html_dest}
fi
done
# delete html files that no longer have a txt file as their parent
for file in ${html_files[@]}; do
dw_source=${DW_FILES_ROOT}$(echo ${file} | cut -d '/' -f 2- | sed 's/\.html/\.txt/')
if [ ! -f ${dw_source} ]; then
rm -v ${file}
fi
done
# push up to webserver
rsync --delete -zav ${HTML_FILES_ROOT} username@server_address:/var/www
This script requires that I be in the directory where the website articles are written. That is fine because I always am. From this point, the entire rest of the server setup was just Nginx, which I already had running on a VPS and needed only a little bit of configuration for the particular file structure that I've built out here.
You see those little badges on the bottom of this page? I love them. I love websites that have them. I made all of the ones on my site by hand. Figuring out how to do that in Gimp took longer than it took to write the page-generation script. Since they are so small, I just host them directly here on the webserver. However, if I want to host larger images or video files or any other kind of sizeable download, I will need to use something else. This VPS has a small disk and it is slow and far away. I'd want heavy content to load faster. I'm not sure what the best solution to that problem is yet, but I'm sure I'll figure it out. I don't really want to rely on a fancy CDN, because that goes against the "no external requests" part of the goal of this website.
At the time of writing this post, I have no RSS feed to speak of. I would like one though, I feel like they are an essential part of a blog. I'll probably have to hand-write the actual tool that will generate this feed.
I have some goals for what this site will actually host. I want a central place to put:
I have replaced the original Bash script with a longer and more complicated Python script. The new script is more extesnible for future modifications (I think) and it generates a valid RSS feed. This script was also the first time that I used Python classes while writing a script. I had never used them before but found them fun to work with. I think that they were a good choice of tool to use for webpages, since each webpage could get its own object. The whole new script can be found here:
#!/usr/bin/env python3
import sys
import os
import re
import subprocess
import datetime
class Webpage:
site_name = "Ivory and Teal"
page_txt_extension = ".txt"
page_html_extension = ".html"
webpagedict = {}
templatefiles = {}
def __init__(self, page_filename):
self.txt_content = None
self.html_content = None
self.local_txt_path = None
self.local_html_path= None
self.title = None
Webpage.webpagedict[page_filename] = self
def generate(self):
print(" GENERATING PAGE")
with open(self.templatefiles['html_top']) as f: html_top = f.read().replace('_PAGETITLE', self.title)
with open(self.templatefiles['html_bottom']) as f: html_bottom = f.read()
with open(self.local_html_path, 'w') as dest_file:
dest_file.write(html_top)
dest_file.write('\n')
dest_file.write(self.html_content)
dest_file.write('\n')
dest_file.write(html_bottom)
def find_files(path, regex):
'''Takes in a path and a regex. Searches the path for files matching the regex, then returns a list of those files. Does not match directories'''
matching_files = []
for root, dirs, files in os.walk(path):
file_regex = re.compile(regex)
for file in files:
file_path = root + '/' + file
if re.match(file_regex, file_path):
matching_files.append(file_path)
return matching_files
def upload_everything(website_source_root):
# This puts stdout on my terminal which is what I want
subprocess.run(['rsync', '--delete', '-zrv', website_source_root + '/html_root/', 'cxe@punkto.org:/var/www'])
def make_xml(website_source_root):
print(" GENERATING RSS")
posts = []
# Actual time of day doesn't really matter. If your RSS reader uses this as critical information to determine if there is new content, then your RSS reader sucks.
todays_rfc_822 = datetime.date.today().strftime("%a, %d %b %Y") + " 00:00:00 PST"
blog_post_line_regex = re.compile('^ \* ..., [0-9]{2} ... 20[0-9]{2}')
with open(Webpage.templatefiles['xml_top']) as f: xml_top = f.read().replace('_SITENAME', Webpage.site_name).replace('_RFC822_FORMAT_DATE', todays_rfc_822)
with open(Webpage.templatefiles['xml_middle']) as f: xml_middle = f.read()
with open(Webpage.templatefiles['xml_bottom']) as f: xml_bottom = f.read()
hopefully_blog_index = website_source_root + "/txt_root/blog.txt"
if os.path.exists(hopefully_blog_index):
# hand-typed RFC 822 compliant dates so that I can choose what shows up as the publish date in RSS feeds
blog_index = open(hopefully_blog_index).read()
with open(hopefully_blog_index) as blog_index:
for line in blog_index:
if re.match(blog_post_line_regex, line):
# Figure out all the variable XML data
rfc_date = ' '.join(line.split(' ')[3:7]) + " 00:00:00 PST"
filename = re.split(':|\|', line)[1]
link = "https://punkto.org/blog/" + filename
content = Webpage.webpagedict[filename].html_content.replace('src="/', 'src="https://punkto.org/').replace('href="/', 'href="https://punkto.org/')
title = Webpage.webpagedict[filename].title
post_xml = xml_middle.replace('_TITLE', title).replace('_PAGENAME', filename).replace('_LINK', link).replace('_CONTENT', content).replace('_PUBDATE', rfc_date)
posts.append(post_xml)
rss_feed_path = website_source_root + "/html_root/blog.xml"
with open(rss_feed_path, 'w') as dest_file:
dest_file.write(xml_top)
dest_file.write('\n')
for post in posts:
dest_file.write(post)
dest_file.write('\n')
dest_file.write(xml_bottom)
def main():
# get the where the script lives
website_source_root = os.path.dirname(os.path.abspath(sys.argv[0]))
Webpage.templatefiles = {
'html_top': website_source_root + '/template/top.html',
'html_bottom': website_source_root + '/template/bottom.html',
'xml_top': website_source_root + '/template/top.xml',
'xml_middle': website_source_root + '/template/middle.xml',
'xml_bottom': website_source_root + '/template/bottom.xml'
}
# Don't make any non-site files that end in txt. No README.txt that is in MarkDown format. It won't come out looking very pretty.
# If you name a file something.html and it is not an html file then don't get mad when this script messes up.
# Filenames must be unique. No having multiple index.html pages. This is a limitation that I don't care about. If I eventually care about it then I will change it
for file in find_files(website_source_root, '^.*/(txt|html)_root/.*\.(txt|html)$'):
pagename = file.replace('.','/').split('/')[-2]
# if this page doesn't already have an object, create it
if pagename not in Webpage.webpagedict:
Webpage(pagename)
if file.endswith('.txt'):
# File paths
Webpage.webpagedict[pagename].local_txt_path = file
Webpage.webpagedict[pagename].local_html_path = '/'.join(file.split('/')[0:-1]).replace('txt_root', 'html_root') + '/' + pagename + Webpage.page_html_extension
# Page TXT and HTML content
with open(file) as f: Webpage.webpagedict[pagename].txt_content = f.read()
# Use pandoc
Webpage.webpagedict[pagename].html_content = str(subprocess.run(['/usr/bin/pandoc', '-f', 'dokuwiki', '-t', 'html'], input=bytes(Webpage.webpagedict[pagename].txt_content, 'utf-8'), capture_output=True).stdout, encoding='utf-8')
self_title = Webpage.webpagedict[pagename].txt_content.split('======')[1].strip()
if self_title != Webpage.site_name:
Webpage.webpagedict[pagename].title = Webpage.site_name + " | " + self_title
else:
Webpage.webpagedict[pagename].title = Webpage.site_name
if file.endswith('.html'):
# File paths
Webpage.webpagedict[pagename].local_html_path = file
Webpage.webpagedict[pagename].local_txt_path = '/'.join(file.split('/')[0:-1]).replace('html_root', 'txt_root') + '/' + pagename + Webpage.page_txt_extension
make_rss = False
upload = False
hopefully_blog_index = website_source_root + "/txt_root/blog.txt"
for name, wp in Webpage.webpagedict.items():
# generation of html items
print(name)
# Delete TXT file
if os.path.isfile(wp.local_html_path) and not os.path.isfile(wp.local_txt_path):
print(" DELETING")
upload = True
os.remove(wp.local_html_path)
# Write HTML to the file
elif os.path.isfile(wp.local_txt_path) and not os.path.isfile(wp.local_html_path):
make_rss = True
upload = True
wp.generate()
elif os.path.getmtime(wp.local_txt_path) > os.path.getmtime(wp.local_html_path):
make_rss = True
upload = True
wp.generate()
elif os.path.getmtime(wp.templatefiles['html_top']) > os.path.getmtime(wp.local_html_path):
upload = True
wp.generate()
elif os.path.getmtime(wp.templatefiles['html_bottom']) > os.path.getmtime(wp.local_html_path):
upload = True
wp.generate()
elif os.path.getmtime(wp.templatefiles['xml_bottom']) > os.path.getmtime(hopefully_blog_index):
upload = True
make_rss = True
elif os.path.getmtime(wp.templatefiles['xml_middle']) > os.path.getmtime(hopefully_blog_index):
upload = True
make_rss = True
elif os.path.getmtime(wp.templatefiles['xml_top']) > os.path.getmtime(hopefully_blog_index):
upload = True
make_rss = True
if make_rss:
make_xml(website_source_root)
if upload:
upload_everything(website_source_root)
if __name__ == "__main__":
main()