Updating the Search Function of my Website.

https://drwho.virtadpt.net/archive/2021-08-26/updating-the-search-function-of-my-website/

Not too long ago I got fed up with how good a job Duckduckgo's site search feature wasn't doing. No matter what I did I couldn't find dick around here. And, folksonomies being what they are, unless you plan them (and then they won't be folksonomies) you probably won't remember what tags you used. It's frustrating to get get lost in what amounts to your own house. So, one night I got well and fed up and decided to put some of my spare computing power to use. I did a walk-around of my exocortex and figured out that Jackpoint* had some RAM and a core free. So... off to one of my favorite pieces of software.

Installing YaCy is pretty easy if you read the directions (and even if you've done it a few times it's still a good idea). So I installed a headless JDK (sudo apt-get install openjdk-8-jdk-headless) on Jackpoint and then a clone of the source code for YaCy. I then had to configure nginx to proxy YaCy and cache the static HTML stuff. For the sake of completeness because this is a fairly common thing people ask about, here are the relevant parts from the file /etc/nginx/sites-enabled/heterochromia.virtadpt.net on Jackpoint:

        location / {
                proxy_http_version 1.1;
                proxy_buffering off;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Proxy "";
                proxy_pass http://127.0.0.1:8090/;
                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                client_max_body_size       10m;
                client_body_buffer_size    128k;
                proxy_connect_timeout      10s;
                proxy_send_timeout         10s;
                proxy_read_timeout         10s;
                proxy_buffer_size          4k;
                proxy_buffers              4 32k;
                proxy_busy_buffers_size    64k;
                proxy_temp_file_write_size 64k;
        }
        location /env {
            proxy_pass http://127.0.0.1:8090/env;
        }

Once that was done it was easy to set up a periodic indexing run of my website:

  • YaCy Administration -> Load Web Pages, Crawler
  • Site: https://drwho.virtadpt.net/
  • Start New Crawl
  • Process Scheduler -> crawl start for https://drwho.virtadpt.net/
  • Event Trigger -> no event
  • Scheduler -> 7 days
  • Execute Selected Actions
  • Index Export/Import -> RSS Feed Importer
  • URL of the RSS Feed: https://drwho.virtadpt.net/rss/feed.xml
  • Show RSS Items
  • Indexing -> scheduled -> repeat the feed loading every 7 days automatically
  • Add All Items to Index (full content of URL)

However, there were two problems to solve: YaCy, being a search engine, spiders not just my website in this case but also every link leading away from my website. This meant that it was more likely to return hits for stuff I linked to and not hits on my website (which was the whole point). The other problem was modifying my website's theme such that there was a YaCy search box instead of a DDG search box. But let's tackle one problem at a time.

It took some tinkering before I figured out the first problem. The solution that worked was the following process:

  • YaCy Administration -> Ranking and Heuristics
  • Filter Query -> fq= host_s:drwho.virtadpt.net
  • Set Filter Query

I also tweaked some of the settings in the Solr Boosts part of the same page to get more specific and accurate search hits (explanations of what these settings mean are right next to them so refer to your own YaCy server for details):

  • sku: 1.25
  • title: 3.0
  • host_s: 6.0
  • dates_in_content_dts: 1.0
  • description_txt: 2.0
  • keywords: 3.0
  • text_t: 8.0
  • synonyms_ext: 1.0
  • url_file_name_s: 1.0
  • url_file_name_tokens_t: 4.0
  • url_paths_sxt: 3.0
  • click Set Field Boosts

After some experimentation I settled on the above settings. That brought me right along to the second problem: Integration. In the YaCy Administration panel scroll down to Search Portal Integration -> Portal Configuration -> Search Box Anywhere and you'll see some boilerplate HTML generated by YaCy:

<form method="get" accept-charset="UTF-8"
    action="http://heterochromia.virtadpt.net/yacysearch.html">
  <div style="text-align:center; padding:5px; background-color:#eeeeee;
    border:1px solid #cccccc; -webkit-border-radius:5px;
    -moz-border-radius:5px; border-radius:5px; display:block; float:left;
    margin-right:5px;">
    <div style="font-family:Arial,Helvetica,sans-serif; font-size:16px;
    display:block; float:left; padding-top:3px; padding-right:5px;">
    MySearch
    </div>
    <input type="text" name="query" value="" maxlength="80" 
           style="width:300px; font-size:16px; float:left;" />
    <input type="hidden" name="verify" value="cacheonly" />
    <input type="hidden" name="maximumRecords" value="10" />
    <input type="hidden" name="meanCount" value="5" />
    <input type="hidden" name="resource" value="local" />
    <input type="hidden" name="urlmaskfilter" value=".*" />
    <input type="hidden" name="prefermaskfilter" value="" />
    <input type="hidden" name="display" value="2" />
    <input type="hidden" name="nav" value="all" />
    <div style="font-size:18px; display:block; float:right; padding-top:1px;">
      <input type="submit" name="Enter" value="Search" />
    </div>
  </div>
  <p style="clear:both;"></p>
</form>

Nice, but not really what I was after. So, I spliced the code into the local copy of my website theme anyway and set about tinkering with the bits and pieces of the HTML form parts, using the local test server (make serve) to troubleshoot. The process required editing the base.html template file because every page on my site is ultimately built on top of that file. After a lot of tinkering, site rebuilding, and page refreshes I finally settled on the following chunk of HTML:

<!-- Search -->
<section class="box search">
<form method="get" accept-charset="UTF-8" target="_blank"
    action="https://heterochromia.virtadpt.net/yacysearch.html">
<div style="padding: 0; -webkit-border-radius: 0.2em;
    -moz-border-radius: 0.2em; border-radius: 0.2em; margin: 0;">
<input type="text" name="query" value="" maxlength="80"
    style="width:180px; " />
    <input type="hidden" name="verify" value="cacheonly" />
    <input type="hidden" name="maximumRecords" value="20" />
    <input type="hidden" name="meanCount" value="5" />
    <input type="hidden" name="resource" value="local" />
    <input type="hidden" name="urlmaskfilter" value=".*" />
    <input type="hidden" name="prefermaskfilter" value="" />
    <input type="hidden" name="display" value="2" />
    <input type="hidden" name="nav" value="all" />
</div>
</form>
</section>

And much to my surprise, it worked. If you look at the search bar on my website you can search my website and the search results will open in a new browser tab. Or you can go right to my search engine directly at https://heterochromia.virtadpt.net/

Happy hacking!

* Why did I name this server Jackpoint? When I originally set up this virtual machine as a Prosody server, it's the only hostname that came to mind because I was in kind of a hurry.

Pontification on the guy who stole a bag full of stuff.

https://drwho.virtadpt.net/archive/2021-08-19/big-scale-shoplifting-pontification/

You might have seen on the news a couple of weeks ago a video of a guy on a bike sweeping a bunch of stuff off of a shelf into a garbage bag (local copy) (video.hackers.town) and exiting the Walgreens with alacrity on a bicycle. Unsurprisingly, there was a brief wave of outrage, jokes in questionable taste, hellthreads on Nextdoor, and a run on strings of pearls to clutch. Rather than join in those particular fun and games it reminded me of something I saw in the Before Times while out and about.

Please note that the two things may not be connected at all. I tagged this article 'pontification' for a reason.

When it was still possible to do so (least of all because all the coffee shops I used to visit went out of business during the first quarantine in 2020.ev), I would wander around the Bay Area on the weekends and occasionally stop here or there to pick up some stuff I needed. On one particular afternoon I paid Walgreen's a visit. While scanning the shelves for what I needed (because every store is laid out a little differently) I heard a quiet snapping sound next to me. Having survived the USian public school system I froze, tilted my head a bit, and looked out of the corner of my eye at the source of the sound.

The source of the sound was a taller guy that I've seen at one of the homeless camps in the area before. He had a gym bag slung over his shoulder, partially unzipped, and had a pair of what looked like pruning shears in one hand. The snapping sound was the guy in question cutting notches in the packaging so that they could be removed from the security shelf (it is not uncommon in the Bay Area for essentials like toothbrushes, deoderant, and soap to be kept under lock and key in stores) and slipping them into the gym bag. It wasn't the first time I'd seen something like this, but what struck me was the stuff he was shoplifting.

Band-aids. Antifungal cream. Tylenol. Dental floss. Toothbrushes.

I didn't openly stare, nor did I say anything (see also, veteran of the USian public school system) but I did discreetly follow him down the aisle as he picked out what seemed most useful to folks living in an encampment. Basic toiletries and first aid supplies that we take for granted but aren't common if you're homeless. He moved with purpose, knowing what he wanted, where it could be found, and why he wanted it. After he'd acquired what he came to the store to get he walked out the front door like nothing at all had happened and vanished down the street. Judging by what he grabbed I'm fairly sure that it wasn't to resell (a phenomenon the existence of which I am somewhat skeptical). He was looking for things that people needed at that moment for specific purposes.

I don't know what else I can reasonably write about this. I don't have anything else to say. I don't know the guy and haven't seen him him in quite some time. I don't know his situation. I don't have any solutions for getting homeless folks permanent homes, none that others haven't already discussed to the ends of the Earth, anyway. I'm just some schmuck that saw something interesting and connected two dots, and maybe expanded the perspective of one or two folks. Just... be good to each other, okay?