Updates and some short-term planning for Pipette

The last week or so has been very busy for me. We got new floors installed in the apartment, and they are lovely!

The dog isn't so sure about the new floor yet

Pipette inside the HyperOS virtual machine on OS X

I managed to get pipette-hugo-worker to work inside HyperOS on OS X! It turns out that dat-container (and pipette-hugo-worker) needs the virtio-rnd kernel driver installed. Happily, Max Ogden made a new release of HyperOS that includes it, so pipette-hugo-worker now just works out-of-the-box!

HyperOS works great for demoing the worker for OS X users. However, right now, HyperOS just runs on a ramdisk, and it will lose its storage as soon as it is shut down, so it’s not really what you want to use for publishing. In the future, HyperOS will gain the ability to use permanent disk storage (using HyperDrive) – at that point, it will be very interesting!

Documentation

I created another Pipette-based blog for writing my “Drafts”. Netlify CMS has a feature for doing drafts using Git, but I didn’t bother to implement that on Dat – so for now, I think it’s easiest to just write my drafts on another Pipette blog.

I started to write a “How To Install pipette-hugo-worker on Ubuntu” guide, but I’ve been stumbling a bit trying to figure out how best to describe the various bits and pieces. There is definitely a need for a higher level introduction.

Pipette is meant to be a bunch of small modular pieces that can be easily configured to build a publishing workflow … but a lot of it isn’t done yet, and the audience I want to reach will not have literacy in the concepts that Pipette is built upon: peer-to-peer networking, the Dat Project or Beaker Browser.

I think the documentation is actually the greatest challenge.

Multi-Pipette and API

The next important step in the evolution of Pipette is to enhance the worker so that one virtual machine can act as the worker (the part that runs Hugo) for many, many different blogs.

Right now, it’s wasteful to dedicate a whole virtual machine to running the worker for a single blog. I imagine a typical blog only gets updated once a week, and running Hugo only takes a few seconds … the rest of the time, the virtual machine is idling.

I want to implement an API where somebody can ask me if my Hugo worker can also listen to changes to their Dat archive with their blog content, and my worker will generate a static blog Dat for them. This will make using Pipette really easy, as it will remove the need for prospective Pipette bloggers to have to install pipette-hugo-worker on their own hardware.

Of course, there are issues to consider when making an API … How do I pay for it if it becomes very popular? What if people try to use it to publish objectionable or illegal content? I think it shouldn’t be too hard to solve those problems given enough time. My guess is that the service won’t be too popular at the start, and that the initial batch of users will actually be nice people.

I want everything to be Open Source. Beyond that, I want everything to be forkable and cloneable. I’ll run the first instance of the API … but it will be built so that anybody can easily run their own service for themselves or for their friends.

Other parts of the puzzle

For the API to really work well, I’d like to wire up some additional machinery:

  • I’d like to package subscribed-hypercored using dat-container so I can easily provide extra peers for people who register with the API. I’ve found that having many peers helps things work a lot better with Dat … often I find that one peer can’t find the other peer, but if there’s an additional peer or two, then everything just works.
  • I have some code that listens to a Dat archive containing a list of Dat archive urls which represent a feed for hypercored. It also recognizes sub-feeds … which it then expands into an expanded list of Dat archive urls which I can use with a fleet of subscribed-hypercored servers. The code is sort of crappy, but it’s really neat to just write a Dat url to a file, and watch a bunch of peers appear out of the blue!
  • I need to build some monitoring/supervisor/logging infrastructure. I’ve been running hypercored on multiple machines for a long time, and it has bugs that need investigating … memory leaks, etc.
  • Updates to pipette-netlify-cms-beaker - after forking a copy of the CMS, and writing some content, there needs to be some code and UI which will rendez-vous with the API to ask for it to start publishing. Even better, it would be nice if the UI for the CMS would also show the live state of the worker and the final URL for the generated static site Dat archive.
  • Beyond just publishing to the Dat archive, I think the system will really start to appeal to potential bloggers when we can chain some additional publishing steps on the end … eg. setting up DNS and https endpoints.
  • I want to switch my own main blog over to use Pipette.

Money: Freelancing vs. Full-time

I’m super enthusiastic about what’s happening in the Dat ecosystem at this point in time. It should be fun to look back a year from now to see the progress that has happened. I’m happy that I decided to scale back on paid work for a while so I could deeply immerse myself in what is going on.

Once Pipette is ready for broader usage, I think I’ll ask for community support via Open Collective, or Patreon or similar. That will be a good way to see if it’s delivering value to people.

Could Pipette be a business? It would be very speculative to start a business in the peer-to-peer web space right now. But it’s clear to me that the peer-to-peer technologies are going to have a dramatic impact on the Internet and cloud business as it exists today. There’s a lot of energy and passion in the space right now.

Thinking of longer-term sustainability, I’d like to support my family by doing as much Dat-related freelance work as possible. If you know anybody that wants to build something nifty with Dat, I’d love to talk! I’m also pretty good with Node.js, client-side JavaScript (React, etc.), Docker and Kubernetes, if you need that sort of thing.

As for full-time positions, I’ve been turning away recruiters. I’ve been encouraged by the initial response to Pipette, and I’d really like to get it off the ground. My hope is that there is enough freelancing work out there to give me that flexiblity.

Messing around with HyperOS

I had some success getting the Pipette Hugo worker to run inside HyperOS.

Most of the problems are due to the fact that HyperOS is designed primarily to run dat-container, and dat-container only. My pipette-hugo-worker project started out as a fork of dat-container, so it mostly works.

But I added some dependencies on losetup, GNU tar, and xz in order to speed up the first-time run, which is quite slow with dat-container. Unfortunately, the BusyBox-based equivalents in HyperOS were missing features. I think I’ve figured out a workaround.

Additionally, I’ve been experiencing some random hangs/deadlock type behaviour when running either dat-container or pipette-hugo-worker. I’ve been trying some things out with Max Ogden in the #dat IRC channel … I believe it’s got something to do with the random number generator, so I’m trying to figure that out.

Right now, HyperOS runs on a RAM disk, so it’s not really suitable for running pipette-hugo-worker, as there is no persistance. But I think it could make a nice demo on OS X. And as I understand it, the ultimate plan for HyperOS is to have persistance capability via Dat … that’s a pretty wild idea!

Shipped: A README file

Sorry, I wrote the introductory blog post before I had committed a README file to show how to actually run the worker. Oops.

So, at long last, 24 hours later, here’s the missing information:

https://github.com/jimpick/pipette-hugo-worker

I’ll write up some better HOWTO guides, but hopefully this is enough to get somebody that is motivated started.

If anybody gets it working, I’d love to hear about it! You can find me on Twitter or Rotonde.

Repo renaming and HyperOS

Quick update:

Renaming

I renamed the four repos used for Pipette:

I’m still not super happy with the names, but at least you can tell that they belong to a single project now.

READMEs are coming soon!

HyperOS

On the #dat irc channel, Max Ogden suggested I try out HyperOS - it’s a simple way to run a Linux virtual machine image on an OS X machine. He just added support for dat-container today. I tried it out on my laptop, and it worked great!

Since pipette-hugo-worker is a variant of dat-container, I decided to try that out. It didn’t work immediately.

After investigating, I discovered that HyperOS didn’t have the tools to decompress a .xz file - I’m using that to compress the ‘primer’ file that speeds things up … I am thinking of switching to .gz anyways to see if it would unzip faster.

Also, the HyperOS image uses the version of losetup from busybox, which has different command line options than the Ubuntu version. I did some hacking to the file, and go it to work!

So after I get some READMEs done, I think I’ll do a version that runs inside HyperOS to make it easy for OS X users to run a worker on their own hardware. I’m thinking to make it really, really user friendly, it would also be nice to wrap it in an Electron app, but that will have to come later…

Update: Switching to .gz from .xz didn’t speed up the unzipping phase, but it did increase the primer tarball size from 31M to 44M. I’d prefer to keep .xz, but I will switch to .gz for now so I can play with HyperOS without having to modify it. But I did discover that I need to create the tarball using the –sparse flag or it wastes a lot of disk space when it unpacks for zeroed out files.

Introducing Pipette

Hello from my temporary location in East Vancouver!

My name is Jim Pick. I’m a freelance software developer that lives in Vancouver, Canada.

“Pipette” is the name of the new open source project I’ve been working on for the past few months.

And this blog is where I’m going to maintain a development journal.

So what is Pipette?

Pipette is a blogging system.

So how is this different than other blogging systems that exist today?

Pipette is built using “Peer-to-Peer Web” technologies.

The blogs that you create with Pipette are published as Dat Archives, which can be synced to any computer and even viewed when you are offline.

For example, this blog is published as a Dat archive with the following address:

dat://cf1c6948c28017a94f7ffbbcd836b9cc3b6e15f328cb1eec5d91fd5f7109f969/

Don’t worry … you can make less scary names for sharing. And you can also publish on a traditional web address.

You can create a new blog directly from within Beaker Browser and publish immediately, straight from your browser. Creating a new blog is practically free. And you don’t have to upload your content to a centralized hosting service like Facebook, Medium or Tumblr.

Note: At this point in time, Pipette is only at the proof-of-concept stage, so there are a couple of manual one-time setup steps that are not yet automated.

Is it anything else?

So glad you asked! Pipette is also an experiment in cloneable/forkable service infrastructure.

We can make it possible for individuals and communities to have the ability to easily fork an entire web service. Not just the source code or the final HTML/CSS/JavaScript of a web service … but also all of the intermediate machinery running in the cloud needed to support the web service.

If we enable ‘forking’ on a deeper level, we open up a whole new way for people to work together to collaboratively run and maintain complicated services.

It’s a deep topic, so I’ll elaborate more in a future post.

Why the name “Pipette”?

Since much of the code runs in Beaker Browser, and a “Pipette” is also a piece of laboratory equipment … it seemed to fit.

A Pipette (image from Wikipedia)

Is this blog created using “Pipette”?

Yes! This is first blog created with it. I’ll eventually move my personal blog over to use it as well.

Is there a demo?

I’m going to put a full demo together in the next few days.

I can’t wait. Can you give me a quick breakdown of how to make a new blog?

Sure.

First, you need to install Beaker Browser.

Second, you need to use Beaker to “fork” a copy of the “CMS” (content management system) which you will use to write the content of your blog posts. If you are familiar with WordPress, the CMS is basically the dashboard – except in this case, it lives at a separate website.

I’m using a modified version of Netlify CMS which knows how to write back to it’s own Dat archive (self-modifying pattern) instead of writing to Git (a version control system for programmers).

Netlify CMS Screenshot

You can use Beaker to fork the CMS for this blog as a starting point:

dat://d6185c1680001cd2260a0f31bfb209edbf97551774dc6175c110019d9020199d/

After you fork it, you will probably want to delete the existing content and change the blog title in the Settings menu.

Third, once you’re written your first blog post, you’ll want to turn it into a full static web site suitable for publishing. In order to do that, the initial version of Pipette is using the Hugo static site generator. It’s nice and fast. It’s also popular and has most of the features you’d want in a blog, such as an RSS feed.

I’ve written a “worker” program that can easily be installed onto a Linux virtual machine that will subscribe to changes on the source CMS Dat archive, and generate or update the static website in a separate output Dat archive whenever a blog post is created or edited. Here’s a (slightly scary) look at the worker as it does its thing:

Hugo worker process rebuilding a blog after a revision

Notice that it’s quite fast.

The nifty thing about the “worker” is that it is packaged up as a full Linux virtual machine image (using @mafintosh’s dat-container) - and is itself distributed via Dat. The whole thing fits inside a 31MB file: Linux, Hugo, Node.js and all the custom logic!

Fourth, the Peer-to-Peer web works much like BitTorrent – there’s no single server, just a bunch of individuals sharing copies. You probably want to get some additional “peers” for your new blog so that multiple copies of it are available in case some go offline.

And you probably also want to expose it to regular web browsers via HTTP/HTTPS. Here is where publishing to Dat shines!

You can easily set up another peer and get a free website all at once using Hashbase from the creators of Beaker Browser. This blog is published using that option: https://pipette-dev-blog-jimpick.hashbase.io/

Alternatively, you can install a dathttpd server on your own hardware.

Another approach would be to write some software that subscribes to the Dat archive with the rendered static files, and republish those to an existing CDN or static file hosting service, such as GitHub Pages, Zeit Now, Surge, Netlify, Aerobatic, Firebase Hosting, or even raw Amazon S3 or a CDN such as Amazon CloudFront. There are a huge number of options.

Is there source code?

Yes! But not much documentation yet.

I’ve actually been working on parts of this project for the past few months. There are a large number of parts to document, so it will take a while. I’m going to use this blog to write about the bits involved over the next few weeks.

The main project currently lives at:

I’ll probably rename it once I document it some more. I haven’t even updated the README for it. There are several Linux dependencies to install first, and then it can be installed using npm -g dat-hugo-worker. It needs to be run as root since it starts a new Linux virtual machine using the systemd-nspawn virtualization system. It is invoked as dat-hugo-worker dat://.... - which will cause it to subscribe to the Dat file provided. Assuming that it’s one of the Netlify CMS based Dat files with some blog posts, it will create a new Dat archive with the output and share it.

This is the actual worker that runs inside the dat-container based virtual machine image.

This is the modified version of Netlify CMS.

A modified version of this wrapper for Hugo (from Netlify) also lives inside the virtual machine image.

Next Steps

The big thing now is documentation!

The development and build process is many-layered. There are multiple images, and they are currently hand-built. There needs to be some more automation and testing. If I don’t document the build process, even I might forget all of the manual steps involved.

Most people aren’t going to want to set up a Linux box. Really, most bloggers won’t even know what Linux is! The worker only subscribes to one source Dat at a time, but it would be really nice if it could subscribe to many more than that. It would be easy to build a web service that could provide Hugo workers for hundreds or thousands of blogs from a single Linux instance.

I’d really like to build an Electron app that would allow people to run the worker on their own computers at home or at work.

Supporting Pipette

I’m releasing everything as open source.

If you can help test or help with documentation or adding features, contributions are encouraged!

Given my background working for large storage and cloud hosting companies in the past, I’ve been thinking about what a business for hosting dat-containers might look like. If you are interested in talking to me about that, I’d love to chat!

I support my family with freelancing work. If you are looking for somebody to do some Node.js development, DevOps, or even Dat ecosystem work, I’m your guy! Right now, I have availability. Try pinging me on Twitter at @jimpick, or via email at jim@jimpick.com.