Licht und Luft

Entenwerder 1
Blick von Entwerder aus über die Norderelbe
Stärkung in Erika's Eck
Der Himmel über Hamburg
Rugby-Gucken in Paddy's Bar: Irland hat Rumänien bei der Rugby-WM mit 44:10 geschlagen
Das Rathaus bei Dunkelheit
Wasserlichtspiele bei Planten un Blomen


Distributed Matters 2015

Distributed Matters 2015 in Berlin - Friday was workshop day. I've chosen to attend Data applications on Hadoop, live from the trenches held by Arnaud Cogoluègnes and A Hands-on Introduction to Ansible by Victor Volle.
Arnaud's workshop was well-prepared by supplying a VM with everything needed for the workshop. It was a beginner level workshop with a good amount of exercises using Java. Unfortunately, we ran out of time to do all exercises. Arnaud covered the following topics:
Victor's workshop consisted of an introduction into Ansible accompanied by exercises performing installation and configuration tasks. Again, we ran out of time to complete all exercises. He covered the following topics:
  • Motivation for automation and configuration management
  • Ansible core concepts (incl. the very good documentation)
  • Idempotency and target state detection
  • Playbooks with some modules by example
  • Modules and reuse of code
  • The still poor multi-stage/multi-environment support of Ansible, if you want to group nodes in several dimensions (e.g. roles like web servers and development stages like test environment)
Saturday was conference day with a bunch of talks in two parallel tracks. Here are my notes from the talks I've attended:
  • The keynote was held by Kyle Kingsbury about Jepsen, a tool to simulate network partition for different distributed databases, schedulers etc. Network partition is kind of a worst case scenario in distributed systems, and Kyle went through some NoSQL database products and how they cope with network partition and whether they keep their promise of data consistency.
  • Clojure at Braintree: Real-time Data Pipeline with Kafka by Joe Nash: Braintree is a payment processor using real-time data pipelines. Their data warehouse is Amazon Redshift, and Joe described their journey to introduce Kafka and Clojure for feeding data into Elasticsearch for transaction search. Kafka features they found useful:
    - Implements the pub/sub pattern.
    - Topics are spread across partitions.
    - The retention period is configurable.
    - The message stream is strongly ordered.
  • Upgrade your database: without losing your data, your perf or your mind by Charity Majors: What can go wrong? A lot! Data loss, queries being slowed down, broken replication, ... Risk assessment is important, e.g. about the type of data you risk to loose or the existence of a rollback path. Charity presented why MongoDB has the highest update risk in the context she is working - mainly because of the type of data they store in there and the fact that MongoDB is quite young and immature with a lot of changes between releases. Some learnings she presented:
    - One change at a time.
    - In an ideal world run tests with production traffic and production data with each change. They capture and replay each API request to hit the production database and a shadow database (which might already be upgraded) and compare the results.
    - Don't trust vendors, run your own benchmarks.
    - Know your workloads.
    - Measure and look for outliers e.g. by tracking the p99 percentile.
  • Stream based textanalytics with Spark and Elasticsearch by Stefan Siprell and Hendrik Saly: They explained CRISP (Cross Industry Standard Process for Data Mining) using different domains. Tools mentioned were R (with some small code examples concerning data cleansing and clustering being presented), Spark (e.g. for stream processing of tweets and pushing data to Elasticsearch - incl. some code examples), Elasticsearch (for interactive data exploration using Kibana). Some machine learning fundamentals like supervised learning, e.g. based on support vectors, were covered.
  • Running database containers using Marathon and Flocker by Kai Davenport: He talked about scheduling of containers across many hosts. Mesos manages resources in a cluster. It uses a master/slave architecture and stores its state in ZooKeeperMarathon is a framework on top of Mesos and manages containers in a cluster. Flocker orchestrates storage across a cluster of containers providing a REST API. It requires an agent on each host and relies on a central control service. Kai closed his talk with a storage failover live demo.
  • Containers! Containers! Containers! And Now? by Michael Hausenblas: No slides! Just a terminal session and a web browser. Michael lead us towards the motivation for using containers and a cluster management layer for containers - Marathon again. He went through this blog post in a live demo fashion.
  • Microservices with Netflix OSS and Spring Cloud by Arnaud Cogoluègnes: Spring Cloud wraps the tools from Netflix OSS to enable them for Spring projects. Arnaud showed some Java code examples to demonstrate how easily this can be used.
  • Conflict Resolution with Guns by Mark Nada: This talk was about problems of distributed systems like split brain, building a quorum and the CAP theorem. Gun DB, a distributed cache, tries to mitigate those problems by using local caching and conflict resolution algorithms. It challenges the need for strongly consistent systems.
  • Disque: a detailed overview of the distributed implementation by Salvatore Sanfilippo, the guy behind Redis: This final talk was about Disque, a distributed message queue. Disque has been forked from Redis and uses the same protocol. Salvatore provided some technical insights into the API and details like ID generation, delivery guarantees, replication schemes and save restarts (restarting the system with persisting the messages before shutdown and loading them again after start). He concluded with a short live demo.




Heute wollte ich mir ein Fußball- und ein Rugbyspiel anschauen.
Aber erstmal kurz in die Fabrik, um beim dortigen Markt ein paar Macarons und ein paar Kunstkarten zu besorgen ... im Anschluss noch ein kleines Frühstück in der alpe altona.
Dann ging's aber in das Rudi-Barth-Stadion, um mir das Spiel des HFC Falke gegen Cosmos Wedel 2 anzuschauen. Der HFC Falke ist nach dem Vorbild des FC United of Manchester von HSV-Fans gegründet worden, die genug von der Kommerzialisierung des Profi-Fußballs hatten. Aktuell steht die Mannschaft an der Tabellenspitze der Kreisklasse 5 und spielt vor m.E. recht vielen Zuschauern, die gut Stimmung machen - ich würde mal behaupten, dass sie damit in dieser Spielklasse die große Ausnahme bilden.
Das 2:0 per Elfmeter - zur Halbzeit stand es 5:0, der Endstand war 9:0 ... ähm, für die Falken natürlich
Zum Mittag dann ins Zum Spätzle.
Danach habe ich mir zum Abschluss noch das Rugbyspiel des FC St. Pauli gegen RU Hohen Neuendorf im Stadtpark angeschaut - liegt ja praktisch um die Ecke von meiner Wohnung.
Hier spielt also Hamburg Rugby ...
Familiäre Atmosphäre im Grünen und überlegene Hausherren - Halbzeitstand 24:7, Endstand 31:12


Blue Port

Irgendwie muss ich gerade ein meinen Lieblingsfilmdialog denken:
Hamid: "Wozu ist das?"
Rambo: "Das ist blaues Licht."
Hamid: "Und was macht es?"
Rambo: "Es leuchtet blau."


Wieder Sonntag und wieder eigentlich nichts vor ...

... also mal wieder los: Zuerst frühstücken bei Frau Larsson, dann auf Tour.
Ehemalige Fabrik Rieck & Melzian am Goldbekplatz
An den Landungsbrücken
Auf dem Jüdischen Friedhof Altona
Alsterschippern: Im Osterbekkanal
Auf dem Stadtparksee
Schrebergärten am und Schwanfamilie auf dem Goldbekkanal
Auf dem Rondeelteich
Auf der Außenalster
Kurz vor dem Anleger: Auf der Binnenalster
In den Alsterarkaden
Zum Abschluss: Burger in der LOUIS kitchen.bar.


Python Unconference

It's the first time I attended an unconference. In contrast to traditional conferences there are no talks announced upfront - talks are proposed at the beginning of the unconference, anybody can propose a talk, and the attendees vote for the talks they want to listen to.
This weekend the Python Unconference took place at the University of Hamburg - for three days, but I just attended Saturday.
26 talks were proposed for 12 planned slots (four sessions with three parallel tracks each). The proposed talks covered a broad range in terms of content and quality. After voting I attended the following talks:

  • "Why Twitterbot? Using Python to Twitterbot" by Esther Seyffarth. The slides are available here. She introduced some of the Twitterbots she already has implemented mainly using Tweepy. Esther went through the code of OMG Wikipedia! in some more detail and showed various other Twitterbots - EmojiHaskell is my favorite one. Her motivation is that Twitterbots are a nice exercise for text generation using Python and can deliver some funny and entertaining results. In the discussion afterwards NLTK as an interesting toolkit for processing natural language in Python was mentioned.
  • "TDD for APIs" by Michael Kuehne. To be honest I was distracted and didn't follow that much, but the discussion afterwards was focused on how to test the full stack of an API.
  • The Lightning talks in between covered a broad variety of topics, e.g. 3D rendering of OpenStreetMap data and coding katas.
  • "Pandas intro (Apache log analysis)" by Nikolay Koldunov. It was a live session by going through his IPython notebooks including an introduction into pandas and showing a use case for exploring Apache logs.
  • "Building data products with Flask and AngularJS" by Andy Goldschmidt. He demoed two web applications - one used machine learning for classification and provides a simple interface to play around with the features of a data set. The other one analyzed an image to deliver the dominant colors. For both, the frontend was implemented using AngularJS, the backend is driven by Flask using scikit-learn for the machine learning algorithms.