"We are not what we know but what we are ready to learn" || Industrial Engineer turned data analyst turning Blockchain Developer
16,512 words
https://github.com/r1oga @r1oga

Yes, large-scale data can be stored on the Ethereum blockchain

WePower/Elering Nationwide Energy Experiment

Hinsights from WePower website:
WePower’s mission is to enable everyone to make a change towards a sustainable energy future through their energy purchasing decisions. WePower essentially tackles the limitations of the current instruments that exist in the market, especially Power Purchase Agreements (PPAs), which are too complex and expensive for corporate energy buyers. That's why WePower has built a next-generation renewable energy procurement and trading platform based on smart contracts that uses virtual PPAs.

...so we have #blockchain for #P2P transactions and #transparency in the energy market, #SmartContracts for cheaper, faster, transactions... and on top a dedicated #token.
Sounded like nothing new or worth further reading. Not the first time people claims revolutionizing some market thanks to Blockchain.
Until I read about the test carried WePower carried out. They don't just throw out buzz words. They did perform a real test: trying to write on the Ethereum blockchain energy production and consumtion data on a national scale.

And they did it succesfully.

Scope

1 year of actual production and consumption data was provided by the estonian energy operator Elering

Volume of data points

households 7x10⁵
timeframe 1 year
readings 1/hour/household

7x10⁵ x 365 x 24 = 6.132x10⁹

Time

WePower assumed they could store 200 data points per block.

Total volume 6.132x10⁹
Assumed volume per block 200
Seconds per block 15

6.132x10⁹ * 15 / 2x10² = 4.599x10⁸ s ~ 5323 days ~ 15 years

Costs

Based on the Ethereum yellowpaper (page 20)

Store operation 2x10⁴ gas cost
Every Transaction 2.1x10⁴ gas cost
Average Gas Price 8.5 Gwei
1 ETH 1x10¹⁸ wei
1 ETH 185.68 € (coinmarketcap, 13th August 2019

6.132x10⁹ x (2x10⁴ + 2.1x10⁴ / 200) * 8.5x10⁹ = 1.04791281x10²⁴ = 104,791.281 ETH = 195 M€

--> WePower needs 14 years and 195 M€ to run their pilot/test... GAME OVER? Not quite yet.

Optimization

To cut down testing costs and time, they:

  • compressed data
    • EVM uses 32 bytes per words. Or numbers represented by 256 bit integers, which was more than required for one point. They figured they could fit 15 data points per 256 bits. Meaning 15 times less gas costs and 15 times less validation time.
  • reduced the volume of data by:
    • aggregating at zip code level --> 3837 points (also added benefit of anonymizing data and complying with GDPR regulations)
    • summarizing hourly consumption into monthly consumption --> 3837 x 12 = 46,044 data points

Test results

Stats

Total Transactions 434
Gas Used 1,129,171,462
Average gas price (Gwei) 9.65921659
Median gas price (Gwei) 2.5
Cheapest gas price (Gwei) 1.5
Highest gas price (Gwei) 60
Average wait time (s) 1,456.5
Median wait time (s) 203.6
Shortest wait time (s) 0.8
Longest wait time (s) 44,882.1

Takeaways

Higher gas price doesn't necessarily means shorter confirmation times

Chart - Median time to transaction confirmation per gas price

With the intention of reducing testing time and costs, they were batching transactions by 15 per block. This actually led to longer waiting times when "gas price increased while transaction were still waiting to be confirmed. Other transactions from the batch had to wait in line until the gas price would meet again the set price level". (see transactions at gas price 4 Gwei).
"Performance depends on other activities being performed at the same time on the blockchain, meaning that increasing gas price does not always guarantee timely confirmation of transactions."
Speed was achieved in being reactive, meaning readjusting the set gas price frequently enough to avoid being below the market price. Practically, they ended readjusting the gas price every 3 transaction using ETH Gas Station API

Gas price equal or lower than the average secures confirmation time lower than 5 minutes

Chart - Distribution of transactions per gas price strategy
  • Safe low: both cheap and successful. Lowest price where at least 5% of the network hash power will accept it.
  • Average: accepted by top miners who account for at least 50% of the blocks mined- safe and prompt. Usually reflects wallets' defaults.
  • Fastest: lowest gas price accepted by all top miners. Should secure transactions will be accepted by all the top pools. Paying more than this price is unlikely to increase transaction confirmation time under normal circumstances.
Chart - Transaction confirmation time distribution

Reading both charts, we see that while ~85% transactions were submitted using an "average" or worse (safelow) price, ~60% transactions were still confirmed in less than 5 minutes.

For the energy market, my personal guess is that this confirmation time is satisfactory.

Conclusion

This test was successful both for WePower and the Ethereum network.
One one hand the pilot "helped WePower to validate and verify the logic and processes that will be at the core of the WePower platform.": they can go forward developing their platform with confidence.
On the other hand their test confirmed that "Ethereum is mature enough to accommodate contracts with multi-year terms." WePower ends their report acknowledging "that the scalability of Ethereum blockchain currently has limitations" but also reminds that "the problem is being tackled by Ethereum developers with plans to implement sharding, i.e. partitioning data into subsets, and moving from energy-intensive Proof of Work to a more environmentally friendly Proof of Stake consensus model".

Data visualisation principles

The motivation to write these lines came from seeing both bad/good visualizations, both at work or in the medias. So here are some principles that should help build better visualizations or detect flaws in poorly designed visualizations. I'll start assuming you have some data (symbols, signs, bytes, characters….) that you interpreted to get some information out of it. You now want to communicate it and present it to others. One effective communication is often visual communication. Back to data, you have basically two options: tables or graphs.

Graph or table?

First of all ask yourself if you really need a chart. Unlike data tables, graphs are not meant to provide precise quantitative values. Graphs reveal patterns, trends, relationships and exceptions that would be difficult to discern from a table of values.
Sometimes the best graph is no graph.


Visual attributes

Let's now assume you found out you do need a graph. It's good to know some things about visual perception from a physiological perspective to understand what works and what doesn't. Your eyes are able to detect a limited set of visual attributes (e.g color, shape, size….). Due do pre-attentive processing some of these visual attribute are perceived extremely fast without any conscious effort. Why should you care? Because you want to visually encode your information so that it is perceived instantly and easily. Here are some pre attentive visuals attributes, from the most to the least "accurately perceived":
1.Position
2.Length
3.Angle/Slope
4.Area
5.Volume
6.Color Hue/Density
pre_attentivve_visual_attributes

  • Position and length being better perceived, they are better suited for encoding quantitative data --> how much?
  • Colors or shapes are better suited for encoding categorical data --> what?

Which type of graph?

Ask yourself what do you want to show.

Purpose Graph type
Comparison Between items: bar charts, over time: line charts
Distribution Histograms
Relationship scatter chart
Composition stacked bars chart, waterfall chart

This document provides useful help when it comes to choosing the right chart's type.


Best pratices

Here are some recommendations before finally building your graph:

Save Pies for dessert

Although Pies are good to show part-of-whole relationship, pies use areas as a visual encoding which is not so accurate. And it also often requires using redundantly colors to distinguish values.
Prefer bar charts over pies!
On which chart is it honestly easier/faster to read/order/get sense of the data values without having to explicitly label them?
save_pie_for_dessert

Colors

Use different colors only when they correspond to differences of meaning in the data. In the example above color was actually redundant for the bar charts.

Colors are appropriate to show:

  • categories (different values per item)
  • sequence
  • divergence color palettes Besides these cases, it is very likely that using color (or more than one) is redundant. It should not be carnival on you chart. You are doing data visualization, which is about understanding: an effective chart may look "boring". You are not doing data art, which is about entertaining.

Data look better naked

Remove from your graphic all the ink/pixels that are not related to the numbers/values you actually want to represent (concept of maximizing the data-ratio from EdwardTufte). This includes removing: background, frames, axis, shadow/3D effects, gridlines...
You should grasp the idea looking at this animation:

animation

Avoid not making your vertical axis start at 0

It is confusing and may convey a wrong message.

On this first chart, it looks like Germany has a big edge over countries like France or Italy.
misleading vertical axis
While actually...
ok vertical axis

Credit & further reading: most of the concepts I have just summed up come from www.perceptualedge.com.

DietPi Home Cloud Server

Block ads and access your data everywhere: self-hosted DNS+VPN+FTP+CLOUD server

I used to rely on cloud services offered by 'powerful, centralized, privately-owned companies' to store and share data between my personal devices. Not happy with their valuing of privacy, I decided to host myself a server. It should fulfill the following requirements:

  • [ ] 'network firewall' or 'DNS sinkhole' to block ads and trackers.
  • [ ] file server (ftp)
  • [ ] cloud server (http)
  • [ ] store data on a separate drive
  • [ ] accessible on the go
  • [ ] rely as much as possible on open source products
  • [ ] low cost
  • [ ] headless: no keyboard, mouse or screen, controlled remotely via ssh connection
  • [ ] secure

...a DNS+FTP+CLOUD+VPN server.

1. The Single Board Computer: Raspberry Pi 3B+

The Raspberry Pi is the name of a popular series of single board computer made by the eponymous Foundation. They provide low-cost (35$, high-performance coputer, outreach and education to help more people access computing and digital making.

The Raspberry Pi operates in the open source ecosystem: it runs Linux and its schematics are released (board itself is not open hardware though).
Costs: 55.39€ (board + case + power supply + SD card)

  • [x] open source
  • [x] low cost

2. The OS: DietPi: Raspberry Pi on diet

DietPi describes itself as lightweight justice for your single board computer. It is an extremely lightweight Debian based OS. Think of a stripped version of 'Raspbian lite'.

It moreover offers a catalogue of popular 'ready to use' and optimized softwares (desktop, media, ssh, cloud, web/file servers...).
So it is optimized for minimal CPU and RAM usage and includes pimped versions of the softwares I plan to use. DietPi sounds like the perfect OS for my RaspberryPi.

Installation

  1. Flash SD Card with latest version of DietPi using Etcher

Optional: Pre configure dietpi for wifi

Locate and edit dietpi-wifi.txt:

aWIFI_SSID[0]='MySSID'`, `aWIFI_KEY[0]='MyWifiKey'
  1. Check Router interface to find IP of raspberry or use nmap: e.g nmap -sP '192.168.178.*'
  2. Connect via SSH to rasberry PI: ssh root@i.p.add.ress.
    • Standard password: dietpi
  3. [x] headless

  4. Go through throught the installation

  5. Set up static IP address (required for pi-hole to work):
    Dietpi Config > 7: Network options: adapters: select your adapter > change DHCP setting to static and apply


2. The DNS server: Pi-hole

Pi-hole describes itself as a black hole for Internet advertisements.

Pi-hole basically blocks queries using lists of blaclisted hostnames. Acting as a DNS server makes it an ad blocking application much more powerful than e.g brower plugins:

  • All your home devices (including smart TV) benefit from the network-level blocking. Especially in blocks
  • Network-level blocking allows to block ads in non-traditional places such as in-apps ads

Installation

  1. dietpi-software > Pi-hole
    • Select upstream DNS provider > Custom: 46.182.19.48 (digitalcourage.de), 80.241.218.68 (dismail.de)
    • Select default for all other configuration options
  2. Automatic reboot. Relog.
  3. Configure your router: add Raspberry Pi IP as local DNS server
  4. Redefine pihole admin password: pihole -a -p
  5. Last settings:
    • Log to http://diepi.ip.address/admin
    • Settings > DNS
      • Interface listening behaviour: should be "interface tun0"
      • Advanced DNS settings:
        • [x] Never forward non-FQDNs
        • [x] Never forward reverse lookups for private IP ranges
        • Conditional Forwarding
          • [x] Use conditional forwarding: provide your router's IP and domain name

Automatic updates

Edit sudo nano /etc/cron.d/pihole. Add at the end:

# Pi-hole: Auto-Update Pi-hole!
30 2    * * 7    root    PATH="$PATH:/usr/local/bin/" pihole updatePihole

Note: it may be necessary that you reboot your devices before they actually start using the pi-hole DNS server and that their queries get blocked.


3. The storage: mount a usb drive

  1. Plug your usb drive into the raspberry pi
  2. dietpi-software
    • User Data Location >Drive: Launch Dietpi-Drive_Manager
    • Select drive
    • Ensure it is formatted as ext4. If not use the dietpi formatting feature.
    • Mount and rename
    • [x] User data: Select to transfer DietPi user data to this drive
    • Exit

Check in dietpi-software that 'User Data Location' now indicates: mnt/yourdrive/dietpi_userdata

  • [x] store data on a separate drive

4. The cloud server: Nextcloud

  1. dietpi-software > software optmised > 114 Nextcloud
  2. Check access
  3. Add the hostname set for your RaspBerry Pi (I personally use dynv6 as a provider) and/or your static IP address to the list of trusted domains:
    Edit /var/www/nextcloud/config/config.php

    'trusted_domains =>
    array (
    0 => 'rasp.berry.pi.ip',
    1 => 'new.dom.ain.ip'
    )
    
    1. Increase max upload and php memory size Edit /etc/php/7.3/cli/php.ini and /etc/php/7.3/fpm/php.ini and increase post_max_size, upload_max_size, memory_size

5. The FTP server: ProFTP

  1. dietpi-software > File Server > ProFTP
  2. go to ftp://username:pwd@your.raspberrypi.ip.address (port 21)

Change the destination directory

Replace /Path/To/Directory to your target directory.

systemctl stop proftpd
sed -i '/DefaultRoot /c\DefaultRoot /Path/To/Directory' /etc/proftpd/proftpd.conf
systemctl start proftpd

Enable "jailing" (lock users to their home folders)

systemctl stop proftpd
sed -i "/DefaultRoot /c\DefaultRoot ~" /etc/proftpd/proftpd.conf
systemctl restart proftpd
  • [x] FTP server
  • [x] open source (GPL licensed)

6. The VPN server: openVPN

After setting a VPN we will benefit from:

  • access to pi-hole on any of your connected devices even outside of your home LAN
  • more security as your connection will be encrypted ("tunnelled") while on e.g a public wi-fi network
  1. Get a hostname for your dynamic (router) IPv4 address (I personally use dynv6 as a provider).
  2. dietpi-software > PiVPN
  3. Use dietpi user
  4. Local DNS: enter domain of your dynamic DNS address: this will secure that your client can connect to your piVPN server even after an IP address change. Your router will have to be configure accordingly too (see further below).
  5. Change default port for more security: ex 3456
  6. DNS Provider for VPN clients: custom > address: 10.8.0.1
  7. No custom search domain
  8. Accept other default options
  9. Reboot and relog

Now we want to define the IP address of the VPN interface (tun0) as the DNS server for the VPN clients. That way we reroute all DNS queries of the clients to our local DNS server, which is pi-hole!

  1. nano /etc/openvpn/server.conf
  2. comment out push "block-outside-dns" (windows specific)
    • Check line push "dhcp-option xxx". Should be: push "dhcp-option DNS 10.8.0.1" If something else is defined, delete/comment out/replace.

Finally the dnsmasq configuration must be extended so that Pi-Hole allows DNS name resolution for the IP address of the VPN interface.

  1. nano /etc/dnsmasq.d/02-pivpn.conf Write line: interface=tun0
  2. nano /etc/pihole/setupVars.conf. Add line: PIHOLE_INTERFACE=tun0
  3. Enable IP forwarding
    • sudo nano /etc/sysctl.d/01-ip_forward.conf: add line net.ipv4.ip_forward=1
  4. Restart services
    • /etc/init.d/openvpn restart
    • /etc/init.d/pihole-FTL restart

Configure router:

  • Set the dynDNS settings
  • Forward port defined for the VPN sever (UDP) to secure that data packets from outside can reach it

Connect client

  • Add user: pivpn add
  • Copy .ovpn config file to client (e.g using proFTP)
  • Set up client with this config file
    Start VPN session on linux
    sudo openvpn --config path/to/.ovpn file

  • [x] VPN server

  • [x] open source

  • [x] accessible on the go


7. Security

  1. Change ssh port and forbid root login:
    • Edit sudo nano /etc/default/dropbear DROPBEAR_EXTRA_ARGS="-w -g" DROPBEAR_PORT=2200
    • service dropbear restart
  2. Exit
  3. Copy public key to Raspberry Pi to avoid entering ssh password every time: ssh-copy-id <USERNAME>@<IP-ADDRESS>
  4. Relog with new user: ssh username@i.p.add.ress -p 2200
  5. Install Fail2Ban: dietpi-software > Fail2Ban
  6. Enable HTTPS
    • dietpi-software > CertBot
    • certbot -d your.domain --manual --preferred-challenges dns certonly
    • Follow instructions and deploy the DNS TXT record _acme-challenge.... and its value
    • Renewal: for the moment manual --> to be improved
  • [x] secure

Conclusion

  • [x] 'network firewall' or 'DNS sinkhole' to block ads and trackers.
  • [x] file server (ftp)
  • [x] cloud server (http)
  • [x] store data on a separate drive
  • [x] accessible on the go
  • [x] rely as much as possible on open source products
  • [x] low cost
  • [x] headless: no keyboard, mouse or screen, controlled remotely via ssh connection
  • [x] secure