Transforming and Flattening Data

Transforming and flattening lists

A while back I had a the need to take what were essentially audit logs, where multiple records existed for each id, and flatten then into a single record for each id.

I pulled the data into python from a database as a list of over 100,000 tuples. Here is an example of the data I was looking at.

RECORD,ID,OLD_VALUE,NEW_VALUE,DATE,TABLE,COLUMN
1, id1, value1,  value2,  02/01/2020, table1, column1
2, id1, value2,  value3,  02/02/2020, table1, column1
3, id1, value3,  value1,  02/03/2020, table1, column1
4, id2, value4,  value5,  02/01/2020, table1, column1
5, id2, value5,  ,        02/03/2020, table1, column1
6, id3, value6,  value7,  02/01/2020, table1, column1
7, id3, value8,  value9,  02/02/2020, table1, column1
8, id4, value10, value11, 02/01/2020, table1, column1
9, id4, value11, value12, 02/02/2020, table1, column1
10,id4, value12, value4,  02/03/2020, table1, column1

This was an audit log for a series of undesired changes. Here is an example of the progression of these changes:

  • 02/01/2020: a large number of records were changed and iterated to the next highest available number
  • 02/02/2020: a similar number of records (but not all) were changed again, again iterating to the next available number
  • 02/03/2020: an additional event took place that resulted in many of the records being corrected (returning to their previous numbers), but some ended at yet another iteration higher, while still others ended up missing a value altogether.

In the sample information above there are 4 different unintended transformations of data that occurred:

  • id1: Example series of changes that ended back at the correct value:
  • id2: Example of series of changes that ended with a complete removal of the value:
  • id3: Example of series of changes that ended at a different number:
  • id4: Example of series of changes that ended up a different id's value (data not only incorrect, but conflicting):

In addition to the obvious differences between these 4 different changes, there are some that are harder to see:

  • Some id records went through 3 changes, others went through 2 (this one is not hard to see)
  • The last change of some id records was on 02/02, while others' last change was on 02/03

Altogether, the different factors that need to be understood are the following:

  • Some records changed each of the 3 days, some only on 2
  • Some of the records that only changed 2 times had their final changed on day 2, while others skipped day 2 and had their final change on day 3
  • Some records eventually were self corrected to their original value
  • Some records iterated 2 times and ended at a different number
  • Some records eventually ended up with a final value of null
  • Some records ended up at the original value of a different id

This example, as messy as it is, also is cleaner that the situation itself. The follow challenges existed:

  • There were some that only had 1 audit record where the original value was immediately replaced with a null
  • The list of tuples was not sorted in any fashion
  • The dates were actually spread over a period of 14 or 15 days, with the changes for a single record following anywhere in that time period -- not a clean 3 days like the example here.

The tuples that represent the audits for these changes are all over the place. In the end, what I wanted to see clearly what I was dealing with. In order to do that I needed to have a list of new records that would clearly show me the following for each id:

  • original value, and the date that value was lost
  • final value, and the date that value was added

Step 1: Gather the changes associated with a single id together, identified by that id

import csv
from collections import defaultdict

reader = csv.DictReader(open('audit_record.csv'))

dict_by_user = defaultdict
for i in reader:
    dict_by_user[i['ID']].append(i)

audit_summary_l = []
for i in dict_by_user:
    if dict_by_user:
        temp_dict = {}
        max = '0'
        min = '99/99/9999'
        for record in dict_by_user[i]:
            if record['DATE'] > max:
                max = record['DATE']
                last = record
            if record['DATE'] < min:
                min = record['DATE']
                first = record
        temp_dict = last
        temp_dict['OLD_VALUE'] = first['OLD_VALUE']
        temp_dict['ORIGINAL_DATE'] = first['DATE']
        audit_summary_l.append(temp_dict)

columns = ['ID','OLD_VALUE','ORIGINAL_DATE','NEW_VALUE',
           'DATE','TABLE','COLUMN','RECORD']
with open('audit_summary.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, fieldnames=columns, lineterminator='\n')
    dict_writer.writeheader()
    for data in audit_summary_l:
        dict_writer.writerow(data)

This results in the following csv data:

ID,OLD_VALUE,ORIGINAL_DATE,NEW_VALUE,DATE,TABLE,COLUMN,RECORD
id1, value1,  02/01/2020, value1, 02/03/2020, table1,column1, 3
id2, value4,  02/01/2020,       , 02/03/2020, table1,column1, 5
id3, value6,  02/01/2020, value9, 02/02/2020, table1,column1, 7
id4, value10, 02/01/2020, value4, 02/03/2020,table 1,column1, 10

Auto-Login to Multiple KeePass Databases

Depending on the need, I use a couple of different password managers. For many things I use 1Password, primarily for it's multi-user support. Some a few things I use a couple of KeePass databases. I log into the two KeePass databases everyday, and the manual work of getting all of it open became tiresome.

The process I followed everyday:

  1. Open KeePass
  2. Login to which ever account I last logged into1
  3. Then File > Open, navigate to and select my second database2
  4. Login to the second database

While doing this once a day wasn't the end of the world, it was time consuming. I decided to automate the launch and login of both KeePass files.

First, a couple of caveats.

  1. This tutorial assumes you use two or more KeePass databases
  2. My scripts contain passwords in plain text3
  3. Depending on your use case you should explore encryption4 or at least some form of obfuscation5

Overview of Solution

  • A batch file that launches KeePass and a VBScript file
  • A VBScript file that automates the selection and login of two databases

Step 0

Take inventory of the following:

  • Name and Directory of KeePass database 1
  • Name and Directory of KeePass database 2
  • Directory of KeePass installation (where KeePass.exe is located)

Step 1

Disable the setting that causes KeePass to remember the last database.

The setting is found in the Advanced settings.

Tools | Options | Advanced
  1. Locate a checkbox labelled: "Remember and automatically open last used database on startup".
  2. Uncheck this setting.

Step 2

Create a batch file to open KeePass and launch the vbscript that will actually do the work.

This is an example of the batch file that includes comments (lines with REM) that explain what each part does. See further below for the script without comments.

@echo off
REM COMMENT 
REM Change Directory to KeePass installation location
REM run KeePass executable

cd "F:\z-Apps\KeePass"
start KeePass.exe

REM COMMENT 
REM Allow 2 seconds for program to launch

timeout /t 2

REM COMMENT
REM Change Directory to vbs file location
REM run ("CALL") vbs script

cd "F:\z-Scripts"
CALL Multi-KeePass.vbs

exit

Here is what your batch file will actually look like:

@echo off
cd "C:\Folder\KeePassFolder"
start KeePass.exe

timeout /t 2

cd "C:\Folder\ScriptsFolder"
CALL Multi-KeePass.vbs

exit

To create your own version:

  1. Create a text file in Notepad or Notepad++ and copy and paste the script without comments above into it.
  2. Save the file as a name of your choosing (I used "Multi-KeePass")
  3. Change the file extension from .txt to .bat
  4. Open the file and Change the C:\Folder\KeePassFolder to the KeePass directory on your own computer (i.e., folder that contains the KeePass.exe file)
  5. The C:\Folder\ScriptsFolder and Multi-KeePass.vbs will changed later after the VBScript file has been created.

Step 3

Create a VBScript file that will do the work of logging into both KeePass databases automatically when the related batch file is launched.

This is an example of the VBS file that includes comments (lines with ' ) that explain what each part does. See further below for the script without comments.

set wshshell = wscript.CreateObject("wscript.shell")

' COMMENT
' Target the open KeePass application
' Open new file window
' Input path into name field
' Input file name into name field to select file

wshshell.AppActivate "KeePass"
wshShell.SendKeys ("^o")
wshShell.SendKeys "C:\Folder\SubFolder1{ENTER}"
wshShell.SendKeys "KeePass_1.kdbx{ENTER}"

' COMMENT
' Target window requesting password
' Input password (Replace PASSWORD with your password)

wshshell.AppActivate "Open Database - KeePass_1.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"

' COMMENT
' Target KeePass application window with new name
' Open new file window
' Input path into name field
' Wait/sleep for a few moments
' Input file name into name field to select file

wshshell.AppActivate "KeePass_1.kdbx - KeePass"
wshShell.SendKeys ("^o")
wshShell.SendKeys "C:\Folder\SubFolder2{ENTER}"
wscript.sleep 3000
wshShell.SendKeys "KeePass_2.kdbx{ENTER}"

' COMMENT
' Target window requesting password
' Input password (Replace PASSWORD with your password)

wshshell.AppActivate "Open Database - KeePass_2.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"

Here is what your VBS file will actually look like:

set wshshell = wscript.CreateObject("wscript.shell")
wshshell.AppActivate "KeePass"
wshShell.SendKeys ("^o")
wshShell.SendKeys "C:\Folder\SubFolder1{ENTER}"
wshShell.SendKeys "KeePass_1.kdbx{ENTER}"
wshshell.AppActivate "Open Database - KeePass_1.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"
wshshell.AppActivate "KeePass_1.kdbx - KeePass"
wshShell.SendKeys ("^o")
wshShell.SendKeys "C:\Folder\SubFolder2{ENTER}"
wscript.sleep 3000
wshShell.SendKeys "KeePass_2.kdbx{ENTER}"
wshshell.AppActivate "Open Database - KeePass_2.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"

To create your own version:

  1. Create a text file in Notepad or Notepad++ and copy and paste the script without comments above into it.
  2. Save the file as a name of your choosing (I used "Multi-KeePass")
  3. Change the file extension from .txt to .vbs
  4. Open the newly created file and reference the text below to replace the bold sections with your own information:

set wshshell = wscript.CreateObject("wscript.shell")
wshshell.AppActivate "KeePass"
wshShell.SendKeys ("o")
wshShell.SendKeys "C:\Folder\SubFolder1{ENTER}"
wshShell.SendKeys "KeePass_1.kdbx{ENTER}"
wshshell.AppActivate "Open Database - KeePass_1.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"
wshshell.AppActivate "KeePass_1.kdbx - KeePass"
wshShell.SendKeys ("o")
wshShell.SendKeys "C:\Folder\SubFolder2{ENTER}"
wscript.sleep 3000
wshShell.SendKeys "KeePass_2.kdbx{ENTER}"
wshshell.AppActivate "Open Database - KeePass_2.kdbx"
wshShell.SendKeys "PASSWORD{ENTER}"

Breakdown:

  • C:\Folder\SubFolder1: Replace the single instance with the directory of your first KeePass database
  • KeePass_1: Replace all three instances with the name of your first KeePass database
  • PASSWORD: (first instance) with the password of your first KeePass database
  • C:\Folder\SubFolder2: Replace the single instance with the directory of your second KeePass database
  • KeePass_2: Replace both instances with the name of your second KeePass database
  • PASSWORD: (second instance) with the password of your second KeePass database

Step 4

Run two instances of KeePass -- which means different bat and vbs files. This is in Sourceforge

--edit to create different shortcut keys for each one.


  1. By default KeePass remembers the location of the last database you navigated to. If you regularly login to multiple databases the 'last' database you logged into will often be different. This means that  

  2. While the first database will open and prompt for login upon launch, any additional databases have to be specifically selected and launched. 

  3. These passwords are not super important, and they are on my own hardware that's not connected to the internet. In addition, the scripts that contain the actual passwords are on an encrypted flash drive, which is unplugged after launching KeePass. This is just a personal cost/benefit decision that I made. 

  4. How to safely store a password 

  5. vbs script obfuscate discussion 

iPhone Screen Repair Note - Touch Sensor

I recently replaced the screens of an iPhone 5S and an iPhone 6S and run into one specific issue that wasn't specified in any of the instructions I used for the process.

[Note 1: This issue doesn't apply to the iPhone 7, 7+, 8, or 8+]

When moving the home button to the replacement screen, make sure that the bracket mounting screws are not tightened much at all. If the screws are too tight the home button will not click. To prevent this, only lightly tighten the screws until they hit resistance and stop.

This will likely feel too loose, but it is correct. Any additional tightening will result in the home button click issue. Click the home button a few times to ensure that it feels right before reassembling the phone.