Everything not physics in computational physics

Thursday, March 14

I gave a brief presentation at our group meeting last week. Besides a short introduction to Git, I talked about some of the things that we (or at least I) tend to spend a lot of time on, but that are not directly related to physics. Hence the title.

While the original presentation (pdf) is available, I've also included the presentation's content below, only slightly edited to better fit the blog format.

I wrote the presentation in Markdown, and the slides are courtesy of the amazing reveal.js. In the past I've used Latex Beamer for talks. reveal.js is very simple—I got everything working in less than half an hour—and at least for this presentation it worked sufficiently well. Next I need to find out how best to integrate equations and then I could also use reveal.js for my research focused talks.

Contents

  1. Git in 13½ minutes
    1. Intro to Version Control Systems
    2. Hands-on Git
    3. Hands-on Bitbucket
  2. Everything not physics in computational physics
    1. Common workflow
    2. Provenance
    3. Data management
    4. Thoughts on a Python-centric workflow

Before we start

Version Control Systems

Revision control

Image from: Pro Git book

Version Control Systems

Hands-on: Git 1

$ mkdir my_cool_project
$ cd my_cool_project/
$ git init
Initialized empty Git repository in /Users/burkhard/my_cool_project/.git/

Hands-on: Git 2

$ vim funky_program.cpp
$ git status 
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       funky_program.cpp
nothing added to commit but untracked files present (use "git add" to track)
$ git add funky_program.cpp
$ git commit -a -m "Initial commit."
[master (root-commit) 1588d78] Initial commit.
 1 file changed, 4 insertions(+)
 create mode 100644 funky_program.cpp

Hands-on: Git 3

$ vim funky_program.cpp
$ git diff
diff --git a/funky_program.cpp b/funky_program.cpp
index c557e87..32e958e 100644
--- a/funky_program.cpp
+++ b/funky_program.cpp
@@ -1,4 +1,5 @@
 int main ()
 {
+    int whizbiz = 0;
     return 0;
 }
$ git commit -a -m "Implemented the new whizbiz feature."
[master 8859ecf] Implemented the new whizbiz feature.
 1 file changed, 1 insertion(+)

Hands-on: Git 4

$ git log
commit 8859ecf6b0cbcd29407ddbfde3bc0f3ae5c953b2
Author: Burkhard Ritter <burkhard@seite9.de>
Date:   Thu Mar 7 18:36:30 2013 -0700

    Implemented the new whizbiz feature.

commit 1588d78bb6ee615499441a76ab3a8fb6a62241c5
Author: Burkhard Ritter <burkhard@seite9.de>
Date:   Thu Mar 7 18:34:29 2013 -0700

    Initial commit.

Hands-on: Git 5

$ git help
[...verbose help message...]
$ git help tag
[...helpful manpage...]
$ git tag -a -m "Version 1." v1
$ git tag
v1
$ git describe 
v1

$ vim funky_program.cpp
$ git commit -a -m "More exciting features."
[master 7684818] More exciting features.
 1 file changed, 1 insertion(+)
$ git describe 
v1-1-g7684818

Hands-on: Git 6

$ git help diff
$ git diff head^
diff --git a/funky_program.cpp b/funky_program.cpp
index 32e958e..0403944 100644
--- a/funky_program.cpp
+++ b/funky_program.cpp
@@ -1,5 +1,6 @@
 int main ()
 {
+    bool evil_bug = true;
     int whizbiz = 0;
     return 0;
 }

Hands-on: Git 7

Collaboration


hagen@worms:~/nibelungenlied$ git pull kriemhild
hagen@worms:~/nibelungenlied$ git add chapter4.tex
hagen@worms:~/nibelungenlied$ edit chapter3.tex
hagen@worms:~/nibelungenlied$ git commit -a
hagen@worms:~/nibelungenlied$ git push kriemhild

kriemhild@worms:~/nibelungenlied$ edit chapter3.tex
kriemhild@worms:~/nibelungenlied$ git commit -a
kriemhild@worms:~/nibelungenlied$ git pull hagen
kriemhild@worms:~/nibelungenlied$ git pull ssh://kriemhild@worms/home/hagen/nibelungen

Collaboration

More complex example: central server / repository

Distributed workflow

Image from: Pro Git book

Bitbucket

Bitbucket

Hands-on: Bitbucket 1

Bitbucket

Hands-on: Bitbucket 2

Bitbucket

Hands-on: Bitbucket 3

Bitbucket

Hands-on: Bitbucket 4

Bitbucket

Hands-on: Bitbucket 5

burkhard@macheath:~/my_cool_project$ git remote add origin ssh://git@bitbucket.org/meznom/my_cool_project.git

burkhard@macheath:~/my_cool_project$ git push -u origin --all
[...]

burkhard@macheath:~/my_cool_project$ git status 
# On branch master
nothing to commit, working directory clean

burkhard@macheath:~/my_cool_project$ git pull
Already up-to-date.

Hands-on: Bitbucket 6

Bitbucket

Hands-on: Bitbucket 7

burkhard@lise:~$ git clone git@bitbucket.org:meznom/my_cool_project.git
[...]
burkhard@lise:~$ cd my_cool_project/
burkhard@lise:~/my_cool_project$ vim README.md
burkhard@lise:~/my_cool_project$ git add README.md
burkhard@lise:~/my_cool_project$ git commit -a -m "Added Readme"
[...]
burkhard@lise:~/my_cool_project$ git push
[...]

burkhard@macheath:~/my_cool_project$ git pull
[...]
From ssh://bitbucket.org/meznom/my_cool_project
   7684818..a4b5daa  master     -> origin/master
Updating 7684818..a4b5daa
[...]
burkhard@macheath:~/my_cool_project$ git log -n 1
commit a4b5daad03272a7598b1d2263f946d7ca34fdfe9
Author: Burkhard Ritter <burkhard@seite9.de>
Date:   Thu Mar 7 22:37:55 2013 -0700

    Added Readme

Hands-on: Bitbucket 8

Bitbucket

Hands-on: Bitbucket 9

Bitbucket

Everything not physics in computational physics

Everything not physics in computational physics

Common workflow

/----> develop / change code
|
|  /-> input parameter set 1     input parameter set 2     ...
|  | 
|  |          run                         run              ...
|  |   
|  |       output data 1             output data 2         ...
|  |   
|  |   store raw data
|  |   
|  |   postprocess output data
|  |   
|  |   store processed data
|  | 
|  \-- plotting
|
\----- results, final plots

Common workflow

Provenance

Provenance

Provenance

VisTrails

Image from: VisTrails Website

Provenance

Data management

In the long run, your data matters more than your code. It's worth investing some effort to keep your data in good shape for years to come.

Konrad Hinsen, "Caring for Your Data," Computing in Science and Engineering, vol. 14, no. 6, pp. 70-74, Nov.-Dec., 2012; available on computer.org

Data management

Data management

Data management

Data management

Example: XML

<molecule name="water">
  <atoms>O H1 H2</atoms>
  <bonds>
    <bond atoms="O H1" order=1 />
    <bond atoms="O H2" order=2 />
  </bonds>
</molecule>

Example: JSON

{
  "type": "molecule",
  "name": "water",
  "atoms": ["O", "H1", "H2"],
  "bonds": [{"order": 1, "atoms": ["O", "H1"]},
            {"order": 1, "atoms": ["O", "H2"]}]
}

Data management

Data management

Examples / ideas for Monte Carlo: HDF 5

/experiment
  attributes: id, description
  /measurement
    attributes: count
    /0
      attributes: config
      /observables
        /Magnetization
          /data
          /jack
          /bins
      /run
        attributes: count
        /0
          /observables
            /Magnetization
              /data

Data management

Examples / ideas for Monte Carlo: JSON

{
  "info": {
    "program": "SSEMonteCarlo",
    "version": "unknown",
    "state": "done",
    "seedvalue": 42,
    "run": {
      "0": {
        "startdate": "2013-03-08T09:41:43Z",
        "enddate": "2013-03-08T09:41:43Z"
      },
      "1": {
        "startdate": "2013-03-08T09:46:13Z",
        "enddate": "2013-03-08T09:46:13Z"
      } 
    } 
  },
  "type": "ssemontecarlo.montecarlo.MonteCarlo",
  "params": {
    "type": "ssemontecarlo.montecarlo.Struct",
    "N": 10,
    "beta": 1,
    "h": 10,
    "J": {
      "type": "ssemontecarlo.montecarlo.NNAFHeisenbergInteraction",
      "J": 1
    }
  },
  "mcparams": {
    "type": "ssemontecarlo.montecarlo.Struct",
    "t_warmup": 100,
    ...
  },
  "observables": [...],
  "data": {
    "ExpansionOrder": {
      "mean": 246.39998046875,
      "error": 0.9593617279141327,
      "binCount": 200,
      "binSize": 1
    },
    "Magnetization": {
      "mean": -0.060000009536743164,
      "error": 0.18963782783471672,
      "binCount": 200,
      "binSize": 1
    }
  }
}

A Python-centric workflow

A Python-centric workflow