all

⊃ boundaries

some notes from Gary Bernhardt’s 2012 talk Boundaries.

  • here’s the talk itself
  • make simple values the boundaries between components and subsystems
  • his walrus example at 11min is informative: if you have to write walrus.stomach << Cheese.new then you have to just know that the walrus has a stomach and it can have some things put inside it – this is too much. There’s also mutation which is not ideal..
  • consider instead creating a functional core and imperative shell – the core is dependency-isolated with many paths whereas the shell has many deps but few paths. Unit tests will now be easier to write against the core, integration tests are written against the shell.
  • “core has the intelligence but the shell has the state” – the core will do some work and return a value..it won’t mutate anything
  • consider every value a message between subsystems – makes it easier to build systems for concurrency

⊃ hiking pescadero

I had a very nice walk around Pescadero Creek SP after a week of rain. From the Tarwater Trailhead I took the Tarwater Trail Loop west and then south to the camp – it was quite swampy down there. Of six sites I think two were kinda nice. And you could sorta rappel down to the creek which might be nice in the summer. I returned via the Pomponio Ridge and Upper Coyote Ridge trails, about six miles total. Took me about 2hrs 45min, I believe.

There was a curious abandoned cabin..maybe from loggers?

mysterious pescadero cabin

Many trees had come down recently.

fallen redwood

The grasses along the trails were beautiful though.

fallen redwood

And you could look down into the valley to see the fog.

pescadero valley

⊃ more go

more go tidbits as I work through aoc 2016 – the code I wrote for this is here.

  • reflect.DeepEqual will see a difference between [2]int{1,2} and []int{1,2}, hm.. ah, that’s slices vs arrays!
  • I struggled mightily with a problem that had me build a fixed size array, add items to the array and then return once the array had a value in each slot – it was tough to check for nil in the array. I ended up checking for a char{}

⊃ bioreactors for cell therapies

notes from a recorded talk on bioreactors (the image above is a muscle bioreactor that stretches out / exercises the cells..pretty cool!)

Julie G. Allickson, PhD, Director of the Wake Forest Institute for Regenerative Medicine

  • bioreactors are used to make kidneys, ears, bone, muscle, blood vessels, heart valves, etc.. have to be easy to ship because they’ll be sent straight to a clinician, interesting!
  • really interesting design factors for mechanical stimulation of these different types of cells..
  • materials must be USP Class VI, ISO 10993 – sterilization might also affect leachables and extractables (?)

Biren Mistry, MS, Celgene Cellular Therapeutics

  • bioreactors can be used to produce allogeneic cell therapies (patient-derived) like CD34+ hematopoetic stem cells, RBC “farming” or immune cells
  • traditional cell therapy platforms require high OPEX and high CAPEX to enable scale-out (rather than scale-up) – but they’re relatively technically simple
  • a 1000L bioreactor might make 200B cells per batch (vs 200L -> 40B) and you’d expect to run 200 batches per year on these (Schirmaier et al 2014)
  • microcarriers: microscopic beads suspended in a stirred tank – cells grow on these biocompatible beads
  • cannot compare RPM across different vessel volumes, but can compare power input (energy dissipation or by impeller tip speed)

Pascal Beauchesne, PhD, Juno Therapeutics

  • there’s been an evolution of sorts from from small molecule drugs to biologics to cell therapies – they’re very selective and can also self-distribute. Doses can be auto-regulated in very interesting ways (you can signal that the cells should die, for instance)
  • there are three approved autologous cell therapy-based products: Carticel (cartilage repair), Provenge (prostate cancer), LaViv (scarring, no longer in production?)
  • he works on adoptive T Cell therapy like CAR (a single chain with variable fragments that can bind to a cancerous cell and activate t-cells) and high affinity t-cell receptors (genetically modified cells with a very high affinity to peptides on cancerous cells)
  • the process: leukapheresis (cell collection), cells are shipped to a facility for washing and activation, gene transfer occurs to genetically engineer the cells, cells are expanded and then infused back to the patient
  • cell expansion must maintain phenotypes and should probably be done in single use culture vessels
  • small scale work will give “some insight” into critical process params: cell health, populations (composition) and functionality (potency) – these are critical quality attributes (CQAs)
  • scaleout has to happen with bioreators controlled from a server: centralized recipes, alarm monitoring, user permissions, run log archive, CFR 21 Part 11 (FDA regs on electronic systems and signatures)
  • culture systems: bags, expandable gabs, G-REX, rocking motion bioreactors, hollow-fiber bioreactors – here is the G-REX system in action:
  • bags may be made from semi-permeable materials: FEP (fluorinated ethylene propylene), polyolefin, EVO copolymers
  • G-REX can supposedly hit 20B+ cells/mL on large surface area systems despite having no online measurements (volume?) versus rocking motion bags: 15M cells/mL
  • some systems need a shear-protecting agent added to the culture
  • hollow fiber can do 100M cells/mL and have online measurements
  • single use consumable designs, if they have open manipulations require ISO 5/ Class 100 BSC – (aka Class II Type A, I think)
    • also need DEHP-free PVC for sterile welding
    • closed sampling systems: bi-directional reusable ports are not considered closed
    • for leak-proof connections bonded > barbed > luer lock, also need to pressure-test
    • single use components need to provide a characterization of their extractables and leachables
  • “more online measurements would be desirable to further automate automated bioreactor control strategies” – especially wrt feed strategies
  • the quantum terumo BCT product is interesting..

⊃ controlling stepper motors with pwm

I wrote a little instructable on stepper motors, check it out here!

⊃ influxdb

notes on influxdb..

  • config files must have the sections uncommented as well – sections like [admin]
  • there are “databases” which contain “measurements” – these are like tables, I think
  • you can further create fields and tags which create some other level of division
  • if you try to add a duplicate entry it is just ignored
  • data can be injected quickly enough via bulk operations and the python client: 34s for 100k records added to a remote instance, 14s for 100k records added to the local instance, (the instance being a 2.20GHz vCPU with 3.75GB RAM)
  • storage efficiency: I think I’m seeing 5MB for 200k simple records (25B / record)
  • auth pretty easy to setup..see the docs

other products

  • telegraf: collects data
  • chronograf: graphs things
  • kapacitor: creates alerts

⊃ raspberry pi

various Π-related snippets..

Setup an SD card with jessie lite

Follow these instructions.

Fix keyboard issues, like " showing up as @

  • start with sudo raspi-config
  • go to internationalization options > change keyboard layout
  • select “generic 101 key PC”
  • go to other then select english
  • it’ll then ask you about special keys – I leave those alone

Setup wifi

Add the following to /etc/wpa_supplicant/wpa_supplicant.conf:

network={
  ssid="wifiwithmylittleeye"
  psk="somethingred"
}

and then reboot.

Live stream with a raspberry pi camera and youtube

Install some prereqs:

$ apt-get install libx264.

Get ffmpeg source and build it:

$ git clone git://source.ffmpeg.org/ffmpeg.git --depth=1
$ cd ffmpeg
$ ./configure --enable-gpl --enable-nonfree --enable-libx264
$ make -j$(nproc)
$ sudo make install

Could be more clever with ./configure and disable many unused filters, protocols, etc, as per this answer on SO.

I used the command from this gist to stream (more about the command on this blogpost). More streaming notes on this gist.

⊃ lampiphy

Me and Esmond and Pal worked on a conceptual diagnostic device for use in rural settings.

⊃ 2016 election

notes on my 2016 vote:

questions after the election

  • what will Trump’s first hundred days look like?
    • notes from Osnos
    • people with him: Christie, William Palatucci, Roger Stone, Gingrich, Stephen Miller, Stephen Moore, Dan DiMicco, Peter Navarro – though it should be noted Trump has historically stated and shown that he values no one’s advice but his own.. Other transition team members (and their associations) listed here.
    • as Obama relied heavily on executive order, Trump could undo many accords in about a year, like the Paris GHG Agreement, Keystone pipeline exploration, Syrian refugee program, the raising of fuel standards, the banning of energy exploration in AK and the Arctic – Trump could even loosen background checks for gun purchases. He will also have a drone killing program at his disposal, as well as wide leeway to conduct military operations in the MidEast and Africa.
    • he’s promised to begin, in the first hour of his presidency, the deportation of anyone who has entered the country illegaly – to me this is the scariest prospect. It would require an estimated police force of 90k+ to go door-to-door, an entity that cannot exist. He wants 11.3 million undocumented people deported in 2yrs – 15k arrests per day, 20x the current deportation pace. It would be modeled on Eisenhowers “Operation Wetback” during which many citizens were mistakenly deported, and many immigrants died during the actual deportation. To me this is the line that can’t be crossed.
    • a political scientist is quoted as saying that politics will constrain Trump, because the pres-elect finds everything to be negotiable.. I’m wary of these charitable views
    • analysis of past presidencies shows that candidates accomplish about 70% of their campaign promises
    • Roger Stone, an adviser, claims Trump will pursue his most radical ideas, using the Muslim ban as an example
    • his family business will remain intact and is expected to be controlled by his sons – the president is not required to relinquish control of business ventures despite possible conflicts of interest
    • he will be encouraged to go after federal employee tenure (I didn’t know this existed – seems like it’s achieved after four years) with the hope that public employee unions similarly unravel
    • he would need Congress to needed to repeal Obamacare and cut taxes, and Dems in the Senate can still filibuster with their numbers
    • unilaterally he can renegotiate the Iran nuclear deal, ban Muslims and order the JD to pursue certain cases more aggressively
    • but civil servants do not have to follow unlawful orders.. some hope there for some of the egregious torture-related things Trump has said
    • he has said he will not unilaterally back other NATO countries
    • Gingrich believes Trump will have to build some sort of wall or fence.. it’s projected to cost $25B+ over four years and the money would have to come from Congress
    • Gingrich also favors the recreation of the House Un-American Activities Committee..
    • Trump will chase supply-side economics and be able to withdraw from TPP, NAFTA and even the WTO. He could also impose tariffs on certain goods from China, sparking a trade war. But if that comes to pass, it’s fine because “if the economy crashed you could make a deal,” he’s said.
  • what next steps can I personally take to protect vulnerable groups?

notes for much later

  • even until late on Tuesday, it was a foregone conclusion that Clinton would win, at least among the polls and analysts that I read regularly. Only the Trump campaign and his media apparatus claimed the polls were wrong or perhaps rigged. Stories have come out that say even his campaign’s internal polling was off.
  • she is winning the popular vote, even before all of CA is counted
  • she lost 70+ EVs by <200k votes combined
  • turnout seems low – only ~120M votes cast with >220M eligible voters in the country
  • some data suggests over 200 counties that Obama won were lost by Clinton, primarily in PA and WI.
  • Trump received fewer votes than Romney in many states
  • the above two points are being used to argue that it’s not about the racism of the white working class, but instead about the failure of Dems to turnout. And the failure of turnout is due to poor tactical choices and an unlikeable candidate. I don’t see it though – people still voted for a racist and a misogynist, that can’t be overlooked.

pre Nov-8

  • I voted Hillary for her policies, CA AG Kamala Harris for her pro-environment stances (though her opponent dabbed at the debate, haha), the incumbent, Anna Eshoo for her track record and policies, Jerry Hill for his record on the environment, Vicki Veenker for all the endorsements she earned from env groups ( :/ ), Grace Mah for her experience, Lucas Ramirez, Thida Cornes, Chris Clark and Margaret Abe-Koga based on their endorsements from the Santa Clara League of Conservation Voters
  • state propositions.. thank god for ballot.fyi I voted yes on:
    • 52 (Medi-Cal hospital fee disbursements),
    • 58 (allows local groups decide how to teach English in their communities)
    • 59 (advise the legislature to attempt to overturn the Citizens United decision)
    • 62 (repeal the death penalty)
    • 63 (stricter regulations around ammunition sales)
    • 64 (legalizing pot)
    • 65 (plastic bag fees go to an env fund)
    • 67 (statewide ban on plastic bags)
  • I voted no on:
    • 51 ($9B bond to upgrade schools)
    • 53 (statewide voters have to approve revenue bonds > $2B)
    • 54 (legislative bills frozen for 72hrs)
    • 55 (extending tax increase on highest earners)
    • 56 (increased cigarette tax)
    • 57 (parole changes for non-violent offenders)
    • 60 (requiring adult film performers to wear condoms)
    • 61 (state agencies will attempt to pay what the VA pays for prescription drugs)
    • 66 (changes death penalty procedures – would’ve preempted my ‘yes’ vote on 62)
  • local stuff:
    • yes on A (affordable housing bond), B (sales tax increase for transit improvements) and W (rent stabilization arbitration), but no on V (rent stabilization amendment – creates a weird, unelected parallel council)

old notes on Bernie vs Hillary:

California votes on June 7 along with a handful of other states – only DC votes later..lame.

on the right side of things..

⊃ Sonoma

Nora and I went to Sonoma! We saw Iron and Wine play at the GunBun winery and had a nice time touring the Peter Cellars and Ravenswood wineries.

Peter, of Peter Cellars, checking out the crush. He had award winning Pinots but we liked his Petit Syrah a lot.

peter and the crush

A view of his vineyards.

peter cellars vineyard peter cellars grapes

We were getting a distinct Pink Panther vibe from the this winery (:

peter cellars syrah

Sam Beam took requests, haha, that was fun.

Sam Beam at GunBun

Avocado toast from the tasty Sunflower Cafe in the Sonoma town square:

avocado toast

The view from the Ravenswood winery, a stop for many cyclists!

the view from ravenswood

⊃ AVC 2016

Me and some friends worked on a car for Sparkfun’s Autonomous Vehicle Competition. The race has your car make one (just one!) lap around a haybale-bordered track in Sparkfun HQ’s parking lot. There are dirt sections, hoops to go through, barrels to dodge, jumps, zigzags and the famous “discombobulator” – a spinning disc that most competitors just try to jump if they approach it at all.

Our entry, “Neural Carputer,” used an eight layer neural network with four convolution and four fully-connected layers. It was an end-to-end system – it took in camera and odometer data and output steering and throttle commands. We took about an hour of training data – I just manually drove the car through the course during the practice time before race day. In autonomous mode, Carputer would eventually make one perfect run around the course and have many more less-than-perfect runs too (:

carputer at the starting line

the hardware

Carputer has raced in three AVCs now and the hardware’s evolved over the years.. one arduino takes RC input data, another arduino sends servo commands, and in between.. a Macbook Pro retina with an NVIDIA GPU running our control software, haha. The Macbook is definitely overkill but it lets us take data, train, update and test very quickly.

carputer hardware

The chassis is an RC car that we exteneded to accomodate the laptop. Our camera is mounted into the plastic body of the car. We use a 3S lipo but stay far away from the car’s 60mph+ max speed. There was once a lidar in the mix but we couldn’t quite get that working the way we wanted.

the software

We’ll post it soon – it uses tensorflow and a bunch of pre- and post-processing scripts.

our autonomous runs on the Sparkfun course

Our first test run in the early morning light was amazing – it completed a full lap, only lightly grazing a barrier on the backstretch. Here’s what the car sees, in jawdropping 160 x 120 resolution. We’ve overlayed the steering and throttle settings, as well as the odo measurements.

We had trained that particular model during the drive to the track after applying two new ideas the night before. We were really pumped about the car’s morning performance but, sadly, this would be our best run of the day..

We competed in three official heats: the first heat had our car roll forward with great promise for about 3m and then promptly hit the brakes. It sat there stubbornly until the judges made us clear the track.

Our logs showed that the remote override had been activated – something we usually employ to prevent really spectacular crashes. I had been holding the RC transmitter, but I didn’t think I had hit our e-stop button. We later saw this happen during other test runs and attributed it to RF voodoo magic – there were a lot of other transmitters in the vicinity. After commenting out our remote kill switch code, we got ready for the next races..

Our second heat was the most exciting run I saw from any car – Carputer fought through several haybales, hellbent on getting through the dirt, and then it took an unplanned turn towards the discombobulator..

here’s the audience’s perspective:

As usual, the discombobulator won (: We nearly got very lucky and had the car right itself and point in the right direction – it might have done ok if it had tipped a bit more. Or if Otavio had danced a bit harder, haha.

Our final official race was quite anticlimatic – we had added a throttle hack to try and slow down just for the first turn.. that, a new untested model, and some camera white balance issues left us driving uncertainly towards an SFE videographer and then into a haybale.

Hm, it also looks like our steering trim is way off after crashing so much during the day – we’re sending “89” to the steering (just one away from neutral) but we’re drifting way left. Compare that to the 89 we send in the first video..

We also had the car race for fun in the multi-car rumble – it gets a lot of human assistance on those laps (: I’m sure Sparkfun will do an AVC wrapup video sometime soon, hopefully Carputer makes it in there. You can see us at 52:42 and other various points in the SFE livestream

various tricks that helped us this year

We “class balanced” our network by tossing 70% of training data that just had the car going straight-ish – this allowed the network to learn more from turns. It helped make up for the fact that most of the driving is on straightaways.

Our odometer wasn’t the fanciest, and we only had one of them, so we were worried about drift. To counteract this we binned our odo data to effectively divide the track into 64 segments. Our hope was that the car would use this info about its rough position on the track to augment its perception of certain images.

next year

  • learn about how to avoid RC interference
  • figure out white balancing issues
  • trim the steering more often
  • reattach the lidar and get that working (:

other racers

This fellow hand-built his hall-effect odometer system with alternating neodinium magnets on the inside of each wheel.. amazing.

handbuilt hall effect odometer

“Two Potatoe” was cleaning up out there – it completed all three of its runs at a respectable pace with no fuss.

Two Potatoe hoverboard

This guy was up late with us during the practice session, making a map in the dark with his scanning lidar.

lidar mapper

I had a great time, hopefully SFE hosts it again next year! Or maybe we can make a CA series happen..

⊃ soft robots

Inspired by this instructable on a soft universal gripper and some articles sent by my friends Zina and Ben, I’m reading about soft robotics.

⊃ Milk Pail shelf

I’m working on a design for a shelf at my favorite store, Milk Pail.

Some inspiration: one, two, three and four.

Mortise and tenons would be nice – Mattias also has some lightweight shelving ideas.

⊃ trident remote jailbreak

Reading Citizen Lab’s report on the attempted remote jailbreaking of the UAE dissident, Ahmed Mansoor.

  • three zero days were attempted on Mansoor’s stock iPhone 6 – they seem to be the first remote jailbreak attempt seen in the wild as part of a targeted attack
  • note that Zerodium paid $1M for a similar exploit chain in Nov 2015
  • Mansoor is on the HRW advisory committee and has been previously jailed and harrassed for supporting a democracy petition. He’s been targeted with FinFisher spyware in 2011 and Hacking Team spyware in 2012 and was suspicious of the texts he received in Aug 2016.
  • the “Trident” attack is an exploit for WebKit which allows the initial shellcode to be executed, a Kernel Address Space Layout Randomization bypass to find the kernel’s base address and jailbreak itself, allowing execution of code in the kernel and install additional software. The additional software appears to be the spyware package from NSO group for monitoring the microphone, camera, GPS, messages, calls and other apps.

⊃ OTR and kLa

I’m reading this PDF from UGA on oxygen mass transfer..some notes:

  • like temperature changes, the rate at which a gas dissolves in a liquid is proportional to the difference between the equilibrium concentration and the present concentration – when the liquid has no dissolved gas, the dissolution rate is at a max, and when the liquid is saturated, the rate is zero
  • a liquid is “saturated” when it reaches this equilibrium concentration
  • an equation defining the mass transfer coefficient: F = kL * ([O2]s - [O2]) where F is the molar oxygen flux, [O2]s is the concentration of O2 at saturation, and the present concentration of O2 is given by [O2]
    • mass transfer coefficients are labeled K and kL is the liquid phase mass transfer coefficient
    • the liquid phase is more relevant for O2 transfer as oxygen isn’t very soluble in water
  • we can also describe the rate of change of O2 concentration: d(V*[O2]) / dt = F*A where V is the system volume and A is the surface area available for transfer
    • then, since the system volume doesn’t change, we can call A / V just a, the specific exchange surface, and we get: d[O2] / dt = kLa * ([O2]s - [O2])
    • we can think of d[O2] / dt as the oxygen transfer rate (OTR) – units of mg / L / h
  • the value of a is hard to nail down in a bioreactor, so the kLa expression is often considered altogether – it’s called the volumetric oxygen transfer coefficient (units of 1 / h).
    • some call this the overall mass transfer coefficient which is ..kinda correct
    • it’s related to fluid properties, the impeller, vessel geometry, agitation and gas flow
    • to study scale up this relationship is used, given a set geometry, impellers and fluid properties: kLa = alpha * (P / V) ^ beta * (U) ^ sigma, where P / V is the power per volume ratio and U is the superficial gas velocity (units of m / h)
    • often a small bioreactor has a higher P / V and lower U compared to large scale systems
  • you can use a similar equation to calculate the critical oxygen concentration for a given OUR (uptake rate)
    • but note that limiting growth by reducing O2 or some other nutrient can be smart – metabolism is more “controlled” (meaning, I think, that you’ll produce your target protein) and also produces less heat
  • good notes here on actually measuring kLa..

more resources

⊃ aseptic technique

Lots of great videos out there on aseptic handling, woo!

aseptic handling during cell culture

  • things are generally sprayed down with 70% ethanol – I guess this evaporates super fast..seems like it would also contaminate your equipment in some sense
  • but what about the stool in the video??
  • try to place caps inside facing down and try not to pass your hands over top of stuff
  • things are opened out but only under the hood – it’s downdraft actually, never knew that..
  • they’re dropping bio waste into a little container with a disinfectant, virkon

cleanliness via fire

  • everything sterilized on a bunsen burner (reminded me of a lab someone told me about that just kept a bunsen burner running near the labwork to keep things clean..)

notes from thermo

  • recommend leaving the hood on unless you’re not going to use it for an extended time
  • can use UV light to sterilize surfaces and air inside the hood when it’s not in use
  • open and close things quickly – they also recommend you not talk, sing or whistle, even though the hood is presumably down :(

⊃ CNC at pier 9

Going through the CNC classes at Pier 9.

class two - CAM

  • for the full walkthrough see the instructable
  • go to CAM and setup – set WCS in back corner by just clicking
  • also note that the machining happens in inches, not mm
  • set stock size (Al typically comes in as true 2x4 bars) – prolly want to leave 0.1in over after cutting on horizontal bandsaw. We’ll eventually tell the machine the actual dimensions
  • set it 50 thou from top so we can face it – excess on bottom will be wasted too but we can hold it down there

toolpaths

  • start with a facing 2D toolpath
  • generally pick shorter and bigger tools and simulate – you’ll see what detail needs to be redone
  • when doing pockets, it’s better to choose a contour over a face, more precise
  • optimal engagement is basically the stepover – use the Pier9 formula, at 1” tool diam the engagement is only 5% (not the 40% default)
  • we’ll machine at 5% rapid so we have time to stop things – so move passes will be slower
  • then we’ll do a pocket for smoothing to clean up what’s left
  • spot drill before drilling holes – using a 12” chamfer for 6mm holes, can just click the chamfers if your holes have them
  • when using a drill from some non-pier9 lib, you have to recalc the feeds and speeds using a chart – also change the name (starting from 1) and set coolant by editing the tool
  • make sure to watch the hole depth – prolly want to blow through a little bit to remove burs
  • if depth to diameter ratio is less than 3, you’re clear to do regular drilling with rapid out, otherwise you need deep drilling full retract (peck drilling) to allow chips to clear

class three – more CAM

  • remember to use the chamfer mill for spot drilling – 131 or 101
  • drill first, then counterbore to take advantage of the pre-existing hole – make sure you have a helical path when boring too, otherwise you’re just plunging with an endmill
  • simulation isn’t actually relevant for speeds, just motion
  • can change stepover by checking “multiple heights” – useful if you’re not doing adaptive and worry about breaking an endmill
  • nice edit notes feature (right click on the setup) – these end up in the machining notes (right click on setup and then “setup sheet”)
  • right click on house -> set current view as home – nice
  • post simulation, you can right click on the stock and “save stock,” useful if you were going to make some other press fit and you need to know just what you machined
  • remember to check your “minimum part stick out” – this is on the setup sheet as min z
  • and after a few more classes, I made the test part on the Haas:

haas test part

video series

  • the Haas at Pier 9 is 30 HP, 12000 RPM, 1400 IPM and can hold 24 tools

waterjet

  • I also did some waterjet training, we made some little logo cutouts:

waterjet-test-part

misc

⊃ bike tag

I’d like to make a new tag for when my bike’s on the train. Here are some basic renders..

I’m thinking one piece will be mounted to the seatpost and another to-be-rendered part will insert into this and indicate where I’m off to:

signpost b

maybe the insert could have a little magnet at the end to keep things in place..

signpost c

signpost a

I like this concept but the blue / yellow interface isn’t quite right..

morphous mount

leveling that mount and adding a bunch of fillets..

morphous mount combined

⊃ deciding what to prototype

how to decide what to prototype.. hm, there are many approaches:

  • the senic blog suggests you test even basic assumptions, and not make anything too high fidelity – you want to give room for people to discuss their ideas.
    • also use easy to source / easy to use parts and tools.
    • each proto should have a hypothesis / goal
  • GV folks remind you to list all your assumptions, look for conflict in the ideas as they suggest choices and to not make the design democratic
  • some YC alums recommend a “looks like” high fidelity long term mockup to go along with a “works like” black box that isn’t perfect but accomplishes something
  • Ben of Bolt reminds you to build..one physical model a week
  • other reflections:
    • seems wise to list the objectives of the prototype
    • once (on Altman’s blog maybe?) I read you should look to test the riskiest bit of your thesis
    • and consider timelines – what will it mean for testing and validation if you go down a certain route

⊃ design of experiments

⊃ a five day fed batch protocol

interesting tidbits from a very detailed fed-batch protocol:

  • the media is autoclaved in the vessel
  • at the end of the batch phase, just before you start feeding for the first time, your cells have used up all the starter glucose – this is about 16hrs after inoculation in this protocol
    • the stirrer speed will decrease and DO might increase
    • they’re guessing that other metabolites (made earlier in the process) are now being consumed so the growth will flatline but why does the solution get more dense..?
    • they also note that you can add several drops of 70% glucose to their 850mL solution and immediately you’ll see the stirrer speed increase and DO decrease.. they’ll just as quickly drop (within a few seconds)
  • they induce protein expression with the addition of IPTG
  • their feed solution is quite similar to the media
  • broth is drained into two 600mL centrifuge bottles via the manual sampler port and gravity
  • they autoclave and reuse the vessel, tubing and filters – though sometimes the tubing is too dirty and have to be tossed
  • seemingly everything is autoclaved and then rinsed / scrubbed, interesting..
  • they use nonlinear feeding profiles

⊃ DO probes

learning next about sensors for dissolved oxygen..

YSI on DO basics

  • developed first portable DO probe in 1963 – a membrane-covered Clark Polarographic sensor
  • optical sensors are also available (may also be called luminescent DO or rugged DO sensors) – still uses Clark’s membrane-covered system

OTTHydromet video on Clark cell calibration

  • probes have two electrodes surrounded by an electrolyte solution and covered by an oxygen-permeable membrane
  • O2 crossing the membrane is consumed in a chemical reaction and causes a change in current flow between the electrodes
  • since O2 is consumed, the water around the sensor must be replaced by natural flow or a circulator

YSI series on DO measurements

  • temp affects O2 membrane diffusion in electrochemical probes and optical sensors
  • temp also affects O2 solubility in water – mg/L (ppm) values will change even if the saturation value is 100%
    • colder water can absorb more O2
    • so it seems probes typically measure O2 saturation and convert to mg/L values (analytically)
    • temp compensation for saturation is empirical
  • salinity also affects the conversion of saturation to mg/L – that is, increasing salinity decreases water’s ability to dissolve oxygen
    • need builtin conductivity sensing or manually-entered data
  • barometric pressure also matters – typically just calibrate at the test pressure or you set the known pressure of oxygen and follow a linear relationship for changes
  • response times improve with increased solution flow – even for optical sensors which don’t consume any O2

YSI on membrane comparisons

  • polarographic and galvanic systems both use gas-permeable membranes – typically they stretch over the probe and are secured by an O-ring or come pre-attached to a ring from the factory
  • nice table comparing material type and thickness to response time and flow dependence
  • some sensors require specific membranes and you have to tell the sensor which membrane is in place, of course

YSI’s DO handbook

  • optical probes work because O2 quenches the lifetime and intensity of the luminescence of certain dyes – this is described by the Stern-Volmer equation which is linear at low concentrations
    • intensity-based probes require more frequent calibration
    • YSI dyes are immobilized polystyrene
    • there’s an oxygen-permeable diffusion layer – these membranes are teflon or polyethylene – the latter is newer and has faster response
    • red light emitted as reference, blue as measurement
    • there is a temp dependence corrected on YSI probes
    • similar accuracy to electrochemical sensors – better in the 0-20mg/L range
  • steady-state polarographic sensors
    • gold cathode, silver anode
    • polarized at 0.8V constantly
    • requires 5-15min warm up before reading or calibration – 1yr warranty is typical
  • steady-state galvanic
    • silver cathode, zinc (or lead) anode
    • difference in electrodes self polarizes (no externally-applied potential needed) and reduces O2 molecules (similar to a battery)
    • instant-on, but contstantly consuming the anode (6-mo warranty is typical)
  • response time
    • optical T95 can be 40s (!) but they’ve done internal studies to show that stirring can half that time – note the T95 test is pretty worse case: going from 100% to 0% saturation and timing until the sensor reads 5%
    • galvanic or polarographic can be as low as 8s with the right membrane
  • calibration hold
    • optical can hold for “many months”
    • electrochemical sensors used in spot-sampling need daily calibration
  • cleaning
    • some sensors equipped with automatic wipers to prevent bio-fouling – this was called out in a 52 day test

Atlas Scientific DO App Note

  • note that temp, pressure and salinity changes only affect saturation percentage, not, to a point, the true mg / L readings
    • cooler water, headspace with higher atmospheric pressure and less saline water can all hold more O2
  • O2 saturation can be above 100% as the heated water tries to reach an equilibrium and lose oxygen

⊃ pH probes

learning about pH probes..

a brief PDF on glass pH probes (2004)

  • the alkali metals in the thin (0.1mm) glass membrane undergoes ion exchange with the surrounding solution
  • typically have a combined concentric design – the inner vessel has one electrode surrounded by a constant buffer, the outer ring has a reference electrode exposed to the solution
  • I think you need temp compensation..but maybe having the ref takes care of that?
  • two buffer calibration sets the linear relationship
  • also need high impedance amplifier electronics (as the glass is very high impedance)
  • should only report pH to the hundreths as the calibration solutions don’t get any better than that
  • also notes practical reproducibility is +/- 0.05 units
  • alkali metals (Na+ or K+) can interfere – a special lithium glass can be used to mitigate this
  • proteins can absorb to the surface to interfere with measurements
  • no longer do probes need to remain wet and hydrated for proper measurement

Hach’s PDF on pH

  • glass electrodes since ion activity, and a solution’s ionic strength (its presence of other ions) can affect measurements as well – the activity coefficient is (strangely) affected by ionic strength
  • Nernst equation describes the workings of pH-sensitive electrodes – for hydrogen ions you’ll see about 60mV changes per pH unit at 25C
  • temp effects: caused by change in glass bulb resistance and dissociation changes in the solution
  • you’ll see about 0.003 pH error / pH unit / deg C in temp uncompensated systems
  • pH probe will pass a small amount of current between the reference and measuring electrode, so the meter must have a high internal impedance. (Quora has good notes on impedance here.)

review from the All About Circuits site

  • probes allow H+ ions to migrate through a selective barrier, producing an electrical potential
    • the measurement probe is made of a special glass to produce this selective barrier – it’s doped with lithium ions which will react to H+ ions
    • the reference probe has a neutral buffer that exchanges ions with the process solution through a porous separator, making a relatively low resistance connection to the liquid
    • (a plain metal wire cannot be used as it may react to the solution and produce its own voltage)
  • good explanation of why you need a high impedance voltmeter:
    • the reference electrode might have an impedance of a few kilo-ohms, but the measurement electrode (being glass) might be at hundreds of mega-ohms
    • so if the voltmeter’s resistance is low, the current passing through the other components will create a large voltage drop on those components, and not much of one on the voltmeter
    • so the voltmeter needs an impedance of something like 10^17 ohms
    • or a “null balance” voltage measurement setup can be used where a precision variable voltage source is adjusted until it reads zero – i.e., there is no current flowing in the circuit. A voltmeter can then be connected in parallel
  • notes that glass probes are subject to fouling
  • temp compensation is needed to adjust the pH/mV response of the measurement probe

PDF from Emerson

  • double / triple junctions and gelled reference solutions are attempts to prevent poisoning
  • cleaning solutions may enter the reference liquid junction.. they’ll persist until they diffuse out – a new “hydrolysis reference junction” (called TUpH by Emerson) has smaller pores and may prevent fouling
  • some sensors can sound an alarm if the reference impedance grows too large (indicating something has fouled that component and cleaning is needed)
  • online ultrasonic cleaning ineffective, jet sprays found to be better
  • ideal slope is -59.16mV/pH unit, but in practice a new electrode may be -57 - -58mV/pH unit – calibrating and getting this slope shows the age of the probe, the zero value shows the degree of reference poisoning that has occurred

Mettler Toledo on glass junctions

  • ceramic: the typical porous junction that allows the ref electrolyte to diffuse out – fouled by proteins (which precipitate when they contact the KCl ref electrolyte) and other suspended solids
  • PTFE annular diaphragm: increases surface exposed to the medium to preven fouling
  • open junction: the ref electrolyte is completely open (only possible with a solid polymer ref electrolyte) – cannot clog but has slow reaction time
  • dual-membrane sans junction: for specific “chlor-alkali” processes, has builtin Na ref glass

ISFET probes

  • “hydrogen ion-selective field effect transistor” – some notes from this page
    • a gate connects two electrodes, the source and drain – all three are embedded in silicon, the gate is in direct contact with the solution and is sensitive to H+ ions
    • a ref electrode in the solution is still needed and operates in the same manner as in other glass probes
    • will produce a high current, low impedance output (in contrast to glass)
    • sensing area can be cleaned with a toothbrush (in fact, I saw this recommended by one vendor)
    • can be stored dry – have fast response times and fast measurements
    • light-sensitive
    • prone to drift and may be less accurate than glass (though a commenter disagrees)
  • notes from wikipedia
    • still limited by ref probe so some systems are trying to use a FET for the ref as well (REFET) – needs frequent calibration though
    • ref may leak?? “KCl leak” or “glycerol leak” – hadn’t heard about that!
  • Schmid on ISFET operation and applications (PDF)
    • ISFET tech can be altered to sense other things like enzymes or DNA
    • this PDF also captures some sensor possibilities

YSI on cleaning electrodes

  • single rods and ref electrodes: store in solution that’s the same as the ref
  • glass electrodes: DI water
  • organic adhesions: use an organic solvent
  • inorganic adhesions: use a slightly acidic or basic solution and heat if necessary
  • clean ref electrode with the ref solution – heat to break up KCl crystals

YSI on electrolytes

  • typical for a ref electrode to leak into solution (!?)
  • 3M KCl is a typical choice, may have a gelling agent
  • faster response times with liquid electrolytes
    • refillable electrodes last longer – have a little port with a closable window
    • more resistant to temp and temp changes
    • susceptible to diffusion potential issues in strongly basic or acidic solutions as the outflow rate will be higher compared to gels / polymers
  • gels
    • less likely to see silver precipitation at the junction
    • less likely to have ions from the solution invade the reference

YSI on ref junctions

  • aka the diaphragm
  • need low resistance and an inert material, but some level of ion permeability
  • nice table of junction types, outflow rates (0 - 3mL / day)
  • bizarrely, they’re saying that the reference electrodes that have greater outflow rates are easier to clean – I would’ve thought the opposite, that solution could more easily infiltrate and contaminate the ref
  • double and triple junctions exist to reduce contamination
  • nice product breakdown
  • to clean after protein exposure you can immerse in a pepsin cleaning solution (PEP/pH) for 1hr

1991 intraluminal ISFET study

  • identical curves over 24hrs in vitro between glass and ISFET probes
    • same result over 21hrs in vivo
  • some notes on response times too

Endress Hauser pH basics video

  • remember the Ag / AgCl component is the wire – the electrolyte is typcically KCl
  • H+ ions penetrate the gel layer of the glass membrane – the same happens on the inside, on the ref
  • refs are just a stable 0V potential..but they still need electrical connectivity to the measuring probe

MT video

  • nice diagram of the ref system and measurement probe

ph probe diagram

  • good notes on cleaning, calibration, junction types and temp compensation
  • points out that solutions themselves will have temp dependence – 0.001 mol/L NaOH will have a pH of 11.17 at 20C (as opposed to the 11.00 you might expect), and the pH meter cannot compensate for that

Pulse Instruments on ISFET

  • pH electrodes are very high impedance so, to minimize electrical noise, the cable has to be shielded and the input stage of the meter has to be very close to the electrode
  • ISFET combines the pH sensitive membrane and the voltmeter’s FET into one device
  • sensing areas can be quite small with this method: 0.5 square mm for 30uL samples

⊃ sterilizing instruments

reading about various ways to clean probes and instruments – wikipedia has a great review, as does Finesse.

autoclaving

  • 15-20 min of steam at 121 C, or 134 C for 3-4 min
  • performed at high pressure: 100 kPa (which is 1 atm, basically)
  • longer cycles for prions
  • items have to be physically cleaned beforehand

dry heat

  • 160 C for 2 hrs or 190 C for 6 min

flaming, incineration, tyndalization, heated beads

  • heating, burning or boiling water..

ethylene oxide

  • most common method – used for ~70% of all sterilizations and over 50% of disposable medical devices
  • gas concentrations of 200 - 800 mg/L
  • 30-60 C with RH above 30% (sometimes 100%)
  • process takes 2-3 hours of exposure and then 12hrs more to vent
  • kills bacteria, spores, viruses, yeast and other fungi – also very hazardous to humans
  • subs in for H atoms on molecules, disrupting many chemical processes
  • explosive at concentrations above 3% (so it’s often diluted)
  • very strict OSHA reqs around EtO’s use – if you can smell it, you’ve had a hazardous dose
  • it can permeate some plastics so you can actually bag your equipment before sterilizing it
  • they typically use bio indicators to test the sterilization’s efficacy on weirdly shaped equipment – it can kill to a 10^-6 level

nitrogen dioxide

  • applied at room temp and ambient pressure and “very low concentrations” of the gas
  • inactivates spores, bacteria and viruses
  • boils at 21 C so people often use liquid NO2 – often called dinitrogen tetroxide (the dimer)
  • less corrosive, no condensation (so the aeration phase is limited)
  • still seems like a new technique though, without much commercially-available equipment

ozone

  • for cleaning surfaces, water and air
  • has to be made on-site, typically in the sterilizer itself from medical-grade O2
  • easy to remove waste ozone via a catalyst
  • dangerous at 5ppm ..way lower than EtO

glutaraldehyde and formaldehyde

  • 22 hrs+ of immersion
  • short shelf life once opened (<2 weeks) and expensive
  • volatile and toxic to skin contact and inhalation

hydrogen peroxide

  • sometimes used in vaporized form – liquid purities are at 35 - 90%
  • cycle times as short as 30 min
  • listed as dangerous at 75ppm (a tenth of EtO)

peracetic acid

  • 0.2% purity used as a liquid sterilant

non-ionizing radiation (UV)

  • 250-260 nm light breaks molecular bonds in a microorganisms DNA
  • useful for surfaces (like glass vessels)
  • may damage plastics

ionizing radiation

  • gamma, x-ray, electron beam – some more penetrating than others
  • trivia: meat and DC mail gets irradiated, as does feathers in blankets
  • gamma uses cobalt-60 or iridium-192 – it can penetrate 80cm
  • gamma radiation can initiate cross-linking in some polymers but isn’t strong enough to make a material radioactive
  • probability of gamma absorption is related to a material’s cross section, density and thickness – when a gamma ray strikes an atom, an electron may be ejected into the atom, this is the photoelectric effect and is the typical mode of interaction at energies < 10keV. Gamma facilities actually use systems around 1MeV where pair production (ionization cascades) effects begin
  • nice info on gamma in this pdf
    • what actually damages cells isn’t firmly known: could be genetic damage, damage to the cell wall, release of free radicals like OH-
    • it’s estimated that one gray of radiation induces 1k single strand breaks, 40 double strand breaks, 150 cross links between DNA and protein and 250 oxidations of thymine
    • (one gray is one joule / kg – equal to 100 rad)
    • nice info on D10 values – dosage required for log reduction, typically around 0.5kGy
    • will typically use around 25kGy though to exceed a 10^-6 sterility assurance level
    • some spores have a D10 as high as 6.8 (Clostridium botulinum)

clean-in-place and steam-in-place in general

  • lots of validation required to prove that chemicals have been flushed
  • Finesse says the overall sterilization cost is comparable between EtO, gamma and traditional steam-in-place methods, but single use methods require less labor

validation

  • this webinar had a note about validating with sterile soybean casein digest media – incubate for 14days with the little challenge tabs, if anything grows that’s a failure

summary

  • seems like UV could work well for glass vessels
  • EtO could be used for sensitive probes
  • autoclave for sturdier probes
  • this is a nice review of chemical sterilants

⊃ genedrive

notes from Hammond et. al’s paper in Nature Biotech: A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae and the Wyss lab’s gene drive FAQ.

abstract

  • they found three genes that confer a recessive female-sterility phenotype in mosquitos, inserted them and found transmission rates of >90%

background

  • synthetic gene drive systems were proposed over ten years ago but were only recently made possible via CRISPR-Cas9
    • CRISPR draws inspiration from bacteria that protect themselves from viruses by hiding “pristine” copies of their DNA and then repairing viral alterations based on this template

⊃ the introspection engine

Bunnie Huang and Edward Snowden are collaborating on “the introspection engine,” a device that can connect to an iPhone6 and determine if the phone’s radios are active.

Their initial ideas are here.

⊃ Maine

I visited Maine recently with my family – we sought the elusive vacation triathlon: one day of hiking, biking and kayaking..

we hiked the beehive!

hiking the beehive

we biked through Acadia!

biking acadia

and we kayaked to a nearby island (and later to some mysterious structures in the cove..)

mom and kk kayaking

there was also kub

maine kub

and some good local food from the land

maine blueberries

..and the sea

maine lobstah

what a beautiful place.. can’t wait to go back!

⊃ Bolt on hardware startup mistakes and failures

Bolt recently published “Mistakes that Kill Hardware Startups” and “The Failure of Coin”.

mistakes

  • need multi MM of preorders to get institutional investors excited
  • focus the pitch on customer metrics (rather than revenue and margin)
  • CMs begrudgingly work with startups..you have to provide some value for them too, like giving them a chance to work in a new area
  • recommends not crowdfunding until you’ve nailed down product dev.. at least to the point that you understand BOM, COGS, margin, etc
  • don’t waste time and $35k on a patent before you have institutional funding
  • budget a marketer / sales person for your first raise
  • the time between seed and series A is the valley of death for hw startups..
    • hard to show growth and traction
    • it’s fundraising and manufacturing hell when you approach from 100 - 1k - 10k units built (haven’t yet economies of scale)
    • quality issues getting fixed, but still not selling that many units
    • forgot to focus on customer acq
    • additional recommendation: have a large lead investor in the seed round that can bridge

coin

  • combined all your credit cards into one custom card
  • assets recently acquired by fitbit.. existing products will work til the battery dies :(
  • tech risk: coin seemingly was able to overcome this, building largely what they said they would
  • product risk: failing to live up to expectations – this’ll be high if product failures are high stakes, low if failures are mere annoyances
  • I guess they’re saying coin’s failures were “high stakes,” because you may have left all your other credit cards at home
  • high risk: 1-2 failures over the lifetime of the product create a negative interaction, related to critical business ops (like a POS system), failure creates safety / security problem
  • your startup should take on high product risk or high tech risk
  • fellow in the comments recommends validating the idea with off the shelf systems as much as possible

⊃ heckbot

I made a bid recommendation script for the card game “Oh Hell.” I think it worked well! Or, I did a lot better with it than without it (:

In Oh Hell you’re given a hand of cards and asked to bid on the number of tricks you’ll win. You have to play the suit that’s led if you have it, and after the first round, trump suits come into play. Each round you’re given one less card in your hand.

So to play with some computational assistance, you tell heckbot the cards in your hand, the number of players, the trump card if there is one, and your position in relation to the dealer. Then heckbot plays your cards against a whole lot of randomly generated opponent hands. It plays very dumbly – essentially randomly except wrt following suit, and it keeps track of the number of tricks you win in each simulated round.

After the average number of tricks stabilizes, it tells you what that average is. This can take 10k+ rounds, or sometimes only a few hundred if your hand is particularly good or bad. Either way the operations are simple so this only takes a few seconds. And that’s it! It won’t tell you how to play, just what it thinks you should bid. Sometimes it recommends a bid like 2.5, hah, which isn’t super helpful. But you should probably round down and play conservatively in that case.

The code is on github, check it out!

⊃ cell culture and automation

I’ve been reading some interesting papers on cell culture and automation, like this 2013 thesis on microbioreactors, published by Shireen Goh as her PhD work at MIT.

  • ..

More interesting papers are linked here.

Here’s a whitepaper on the TAP Sonata in the Journal of Lab Automation (2008) – it’s an automated shake flask handler:

  • six processing modules: liquid handling, shaking, incubation, chilled i/o carousel, centrifuge and cell counter
  • six-axis RX60L robotic arm from Stäubli
  • holds 40 microtiter plates, 245 250mL flasks
  • built-in cell counting capability
  • all in a HEPA-enclosed space with negative pressure and “laminar down flow”
  • there seems to be no sensors in any of the shake flasks

This is a whitepaper on the TAP Piccolo, also in JALA (2006).

  • seems like a bioprocess screen – they cite the difficulty in “evaluating multiple sets of expression conditions that will yield biologically active protein in sufficient quantity to meet the demands of discovery teams”
    • structural bio groups need 100mg scale quantities
    • typical groups will run ~20 expression experiments – nice examples of iterations: “variations in inducer concentration, cell density at the point of induction, postinduction expression duration, media composition, DNA constructs, culture strains, and temperature regimes”
    • expression and purification are bottlenecks: pharma groups typically express on the order of 100 proteins each year
  • the system perfoms induction, expression and single-stage purification using E. Coli
  • multiple 24 well “cell vessel blocks” are used – they did one run with 48 of these blocks (1152 wells total)
    • each well is 23mL – 10mL working volume for microbial expression
    • they are aerated by a fixed assembly and then lidded and moved to storage
  • there is a fixed OD sensor – I think the CVBs are brought to the sensor
  • there are also stirring rods and gassing ports (the aeration module)
  • protocols generated and exported as XML
  • has four incubators, a centrifuge, liquid handler and temp-controlled reagents
  • built-in cell lysis and affinity chromatography
  • they got about 1.7mg purified protein per well

NextGenSciences also made a protein purification system

  • developed around 2003, the company appears to be defunct
  • they wanted to sub-clone, express and purify “hundreds of proteins in parallel,” while also exploring cell growth conditions and purification strategies
  • LIMS with a web client (!)

Another whitepaper on a bespoke automated shake flask system tested by DSM and Presens (2013)

  • PreSens chemical optical sensors
  • glucose addition to the flasks when DO fell below some value
  • lower yields than in a 20L bioreactor (in terms of dry cell weight and product concentration) by a factor of two – roughly same strain ordering though
    • probably OTR differences, they say
  • ~3-fold improvement in scaling

⊃ what it takes to make protein

Virvio recently published a paper on a Computationally Designed Hemagglutinin Stem-Binding Protein that can protect against a wide variety of flu strains. Their methods were interesting to me – I’d like to categorize all the techniques, reagents, equipment and software that they used.

notes on the work itself

  • they took a “broadly cross-reactive HA binding protein, HB36.5” and optimized it to increase its affinity for multiple HA subtypes
    • HA is an “influenza envelope glycoprotein,” hemagglutinin. It’s a surface protein that contains 18 subtypes and many genetic variants (strains) within each subtype
    • (recall that current flu vaccines are strain-specific)
  • in vitro studies have shown that, by targeting the HA stem, “broadly neutralizing monoclonal antibodies” can neutralize strains by blocking “conformational rearrangements” required for the virus to fuse to a cell’s membrane
    • their goal was to design a protein that has the same broad affinity as bnAbs
  • they created a protein that reduced viral replication in mice and could be administered intranasally as a prophylactic or therapeutic
    • they also found, somewhat surprisingly, that they didn’t need to engage Fc-FcγR receptors for this to work in-vivo
    • Fc receptors typically bind to antibodies that are themselves bound to pathogens, the receptors then stimulate an immune response of some sort
    • they expected to need a “fuller” immune response based on past work, but they found their protein worked so well that Fc-FcγR wasn’t as important

aside on yeast display

  • a technique to “engineer the affinity, specificity and stability of antibodies,” according to openwetware.org
  • a protein of interest is displayed on the surface of yeast cells as a fusion of a transmembrane protein, Aga2p
    • the target protein, because of this fusion, is held kind of at “arm’s length” from the cell, minimizing potential interactions with the yeast itself
  • you can then add potential ligands, themselves complexed with flourescent and / or magnetic substances, and see which yeast bind to the ligands
    • flourescing additives allow quantification
    • and magnetic beads can be used for separation
  • you’ve presumably done this in a yeast library with ~10^8 variants (clones, but for some randomly mutated sequence), so you can extract the yeast that bound with high affinity, sequence it, and make your protein of interest

their methods

⊃ learning KiCad

I’ve made a handful of boards in EAGLE CAD but I’m interested in learning KiCad – it’s totally open source and, from what I’ve read online, it works pretty well now. It’ll be sad to leave EAGLE – the highlight of my work was getting in the adafruit blog with a bitmap / ULP trick..it made 36k errors in the ERC (!?!).

abe schematic

but turned into this – pretty good for homemade etching!

lincoln pcb

and then I put a surface mount component right on top :( anyway, on to KiCad..

basics

  • I’m using 4.02 (stable)
  • they have a nice workflow diagram
    • create a schematic and add components, creating components when necessary
    • annotate components (giving them numbers like R1, C3),
    • ERC..
    • associate components – select footprints for each component (a change from EAGLE that I like), making footprints where necessary
    • make a netlist
    • make a PCB, reading the netlist
    • debug rule check..
    • export gerbers and check them out with gerbv (a separate program that can be brew installed, not KiCad’s gerbview)
  • works pretty well with git..everything’s plaintext

Windsor Schmidt’s 20min walkthrough video

the workflow was..

  • make a project and a new schematic
  • from the schematic, make a lib via tools -> library editor and create a new component (but not yet a footprint), save this into a new lib
  • start a new schematic and draw your circuit, probably using lots of labels
  • annotate everything when you’re done connecting up things
  • make a netlist and save it
  • run CvPcb to associate schematic symbols with footprints (turn on the preview, probably)
  • save and quit your schematic editor and go to Pcbnew
  • import the netlist..I got errors so I went back to the schematic editor, exported the netlist again, and then I could import into the pcb editor :/
  • position everything..
  • hide values (visibles -> render -> values)
  • draw board outline on edge cuts layer
  • check view -> 3d viewer (nice!)
  • select the front copper or bottom copper layer and “add filled zones” – can copy one to the other layer, then right click and fill them all (the ratsnest will update), then you can hide the fill in the left pane
  • start routing – right click and end tracks when you’ve connected a net
  • change width by adding custom net classes and assigning nets into the class – then right click a track to edit and “set all tracks to their netclass values”
  • to fill your copper zones, run the DRC
  • make gerbers with file -> plot – you probably want these layers on: edge cuts, the copper layers, masks and silkscreen layers
  • use that dialog to create a drill file as well
  • consider using gerbv to check these files.. here’s my rather ugly board!

gerbv render

Matthew Venn’s five part series

part one - schematic

  • hit g while hovering over a part and then r to rotate it in place in the schematic
  • w to run wires
  • can add “power flags” to indicate to KiCad that power will come in from elsewhere
  • when doing PCB layout, view -> switch to OpenGL, then right click -> routing options there are cool walk-around and shove routers – hard to rip wires in this mode though

part two - pcb layout

  • cmd + shift to move things with finer control
  • draw board border in edge cuts layer

part three - changing the schematic

Contextual Electronics has about 90min of video on KiCad

part one - intro

  • ..

part two - creating schematic symbols

  • ? brings up hotkeys
  • make a new lib..open the lib editor, create a new component and name it something. click the empty book to save the component to a new lib. then preferences -> component libs and add the new lib then click the book with writing on the pages to set the working lib and finally you can save. Note the topbar which has info about which lib is active
  • use ~ in front of the pin name to get the bar over the name (active low)

part three - more schematic

  • run ERC to find unconnected things – note that the part’s pin type might cause issues
  • then generate the netlist

part four - library setup and part association

  • the start of footprints..from the schematic find CvPcb to start associating symbols and footprints
  • use the lib wizard from the schematic editor to get the github libs for the first time.. I had to do something similar in the footprint editor to download footprints (the .pretty files)

Udemy’s KiCad course

PCB design techniques in KiCad

  • he starts using it around 22min
  • the push and shove router is so cool..switch to OpenGL mode and then right click -> checkout the routing options

other

misc

  • measure distance in the footprint editor by hitting ‘space’ to reset the dx / dy numbers in the bottom menu

importing libs

  • I like to git clone a lib somewhere in home and then move the bits I need (.bak, .dcm, .pretty, .lib and .kicad_mod)
  • they go into something like /Library/Application\ Support/kicad/template/pololu-drv8825
  • then there’s a lot of prefs > lib wizard wrangling and you may have to restart kicad
  • same with footprints – have to add the lib with some sort of wizard

⊃ endurance cycling

was recently reading about various endurance cycling feats..

The Transamerica Bike Race – 4400 miles, 10 states, self-supported. Lael Wilcox just won the 2016 race, becoming the first American and first woman to win. She took only 18 days..insane – that’s 244mi per day.

Kurt Searvogel rode 76k miles in 2015.

Craig Cannon climbed 95,623ft in 24hrs.

⊃ the Outer Banks

I visited the Outer Banks in NC to attend my younger cousin’s wedding – had a very nice time in New Bern, Beaufort and Wilmington. There is such cool history there.. Would love to visit my family out there again sometime soon.

Pictures to come!

⊃ Making Biologic Medicines for Patients

I’m going through the edX course, MITx 10.03 Making Biologic Medicines for Patients: The Principles of Biopharmaceutical Manufacturing led by Drs Chris Love, Stacy Springs and Paul Barone, here are some notes:

unit one – intro

  • Dec 1983 – genzyme makes alglucerase, the first synthetically engineered beta-glucocerebrosidase, an enzyme that, when lacking, causes Gaucher’s Disease
  • biologic medicines are quite broad according to the FDA – proteins (the focus of this course), vaccines, blood components, cell therapies, natural hormones and plant / animal extract
    • cell therapy: something like injecting live T-cells into a patient – alternative medicine has also co-opted this term to injecting non-human cells
    • animal extract: for example, insulin from pigs
  • these engineered proteins are quite complex – monoclonal antibodies are 1.3k amino acids and 144k Daltons
    • compare that to “small molecule drugs” – <1k daltons
  • first hybridoma created in 1975 by Georges Kohler and Cesar Milstein
    • they had to be further “humanized” (glycosolated?) to make them effective in humans
  • some history:
    • ~7000BC: a pot from China was used to make something like wine..cool
    • the 1850s: Pasteur characterized the activity of yeast and bacteria in distillery equipment in Lille
    • 1894: antibody therapy was inadvertently created with a diphtheria treatment involving a “serum” from guinea pigs that had been infected with a heat-inactivated bacteria – later horses were used to increase production (anti-toxins and anti-venoms are still made in horses and other animals – I wonder how they handle human-compatibility – ah, the comments say that this is still a problem, but you are likely using the treatment for an acute and serious condition and not something chronic)
    • WWI: scientists on both sides used bacteria and yeast to make glycerol (for explosives), butanol (synthetic rubber precursor) and acetone (more explosives) – they pioneered aseptic processing where the reactors, media and equipment were sterilized
    • WWII: Howard Florey and Ernst Boris Chain worked on penicillin production in Peoria with the USDA, they (again serendipitously), found another species of the Penicillium mold on a cantaloupe in a farmer’s market – it made 60ug / mL. With X-rays and UV radiation they mutated the mold and applied some selection pressure, eventually finding a strain that could make 500mg / mL..wow
    • 1945: Fleming, Florey and Chain win a Nobel Prize
    • 1952: there are 58k polio cases in the US – Salk treats Polio with a mammalian-produced product
    • 1953: Watson, Crick and Franklin deduced the double helix structure of DNA
    • 1973: recombinant DNA tech (plasmid injection)
    • the 70s: HGH was isolated from cadavers to treat growth deficiencies
    • the 80s: genzyme makes that Gaucher’s enzyme..but it takes 20k placentas to treat 1 child / yr, wow.. they soon looked into recombinant techniques
  • the steps of biomanufacturing:
    • cell line dev: recombinant DNA techniques + variant selection
    • upstream processing: reactor condition optimization
    • downstream processing: purification and recovery
    • fill and finish: adding other components to the active pure product to make a final drug

unit two – protein structure and function

  • seven of the top-10 best selling drugs of 2013 (including the top-3) were protein drugs as opposed to small molecule drugs (aspirin, penicillin)
    • the top three included humira (an immunosuppressant), enbrel (treats autoimmune diseases) and remicade (another immunosuppressant) ..they’re all for stuff like arthritis
  • small molecules are, compared to biolgic drugs..
    • far easier to characterize
    • far less targeted (where we’re targeting DNA, RNA or receptor proteins) or where we’re accidentally affecting other pathways (where the biologic drug is so complex that these consequences are unlikely)
    • more stable – less likely to be affected by, say, oxidation or deamidation, processes that can alter a protein’s folded structure and thus its function
    • less likely to aggregate improperly
    • administered orally, as opposed to protein drugs which typically must be given parenterally
  • biotherapeutic categories affect:
    • enzymatic / regulatory activity (like insulin or erythropoietin)
    • special targeting activity (like rituximab – it targets B-cells for destruction, a treatment for non-Hodgkin’s lymphoma)
    • protein vaccines (HPV, HepB and flu vaccines, for instance)
  • amino acids
    • differentiated by side chains
    • our DNA makes 20 different side chains
    • some have charges which can form ionic bonds (lysine and glutamic acid, for instance)
    • some have polar functional groups and can form hydrogen bonds
    • most are neither charged nor are they polarized – they are hydrophobic and thus typically make up the core of a protein’s structure
    • cysteine has a thiol group which can form very strong disulfide bonds
    • proline’s side chain is linked to the amine group in the main part of the amino acid, kinking the structure
  • peptides come about through dipeptide bonds,
    • < 40 amino acids makes a peptide, anything more is a protein
  • in the discussion about structure, there’s a cool example involving insulin:
    • typically produced and stored as a hexamer – three dimers that join together via two zinc atoms
    • but insulin is only active as a monomer, and it naturally forms dimers
    • so insulin analogs have been engineered that reduce the dimerization
  • post-translational mods
    • oxidation, reduction in the side chains
    • proteins can be trimmed
    • disulfide bonds can be formed
    • they can tag and destroy mis-folded proteins or mediate other protein function
  • new molecules can be attached to the protein like in glycosolation – glycans are linear or branched polysaccharides. Half of human proteins are glycosolated and mammalian cell lines most closely reproduce glycosolation patterns found in a human
  • the Gaucher’s Disease example was cool – they created a carbohydrate on an engineered enzyme that would trigger natural uptake by a macrophage
  • deamidation is a PTM to avoid, in this case Asparagine loses an amide – one example of deamidation they listed was fixed when the final product was stored in plastic rather than glass
  • acylation can, in insulin, adjust the release profile of the drug
  • PEGylation (polyethylene glycol addition) – PEG polymers are added to biologics as a shield. They can prevent the immune system from clearing the drugs

protein structure and function examples – insulin

  • they discuss engineered changes to insulin:
    • “lispro” was an insulin variant that had Lys and Pro switched at the end of one chain – this resulted in less dimerization and a release profile more similar to natural insulin
    • it was found by looking at another molecule and wasn’t a product of screening
    • glargine developed to be a “long-lasting” insulin

protein structure and function examples – antibodies

  • five of the top ten best selling drugs of 2013 are antibody drugs
    • some target cells for destruction, others make it so small molecules can’t be used by the body, others deliver radioactive or chemotoxic agents to cells, some just block other receptors
  • IgG antibodies are about 150kDa
    • y-shaped, they have two identical heavy chains and two identical heavy chains
    • each chain has a constant domain and a variable domain – in humans this is literally a random sequence of amino acids
    • with PTMs, there are 100M possibilities for each antibody
  • an antibody’s “glyco profile” is the library of glycosolation changes made to a single dose – it can have a big effect on safety and efficacy

⊃ the DAO

A bug in the DAO’s smart contracts was exploited early this morning, allowing ETH wrapped up in the DAO to be drained – lamentable, but it’s made for some really interesting reading!

The attacker split into a “child DAO” and got about 3.6M ETH total before being stopped. That’s about 4% of the total ETH in circulation, and leaves the DAO with 7.9M ETH. The attack took 258 ETH per sec – 1 ETH is worth about $15 atm, after a 25% drop in value today. The balance will be frozen for 27 days due to, I think, the DAO’s rules. Er, and also because many (all?) of the exchanges paused their trading for a bit according to these logs.

This blog post was crazy to read – the slock.it “organizers” first ask the community to spam the network to stop the drain. They post a code snippet for ethereum clients that is supposed to create a lot of spam, but it’s unsigned code that could do just about anything, yikes.

This post on the ethereum blog then discusses what the “leaders” of the community have agreed to do – a soft fork followed by a hard fork. (See here for more on hardforks vs softforks – it’s basically backwards-incompatible vs compatible changes.) The soft fork will change ethereum’s codebase to prevent withdrawals from the DAO address and its children. This will extend the 27 day window and prevent the attacker from withdrawing funds at any time. The hard fork would be the follow up – it would allow people to only withdraw their invested ETH, and since nothing has been spent, it should all be recoverable. The DAO would essentially be shuttered.

These are forks to ethereum itself, even though it was only a bad contract written into the DAO that caused this issue. Miners would have to come on board, and this has come some controversy about the allegedly decentralized nature of ETH. Which causes more distrust in the network, forking or the attack itself? Who decides when a bad contract is bad enough to warrant a fork? Well one pretty good answer is that the miners still decide..it’s still a consensus-driven system. And it’s a lot of the global supply of ETH tied up in this bad contract – the price seems like it’ll drop no matter what they do. Ethcore agrees in their response – pointing out that bitcoin worked through similar bugs via consensus-driven hard forks. Bloomberg has an amusing and informative take on how the “hack” is unfolding..

There are fun conspiracy theories about the attacker paying off miners to not accept these proposals.. And other great theories about how the plan was to short ETH all along..

The slock.it team looked at this type of vulnerability only 5 days ago and thought they were safe – here’s another post breaking down the exploit. I think it’s a race issue where you recursively withdraw funds, calling another “withdraw” before the contract checks that you actually still have funds.

There is also some discussion about “proof of stake” vs the traditional “proof of work” – I’m still trying to understand this.. Apparently ETH was contemplating forking to PoS but that hadn’t gone through yet.

It was pretty fun reading about this today – it’d be fun to participate in the future. I’d also like to read more on Monero.

⊃ Adimab

I’ve been reading about Adimab, a NH company that makes an antibody discovery platform. This March 2016 Wired article details their work in yeast:

  • they have a “synthetic human immune system” in the yeast via ~30 modifications to the yeast’s genome
  • this allows the yeast (a fungus) to make human proteins
  • Adimab has made 10B antibodies of unknown utility – they allow pharma companies to bring in target antigens and they see which yeast can make antibodies that attack these antigens.
    • for scale, a single human’s immune system can produce about 100M unique antibodies)
    • also recall that humans make antibodies and fight disease with a similar method – it’s a bit random, bizarrely, B-cells just crank out antibodies and other processes in the immune system determine what works
  • five of these antibodies have now moved on to clinical therapeutic trials (Adimab was founded in 2007)

A team from Scripps used the platform to screen for Ebola antibodies, their 2016 Science paper is here.

  • 349 mAbs were isolated from a convalescent donor who survived the 2014 EBOV Zaire outbreak
  • they found 77% of the antibodies neutralized live EBOV
  • therapeutic tests:
    • mice were first challenged with EBOV
    • given a 100ug dose of antibodies after two days
    • 60-100% of mice survived
    • only 40% of mice survived in the control group (treated with another antibody)

⊃ glossary

various terms I’ve come across..

  • monoclonal antibodies: antibodies made from clones of a single B-cell, they will have a singular specificity to antigens
  • water polishing: removing microscopic particular matter or very low concentrations of dissolved material
  • hybridoma: created by injecting an antigen into a mouse, collecting the B-cells that make the target antibodies and then fusing it with a tumor cell to make the cell immortal
  • dalton: the weight of one proton or neutron, essentially – abbreviated Da or u, it is about 10^-27kg
  • deamidation: the removal of an amide functional group from an organic compound – a form of protein degradation
  • parenteral: bypassing the GI tract – like administering a drug intravenously or via inhalation

⊃ the trashtalk API

Computers continue to dominate humans in the highest forms of board and card games (see set, alphago and deep blue) – they need new ways to express their range of emotions and general superiority..

Introducing the trashtalk API!

Inject some much needed spirit into your 3d printer, RC car, smart watch or IoT fridge. Get started today at http://trashtalk.oakmachine.com.

⊃ outside-o-meter

I want a device that can quantify the proportion of my day spent in the out-of-doors.

options

  • toggle switch (my favorite so far..)
    • push a button to activate “I’m inside” state, push again to toggle “I’m outside” state – timestamp each activity and upload to some other device for analysis
    • simple purpose-built device
    • disadvantage: no passive monitoring
    • could be a necklace or a ring
  • GPS
    • just log GPS position and later determine if positions are in a building
    • would struggle with time spent on trains, time on subways, time in cities..
    • actually not really sure what would happen when inside at all – might show an inaccurate fix outside of a building
    • could use other gps-assist technologies available on a phone (wifi, cell towers..)
  • camera
    • periodically take a picture of the surroundings
    • would probably need something purpose built and wearable – phone is too often in my pocket
    • may be tricked by images of windows
  • phone video
    • periodically take a short video (with sound) of the surroundings
    • would also need something purpose built + wearable..
  • microphone
    • record ambient noise and try to discriminate between states
    • sounds tough..

⊃ Seth Rolland

a video of the amazing bandsaw work of Seth Rolland.

⊃ the John Muir Trail

My buddy Will and I did a short trip on the John Muir Trail in the Sierras. We camped near Florence Lake – right here actually: 37.2324891N, -118.8780362W. I remember because we camped right near that logjam north of the river.

logjam near campsite

We were going to cross the San Joaquin to look for the hotsprings on the other side, but Will saw a brown bear the first night.. it was eating a deer right at the other end of the logjam bridge. The river was really cranking so we don’t think it heard us, but when I got closer it started sniffing around and loped off into the woods. It came back a few minutes later and dragged off the rest of its deer jerky. We saw him again the next morning and decided the bear could enjoy the hotsprings on his own. I tried getting a picture with my disposable camera but I guess it didn’t come out – they never sent me any bear pics back and so I’ll never have proof, alas.

Will on the bluffs

Had a great time all around – we swam in Florence Lake for all of 30s, it was freezing with the runoff from Memorial Day snow. We saw another fellow hiking with skis on his back – he had just come from some spring skiing at Selden Pass.

Another guy we met was hiking (and promoting) the National Hot Springs Trail. Nice dude..also a total machine.. capable of 30mi days, it was his 33rd day on the trail. His route was so cool, he’d hit a different hot spring almost every single day, from Santa Barbara through Nevada and Idaho to Montana. We gave him some blueberries and carrots, he was too antsy to wait for the ferry with us and he set off to hike around the lake.

⊃ planck length in a simulation

Wikipedia’s excellent notes on the Planck length:

The size of the Planck length can be visualized as follows: if a particle or dot about 0.1 mm in size (which is approximately the smallest the unaided human eye can see) were magnified in size to be as large as the observable universe, then inside that universe-sized “dot”, the Planck length would be roughly the size of an actual 0.1 mm dot.

The value is about 1.616199(97) * 10^-35 m and is regarded as the shortest measurable length.

on binary fractions ..

⊃ Berkeley

Me and Nora went to Berkeley!

We saw some amazing succulents at the botanical gardens, some could grow to be 2000 years old!

succulent bonsai succulent y succulent collection

The other areas were quite nice as well.

tropical bouquet nora amidst the flowers

We ate like royalty at Gather, Cheeseboard and many other places.. We were enticed by the Pacific Film Archive’s amazing map and bookstore, but alas, their main galleries were closed.

pacific film archive map

We saw and puzzled through The Lobster..

the lobster

it was a good time all around, I’d say.

⊃ ANNs from scratch

I’m making a video on how to write an ANN in python from scratch.

backprop derivations

and

along with this worked example and some slides

⊃ riding down US 1 to SLO

My friend Trevor organized a ride from Gilroy to SLO over Memorial day weekend – our campsites: Mt. Madonna SP outside Gilroy, Veteran’s Park in downtown Monterey, Kirk Creek in Big Sur and the San Simeon Creek Campground near Hearst Castle.

I took along a disposable camera ..I guess my memories will have an early nineties vibe. Here’s Trevor chasing his land speed record, going down from Mt Madonna:

Trevor riding down Mt Madonna

Southbound is definitely the way to go – more tailwinds that way. And the days aren’t too taxing – we do roughly 50mi per day. The ride to Kirk Creek is the longest at almost 70mi and also the hardest because of the hills.

Some of the crew paused near Monterey:

photo op near Monterey

I think this is Julia Pfeiffer State Beach:

Big Sur waterfall

Spindrift Cove, a nice detour from the highway near Pt Lobos:

Spindrift Cove

Looking south:

looking south on US 1

I had just one mechanical issue, albeit a big one: my chain snapped as I left the Kirk Creek campground. We had decided to camp at Pacific Valley that night, a campsite about 4-5mi up the road but, after my little snafu, we ended up staying at Kirk Creek. We had stayed there for last year’s July 4th ride and I again swam in the ocean with the sea otters for about 30s – it’s insanely cold.

We saw some slackliners in one of the coves:

slackliners near US 1

The final bit to SLO is really nice and warm, definitely my favorite few miles.

the road to SLO

horses outside of SLO

Look forward to next year’s ride!

⊃ startups

On working for startups in the past and possibly the future:

I’ve loved working on small teams throughout school, in hackathons and in the working world. Little teams let me take on many roles and explore lots of different things, from hardware to the web to product design, ux and even sales I guess! I really like the opportunity to see a product, whatever it may be, from many different angles. You can explore ways to make it more useful, more feature-ful, better positioned, better priced.. and it’s engaging to think of these systems in more nuanced ways.

It’s hard for me to imagine working for a huge team, or for a small team making a tiny part of a larger product. And so this lends itself towards working for startups. But why a startup over just working on a project?

All of my work to date has been on “projects” – small scale, and for small audiences. Often I’ve worked on them to practice some technical thing, or they’ve had such a small number of users that they’ve essentially been contract work. So it’s of course exciting to imagine helping lots of people.. do their jobs more efficiently, create things they otherwise couldn’t, or express themselves in new ways. I love that startups force you to put your creations out in the world.

Many anecdotes cover the stresses of startup work – founders especially face huge demands on their attention and personal relationships often suffer. I would hope to work in an environment that is open and intentional about this addressing this. (The Startup Podcast episode about burnout is a really excellent look at this problem, by the way). In general, intentionality is a value I would want to see in many aspects of the work. I suppose introspection comes with that, along with a willingness to measure outcomes objectively. I think openness is an attractive value to aspire towards – at the least, the team should be transparent about their thought processes and rationales. At best, a team’s work could be entirely open all the way to the tech front (though I understand why this can be difficult). I would hope to work in a creative and expressive environment, one in which diverse perspective are sought out, shared and acted upon. And I’ve been fortunate to work with incredible colleagues that have challenged me to grow as a person and as an engineer – I would of course want that to continue.

I hope that, in my next venture, I’m able to foster an environment like the one described above, and come together with friends to make something inspiring, innovative and useful for the world at large.

other notes

  • nice notes from Mike Maples
    • need to develop the power of proprietary tech, the product, the company and the category
    • create a “WTF moment” for customers – they should never go back to the old way of doing things
    • nice note on empowering teams: they should control the deadline or the features – mgmt should never control both

⊃ Proust queries

Marcel Proust answered a series of questions in a “confession album” around 1885 – at the time, it was a popular way to “record thoughts, feelings, etc.” He wrote his answers as a teenager and you can read them here. Not having done enough personality surveys, here are my own:

your favorite virtue

  • curiosity

your favorite qualities in another

  • independence, self-reliance, patience, candor, creativity, curiosity, mindfulness

your chief characteristic

  • patience

what you appreciate most in your friends

  • their creativity, open-mindedness and their ambition

your main fault

  • torpor, or maybe greed

your favorite occupation

  • extertion to a point – climbing Montebello Rd and then coming down the trails to Stevens Creek Canyon

your idea of happiness

  • riding on a sunny day, with some balance of the known and unknown ahead of me

your idea of misery

  • entrapment, consumption

if not yourself, who would you be?

  • a writer of short stories, someone more devoted to the arts

where would you like to live?

  • Japan, Spain – two places I’ve never been. Berlin. Cities with history, cultures beyond my own.

your favorite color and flower

  • green; the daisy – Proust’s superior answer: “the beauty is not in the colours, but in their harmony”

your favorite bird

  • the eastern bluebird for its beauty and fragility and its need for us

your favorite prose authors

  • DFW, Lahiri, McCarthy, Murakami, Borges, Dostoyevsky, Saunders

your favorite poets

  • Dickinson, cummings, Collins

your favorite heroes in fiction

  • Toru, Llewelyn

your favorite painters and composers

  • not familiar with many painters – musicians I enjoy: John Darnielle, Jonny Greenwood, Joanna Newsom

your heroes in real life

  • Greenwald, Snowden

characters in history you most dislike

  • McCarthy, Kissinger

your heroes in world history

  • Thoreau, Muir

your favorite food and drink

  • homemade things, by my friends or myself, from fresh food that we grew

your favorite names

  • lepidoptera, the scientific

what I hate the most

  • demagoguery

the military event I admire the most

  • the WWI Christmas truce

the reform I admire the most

  • a personal one

the natural talent I’d like to be gifted

  • a knack for oration..but “natural talents?” I object.

how I wish to die

  • with dignity and intentionality or with purpose

your present state of mind

  • introspective, questioning, curious

for what fault have you the most toleration

  • tactlessness

your favorite motto

  • “yes, and..”

⊃ bioreactors

I’ve been learning about bioreactors and fermenters: devices used to grow bacterial and mammalian cells. There are a lot of interesting startups in the synthetic bio space that use bioengineered organisms to produce unique proteins – these companies need bioreactors to run experiments.

basics

  • 2011 article on single use vs stainless steel bioreactors

    • there are even benchtop autoclavable vessels
    • lots of innovation in single use sensors, improving aeration, feeding and waste removal
    • pros of single use: reduced time and labor and energy; cons: capacity, reliabililty, automatibility, security, waste
    • GE analyzed 5 2000L steel tanks vs 20 500L (!) disposable bioreactors – the latter was only 40% cheaper (suprisingly low), required 18% less staff, and produced 34% more product
    • only really the best if you’re doing 100kg/yr of product in 500L - 1kL reactors and if the facility is at 90% utilization (not sure why that is) and if you need to be very flexible
    • Sartorius says there is rising interest in 50L single use system for microbial applications
    • people want more sensors: biomass and CO2
  • challenges in fermentation scalability

    • a review of eppendorf’s product line – they have autoclavable, sterilize-in-place (SIP) and single-use designs
    • critical to reproduce bioprocess params at each scale to maintain yield
    • they examined vessel and impeller geometry, tip speed, mixing time, oxygen transfer rate, impeller power number
    • with mammalian cells and stem cells especially, you have to be careful with tip speed or you could shear the cells – but then in bigger volumes you may need to mix a lot more
    • good notes on testing mix rates, OTR and impeller power number
  • Apr 2016 article on single use innovation

    • strong research interest in single use and continuous flow
    • people in the industry think a facility that only uses disposables is coming
  • Apr 2015 article on single use processing for microbial cultures

    • single use vessels for mammalian purposes “have gained wide acceptance” but the industry still needs options for “column chromatography media, cryopreservation, and process monitoring”
    • single use vessels in microbial fermentation is rare
    • “Fermentation is principally the domain of nonglycosylated recombinant proteins and peptides such as insulin, erythropoietin, and interferon.” – I-know-some-of-those-words.jpg
      • nonglycosylated: from this pubmed abstract, it seems like proteins expressed in E. coli may be nonglycosylated whereas the same protein from CHO cells may be glycosylated. The effect of the sugar chain may be small in terms of its contribution to molecular weight (4%), but there is some debate as to how the sugar chain can change the protein’s function.
      • insulin: regulates blood sugar levels
      • erythropoietin: secreted by the kidneys, it increases the rate of production of red blood cells in response to falling oxygen levels
      • inteferon: a signaling protein released in response to the presence of pathogens, they activate other immune cells
    • new high-value therapeutics produced through fermentation: “antibody fragments, fusion proteins, nonglycosylated antibodies”
      • antibody fragments: ..
      • fusion proteins: aka chimeric proteins they are the result of translating a gene sequence that is the combination of two parent sequences. For instance, if you develop a monoclonal antibody in a mouse, you may need to reengineer the sequence a bit to make it more like a human antibody
      • monoclonal antibody: antibodies specific to one antigen made by identical immune cells – used in detection and purification of substances, often made by just harvesting cells from live mice
      • nonglycosylated antibodies: presumably an antibody without that sugar chain..
    • risks to biomanufacturers: regulatory (I keep hearing about this vaguely) and a lack of “reliable platform processes that reduce development costs and shorten time to market”
    • conventional single-use bioreactors that are suited for mammalian culture are not well-suited for single-use fermentations – worse by a factor of three in the OD600 dept, according to one app note
    • nice notes on fermentator design.. fermentation is governed by oxygen transfer and heat dissipation rates, OTR can change with the geometry of the vessel, impeller speed, impeller fin number, sparge gas bubble size, and other factors
    • they tested their 30 and 300L single use fermentors against stainless steel bioreactors with E. coli and Pichia pastoris and saw similar OD600 values over time and similar dry cell weights

small scale

medium scale

  • Nov 2014 article about Lygos:
    • they got $1M of equipment for $150k
    • they’ve gotten 32 bioreactors together, according to their own press note
    • Cal Institute for Quantitative Bioscience, QB3, has shared facilities
  • Lygos presentation, Mar 2015
    • barriers: cost effective production, reduced cycle time for dev (design and construction tools)
    • issues with translating small scale experiments to “relevant fermentation process”
    • one goal: expand workflow capacity

industrial scale

  • the GE “FlexFactory” with bioreactors of various sizing and other industrial equipment – more for when you’re scaling up an operation I suppose. This is from their 2011 Xcellerex acquisition, though this vid is from mid-2014. The software could use an upgrade.

companies in this space..there are so many

  • Pierre Guerin Technologies
  • New Brunswick Scientific – BioFlow 610
  • GE Healthcare Life Sciences – makers of the Wave platform (single use)
  • Sartorius
  • Applikon – miniBio
  • Electrolabtech.co.uk
  • Cercell
  • Fisher Scientific
  • Danaher-Pall
  • PBS Biotech
  • Cellexus – here is their single-use reactor at work:

⊃ hardware by the numbers

notes on bolt’s “Hardware by the Numbers” series on hw startups:

part one – team + prototyping

  • in the founding team, need a hacker and a dealmaker / hustler
    • to scale need 8 employees: ME + EE + FW + ID/UX, FE + BE, Marketing/Sales + Ops
    • a 50-50 split of equity is rare – usually someone brings a little more to the table – lots of other equity advice.. option pools too.. tax advice..
  • prototyping: typically takes at least six months in core product dev from ideation through final, functional prototype
    • build something new (something physical) every week..not necessarily functional, but something to test a feature or the look/feel of something
  • get solid relationships with 30 customers at least to get relevant, statistically significant feedback
  • initially, don’t stress about patents

part two – financing and manufacturing

  • a VC leading a financing round will want to own 20%
    • VCs will want you to be an outlier – getting to $100M in revenue in a few years
    • mean pre-money valuation is $4.5M for priced seed rounds (east coast startups, btw) – but the interest / quality of the investor is more important than the valuation
  • average companies do four rounds..so try to sell under 25% of the company in each round
  • companies get commits from 10% of investors they ask – try to pitch in tranches of five
  • average hardware crowdfunded campaign raises only $92k
  • target 18 months of runway – and know it takes 3-6 months to fundraise, and you want to either be fundraising or focusing on product, not both
  • in first year of production, a CM wants to see $1M of BOM leaving the factory – and they will want that figure to accelerate
    • it can take 3 months to finalize negotiations with a CM
    • typically you’ll be quoted 180days for a CM to do its work, but it often stretches into 1yr territory
    • optimize your specialized parts but let the CM find the cheapest SMD resistors.. they’re better at it than you
    • injection molds are like $6.5k each and you’ll porbably have 5-50 different molded parts – it’s 2x more expensive in the US, takes 2x longer, and can’t be later shipped to China because tooling standards are different
  • certifications – you’ll probably need at least one and it’ll cost about $15k
  • think about packaging..it doesn’t have to be expensive, but it’s not free

part three – logistics and marketing

  • recommends buying insurance on your container ships – they can fall overboard :o and it’s a 4-5wk journey
    • $2.1k to move a single container across the pacific
    • air freight is 10x the cost
  • 3-5% of the BOM is paid in customs fees
  • expect to pay 2-5% of the BOM on logistics providers
    • moving a pallet from CA to NY is $1k and takes 4-5 days
  • expect 10% return rate – and realize that handling this is expensive
  • recommends just giving 10 units away per week for a $50 COGS product – more effective than a PR firm, he argues
  • a full day photo shoot with a pro photographer is $3-5k but adds a lot
    • most launch videos cost $7k
  • expect only 2 / 100 web visitors to purchase a product online, brick and mortar stores are critical

part four – retail + exits

  • online retailers take 15-20% of the final sale price
    • big box or specialty retailers are far higher 40-70% – the apple store, for instance, takes about 50%
  • licensing will only make you 5% MSRP or 2%, more likely
  • look for recurring revenue opportunities..
  • average successfully exited, VC backed company takes 6yrs, $42M raised and goes for $242M
  • 23 founders make under $10M after an exit (and taxes)

⊃ assembly

Reading some things on assembly:

an x86 primer

basics:

  • x86 CPUs have eight 32-bit, eight 16-bit and eight 8-bit registers – the latter two being subparts of the eight 32-bit general purpose registers. The 16-bit registers are {ax, cx, dx, bx, sp, bp, si, di} and are the bottom 16 bits of the corresponding 32bit registers {eax, ecx, ..edi} (where e stands for “extended”).
    • the 8-bit registers are {a1, c1, d1, b1, ah, ch, dh, bh} – they are the low and high eight bits of the {ax, cx, dx, bx} registers

instructions:

  • in most operations, two registers are given – the first is a source and the second is both a source and destination.
    • for example, addl %ecx, %eax would be something like eax = eax + ecx; in C notation, with both eax and ecx having type uint32_t
    • some operations take only one operator, like notl %eax is eax ~= eax; and incl %ecx would be ecx = ecx + 1;
  • for bit shifting we write shll %c1, %ebx which is ebx = ebx << c1;
  • “immediate values” are just fixed vars and prefixed with $ – so movl $0xFF, %esi means esi = 0xFF;
    • movl copies 32 bits from the first arg to the second arg – not really “moving,” just blowing away the second arg’s contents
    • (see this SO discussion on the suffixes in movl, movw and movbl indicates you’re moving 32 bits, w is 16 and b is 8)
  • you’ll often need to write sequences of instructions – you can’t just try addl %eax, %ebx, %ecx

registers and memory:

  • there’s a 32-bit register, eflags that some instructions modify – it can largely be ignored though
  • RAM and registers – RAM is just a big array of bytes, it’s system memory
    • data is stored in little endian form – bytes at lower memory addresses are loaded into the lower part of the register (makes things appear to read backwards if you write out the addresses left to right)
    • reading memory: movb (%ecx), %a1 means (again in C-like notation) a1 = *ecx;
    • writing: movb %b1, (%edx) means *edx = b1;
  • you can also use memory operands in arithmetic instructions: addl (%ecx), %eax means eax = eax + (*ecx); (reading 32 bits from memory) or addl %ebx, (%edx) which means *edx = (*edx) + ebx; (reading and writing)
  • there is a more ergonomic memory addressing syntax for inspecting arrays and handling indices
  • instructions can be prefixed with labels ala:
entry:  // a label
negl %eax  // instruction with a label

jumps and machine code:

  • and then you can jump to labels ala: jmp entry – and there are lots of conditional jumps like ja: jump if above
  • every machine code instruction is translated into 1-15 bytes of machine code and the eip register tracks the current instruction being executed. The CPU knows how to advance eip after each instruction.

the stack:

  • just a region of memory addressed by the esp register
  • grows downward – larger addresses to smaller ones
  • to push a 32-bit value onto the stack, you first decrement esp by 4, then store your value starting at address esp
  • to pop a value, read memory starting from esp (into some other register or discard) then increment esp by 4
  • call (used in functions) will first push the next instruction address onto the stack, so after retl (which pops the address into eip) you can run the next bit of code – C convention puts some (or all) of the args onto the stack as well
  • on 32bit x86 Linux, the function caller (parent) pushes args from right to left onto the stack, calls the function, receives the return value in eax and pops the args
  • more notes on the stack
    • notice that when you call a function, there is some scoping that occurs so internal vars don’t leak, and successive calls of the same function have private memory, in a sense
    • when a program starts executing, a certain contiguous setion of memory is set aside – this is the stack
    • the stack pointer has some lower bound, “the stack limit” – annoyingly, the “stack bottom” is actually the largest valid address
    • when the stack is initialized, the stack pointer points to the stack bottom
    • note that after popping, data is still on the stack (second diagram) but values below the stack pointer are considered invalid, so we’ve basically freed that memory.. this is also how you can accidentally use pointers to local vars and have it work sometimes and fail other times
    • for each function call there’s a reserved section in the stack for the function itself, its args and its return value, this is the “stack frame”
    • a frame pointer is used to track the location of the stack pointer, before the stack pointer was modified to handle a function’s local vars – then you can easily return, moving the stack pointer back and effectively popping the function’s allocated memory off of the stack
    • if that function calls another function, I presume another frame pointer is generated..
    • if too many stack frames are pushed, we enter into invalid memory and the OS will kill the program with a stack overflow error
    • the page notes that with RISC (?), using the stack is discouraged as it’s in RAM and 100x slower than accessing a register
    • Fortran 77 doesn’t use a stack, functions had sections of memory for their data and args – this prevented the use of recursion

64 bits:

  • the 8 general purpose registers are just extended to be 64 bits long – they’re renamed to {rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi} and, as with the 8 bit and 4 bit variants, the 32bit variants (eax, etc) occupy the lower 32bits of the new larger registers
  • and there are 8 new 64-bit registers: {r8, r9, r10, r11, r12, r13, r14, r15} (and these have 32, 16 and 8 bit sub-registers)
  • all memory pointers must be 64bit

more

⊃ App Engine

some notes on using Google App Engine with python:

app versions

  • see this SO post – can’t use semver with periods, but you can use dashes :/
  • you can see various versions of the app by changing the URL, ala: http://v0-1-0.my-app.appspot.com
  • so you can upload a new version with appcfg.py but how do you make it live??
    • you can do it from the GAE dashboard..
    • and there is an appcfg.py start_module_version command but I’m not sure how to get it working
    • would be nice to have a git tag-based workflow – create a tag, upload it, that version gets sent to GAE and traffic is migrated to it

list and delete versions:

$ appcfg.py list_versions ./
$ appcfg.py delete_version ./ v1

testing

the SDK

  • I have a symlink in /usr/local/google_appengine/

SSL with letsencrypt

I ran these commands in an ubuntu VM:

$ git clone https://github.com/certbot/certbot
$ cd certbot
$ ./certbot-auto certonly --manual --email me@email.com -d superamazing.com -d www.superamazing.com

This displays the response text you have to show at some “challenge” url on your site. To make that work for the naked domain, I setup my app.yaml as described here ..and then did it again for the www domain. Finish the LE setup and you should have certs.

Before uploading to GAE, convert your private key into an RSA private key:

$ cd /etc/letsencrypt/live/superamazing.com
$ openssl rsa -in privkey.pem -out privkey-rsa.pem

And then you’re good to copy the certs over to the app engine dev console. I suppose I’ll do something similar in three months once these certs expire :/

I could take down the challenge / response URLs, but maybe I’ll need them when it comes time to renew?

After this is setup, you can redirect insecure requests to https by setting the secure key to always.

concepts

  • each service has a config file and each service has a set of versions, and versions have scaling types (to determine scaling properties) and instance classes (to determine compute resources)
  • all services share the datastore, memcache and task queue

appengine hierarchy

  • apps are sandboxed – you can read from the filesystem but not write, you have to respond to requests within a few seconds or the process is killed, you can’t make system calls
  • certain modules are replaced or customized (e.g. tempfile, logging)
  • threads can be used and even run in the background on manually scaled instances

subdomains

  • each of my subdomains is a different service in GAE parlance – for each subdomain / service, you’ll need to specify a yaml file and an entrypoint (main.py equivalent)
  • you’ll need a dispatch.yaml, something like this:
application: my-app
dispatch:

  # The default module serves the main app.
  - url: "www.example.com/"
    module: default
  - url: "example.com/"
    module: default

  # The API is served by another service.
  - url: "api.culturerobotics.com/"
    module: api
  • the main service is just known as default whether you label it as such in app.yaml or not
  • all services are public by default – specify login: admin to the service handlers to restrict access
  • you can start multiple modules at once with dev_appserver.py app.yaml demo.yaml api.yaml
  • update the dispatch with appcfg.py -A my-app update_dispatch .

the bookshelf tutorial

  • SO has a great note on flask blueprints, but I’m not convinced my apps really needs them.. I suppose it improves one’s organization to some extent
  • if you’re doing things with the gcloud util, you might want to view and change the active project:
$ gcloud beta projects list
PROJECT_ID                 NAME                PROJECT_NUMBER
bookshelf-tutorial-141521  bookshelf-tutorial  123789
nother-project-678         nother-project      123456

$ gcloud config set project nother-project-678
$ gcloud config list
Your active configuration is: [default]

[core]
account = matt@test.com
disable_usage_reporting = True
project = nother-project-678

third party libs

  • have to be all python, so the standard bcrypt lib is out (but apparently passlib works)
  • have to pip install <some_python_library> -t lib or use a yaml file to specify pre-approved libs

⊃ joshua tree

Went to Joshua Tree with some friends this last weekend – scrambling around Skull Rock and Hidden Valley was really fun.

hidden valley

We saw the Queen Valley Mine and hiked around Ryan Mountain and Black Rock Canyon – it’s a really cool place, would love to go back, there’s so much to explore..

On the way home we went by a soaring school near Mojave, CA. Nearby is one of the world’s biggest wind farms, and soaring records have been set in the vicinity as well. We saw a friend’s hand-built glider:

glider

glider cockpit

Would be fun to try sometime, maybe I’ll just get a little RC plane for now (:

⊃ pseudo

I built a barebones webapp + cli to learn more about rust – it’s a “compiler” for pseudocode that runs in the cloud (: Check out the site at pseudo-lang.oakmachine.com and see the source on github.

pseudoc is the local compiler – it sends input files to the server, and then it waits..polling an endpoint for results. pseudo_server is a little postgres-backed webapp that handles input code submissions and provides an interface for updating data.

I’m not sure I’d do another webapp in rust – diesel (the ORM) and nickel (the routing + templating system) are still nascent projects. I’m too much of a rust amateur to understand the myriad of code generation, macro and trait-importing tricks that these projects rely on, so writing the server side wasn’t much fun.

I like rust’s error handling and match patterns, and I’m loving the compiler, so making the CLI, pseudoc, was fun. And rust is supposed to be used for these kind of lower level utilities anyway, so I’ll probably stick with that in the near future.

Try it out! And I apologize if it’s slow (:

⊃ numerical analysis with Herbie

Saw this neat project, Herbie, a tool that uses heuristic searches to find more numerically stable formulae for use in floating point math.

There’s also a rustc plugin lint that would be interesting to try.

⊃ postgres

some postgres notes:

  • connect, list databases, list tables and select:
$ psql diesel_demo
psql (9.5.2)
Type "help" for help.

diesel_demo=# \l
                               List of databases
    Name     | Owner | Encoding |   Collate   |    Ctype    | Access privileges
-------------+-------+----------+-------------+-------------+-------------------
 diesel_demo | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 new-db      | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres    | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 pseudo      | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0   | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/matt          +
             |       |          |             |             | matt=CTc/matt
 template1   | matt  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/matt          +
             |       |          |             |             | matt=CTc/matt
(6 rows)

diesel_demo=# \dt
                  List of relations
 Schema |            Name            | Type  | Owner
--------+----------------------------+-------+-------
 public | __diesel_schema_migrations | table | matt
 public | posts                      | table | matt
(2 rows)

diesel_demo=# select * from posts;
 id | title |   body    | published |         created_at
----+-------+-----------+-----------+----------------------------
  1 | coool | pleasseee+| t         | 2016-05-03 21:00:18.639813
    |       |           |           |
(1 row)
  • update looks like:
UPDATE submissions SET compilation_complete = true WHERE id = 1;

auth

I had this error: fe_sendauth: no password supplied and started reading about host-based auth with pg_hba.conf.

  • the config file is typically at /etc/postgresql/<version>/main.
  • you can also psql <some-db> -c 'SHOW hba_file';
  • the first matching record is used
  • the trust value in the auth-method column (which is the last column) allows connections unconditionally – that’s my setup locally
  • might need to add a new user: sudo -u postgres createuser matt --superuser
  • more good tips on Digital Ocean

postgres with rust

I’m trying diesel, but I might switch to the vanilla, non-ORM rust-postgres package. The ORM functionality is definitely nice, but my projects aren’t that complicated and diesel is still a work-in-progress.

  • for fields that can be null, need to wrap the model types as something like Option<String>
  • rust-dashboard is a good demo project for diesel

On the diesel demo:

  • the full diesel_demo code is here
  • got to play with multirust and rust nightly
  • nice example of a project structure too with multiple binaries and a lib.rs gluing it together
  • datetimes were annoying in diesel and still kind of being worked on, it seems – see my pseudo-lang project for an example. It came down to adding a sql type like created_at TIMESTAMP NOT NULL DEFAULT (now() AT TIME ZONE 'utc') and, in the diesel model definition having a struct field like pub created_at: PgTimestamp where that type comes from diesel::pg::data_types::PgTimestamp

Diesel and Travis-CI:

  • I was getting this error: undefined reference to `sqlite3_errstr' when I tried to cargo install diesel_cli --verbose
  • Travis was building on Ubuntu Precise, I eventually realized, and Precise has a pretty old version of sqlite3 – one that can’t be upgraded cleanly through apt
  • so the fix is just to ask Travis to build on Trusty infrastructure (Ubuntu 14). I added this to my .travis.yml (see more in my pseudo project):
sudo: required
dist: trusty

migrations

After I finished the diesel demo I wanted to add a new column, created_at, which would be a Timestamp. I ran diesel migration generate add_post_created_at to make some files, then, in up.sql, I added:

ALTER TABLE posts
  ADD COLUMN created_at TIMESTAMP DEFAULT (now() AT TIME ZONE 'utc');

and in down.sql:

ALTER TABLE posts
  DROP COLUMN created_at;

I guess one day diesel will watch your model changes and actually write this stuff for you.

⊃ national park webcams

I want to start working on a yet-to-be-named project that turns webcam streams into graphs – I’m thinking mostly for industrial spaces, but I could practice on data from public webcams..

The NPS has lots of public webcams – they monitor air quality in Yosemite:

Yosemite webcam

They show some amazing vistas (this is Glacier National Park):

the St Mary visitor center at Glacier NP

and “Electric Peak” at Mammoth:

Electric Peak

And they just have cool tourist shots like at Old Faithful:

Old Faithful

You can also find more mundane streams and do boring stuff like counting cars:

Apgar Village

⊃ visiting biocurious

I went to the biocurious space in Sunnyvale and heard about some cool projects and meetups that they have going on: bioprinting, bioblock and Real Vegan Cheese (ala Muufri).

Also learned about Omnicommons in Oakland – would be cool to check out, and the Pembient project out of Indiebio.

learned about the Biomonstaaar bioreactor:

The Sydney biohackerspace is working on a bioreactor, Bionascent might be doing the same.

⊃ multirust on El Capitan

Here’s how to setup multiple rust toolchains setup on El Capitan.

First uninstall any old toolchains you might have (I just had a vanilla install of stable 1.8):

$ /usr/local/lib/rustlib/uninstall.sh

Then trust brson and grab multirust, this will use the stable toolchain by default:

$ curl -sf https://raw.githubusercontent.com/brson/multirust/master/blastoff.sh | sh

Get GPG so rust binaries can be verified, then add the nightly toolchain:

$ brew install gpg
$ multirust update nightly

Now start a new project (the diesel demo, in my case) and use the nightly-2016-04-09 toolchain:

$ cargo new diesel_demo
$ cd diesel_demo
$ multirust override nightly-2016-04-09

I previously had some path modifications in my ~/.zshrc, I needed to get rid of them and just let multirust manage everything, then I could install the diesel CLI:

$ cargo install diesel_cli
$ export PATH=$PATH:~/.multirust/toolchains/nightly-2016-04-09/cargo/bin
$ diesel --version
diesel 0.6.1

Read more about using alternative toolchains on the multirust project page.

⊃ postgres on El Capitan with homebrew

This particular mac I’m working with has never had Postgres running locally, so there’s nothing to uninstall and no data to backup, and installing 9.5 on is easy (though brew is nice enough to remind you of other upgrade steps if you do need that):

$ brew update
$ brew install postgresql

Then to start postgres with brew’s launchctl manager (more on brew services here):

$ brew services start postgresql

Test it out:

$ createdb new-db
$ psql new-db -c "SELECT version();"
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.2 on x86_64-apple-darwin15.4.0, compiled by Apple LLVM version 7.3.0 (clang-703.0.29), 64-bit
(1 row)

⊃ MD5 in rust

Here’s how to compute the MD5 of a string in rust:

extern crate crypto;

use crypto::md5::Md5;
use crypto::digest::Digest;

fn main() {
  let must hasher = Md5::new();
  let s = "hi there!";
  hasher.input_str(&s);
  let hash = hasher.result_str();

  println!("{}", hash);
}

You’ll also need to add a dependency in Cargo.toml: rust-crypto = "0.2.35" – then you can build and run it:

$ cargo build
$ cargo run
fd33e2e8ad3cb1bdd3ea8f5633fcf5c7

The thing that bothered me with this code was the need for use crypto::digest::Digest; – without that line there is a build error.

$ cargo build
   Compiling hash-test v0.1.0 (file:///Users/matt/hash-test)
src/main.rs:9:12: 9:21 error: no method named `input_str` found for type `crypto::md5::Md5` in the current scope
src/main.rs:9     hasher.input_str(&s);
                         ^~~~~~~~~
src/main.rs:9:12: 9:21 help: items from traits can only be used if the trait is in scope; the following trait is implemented but not in scope, perhaps add a `use` for it:
src/main.rs:9:12: 9:21 help: candidate #1: `use crypto::digest::Digest`
src/main.rs:10:23: 10:33 error: no method named `result_str` found for type `crypto::md5::Md5` in the current scope
src/main.rs:10     let hash = hasher.result_str();
                                     ^~~~~~~~~~
src/main.rs:10:23: 10:33 help: items from traits can only be used if the trait is in scope; the following trait is implemented but not in scope, perhaps add a `use` for it:
src/main.rs:10:23: 10:33 help: candidate #1: `use crypto::digest::Digest`
error: aborting due to 2 previous errors
Could not compile `hash-test`.

To learn more, run the command again with --verbose.

It’s annoying that crypto::digest::Digest is seemingly unused in the code, but the compiler does helpfully indicate that we’re trying to use methods from traits that are not in scope. It even recommends the trait to pull in, so I guess the workflow isn’t too bad.

⊃ neovim

How I got started with neovim – and especially how I setup neomake to work with rust.

basics:

built neovim from source, following the repo’s instructions, then

$ brew install python3
$ mkdir -p ~/.config/nvim
$ touch ~/.config/nvim/init.vim  # the new ~/.vimrc
$ alias nv='nvim'

I used most of my existing .vimrc (my dotfiles are here) with these exceptions:

  • upgraded to vim-plug
  • thought it was a good time to try vim-sensible – subsequently dropped a lot of my old vimrc, that was nice
  • dropped syntastic in favor of neomake – more on this later
  • using deoplete instead of neocomplcache
  • setup a 256 color term and a new theme – OceanicNext and the corresponding airline theme ..and the corresponding iterm2 theme

neomake and rust:

  • you want to install neomake and then call :Neomake! cargo from within vim – despite what the docs say about extra config, this worked for me out of the box
  • and then you probably want to do that on save, so, in init.vim: autocmd BufWritePost *.rs Neomake! cargo
  • see more in my init.vim – you can change the signs and adjust how errors are displayed
  • sadly you have to stick with symbols in the gutter until this PR lands and allows highlighting of columns and rows (well it will with some slight rework)
  • can’t seem to get the gutter setup properly so neomake kind of pops in and out rather annoyingly ..alas
  • maybe consider vim-accio

vim-racer:

  • gd to jump to a definition and then Ctrl + O to go back to where you were

⊃ solar impulse

I saw the Solar Impulse plane land at Moffett Field the other day. It was at the end of a 60+ hour flight from Hawaii. We probably watched it land for over an hour – it’s amazingly slow, only about 30knots!

That rotor blade sound is a helicopter idling behind me..

The pilot, Bertrand Piccard, was in excellent spirits when he emerged from the plane.

bertrand emerges

It was a very cool sight, and when it was over the crew left in a fleet of Teslas, hah! More on their blog here, including some cool shots of the plane circling SF.

⊃ plotting gene expression data

I plotted some gene expression data from the NIH – each plot is the expression of a gene measured with different probes at different time points.

erbb2ip expression

You can see some probes don’t find much activity during gestation, while others see increasing or decreasing levels leading up to a mouse’s birth.

usp7 expression

The code I used to make the plots is here:

"""Plotting gene expression data from a csv"""
import os

import matplotlib
import matplotlib.pyplot as plt
import pylab


# Load data.
path = 'murine-placenta-gene-1-2.csv'
#path = 'human-placenta-gene-1-2.csv'
with open(path) as data_file:
  data = data_file.read()

# Strip header.
line_data, header_data = [], []
for index, line in enumerate(data.split('\r')):
  if index <= 2:
    header_data.append(line)
  if index > 2:
    line_data.append(line)

# Get the embryonic / trimester data (e8.5, e10.5, etc).
embryonic_data = header_data[1].split(',')
embryonic_numeric_data = []
if 'murine' in path:
  for d in embryonic_data:
    if not d:
      continue
    if d == 'P0':
      number = 20
    else:
      number = float(d.split('e')[1])
    embryonic_numeric_data.append(number)
else:
  for d in embryonic_data:
    if not d:
      continue
    if d == 'Pregnancy stage':
      continue
    if d == 'First trimester':
      number = 1
    elif d == 'Third trimester':
      number = 3
    embryonic_numeric_data.append(number)

# Group the probe data by gene -- keyed by gene, contents are an array of rows.
gene_data = {}
for line in line_data:
  tokens = line.split(',')
  if not tokens or not tokens[0]:
    continue
  gene = tokens[1]
  if gene not in gene_data:
    gene_data[gene] = []
  gene_data[gene].append(tokens)

# Find the greatest y value.
max_expression = 0
for gene in gene_data.keys():
  for tokens in gene_data[gene]:
    data = [float(v) for v in tokens[2:]]
    if max(data) > max_expression:
      max_expression = max(data)

# Make a plot for each gene.
if 'murine' in path:
  out_dir = 'mouse-plots'
else:
  out_dir = 'human-plots'

for gene in gene_data.keys():
  figure = plt.figure()
  for tokens in gene_data[gene]:
    probe = tokens[0]
    data = tokens[2:]
    plt.plot(embryonic_numeric_data, data, 'o', label=probe)
  plt.title(gene)
  axes = plt.gca()
  if 'murine' in path:
    axes.tick_params(axis='x', pad=15)
    plt.xticks(list(set(embryonic_numeric_data)))
    axes.set_xlim([8, 21])
    plt.xlabel('gestation day')
    plt.ylabel('genechip expression level')
  else:
    axes.set_xlim([0., 4])
    plt.xlabel('trimester')
    plt.ylabel('beadchip expression level')
  box = axes.get_position()
  axes.set_position([box.x0, box.y0, box.width * 0.75, box.height])
  axes.legend(loc='center left', bbox_to_anchor=(1, 0.5))
  axes.set_ylim([0., 1.1 * max_expression])
  matplotlib.rcParams.update({'font.size': 8})
  figure_path = os.path.join(out_dir, '%s.png' % gene)
  pylab.savefig(figure_path)
  plt.close(figure)

⊃ the blockchain

video

⊃ the GIL

⊃ barebones resume

Wrote a tiny utility that makes resumes with markdown – check it out on github!

⊃ projector stand

I was recently given an amazingly tiny 2” cube projector that is surprisingly bright and battery powered (!?) – I made this little couch-mount for it.

The projector screen is made of blackout cloth which seems to work well and was only about $5/yd at Joanne’s. It sits on our bookshelf that’s in front of the couch, and I initially wanted the projected image to be as wide as the shelf. After some fiddling, I found this meant the projector had to be forward of the couch a bit, so I put these measurements in fusion to see how large the arm would have to be.

projector stand

The build came out ok – you have to /really/ crank down on the wingnuts to lock the arm’s position in place. It’s cantilevered out pretty far. The mount slides on to the back of the couch – I made a lot of paper test pieces to figure out the shape and cut out the pieces on a scroll saw.

projector stand irl

The whole thing would be smaller and less prone to wobbling and just all around less finicky if I could bring the projector back towards the couch and away from the screen, just to shorten the mount’s arm. This would make the projected image much larger than our bookshelf – maybe too large – and we’d need to make a bigger screen.

⊃ learning rust

I’d like to do some homegrown machine learning work in rust so I’m going through some tutorials..

rust by example

  • the compiler is great, have yet to find a way to get compilation results in vim’s quickfix window, syntastic is also missing errors – Neomake works ok though, (here’s my post on getting that setup)
  • I smiled at the “return on the statement with no semicolon” idea – can also drop the last semicolon if it’s the last statement in a scope, I think
  • underscores for readability in ints is cool – like 1_000_000u32 is one million
  • print formatting:
println!("{:.2}deg C", 23.456");
=> 23.46deg C
  • arrays: known size at compile time and all objects within them have the same type, slices: unknown size at compile time, vectors: growable array type:
// array
let xs: [i32; 5] = [1, 2, 3, 4, 5];
// vector
let mut vec = vec![1, 2, 3];
vec.push(4);
let mut vec2 = Vec::new();
vec2.push(1)
vec2.pop()
// slice (of a vec)
let vec = vec![1, 2, 3];
let int_slice = &vec[..];
// or borrow a whole array as a slice
analyze_slice(&xs);
// or just take a bit
analyze_slice(&ys[1 .. 4]);
  • doing a contains check on an array:
let public_endpoints: [String, 2] = ["/".to_string(), "/login".to_string()];
if public_endpoints.contains("hi") {
  ..
}
  • discussion on Option (enums with Some and None variants) – you might see these when accessing slices
  • panics still possible if you out-of-bounds index an array :((

structs, enums and constants

  • unit, tuple and classic C-like structs are all possible – structs can also contain other structs
  • let can destructure a struct
  • use attribute notation to access fields, ala point.x
  • create types with enum – good examples here with variant creation and matching
  • use with enums brings names in to scope
  • can also use an enum as an integer, just like C
  • the Box is heap-allocated, by the way – recall the heap is memory space shared by all programs, while each thread in an app will have its own stack
  • 'static lifetimes last for the duration of the running program

casting, literals and inference

  • you use the as keyword..very nice:
let number = 123.45678;
let integer = decimal as u8;
  • i32 and f64 are assumed – you can also append the type ala let x = 3u8;
  • with the exception of primitives, types must use CamelCase – relevant when you perform type aliasing: type NanoSecond = u64; (this provides no extra type safety though – anything that’s u64 under the hood can be added to a NanoSecond)

match

  • tuples can be destructured in a match – see the example here
  • destructuring (&, ref and ref mut) is different than dereferencing (*)
  • the destructuring in match seems to be done just so we modify the vars in the match block, that is, both of these work:
let mut x = 5;

match x {
    v => println!("it's just a value, {}", v);
}

match x {
    &v => println!("now I've made it a ref, {}", v);
}

match x {
    ref mut v => {
        // deref it here so we can modify it
        *v += 10;
        println!("now I've modified it, {}", v);
    }
}

println!("strangely this is 15, {}", x);
  • for ranges, an interesting @ binding is also possible
  • if let is cleaner than match sometimes
  • while let is a thing too..
  • note that a ref borrow on the left side of an assignment is the same as an & borrow on the right:
let c = 2;
let ref ref_c1 = c;
let ref_c2 = &c;

the rust-lang book

  • rust automatically imports “the prelude” into every program – there is also an io prelude..but I don’t really get what that’s about, just more imports I think
  • because fn main() { doesn’t include a return type, it’s assumed to be an empty tuple
  • String::new() creates a growable, UTF-8 encoded portion of text
  • the :: means new is a static method – a method associated with String itself, rather than with a particular instance of a String
  • you could also write String::from("hi"); to build a String from an &str (string slice)
  • breaking down this line:
io::stdin().read_line(&mut guess).expect("failed to read line");
  • read_line(&mut guess) calls the method on a mutable reference to guess – rust uses these references to reduce copying. References are immutable by default.
  • why expect – it has to do with the return of read_line (an io::Result) – there is an expect method on this type that takes the value it’s called with and panic!s with that message
  • cargo.toml vs cargo.lock – commit the lock file for binaries, leave it out for libs. There are ways to update the lock file – sometimes it’s automatated and sometimes not, see here
  • hm, not a fan of this: have to call extern crate rand; then use rand::Rng; to get the Rng train in scope, then we can call rand::thread_rng.gen_range(1, 101); – need to learn more about traits, I think..would be nicer if there was an explicit use of Rng
  • this statement: let guess: u32 = guess.trim().parse().expect("need a number!"); converts guess into an int (and feeds rust the expected type, u32)
  • switch from expect to match if you actually want to handle the error (and not panic)
  • _ is a catch all – like if you don’t know what error type you might raise

  • error handling

    • unwrap gets the result of a computation and panics if there was a problem
    • both the Option and Result implement unwrap
    • you’d typically have a function return an Option and then, in the caller, use match to handle the Some and None possibilities
    • in fact, unwrap does case analysis for you, and just panics on the None result
    • map is often used to handle the None => None, Some(value) => Some(f(value)) boilerplate
    • there is also unwrap_or which allows None results to translate into a default value
    • Result is a “richer” version of Option – it expresses the possibility of Error(E) or Ok(T)
    • the docs say you should use Result when you can – when it’s possible to explain why something failed
    • there are “Result idioms” like Result<i32> which fixes the error type to a particular result (like ParseIntError) for convenience
    • unwrap is not conventional unless you’re just writing something quick or when there truly is an error in the code that unwrap would expose – expect is the equivalent on Option types – it just gives you something nicer to print
    • convert an Option to a Result with ok_or
    • and_then chains computations when there could be an error
    • you can also return early with the explicit return keyword
    • and the try! macro abstracts away the early return pattern (it returns the value or an error) – more notes on try! here
    • rule of thumb: define your own error enum, but a String is ok too, especially in apps:
#[derive(Debug)]
enum CliError {
    Io(io:Error),
    Parse(num::ParseIntError),
}
  • closures
    • syntatic sugar around traits.. a section I have yet to cover
    • pipes denote closures (lambda expressions):
let plus_one = |x: i32| x + 1;
assert_eq!(2, plus_one(1));
  • variables and bindings

    • “shadowing” is just establishing a new variable binding with the same name as another binding that’s currently in scope – just calling let again.. You can also change the mutability of a variable when shadowing
  • functions

    • rust is an expression-based language and expressions (unlike statements) return values – one declaration though is let
    • vars can also bind to functions: let f: fn(i32) -> i32 = plus_one;
  • primitives

    • u8 is an unsigned 8-bit integer (0 to 255) while i8 is a signed integer: -127 to 128, I believe
    • slices – to get everything: let complete = &a[..];
    • you can destructure tuples with let, ala: let (a, b, c) = (1, 2, 3);
    • tuples you access with dot notation: tuple.0, whereas arrays use brackets: array[2]
  • comments

    • rustdoc and the doc comments (\\\) seem cool..
  • if

    • if is an expression so you can do let y = if x == 5 { 10 } else { 3 }; because the value of the expression is the value of the last expression in the chosen branch
  • loops

    • loop just goes infinitely and the compiler handles it better than while true
    • you can enumerate for loops: for (i, v) in (5..10).enumerate() { .. }
    • break and continue work as they do in python
    • cool – “loop labels” let you break or continue out of a specific inner or outer or somewhere-in-between loop: 'outer: for x in 0..10 { .. }
  • ownership

    • if you bind a variable, then pass that var to a function, the fn now owns the var and you won’t be able to use it..so you use refs
    • traits are annotations to types that change the behavior of the type
    • all primitives implement the Copy trait so that copies of data are assigned if a variable is used – so really that first bullet I wrote is only applicable to non-primitives, like for Vectors for instance
    • (aside: you can use a leading underscore on unused vars to prevent a compiler warning)
    • references are used in arguments: fn test(v1: &Vec<i32>) { .. } -> i32 and thus also in args: let answer = test(&vector);
    • references are immutable – so you can’t push new values into a vector reference
    • you can also do weird stuff with &mut refs – you may see the deref syntax with * for stuff like this
    • there are borrowing rules – one or more &T refs to a resource, but /only/ one mutable reference &mut T to prevent data races
    • scopes matter and the compiler will give you good info about where scopes begin and end
    • even printing borrows: println!("{}", x); – that’s a borrow on x
    • this all prevents stuff like..modifying collections over which you are iterating, and use after free
    • declaration order matters! you need to define let x = 5; before you can: let y: &i32 if your plan is to y = &x;
    • lifetimes: declared in the <> of a function call, ala: fn bar<'a>(x: &'a i32) – other “generic parameters” can be in the <>, by the way
    • could also have &'a mut i32 as “a mutable reference to an i32 with the lifetimes 'a
  • strings

    • unicode scalar values encoded as a stream of UTF-8 bytes
    • &str: string slices – has a fixed size
    • string literals are statically allocated (saved inside the compiled program) – let greeting = "hi there"; is of type &'static str
    • you can tell if you’ve got a string slice with the 'static lifetime if the data of the string lives in the code itself
    • multiline via \ or just continue writing to preserve the whitespace
    • Strings are heap-allocated and growable – they are often created by converting a string slice via to_string
    • convert the other way (String to &str) via &..or sometimes &* if a trait is needed
    • use .as_bytes() or .chars() to index a string
    • slicing works, though you get byte offsets, not char offsets
    • concatenate with + or with format – though the latter may not be the most efficient

piston tutorial

  • impl provides the ability to use the “method call syntax” – circle.area() or something chained like finances.report().send()
  • they also recommend the builder pattern

other notes

  • you can use cargo like pip via cargo install pulldown-cmark – this looks to crate.io for the referenced markdown processor and, if a binary target is specified in the repo’s Cargo.toml, it’ll install to your system. ..Although the two markdown bins I just installed aren’t working for me :| markdown and pulldown-cmark are both having issues.
  • rust-clippy and cargo-clippy are neat linters
  • a set of notes on cross-compilation
    • seems like if you want to distribute a binary for windows or some other platform, the advice is to build it natively on that platform, or take advantage of CI services like Appveyor
  • this repo takes a swing at many of the Rosetta Code tasks
  • the rust-beginners IRC channel is awesome

on building web services

server.get("/", middleware! { |_, mut response|
  response.set(MediaType:Html);
  let path = Path::new("assets/index.html");
  return response.send_file(path);
});
  • the diesel ORM examples look good
  • and maud is a templating engine that might be handy

rustlings exercises

  • hm gotta read more on borrowing..why is the type signature of &array different than array (if let array = [1,2,3];)
  • there is no assert_not_eq! macro but you can just negate stuff with !: assert!(!false)

pnkfelix exercises

Alex Crichton’s talk (slides)

  • there is explicit memory management and reference counting if you want it via std::rc::Rc
  • values can be frozen by borrowing – his example:
let mut a = Vec::new();
{
    let b = &a
    a.push(1);   // error
}
a.push(2);  // valid -- the b borrow has ended
  • mutability propagates deeply into owned types (think nested structs)
  • spawn(proc() { .. }); is a nice way to achieve parallelism
  • an enum can have data – and you can do something with that data when you match
  • & pointers are never null – but you can have the Option type be None
  • there can only ever be one mutable pointer to your data
  • most FFI happens in unsafe blocks since you can’t guarantee a foreign function’s signature or how it affects the stack

too many lists tutorial

  • Box<T> provides simple heap allocation – see this answer for more: “One way ot regard Box<T> is that it’s a normal T with the guarantee that it has a fixed size.”
  • on non-static methods: & is for methods that only want to observe self – most methods actually don’t want self as taking it would prevent other locations from accessing it
  • I think Option as well as Some and None (Option variants) are part of the prelude import
  • mod creates modules (and adjusts namespaces.. you might need something like use super::List;)
  • and the [cfg(test)] block tells the compiler to only compile a block during test
  • the Drop trait is a destructor – if you contain types that implement Drop, you don’t need to implement it on your own
  • impl Drop for List is the syntax for implementing a trait (an interface)
  • while let will do a thing until a pattern doesn’t match
  • Option has a method, take, that is very similar to the mem::replace(&mut option, None) idiom (taking the value out of the option, leaving a None in its place)
  • match option { None => None, Some(x) => Some(y) is also a popular idiom – we use map + a closure (the | syntax) for it
  • generics..just sprinkle in the <T>s – see this example – used it to make a linked list handle datatypes besides i32
  • lifetimes: the name of a scope somewhere in a program – when refs are tagged with a lifetimes, we’re saying the ref must be valid for the entire scope
  • you work with lifetimes at the type and API level, telling the compiler about the relationship between different lifetimes – this isn’t necessary within a function
  • examples of lifetime elision:
    • one input reference means the output must be dervied from that input
    • many inputs: assume they all have independent lifetimes
    • with methods: assume all output lifetimes are derived from self
  • and there is more with Arc..maybe later

learn x in y minutes

  • an example of “more advanced matching” where there is an additional if tree – that is apparently a “match guard” – see here
  • terse examples of references and borrowing

my friend Suchin’s post on matrix multiplication

  • nice example of fold and ArrayView

learn you a rust for greater good

  • talk of borrows and mutable borrows in the second part especially
  • by default, variable bindings have “move semantics,” but some types implement Copy (giving them “copy semantics”), meaning we don’t have to pass around refs all the time – this is true for the primitives

24 days of rust

the rust-lang blog

  • on traits
    • the cornerstone of abstraction in rust – “the sole notion of interface”
    • they can add methods to an externally defined type
    • interfaces specify the expectations that one piece of code has on another, allowing each to be switched out independently – for traits, this specification largely revolves around methods
    • (see the simple examples with print_hash in the linked post)
    • like templates in C++, these abstractions will compile down to concrete code, but unlike in C++, code that consumes traits will be type-checked in advance
  • on FFI

the book’s notes on crates and modules

the glium tutorial (OpenGL)

⊃ Matthias Pliessnig and Curtis Buchanan

a short profile on the woodworker, Matthias Pliessnig:

The boat he built at RISD looks amazing – the Montfort Classic 12. I’d love to see more of his boats and more info on the steaming process.

and another profile on the chairmaker Curtis Buchanan:

“Contentment is underrated,” he says. Love the story on how he got started with this – over the last 30 years he’s probably made 1500 chairs.

⊃ git on El Capitan

Saw this article on open vulnerabilities with the version of git on El Capitan. The gist is that apple makes it hard to even find where the binary is located – the process of actually symlinking in a new one is even more obfuscated.

I installed an upgrade with homebrew, moved the old git binary and linked in the new binary:

$ brew install git
$ sudo mv /Library/Developer/CommandLineTools/usr/bin/git /Library/Developer/CommandLineTools/usr/bin/git-apple
$ sudo ln -s /usr/local/bin/git /Library/Developer/CommandLineTools/usr/bin/git

git claims to still be in /usr/bin, but the version is now correct so it seems fine:

$ which git
/usr/bin/git
$ which git
git version 2.8.0

⊃ Puerto Rico 2016

Puerto Rico! Got to see San Juan, Vieques and Culebra, very fun.

⊃ servers

various admin notes:

  • I have fail2ban setup on kepler, I used this to open up access to port 8888: sudo /sbin/iptables -I INPUT -p tcp -m tcp --dport 80 -j ACCEPT
  • sometimes images don’t get the right permissions when I put them in static/img – check permissions with stat -f %A reference-file on a mac, and set them with something like chmod 644 target-file

⊃ setbot

I’ve been working on a computer vision system that can play the card game Set. It recently beat a human! (Though it was a close game..) If you want to read more about the pipeline and see the code, the github repo is here.

setbot posing by the window

  • an overhead webcam watches the game
  • a “white isolation” technique finds all the cards
  • each card is sent to a keras-based convolutional neural network that was trained on about 200k cards – the classifier reports the cards characteristics
  • once the board is understood, setbot picks three card combos and checks if they are sets
  • all identified sets have an attached probability (based on the classifier’s confidence in card-identification), so, in the typical game play mode, it outputs only the “most likely” set, for example:

gameplay window

image recognition with OpenCV

  • I converted the webcam image to HSV space and set thresholding ranges to find bright white contours – I tried HSL but had no luck for reasons I can’t remember
  • this post on playing 24 was very helpful – I used a lot of similar techniques

the classifier

  • I’m still learning about designing these things so I used a CNN topology from some keras docs – there are four convolution layers and two max pooling layers. The full model image is here
  • there is dropout but no regularization
  • I used the Adam optimizer but didn’t find it to be drastically better than SGD
  • this gets to about 90% accuracy in training and seems about as accurate with real life images
  • I found some weird inconsistencies with keras – increasing the size of the validation set would make my model stop converging during training..which makes no sense. So I think there is some weird instability there – I’ll probably look to vanilla tensorflow from now on. Though keras was rapidly getting updates as I was using it, so it may be fine now.
  • the keras-based code is here
  • I trained on an 8 core digital ocean box (CPU only, unfortunately) – though once I started using a generator to feed in the training batches, I was only using about 4GB of memory so I could’ve downgraded

image generation

  • the synthetic images were fun to make – here is one of them, a particuarly obscured variant of 3-red-striped-oval: synthetic-image
  • the code to generate the vanilla SVG cards is here
  • and the code to mutate these cards to create a large number of variants is here
  • I originally used 50 x 78px images but found that I was losing color info and having a hard time discriminating between, like, green and purple empty shapes. So I went up to 100 x 156px images at the expense of training time.

next up

  • demo video or something..
  • actuators to slap the cards? there was talk of pneumatics, robot arms, mousetrap-like things..
  • maker faire?
  • better performance in verbose mode (aka trashtalk mode)
  • an app
  • a website?

⊃ alphago

on the DeepMind Challenge:

⊃ nw earthquake

reading the very popular new yorker article from last year:

⊃ food preservation

I attended a nice talk on food preservation at the Mountain View library:

  • The lecturer (Candace?) is a Master Food Preserver – similar to master gardener designation, they are connected to extension programs which are themselves connected to land grant schools (these would be interesting to read more about)
  • the seek to be research-based and fight “gramma’s way” – she really stressed that you should only use approved recipes
  • preservation is pretty broad – there’s canning, freezing, fermenting, pickling, dehydrating..
  • she likes making..
    • ketchup – very different than store bought, she says. Tomato sauce on the other hand, not that different from a good quality store-bought sauce
    • green chiles, hatch variety from NM – you can make salsas or freeze them and make chile rellenos. First you roast them (or get them roasted from Molly Stone’s at the end of August).
    • brandied cherries
    • pears in rum
    • quince jelly
    • feijoas
  • surpsing preserved things..
    • salmon candy
    • diy canned tuna (pretty good apparently) – it’s trendy to age it now, or at least there was a New Yorker article about it
    • pickled watermelon rind and pickled pears
    • persian cucumbers for pickles
    • Njera bread is fermented, as is soy sauce – made with vegetables in a salt brine
    • freezer jams are a thing
  • when making acidic foods use a water bath canner (jelly, preserves), low acid foods need a pressure canned (meat poultry fish, corn)
  • can’t use a pressure cooker though.. not necessarily airtight and cools too fast
  • recommended jars: Ball or Carr, need two part lid – but don’t actually store with the ring on! This traps moisture.
  • when you can the lid should be super tight and won’t have that trampoline feel / sound
  • also need to follow specific recipe, need acidity in the right neighborhood and need appropriate heat in the bath
  • jar will be super hot before it’s filled, need them that way so they dont break under heat shock
  • quart is the biggest you should go for home canning, fermenting can go bigger
  • boiling water better than those steam canners, she thinks
  • siphoning happens – may see some tomato juice in the boiling water
  • useful books:
    • So Easy to Preserve
    • The Ball Complete Book of Home Preserving
    • USDA Guide to Home Canning
  • useful sites:
  • have to use special salt for fermenting – kosher or pickling or noniodized sea salt
  • can microwave herbs for preservation… Cool
  • dipping in ascorbic acid can prevent browning before dehydrating or freezing
  • canned food is safe and good for a year, two years is safe but not as tasty

⊃ nefertiti scan

Lots of interesting articles about the validity of the recent “Nefertiti Hack” – an alleged “heist” of the bust of Nefertiti on display in Neues Museum Berlin.

Two artists recently put a very detailed 3D model of the bust online, and they claim it came from a scan they made with a Kinect hidden beneath a scarf. Others now claim that the scan could not have come from a Kinect, and this may be a coverup for a true digital heist of the 3D model from the museum’s servers, or a sympathetic party within the museum gifting the model to the artists.

Very elaborate! See here for more.

⊃ Stephen Wolfram

The CHM had Stephen Wolfram talk on his biography and, incidentally, a history of Mathematica.

  • ipython took a lot from the wolfram language’s interactive demos..still cool though, dynamic edge detection didn’t draw too much applause though, hah, maybe old hat for this audience..
  • hand drawn lunar orbiter..cool
  • truly a kid inspired by the space program..a living, breathing example
  • he sent in a paper called hadronic electrons? At what age?
  • he just posed scientific questions and found the answers in books or on his own
  • early user of computer algebra, possibly the largest in ‘76 – would devise huge formulae in his papers with MACSYMA
  • known for cellular automata, particle physics
  • Caltech PhD at 20

wolfram

  • started working on his own symbolic manipulation language, SMP
    • told by Rob Pike to use C, “the lang of the future”
    • encrypted the source and lost the pw…lol, they recently broke it recently with crowdsourcing
    • ran it for us live in a vacs emulator
    • it’s very similar to Mathematica – ASCII graphics, random expression generators
  • worked with Sutomo shimoro
  • started Mathematica at Urbana Champaign
    • Steve jobs insisted on the name “Mathematica” over omega
    • contends that TBL’s group at CERN bought NeXT machines to use Mathematica – and TBL used those machines later for a networking experiment..
    • for 10.5 yrs Wolfram used Mathematica for basic research, working as a self described hermit CEO
    • cool story of an American aboard MIR, Dr. Mike Foale, using Mathematica to analyze the tumbling space station – detailed further here
  • finished “A New Kind of Science” book and he came back to his company to work more on the tech – incidentally many Amazon reviewers find that books insufferable :/
  • the company used Wolfram tones, cellular automata to create ringtones – hah, even internally they thought it was silly
  • Launched WolframAlpha via Justin.TV cause they were worried live visitors would crash the site
  • WA powered Siri for a while.. Bing too
  • Sergey interned there..advocated giving software away
  • wants to get Wolfram lang into financial prod systems
  • Mathematica used in LIGO, proof of Kepler conjecture (although this page suggests it’s incomplete), AdSense algo, Skype codec..
  • Emerald Cloud Lab uses Wolfram language behind the scenes
  • rule 30 of a cellular automata – the chaotic triangle, very cool to experiment with
  • A single page of axioms can describe most of math..and 3M theorems – what are standard axioms, sheffer axioms, basic axioms?
  • 3M person years of work, that’s math.. Lots of duplication too, he says

⊃ homebrewing 101

I went to a talk on homebrewing at the Mountain View library.

There are lots of books and online resources about the subject – here are some small tidbits that caught my attention:

  • extract brewing vs full grain brewing: the former is smaller scale and takes less equipment, the latter is more extensive, offers more control
  • the time at which the hops get added affects the bitterness and floral taste
  • SRM is a color scale – 35+ is a porter, pilsner is like a 2
  • IBU is a bitterness scale – but bitterness can be masked by malt and other flavors, and the human palate can only distinguish up to about 110 IBUs according to this site
  • specialty, no rinse sanitization products exist, and are recommended – makes cleaning way easier

⊃ kitchen stand

Making a kitchen stand to go next to our stove..

top surface

I’d like the top to be a cutting surface – a butcher block design with endgrain facing up is popular. People say this is easier on knives as the grains can split a bit and give way as the blade descends. The fibers can also reseal themselves and the surface won’t show as many cutting marks.

Designs using softwood endgrains can be better than those with hardwood edge grains – more info here. I’m all for using cheaper softwoods.. Though boards made with maple, oak, beech, walnut, cherry or ash will be more ding-resistant and less likely to warp. Bamboo is a popular choice too.

some projects:

Should the top be detachable..? That’d be interesting.. but it doesn’t seem to be a common thing that’s done. Looks like protecting and cleaning these boards isn’t too hard.

pricing the top surface

The top surface will be 25 14 x 29” at most -> 732in2.

  • 1.5” thick -> 1098in3
  • 2” thick -> 1464in3
  • 3” thick -> 2196in3
Menard’s cherry 1 x 6 x 8’ $41.25 $0.104 / in3
red oak 2 x 6 x 8’ $76.69 $0.096 / in3
soft maple 1 x 2 x 10’ $14.20 $0.105 / in3
Bruce Bauer #2 pine 2 x 2 $0.69 / lf $0.026 / in3
redwood 2 x 2 $1.00 / lf $0.037 / in3
maple 1 x 4 $2.70 / lf $0.086 / in3
maple 2 x 2 $8.48 / lf (!)
OSH oak 12 x 3 x 2’ $3.49 $0.097 / in3
HD “whitewood” 2 x 4 x 8’ $2.56 $0.005 / in3

so a 3” thick surface made from 2x4s should cost around $11 – seems like that’s worth a try.

stand designs

several months later..

I used oak + spruce + pine and apparently got a piece of beech too, according to George:

island tabletop dryfit

me and Nora hand planed the top quite a bit..

island tabletop planing

I stained with danish oil and made a stand with maple, came out pretty good!

kitchen island completed

⊃ Nissan Leaf api

Oof – some folks have found an unprotected API into the Nissan Leaf.

The car has an associated app that can set charging parameters, control the AC, show trip history and even expose a bit of owner data. The authors found that a straightforward, unprotected HTTP API supported the app, allowing curious people to enumerate VINs until they found someone else’s car.

The authors were able to find a valid VIN in a short amount of time by just testing variations on the last five digits of a known good id. They also pointed out that, since VINs are often visibly displayed on the front window of a car, you could just read them off..

Thankfully the Leaf doesn’t provide remote unlock or remote start via the app.

(Reading this guy’s blog led me to a note on a Rolls Royce model with GPS-aided transmission ..insane.)

⊃ silverware tray

I’m making the most over-designed silverware tray, haha. Spent a long time drawing ugly utinsels in Fusion to come up with this innovative design :|

silverware tray render

I can’t get the align tool to work so, according to the design, my cutlery will be levitating..

And here’s the final thing! Completed with birch plywood and some hot glue joinery. Now to get some more silverware..

silverware tray finished

⊃ And The Pursuit Of Happiness

Spent this morning reading And The Pursuit of Happiness, Maira Kalman’s wonderful collection of watercolors and meditations on the country, her travels and her life.

Much (all?) of this book comes from her NYT blog of the same name. I love her portraits.

fritz

green-jacket

And her thoughts on invention:

Everything is invented. Language. Childhood. Careers. Relationships. Religion. Philosophy. The Future. They are not there for the plucking. They don’t exist in some natural state. They must be invented by people. And that, of course, is a great thing. Don’t mope in your room. Go invent something. That is the American message.

⊃ shadertoy eyeball

Working on an eyeball in shadertoy after Otavio’s tutorial.

I have a long way to go – it’s such a mind bending new way to look at drawing stuff on a screen.

⊃ uber redesign

“By trying to be everything to everyone, Uber has lost its recognizability as a brand.” From Eli Schiff’s notes on the Uber redesign.

I don’t know much about “brand science” (is that a thing?) but his article points out a lot of bad-sounding practices – micromanagement coupled with prolific and somehow unguided design.. color palletes for everyone.. a radically different logo..

I read elsewhere about their design team – there are 200 people! Maybe the rebranding effort was led by a smaller team..

⊃ Scalia

I was reading Scalia’s obituary in the NYTimes and, of course, getting caught up in his history and what the nomination of his replacement will entail.

The man’s practices are hard to stomach – his following of “originalism” (an attempt at interpreting the consitution in a manner similar to that of the document’s authors) led to very conservative outcomes, but he also famously cast doubt on the theory he championed: “I am a textualist. I am an originalist. I am not a nut.” An author, David A. Strauss, replied, “If following a theory consistently would make you a nut, isn’t that a problem with the theory?”

He disparaged homosexuals and opposed abortions on the grounds that they were not “rooted in the traditions of the American people.” He called for a federalist interpretation of the law, yet sided with the majority to halt Florida’s recount in Bush v. Gore.

This distressing reading was alleviated by the discovery of his close friendship with Ruth Bader Ginsberg.

Scalia and Ginsburg

Which then led me to reading about the Notorious RBG ..might have to pick that one up!

⊃ js dev

Hilarious post from @pistacchio about developing in javascript nowadays –

I’d say I’m in largely the same analysis-paralysis loop when it comes to upping my js skills. I guess I’d still go back to React and maybe check out Clojurescript, but the compiling / testing / packing landscape is quite confusing to me.

⊃ LIGO

The LIGO team today announced they have positively identified a gravitational wave – an amazing discovery. The livestream:

  • 1.3 billion light years away, two black holes coalesced and produced a 10^-21 peak strain in two of the LIGO detectors – the deflection itself was, according to this post, about 10^-18 m, or about the size of a hydrogen nucleus
  • they detected a signal at the Livingston, Louisiana site (?) and then, 7ms later, the same signal at the Hanford, WA detector
  • they know it’s a black hole merger because the signal aligns so well with a numerical model of such an event
  • see Physical Review Letters for the paper
  • whoa, the frequency of these waves is in the human hearing range, cool – they shifted it in frequency and it sounds like a bubble popping underwater
  • these gravitational waves are strains in space and move at the speed of light, as predicted by Einstein. He actually did some back of the envelope math to examine these ideas further, but concluded no detector could be built for empirical validation
  • since Einstein, astronomers have found compact objects to observe (neutron stars and these binary black holes) and the detectors have gotten far more precise
  • the mirrors and lenses in the sensors are suspended to handle Earth’s vibrations
  • Kip Thorne worked on this too – he was the science advisor for Interstellar
  • during the 20ms of the collision the power output was 50 times greater than the power output of all of the suns of the universe..wow. The energy output was what you would get if you annihilated 3 suns and turned them into gravitational waves.
  • Kip points out that past views into the universe were exclusively electromagnetic, so he expects we might find even more interesting things

⊃ podcasts

some podcasts I like:

  • TAL – the classic
  • ReplyAll – various internet-related stories
  • SneakAttack – funny show that’s just a recording of five people playing D&D
  • The Specialist – tales of odd jobs, made by a friend
  • TLDR – an older show from the main ReplyAll guys
  • Mystery Show – Starlee Kine converses with the world
  • Planet Money
  • Freakonomics Radio – really excellent in-depth looks at the minimum wage, deliberate practice, payday loans, guaranteed minimum income and the list goes on..

⊃ Frank Howarth makes sake trays

Great video showing Frank Howarth making sake trays for a Japanese restaurant:

It’s a cool view into a prototype / feedback cycle.

⊃ headspace

The headspace app has been huge for me – I’ve used it since the last few months of 2015, doing 10 and 20 min exercises about every day. I especially appreciate the focus on “the now” that the app helps remind me of. And it has helped me bring a feeling of peace and well-being to my life. The “creativity module” has been challenging but rewarding, some very interesting techniques in this one.

And the “7 Weeks” app has helped me continue the practice of this and other good habits – I can definitely recommend that tool as well.

⊃ the Dean Scream

A short documentary on the “Dean Scream” from 2004.

  • bonus Dave Chappelle clip..
  • probably with the benefit of hindsight, Dean himself says in the piece that he succumbed to his desire to get people fired up
  • before the Iowa caucuses he was around the top of the pack (alongside Kerry, Edwards and Lieberman), but was slipping a bit because of his temperament – or the narrative about it
  • he gave the famous speech on the night of the Iowa vote, a vote in which he came in a disappointing third
  • other footage from the event shows it was really loud – there were 3500 people there
  • he actually cracks a pretty funny joke about it..
  • and most people in the documentary claim that, rather than the scream being his undoing, it was his lack of a result in Iowa that sunk his campaign (he left the race soon after)
  • he was the first candidate to raise millions online, but failed to galvanize more than the youth vote

⊃ ideas for 2016

some things to do this year..

places to see

  • go to the Lost Coast
  • ride around Desolation Wilderness
  • styx pass
  • do a double century..or maybe a double metric (125 miles)..that’d be a good start

things to make

  • two chairs
  • finish quandry
  • take something to Maker Faire (in May) – setbot was too late :((
  • bean bag bot
  • primitive bow / red oak bow on instructables
  • projector + imu – “window” into another space

things to learn

  • work on that linear algebra book in rust – or simple classifiers in rust
  • AVC car dynamics test
  • the “home court advantage” idea – is it just about the time spent traveling?

⊃ udacity deep learning lesson two

notes from lesson two of Udacity’s Deep Learning course

linear models

  • limited in their number of parameters – (N + 1) * k where N is the number of inputs and k is the number of classes
  • The calculation of the number of params is instructive – recall for a 28px x 28px input image, the weight matrix is 28 x 28 x 10. This is because you need to multiply each pixel in the image by some weighting parameter, or specifically one param per class. (And remember that we turn the image into a giant 1 x 784 row vector and the weights into a big 784 x 1 column vector – there are ten sets of these so the biases are just one value for each set. So don’t get too caught up in the dimensionality of results.) This makes for 28 x 28 x 10 + 10 = 7850 total parameters.. which is actually small for our purposes, says the instructor.
  • linear models are efficient (via GPUs), stable (in that small changes to the input result in small changes to the outputs), and their derivatives are constants
  • but we also want to model nonlinear events..enter the ReLU – ReLUs evaluate to zero on x < 0 and y = x on x >= 0
  • we can now inject ReLUs into the network instead of using a single matrix multiply – this creates a nonlinear function and gives us H, the number of ReLUs in use.. we can make this quite large

network of relus

another view of the operation stack:

NN operation stack

  • and differentiating this stack is not that difficult because of the chain rule – we just multiply the derivatives of each element
  • back propagation is this breakdown to determine the gradient of the weights – differentiating the model at some point in time

TensorFlow

Getting the first taste of TF in Assignemnt Two – with TF, you first setup a computation graph within with graph.as_default(). This includes..

  • bringing in the data
  • initing the weights and biases
  • setting up the logits (train data * weights + biases)
  • calculating loss (via the average of the cross entropy of the softmax)
  • setting up an optimizer with some learning rate
  • and setting up some predictions (for later reporting)

Then, within another block with tf.Session(graph=graph) as session you..

  • run an init to actually do the setup described above
  • run the optimizer, compute the loss and get the trained predictions (all in one line)
  • occasionally print the validation accuracy
  • and finally print the test accuracy

I’m not understanding how the ReLUs are applied by TF here.. ah, we have to add them via the assignment!

We can also use SGD via a slightly computation graph employing placeholder nodes – an offset is picked randomly to form a randomly selected minibatch. The placeholder nodes are fed this minibatch within the session.

ReLU setup, two-layer network:

  • if you have 1024 ReLUs, you’ll have a weight matrix with dimensions input-data-size x 1024 and a 1024 x 1 bias vector
  • your first set of logits will be a ReLU-wrapped Wx + b computation
  • then you’ll have the “output layer” with its own set of weights that is 1024 x number-of-labels in size plus a bias vector that is number-of-labels x 1 in size
  • the output logits will be Wx + b where x is the hidden layer output
  • then this output will be fed into the softmax function (or something equivalent)..

Another set of TF tutorials. Keras also looks awesome.

⊃ headphone case

Making a case for my audio-technica ATH-M30 headphones.. mediocre headphones at best, but they get banged around in my bag which doesn’t help things.

Here’s a model of the headphones themselves, I wasn’t sure how to best approach this so I started modeling the earpiece, and then worked around from there.

headphones model

The headband didn’t “close” into a proper semicircle when I mirrored everything – you can see the little imperfection on the far right.

Here’s Keqing making an awesome headphone concept with mostly sculpting tools:

Based on his video, maybe I should’ve built everything but the headband, mirrored all that, and then dragged the headband around in a single, unmirrored arc.

⊃ prius key

Working on an enclosure to fit my electronic key fob – it’s needed to start the car, but also has lock, unlock and alarm buttons. It uses a little CR2032 battery.

the gameplan..

  • model the PCB of the key
  • model the battery
  • create an enclosure:
    • sculpt a form slightly small, then pull it out to be larger than the PCB
    • shell it, probably with an inner shell if you’re going to expose button holes
    • split it in half
    • add registration holes on the rim of the shell so you can get a press fit and close the case – here are some press fit tips
    • make some little cutouts in the case so you can separate the shell by twiddling a screwdriver against the edge
    • the plan is to put the PCB in the case button-side down, then the antenna and battery connector will be facing up – set the battery in place (:/) and press down the top enclosure
    • buttons will be accessible via little holes cut in the case – I’ll superglue some fabric over the holes so you can press the button, but the buttons will stay protected
  • send the enclosure for 3D printing

enclosure

What to do about the battery.. I could have it just press into place with the back of the enclosure holding it in. ..But what’s holding the front of the enclosure in place? The PCB has no holes or anything to register on. The original key had a screw-on cover..very annoying.

next rev

  • remember for 3d printing..
    • press fits made in ABS will shrink, so you need to add some room for that (or file)
    • make sure you have flat surfaces
    • consider the support structure that will be needed
    • skip shelling it – just combine/cut with the pcb itself and then expand that cavity – this will make sure your press fit pegs have somewhere to bite

⊃ iowa caucuses

Just saw this amazing video on how the the Iowa caucuses actually work.

It’s a more energetic and entertaining system than I imagined, one that (ideally) allows discourse and neighborhly, one-on-one persuasion to have a larger impact on how one chooses a candidate.

⊃ fusion 360

I just learned that Fusion 360 is free for “enthusiasts” – it’s an excellent modeling program. I tried 123D Design and had issues with it on mac. In the past I’ve used Sketchup, OpenSCAD and, to a limited extent, Solidworks and FreeCAD, but Fusion 360 is my favorite of the bunch.

  • components are containers for bodies – they’re bound by their own degrees of freedom and maintain their own origins and reference planes
  • joints are used to govern positioning and motion of assembly components – as-built joints can be used to setup how parts should move in relation to each other once the thing is actually created
  • the timeline is really cool – editing old sketches is super easy
  • sculpting is also a cool mode that I had not seen before – see the bike seat video below. You can create more organic things in a free-form way.
  • the patch workspace seems to be for creating surfaces, as opposed to the model workspace which creates solid geometries
  • there’s some discussion on parameteric vs direct modeling – it seems like parametric is what I’m used to from Solidworks: defining variables and basing designs off of other surfaces and dimensioned drawings. When one edge moves, others react accordingly.. Direct seems to be just any nonparametric modeling (like the sculpting mode).
  • “Some components have moved” – what’s up with that message (and with capturing position)? It’s there because some operations are position dependent. Here is a well-explained example.
  • hide previous sketches if you want to sketch something new, but you don’t want it to be part of an old sketch
  • consider leaving bodies visible when sketching new components – you can use that old geometry. I have a tendency to hide them or project parts of them. I don’t think it’s always necessary to project..
  • you can project axes into a sketch (though they may be small / hard to find).. that’s a good trick
  • contact sets are very neat – allow you to easily test motion and interference (see the corkscrew tutorial, part two)
  • if you’re trying to select something underneath a bunch of cruft, just click and hold, then highlight the thing you want
  • when making 3D sketches, you can modify a spline, say, or use “include 3D geometry” to bring in other reference edges – note the difference here between that command and “project” which would make the edges coplanar
  • use construct > plane at angle to make planes perpendicular to previously-drawn 2D sketch planes – then you can draw on those newly-constructed planes, projecting in geometry from the old sketch
  • hold command to toggle between the geometry preview and the old geometry / selecting the edges when you are, say, creating a fillet
  • check for interference via the “inspect” menu

renderings

Here’s a bike seat, loosely following the tutorial below, using saddle images from this page and the cloud rendering feature. Mm.. bamboo.. comfy. I kinda messed it up when I tried to put the gap in the seat, oops.

rendered saddle

the board of my car key’s remote – I’m trying to build a new enclosure for the board:

prius key

a swept frame for a pair of glasses:

glasses frame

a flat pack stool:

flat pack stool

a small linkage – this had issues with creating joints on chamfered edges.. the forums say this is a known thing, either chamfer later or don’t preselect before clicking the joint tool:

linkage

a grip:

grip

and a corkscrew:

corkscrew

tutorials

there’s lots of good instruction within the F360 app itself – here are some other videos that were helpful:

ideas – things to practice on

⊃ brain software

A friend recommended this wait but why article on Elon Musk.

  • The author believes there is a technique in use whereby desires from a “want” pool are combined with possibilities from a “reality” pool and formed into goals via some continually adjusting strategy. He recommends you nimbly adjust each of these pools based on your personal changes, changes to the world and results from your previous actions. And the reasoning behind these decisions should be built from the ground up – Musk calls it building from “first principles.” It seems to mean to think for yourself and be cautious of prejudice.
  • Elon wanted to be an engineer rather than a scientist because the former provides the data that the latter depend on.
  • He wanted to do “wondrous” things, but also things important for humanity’s future.
  • The author argues Musk’s “brain software” is well understood by Musk himself, and that it is constantly optimized for the changing world.
  • I think the author is wrong about his Great Depression point – that was 1930, more than two generations away from now, there aren’t many people raised by parents that grew up in that era.
  • He also conflates dogmatic thinking with trusting the ideas of others – it’s not practical to say that we should reason everything out from first principles on our own.
  • Bonus Feynman quote: “I was born not knowing and have had only a little time to change that here and there.”
  • The author is pushing us to examine our own “brain software,” and to search for unjustified certainty – there should be verified data backing those thoughts, if there’s not, you’ve found dogma..
  • And a nice Jobs quote: “Life can be much broader once you discover one simple fact. And that is: Everything around you that you call life was made up by people that were no smarter than you. And you can change it, you can influence it, you can build your own things that other people can use.”
  • Other points line up well with Otavio’s points: fail with enthusiasm, understand the kinds of risk you’re facing..
  • Don’t get trapped in your own history – more good advice.

⊃ ipython

how I use iPython

  • I have a server image on digital ocean that’s setup with iPython – it can be created from scratch in just a few minutes when I want to create notebooks (remember to copy over your ssh keys though..sometimes my user data isn’t there)
  • start the server with jupyter notebook --no-browser --ip=\* --port=8888 --certfile=jupyter-cert.pem
  • I created an SSL cert and a notebook config with a password, as per the jupyter guidelines
  • I sometimes use this omnibox shortcut trick to link directly to the server.. though I have to frequently update the search engine with the droplet’s IP since I destroy the droplet after using it

⊃ bitcoin

notes on an older video, How Bitcoin Works Under the Hood:

  • just a group of computers maintaining a ledger – emphasis on group, rather than some central banking authority
  • everyone knows about everyone else’s transactions
  • every transaction is signed and verified with public/private keys
  • so you can prove who you are on the ledger by generating a signature with your private key and a transaction message
  • there is no “account balance” per se – just a large transaction chain and an index of “unspent transactions” which may be used for future payments
  • there are 10^48 possible bitcoin addresses – people often create a new public/private keypair for each transaction, or even multiple for a single transaction (though they could still be linked back to the same owner)
  • somehow nodes have to agree on transaction order..tough in decentralized systems – the block chain attempts to solve this (different than the transaction chain)
  • the block chain is a big linked list that grows over time
  • transactions in the same block are considered to have happened at the same time – TXs not in a block are considered unordered / unconfirmed
  • nodes can group a bunch of these unconfirmed transactions into a new block and can broadcast this “block suggestion” to the rest of the network
  • they also have to input some random number into the set of transactions (a nonce) – the SHA256 of the transaction set + the nonce + the previous block’s hash must be below some threshold, this is the “mathematical puzzle” that must be solved for the block suggestion to be valid. A typical computer would take years to solve that puzzle, but the whole network takes only about ten minutes. (In fact, the puzzle is calibrated such that it takes ~10min, despite increases in the network hash rate.) The “puzzle” is in place to make it unlikely that two people will submit block suggestions at the same time.
  • This solving of transaction blocks is mining, and it is rewarded by a small number of BTC. The BTC reward is halved every four years.
  • Only 21M BTC will be produced in total, the last in 2140. There are about 11.4M in circulation as of 2013.
  • Miners also receive transaction fees that can be optionally included in each transaction (an incentive to keep mining when the reward is small). So sending money with BTC in the future will not be free as miners will prioritize the transactions with larger fees.
  • it can happen that blocks are solved at the same time – then the blockchain splits. This split is resovled once the next block is solved as clients will use the longest chain available.
  • blocks can’t be precomputed and injected into the network at a convenient time because the SHA256 includes the previously solved block’s hash
  • cool: the network hash rate is constantly increasing as specialized hardware is developed and more nodes join the network – this results in faster block solutions. But every two weeks the network’s software recalibrates the “puzzle” to keep the block solution time at about ten minutes.
  • Litecoin, by comparison, operates with a 2.5min block time. A smaller block time means waiting a shorter amount of time before a transaction is verified by the network, but it also means it’s easier for the blockchain to fork, and mining guilds could sieze control of the chain in an easier fashion.

⊃ shazam

This is a great video on the data from Shazam – they show how to predict hits based on when a song gets “shazamed,” and also show which parts of a song are the most interesting to people

⊃ rides

memories of past rides, and places I’d like to go someday..

2016

  • Henry Coe State Park – an overnight ride in the rain..not my best idea

henry coe

  • Acadia NP carriage trails and surrounding Mt. Desert Isle roads
  • South to San Louis Obispo
  • Black Mountain camping

2015

  • around Cochamó, Chile

cochamo

  • the century ride with Patrick
  • Drawbridge (a ghost town) – rode with lots of folks
  • an attempt to summit Mt. Umunhum with Patrick (we never found the base, just a residential gate)
  • biking with Eric and Andrew around Paradise Loop
  • July 4th ride from Gilroy to Santa Barbara with lots of folks, fantastic route!
  • Montebello with Patrick, Ryan and David – we all made it back!

2014

  • riding Pt. Reyes with Nora in the fog :|

2013

  • Alpine Dam (my favorite ride)

alpine dam fog

  • Pt. Reyes with Joe – met that Cambodian guy and his kids while picking fruit
  • Mt. Tam / Point Reyes station century with Joe
  • Muir Woods + Beach
  • lost in Novato
  • night wandering across the bridge

mill valley

2012

  • Mt. Diablo loop
  • backside of Mt. Hamilton and down from Lick – had to get water from the firefighters

lick telescope

  • Pacifica solo and again with D. Moss (or were we elsewhere?)
  • hawk hill x 5 (or maybe just four)
  • one way to Sacramento-ish to hang out with D. Moss and eat all the figs and acorns in his yard

near Mt Tam

2011

  • Boulder wandering
  • Santa Cruz + cruel Mountain Charlie
  • San Gregorio
  • commute to Fremont over the Dumbarton..in the rain..sans towel

2010

  • Sunol

some day..

⊃ readout

I’m playing around with techniques for reading old-school displays.

(This is something I messed with long ago – see /mtrrdr.)

I used ssocr to preprocess and read a picture of my oven:

oven

$ ssocr crop 174 129 197 92 oven.jpg --foreground=white --background=black --threshold=200 --number-digits=-1
$ 350

I’m suspecting more advanced OCR techniques will be needed..

⊃ machine learning

Recently, several projects I’ve worked on have made me want to know more about machine learning.

Here are some online resources I’ve found:

Stanford course notes for the CS231n class

Image Classification: Data-driven Approach, k-Nearest Neighbor, train/val/test splits

  • these are data-driven approaches to classification – take the CIFAR-10 dataset, for example. There are 60k images with labels.
  • We might split this set into 50k images for training and 10k for testing.
  • The latter can /only/ be used to evaluate the classifier’s performance, otherwise we risk overfitting.
  • Within the 50k training set, we might create five “folds” – in cross validation we pick four folds for training and one fold for tuning hyperparameters (this is called the validation fold). This is then repeated with other fold combinations.
  • If cross validation is too expensive, we might only make one validation split, typically with 50-90% of the data going towards training.
  • the nearest neighbor approach compares test images to every image in the training set
  • one hyperparameter might be the distance metric to use in this comparison, another would be how many of the neighbors to use in the voting of the classification – this applies some smoothing to the grouping (and is the “k” in k-Nearest neighbor)
  • NN with pixel-by-pixel comparison can achieve about 40% accuracy in the CIFAR-10 dataset. CNNs are 95% accurate (on par with humans).

Linear classification: Support Vector Machine, Softmax

  • the linear classifier is effectively a function that maps an input image, xi, and a weighting matrix, W to an array of scores, the highest score being the output “guess” – f(xi, W) = Wxi. The notes have a thorough walkthrough of this, including the use of biases and simplifying tricks.
  • the learned weights end up looking like template images – the article has some really cool examples of this
  • preprocessing: center the data by subtracting the mean from every feature and scale things such that they range on [-1, 1] or are otherwise zero-centered
  • to set these weights and train the model, we need something to optimize – we’ll try to minimize a “loss” function and one common example of such a function is the Support Vector Machine. SVM wants the correct class for each image to have a higher score than the incorrect classes by some margin delta. (Note that our algorithm’s “scores” are still thought to be better if they are larger.) “The loss function quantifies our unhappiness with predictions on the training set.”
  • hinge loss or marg-margin loss is the use of a zero threshold in SVM – sometimes “squared hinge loss” is used: L2-SVM.
  • A regularization penalty is often applied to discourage weights with large values – this has nothing to do with the data, it just removes some ambiguities in finding the weights. This penalization also means that no input dimension can have a very large influence on the scores. The final classifier is thus encouraged to take into account all input dimensions to some degree, rather than strongly considering a small of dimensions.
  • The softmax classifier generates scores in the same fashion (f(xi, W) = Wxi), but they are interpreted as unnormalized log probabilities for each class. The classifier also uses a cross-entropy loss. Note that the ordering of the scores is interpretable, but the absolute numbers (or their differences) are not.
  • SVM and Softmax are comparable in performance in outcomes – softmax is never really “satisifed” with the scores it achieves and its loss function will always seek greater deltas between correct and incorrect scores. Whereas the SVM will not penalize certain scores depending on the margin settings.

Numpy sidenote

  • the number of dimensions is the rank of an array
  • multidimensional slices: b = a[:2, 1:3] grabs the 0th and 1st rows (going up to but not including the 2nd row) and the 1st and 2nd col (up to but not including the third).
  • slices are views into the same data – modifying the slice modifies the original data
  • to preserve the rank of an array, use this slicing syntax: a[:, 1:2] versus a[:, 1]
  • you can also look for certain elements based on an expression: a[a > 2]
  • numeric operations operate in element-wise fashion
  • np.dot performs matrix multiplication (dot is also an instance method on array-type objects)
  • via broadcasting, numpy will be somewhat flexible when doing math between arrays – that is, there are some situations when different sized arrays can be multiplied or added together

Optimization and Stochastic Gradient Descent

  • With optimization we’ll have the full suite – scoring, measuring the accuracy of the score, and finding the weights that compute the score. After covering this, the material returns to the first component and develops more complex mappings such as NNs and CNNs. The general ideas of the loss fn and the need to optimize will remain.
  • At this stage, the SVM cost function is a convex shape. However, when this work is generalized to NNs, we will no longer see a convex form, so the typical convex optimization strategies will not be employed.

NNs part one

  • several neuron activation functions are possible (sigmoid, tanh) but ReLUs, rectified linear units, are the most prevalent now – y = 0 | x < 0 and y = x otherwise. Compared to the others, it’s computationally easier to handle.

Notes on the Udemy course Data Science and Logistic Regression in Python

  • minimizing cross entropy is mathematically the same as maximizing the log likelihood – recall that LL is the probability of seeing a certain result given some data (see lecture ten)
  • gradient descent: finding the optimal set of weights that gives the minimum error – you can take the partial derivative of the cross entropy score with respect to the weights and find that dJ/dw = - sum(tn - yn) * xni (lecture 11). So, by gradient descent, the new value of weights, at each iteration is old value - learning rate * sum of (true value - guess) * input
  • note that when outputs have more than two classes, it’s better to use softmax rather than the sigmoid
  • my ipython notebook that looks at weights, the sigmoid, cross entropy and gradient descent is here

Notes on the Udacity Deep Learning course

  • “one-hot encoding” is just using vector labels with a lot of zeros and a one in the slot for the correct class
  • cross entropy, D is given by D(S, L) = -sum(Li * log(Si)) where S is the scores and L is the labels – and remember the S scores have had the softmax fn applied (or something equivalent) so they will be on (0, 1).
  • L1 on cross entropy has a nice “pipeline” diagram on this multinomial logistic classification, D(S(Wx+b), L):

multinomial logistic classification

  • validation set size rule of thumb – if a model adjustment improves 30 samples in the validation set, that is usually statistically significant. So people typically hold 30k samples in their validation set (or more), allowing the validation set measurement to be accurate to 0.1%.

SGD

  • stochastic gradient descent is GD but with sampling – instead of computing the loss over all data, you compute the loss over a small, randomly chosen subset and take smaller GD “steps.” This means far more steps, but far easier computations, as computing the loss function over the whole dataset is very expensive.
  • For SGD to work well, inputs should have a mean of zero and equal and small variance – weights should also be inited randomly, with a mean of zero and have small, equal variance
  • we can also use momentum – keeping a running average of previous gradients, and we can use learning rate decay
  • lowering the learning rate is the recommendation for what to do when things don’t work
  • ADAGRAD is an SGD variant with builtin momentum and learning rate decay, removing some of these hyperparameters

Hidden Layers

  • I saw a modest improvement in test accuracy (85% to 88%) when I added a single hidden layer of 1024 ReLUs to my SGD model (lesson two).
  • It’s apparently more efficient to add more layers as opposed to increasing the size of the model – possibly due to the hierarchical structure of natural phenomena, and the use of fewer parameters in deep networks (compared to “wide” networks).

Other improvements

  • train just until the validation set stops improving to prevent overfitting
  • L2 Regularization: added to the model – it’s a hyperparameter * the sum of the squares of the individual elements – the derivatives are just the weight vectors themselves
  • dropout – the data flowing between layers is known as “activations.” With dropout, we randomly select half of those activations and make them zero, squashing that data flow..which seems odd. But this forces the network to learn several redundant “pathways” for achieving the same result from some input. This also allows networks to act as they are taking the consensus over an ensemble of networks. When evaluating a model that was trained with dropout, you obviously no longer want this random behavior to be present – instead you take the consensus over the random models. This works by averaging activations – if you scale non-dropped activations by two during training, you can just remove the dropout and scaling operations during evaluation to get the average activations.
  • In assignment three, we try these techniques..
    • in the logistic models, I went from 82.2% validation accuracy to 83.3% with L2 regularization and a beta value of 5e-4 – not sure how to set this up as a hyperparameter
    • in the model with one hidden layer of ReLUs, I went from 85.0% validation accuracy to 91.5% when I applied L2 regularization to both sets of weights
    • dropout had little effect – some people say it helps more when the networks are larger, see here for more
    • I got to 93.7% accuracy by using L2 Regularization and training for 12k steps
    • struggling to get better than 90.3% validation accuracy (95.1% test accuracy) with a 3-layer NN – I’m using L2 Regularization, dropout and some weight-initialization tricks proposed on the forums. Things that help: increasing the number of steps and fiddling with keep_prob settings on the dropout. My setup at the end of this assignment:

Convolutional Neural Networks

  • ‘weight sharing’ concept helps create statistical invariance – used in CNNs for images and embeddings in RNNs for text
  • conv nets slide a kernel through an image while keeping the ‘weights’ the same – this is weight sharing – and it generates a new “image” with a new color depth
  • you generally want to form a pyramid of conv layers that squeezes spatial info into the final classifier
  • recall when designing your convnet kernel, a 3x3 kernel will start sweeping with the middle (fifth) pixel, so if your padding mode is “valid” (not adding zeros to fix the image size), you may lose some data around the edges – see “feature map sizes” lesson
  • networks are often designed with max pooling layers alternated with convolutional layers – the pooling layers have a kernel that takes the max value in the neighborhood
  • adding in 1x1 “convolutions” can also help – they are cheap and introduce nonlinearities into the typical convolutional patch analysis (which is just a linear classifier)
  • the inception concept is to do all things at each layer – pool, 1x1 convolutions and larger convolutions, all concatenated together

some examples from tflearn

  • ..

Andrew Ng’s Coursera videos

linear regression

  • “batch” gradient descent means you’re using all of the training data during the gradient calculation. Compare this to minibatch which might only look at a subsample.
  • normalize your features to speed up gradient descent – see my ng-coursera repo for a good example. You generally want to be in the neighborhood of [-1, 1]. With a toy example, I found that replacing x with ( x - mean(x) ) / stdev(x) sped up training by at least one order of magnitude.
  • most quick fixes to non-convergence just involve making your learning rate smaller
  • suggests considering synthetic features in the multivariate linear regression lecture.. so if you have a model predicting house prices you might try price = b + W1 * sqft + W2 * (sqft) ** 2 to create a polynomial model of higher degree
  • and there is an analytical solution to the minimization problem for multivariate linear regression: inv(XT * X) * XT * y – this inverse can be tough to compute if there are a lot of features

logistic regression

  • remember that logistic regression is for classification work and the output has a probabilistic interpretation – the previous work in this course gives more continuous outputs
  • higher order polynomials create more complex decision boundaries (the shapes / lines dividing up your classes)
  • for binary logistic regression (labels in { 0, 1 }), the cost function must change – it can no longer be just MSE as the sigmoid activation will cause this to be non-convex and have local, non-global optima. The new cost function is -log(h) if y = 1 and -log(1 - h) if y = 0. This fn comes from the principle of maximum likelihood estimation – a way to efficiently find parameters.
  • if you have a cost function and a way to compute the gradient of the cost function, you can use more advanced optimization techniques than gradient descent: the conjugate gradient, BFGS or L-BFGS, for example. These might run more quickly than vanilla gradient descent, or they might automatically pick a good learning rate
  • in multiclass one-vs-all classification, you just train a logistic regression classifier for each class – essentially asking classifiers to identify data as “in” or “out” of the club.. then run each classifier on new data and pick the result that’s most confident (largest)

regularization

  • in Ng’s examples, we have a weight vector where the first element, W0, is a bias term – he points out you shouldn’t penalize W0 during regularization but instead just penalize the large multiplicative weights

neural networks

  • Ng points out that hidden layers are kind of like learned features – like how we previous had to deliberately engineer features by combining preexisting features into new polynomials, the network can now combine the original features in new ways via the weights

⊃ hikes

near Lake Margaret

hikes I’d like to check out

  • Uvas Canyon
  • the Gilroy Hotspring in Henry Coe Ranch SP

⊃ Chile 2015

Chile! Went to Santiago, Puerto Montt, Torres del Paine, La Calera and Talcahuano!

⊃ recipes

things I’ve made in the past, and things I’d like to make someday..

things I’ve made

hall of fame

  • my first attempt at turkey was via this recipe and I usually hate this particular kind of bird, but I loved the result here
  • this bread recipe was very straightforward and quick – tasted good and had a nice crust. The crust will get brown at about minute twenty, but you can press on bake for another ten or twenty minutes. We’ve probably made 20+ loaves of this..it’s a staple!
  • espresso chili – used to make this all the time. The recipe as listed makes a /ton/ of food.
  • I love this chimichurri recipe
  • katherine hepburn’s brownies – the best. Good with walnuts, haven’t tried pecans. Used a few pinches of salt (not just one)
  • from the Moosewood cookbook: the espresso / mascarpone / brandy pudding (mousse?) was very nice

best of the rest

fruit tart

things that look interesting

⊃ red mercury

Entertaining NYT piece on “red mercury”, a fabled substance sought after by many.

Some say it’s cached in sewing machines with butterfly logos.. Others that it’s repelled by garlic and attracted to gold. Other variants (green, silver, etc) are for medicinal purposes or act as sexual stimulants.

But handle the hot, red variety incorrectly, and “a radius of eight kilometers around you will be destroyed!” Bad luck if you’re colorblind I guess.

⊃ concerts

shows that I recall..

2015

  • Here We Go Magic in November at the Independent in SF – they disappointed, honestly, but Big Thief was great.
  • a really nice guitarist at a wedding in Chicago
  • Buddy Guy in downtown SF – that guy’s hilarious
  • Thao Nguyen and the Get Down Stay Downs at the Swedish American Music Hall in SF – was really excited about this but I had to leave to catch the train before Thao came on..
  • Mountain Goats at the Fillmore – one of my favorites
  • Gregory Alan Isakov at the Fillmore – great sound live

2014

  • Worldwide Dance Party – a funky show in RWC

2013-ish

  • Gillian Welch and others at the Warren Hellman memorial show at Ocean Beach

2012-ish

  • Mountain Goats at an East Bay Vineyard

2010-ish

  • Here We Go Magic with Beach House (er, I think?) as an opener, both were great

2008-ish

  • Mountain Goats at NCSSM

shows that I missed..

  • didn’t go see Sufjan because the tickets were too expensive, same with Jose Gonzalez (2015)
  • my friend got Beirut to come to a local coffeeshop right before his first album blew up, but I didn’t go :| (2008-ish)

⊃ beaglebone black wifi

To get my little USB-WiFi adapter working on the beaglebone black, I followed a few tutorials: this one from Make, this article from Adafruit and these notes on the BBB wiki. I’m using this wifi dongle and a 2A supply for the board.

The gist of it is:

  • ssh in over ethernet
  • check the kernel version with uname -a
  • do an upgrade (this brought me to 3.8.13-bone79):
cd /opt/scripts/tools
git pull
sudo ./update_kernel.sh
  • disable HDMI as per the adafruit tutorial
  • install the adafruit wifi-reset script – I found this helped make things work more consistently
  • reboot
  • use lsusb and ifconfig -a to verify the wifi dongle is attached
  • edit /etc/network/interfaces to enable wifi (see the example section that the debian image provides)
  • run ifconfig wlan0 up and ifup wlan0
  • reboot
  • check your router’s client table for the BBB’s wireless connection

⊃ vaporized foil actuator welding

Saw this press release and video on a new kind of welding – VFAW.

Dissimilar metal pairs are joined by being rammed together at very high speeds. This is often done with explosives, but in VFAW they pass something like 8kJ through a piece of aluminum foil, subliming the foil and propelling one metal into another. Pretty cool! Jump to 5:10 in that video above to see them press go, as it were. The distinctive, um-scale wave pattern at the interface of the two metals is also neat to see – it comes about because of the turbulent flow of the two materials.

⊃ legal justification of the bin Laden raid

The NYT has an article about the legal justification for the May 2011 Abbottabad raid that resulted in the death of bin Laden.

Four federal lawyers wrote several memos that explained the legal basis for the raid itself, the killing of bin Laden and his burial at sea. The CIA’s general counsel, Stephen Preston, noted that it was important to have the memos before the raid took place “particularly if the operation goes terribly badly.”

Pakistan had previously assisted on other US govt raids, but this was such an important target, and there were concerns that elements of the Pakistani military were harboring bin Laden, so one memo had to justify violating Pakistani sovereignty and conducting the raid without permission. The legal team also seemed to think the executive branch would violate no domestic law if, during a covert action, international law was breached.

The lawyers determined that killing bin Laden as the default option was also legal. A secondhand source made it clear it would be near-impossible for bin Laden to surrender during the raid. That bin Laden would be killed was such a foregone conclusion that there were no firm plans for dealing with any prisoners that might be taken by the SEAL team.

It was overall a depressing thing to see these cynical conclusions laid out. This was a raid on the government’s biggest single enemy, but it still feels like a bad precedent.

⊃ library of babel

I’m reading about the fascinating Library of Babel: a site created to realize the excellent Borges short story of the same name. The books in the library have 410 pages each, each page containing 3200 chars. All the books in the library have every possible page of 29 chars. So for perspective: the Library of Congress contains 37M (~10^7.57) books. The Library of Babel would have 10^4685 books by my calculation. Meanwhile the poor universe has only 10^80 atoms to put to use.

I understood the idea of using seeds to reliably generate random text since storing pregenerated books would be impossible. But I was perplexed about how the virtual library created for that site could be searchable. To make this possible a reversible pseudorandom number generator was used.

This means that for any block of text, the program can work backwards to calculate its location in the library (the random seed which would produce that output). I couldn’t help but feel that the result was a computer-age form of gematria, converting text to numbers and back again to text.

A Halton Sequence was also considered – this sequence would’ve been handy to know about for my work on uniform sampling.

One of my favorite parts: the creator, Jonathan Basile, taught himself C++ to make the library a reality.

⊃ alexander coward

I’m reading his open letter, Blowing the Whistle on the UCB Mathematics.

  • Coward is widely praised by students for his skill as a lecturer, yet he is being fired
  • others in the dept have asked him to “align more with our standards” wrt to teaching – Coward asserts this is the dept’s way of asking him to stop making them look bad
  • he writes about formative assessment which is the practice of using assessments to improve students’ learning and abilities – think of qualitative feedback on an exam rather than merely receiving a score. He provides this example as a before/after in support of formative assessment.
  • and then an element of conspiracy (which I love): math depts don’t bring in much grant money (compared to experimental sciences, for instance), so to justify their existence they teach prereq classes for many other depts. UCB Math, for instance, teaches 15k students per year and thereby justifies its size. Student feedback typically comes back poor for these classes and UCB Math asks for more money to improve outcomes. So when Coward came in as a dedicated lecturer and got rave reviews, he destroyed that older premise that students just weren’t that into math.
  • will be interesting to see a UCB response..

During a strike at Cal, Coward also wrote an email to his students that is worth reading:

  • “You need to optimize your life for learning. You need to live and breath your education. You need to be obsessed with your education.”

⊃ landmark cases

I’m listening to Stephen Breyer on City Arts and Lectures (what a great show, btw, wish I could get older mp3s..). He’s a great speaker, and I love listening to the “insider baseball” of the SC, as Breyer calls it. His discussion of cases also reveals many interesting historical tidbits.. He has written some books that might be worth looking at.

PBS has a series on landmark cases that looks interesting.

⊃ Endaga

I worked for about six months at Endaga, building low cost cellular infrastructure. Endaga built cell phone tower hardware – entrepreneurs could buy our equipment and use our cloud services to become the “Verizon” of their area. They could buy SIM cards and sell service to people in their community. Our system allowed operators to set their own prices for calls and SMS and, (eventually) data – we provided something 2.5G-ish, with really slow GPRS-based data. This paper (pdf) provides a lot of detail about the network and its operation.

Papua tower installation

I loved working on this project and with the team – Shaddi, Kurtis, Lance, Kashif, Omar and Mona. I helped maintain the backend code running on our servers and I worked on the code that ran on the towers themselves. We maintained billing and usage data, hand-built some of the early boxes, worked on interesting networking challenges, built cool react-based frontends, and connected lots of folks to the wider world. I also worked on the Eno project, building programmable cell phones for running end-to-end tests on our system. It was very rewarding and fun work.

The founding team was acquired by Facebook in 2015, but we all parted on good terms. Definitely looking forward to what they do next with the team in Menlo Park! The Endaga work is part of Facebook TIP and the software should be open sourced before too long!

⊃ quandry

I’m messing with scikit-image and trying to solve jigsaw puzzles.

The current pipeline (but see the code on github for a more up-to-date picture):

  1. find an elevation map with filters.sobel
  2. generate thresholded markers
  3. segment the image with morphology.watershed
  4. “close” the image with scipy.ndimage.binary_fill_holes
  5. detect the edges of the segmented image via the “marching squares” algorithm (measure.find_contours)
  6. estimate the piece’s centroid
  7. detect 90 degree angles along the shape’s outline by selecting slices of the outline, composing two 90 deg angles above and below the slice and running computing Hausdorff distances for the segment and the angles – slices that actually are corners will have a low Hausdorff score
  8. identify the “true corners” of each piece by composing reasonably shaped rectangles out of four corner candidate points and then computing the area of the resulting rectangle – largest area wins
  9. arrange the corners in clockwise rotation with the first in the upper left
  10. arrange the sides in N-E-S-W fashion
  11. compute the distances between the true corners along the path defined by each side
  12. classify each side as ‘in,’ ‘out’ or ‘flat’

pipeline output

  • On the left is the raw image with the calculated outline superimposed in light green.
  • In the right image, the blue star is an estimate of the piece’s centroid.
  • “Candidates corners” are shown as red crosses.
  • The “true corners” are green circles.
  • In this example, btw, side two in the third image and side zero of the fourth image are known to fit together.

The matching sides of two puzzle pieces should have the same length along the path. Matching sides should also have low Hausdorff and Frechet scores. Color-matching and other more advanced techniques could be applied later for more filtering.. I’m learning a lot just getting the side lengths.

I tried various corner-detecting routines provided by skimage: fast, foerstner, kitchen_rosenfeld, shi_tomasi and subpix, but I found harris worked the best while generating the fewest false positives. I later moved away from the harris technique since I already had the piece’s outline as an ordered sequence. I found “template matching” to be more robust – my templates are two 90deg angles that are composed around a test segment. I then determine the similarity between the segment and each angle. In the example below, the segment (in red) is not very similar to either angle and so it’s Hausdorff score would be high:

corner template matching

I also found that the marching squares routine was significantly more robust when the sobel/watershed techniques were applied first to generate a segmented image. This region-based segmentation technique came from this skimage article

I’ve built a few small imaging rigs to take photographs with consistent lighting and camera positioning. I’m using a beaglebone black and an adafruit JPEG TTL camera with my own LED right ring. I need to try taking photos on pieces of paper that have sharper contrast with the pieces – some of the white colors in the pieces get lost in the edge detection step.

⊃ clojurescript

I’m going through the cljs quickstart as I wait for the lispcast Om video to come out..

  • need javac 1.8 (aka JDK 8) and the ClojureScript JAR in the root of the project
  • this is a lein-less tutorial, presumably there are other projects that provide that templating
  • cljs uses the Google Closure Library to provide namespaces, dependency maps and other features
  • a clojure build script sits in the top level of your directory:
(require 'cljs.build.api)

(cljs.build.api/build "src"
  {:main 'hello-world.core
   :output-to "out/main.js"})
  • you can use cljs.build.api/watch to recompile when files are changed
  • you can also make production builds that take advantage of dead code elimination
  • cljs supports an in-browser repl and can also be run with node
  • deps are most easily included as JARs, often from CLSJS
  • you can also setup deps with lein

⊃ week 37 reading, 2015

Holodomor: a man-made famine in Ukraine that killed 2-7M people in 1932 and 33

  • unbelievable, I’d never heard of this..
  • it seems largely accepted that Soviet Union policies intentionally created famine, possibly to suppress Ukranian nationalism
  • the food supply was mismanaged in various ways, and the farm output in those years was lower because the First Five Year Plan mandated the growing of sugar beets and cotton, rather than the sowing of grain.

The Intercept on The Red Web, a book on Internet surveillance and censorship in Russia

  • engineering ‘specialists’ in the country need very little persuasion to develop tools to help in surveillance – “a decent salary combined with the opportunity to practice one’s trade was often enough.” Not that that’s different outside of Russia..
  • media groups and other organizations kowtow to government demands, hoping to do enough to keep their company alive, but they often give away too much
  • the author of the article points out that the book suffers from having being written within this system – it’s hard for the authors to develop a complete picture of this system because they themselves exist within it

⊃ think bayes

from Downey’s Think Bayes book:

  • Downey generally recommends starting with simple models and focusing on the clarity and “goodness” of the model
  • then add details to reduce the biggest sources of error in the model
  • then look for performance optimizations and better analytic methods

  • independence: P(B|A) = P(B)

  • conjunctions are commutative: P(A and B) = P(A) * P(B|A) = P(B and A) = P(B) * P(A|B)

  • Bayes Theorem: P(A|B) = P(A) * P(B|A) / P(B)

  • the diachronic interpretation: Bayes gives us a way to update the probability of a hypothesis, H, in light of some data, D: P(H|D) = P(H) * P(D|H) / P(D)

  • here, P(H) is the prior – the probability of the hypothesis before we see the data. P(H|D) is the posterior, P(D|H) is the likelihood and p(D) is the probability of the data under any hypothesis, also called the normalizing constant

  • the prior comes from background info, but can be subjective

  • to make it easier to determine these factors, pick hypotheses that are mutually exclusive and collectively exhaustive

  • a suite is a set of hypotheses that has these properties

  • a distribution is a set of values and their corresponding probability

  • the thinkbayes module provides Pmf which tracks distributions via a dictionary

⊃ hashicorp

Just about every day I find myself using vagrant, a tool for managing VMs, and now I’ve discovered a lot of other great-looking new tools also made by hashicorp:

  • terraform is like ansible but for cloud infrastructure – so with just a config file and the CLI, you can stand up an AWS instance, for example.
  • vault manages the access of various secrets. I’ve always had static, locally-stored dotfiles with pw protection, but this can handle dynamic secrets, leases, audits..
  • serf creates fault-tolerant, decentralized clusters via gossip. Reminds me of tinc, to some extent, a P2P VPN I used for a work project recently.

They’ve also summarized their beliefs nicely in the tao of hashicorp:

  • workflow-driven developemnt – envision a streamlined UI to achieve a set goal, and then build tools that simplify that workflow
  • tech-agnostic
  • stick to the unix philosophy of being simple, modular and composable
  • build systems via CSP: Communicating Sequential Processes – individual processes with a clear API
  • utilize immutability and the take advantage of all it offers (rollbacks, atomicity, auditing and inspection)
  • all processes should be written as code and then be stored and versioned. Processes should be automated where possible and made resilient – knowledge of a “desired” state should be codified.
  • pragmatism and flexibility (and thus humility) are always valued when approaching any problem, and so the above axioms need not be viewed as some kind of law

⊃ vi

I just read through vimtutor and found some things I’d like to start using:

  • change <motion> with c, instead of, say, dwi
  • replace mode: single chars with r or R to just start clobbering text
  • “find under cursor” with *
  • still getting used to s instead of xi
  • N to go backwards through search results

Some other notes:

  • find and replace within a selection:

    • select a block visually
    • type : to start entering a command
    • s/red/green/g to change all reds to greens
    • this was a bit confusing as non-visual mode selection is the same command but with a percent sign :/
  • find/replace a word under the cursor as per this article

    • type *
    • ciw (change inner word)
    • <esc>
    • then n (next occurrence)
    • and . (repeat change)
  • I’ve been building little go scripts in vim and this pops up the quickfix pane when there’s an error

    • my vimrc has two ways of hiding this: call ToggleErrors() and :ccl
  • opening ctrlp selections in a split:

    • with the file highlighted, type ctrl-x to open in a horizontal split
    • ctrl-v opens in a vertical split
    • I still almost always create the split first
  • I keep forgetting I have easy-motion

    • jump to the letter d, anywhere:
    • <leader>ssd
  • I should learn snippets..

  • better wrapping to 80char cols in python:

    • set textwidth=79
    • gqq to format the current line

⊃ clojure

going through the LispCast Clojure Tutorial videos..

  • methods like contains? that end in ? return a boolean by convention
  • the dotimes [e n] form will execute some block n times, maintaining a counter in e
  • sets are collections that look like #{:milk :egg :flour} – duplicate keys cannot be included
  • multi-arity functions often follows this pattern:
(defn add-ingredient
  ([ingredient]
    (add-ingredient ingredient 1))
  ([ingredient amount]
    ; ...
  ))
  • if/elif/else statements can be constructed with cond. Interestingly, you just use the :else keyword to trigger the last statement as keywords are truthy.
  • or, if the conditional is simple enough you can just use if – the first form will execute if true, the second if false.
  • into will push values into a map:
(def ingredients {:flour 2 :sugar 3})
(defn multiply-ingredients [n ingredients]
  (into {}
    (for [[ingredient amount] ingredients]
      [ingredient (* n amount)])))
(multiply-ingredients 3 ingredients)
; {:flour 6 :sugar 9}
  • also notice (in the above example) the unpacking of the map via for
  • accessing values in a map is about as terse as imaginable:
(def order {:items {:cake 3 :cookies 4}})
(:cake (:items order) 0)
; 3
  • if there was no :cake keyword in the :items map, the default of 0 would be returned
  • function names can be pretty zany: order->ingredients (which I think is awesome)
  • let creates a scoped binding for some variable(s) ala:
(let [items (:items order)
      racks (:racks status)])
  • the (defn -main []) form is special – it is the static function used by lein run
  • (partition n coll) returns a sequence of lists of n items each (note how the “remainder” of 9 and 10 are dropped):
(partition 4 (range 10))
;;=> ((0 1 2 3) (4 5 6 7))

now on to a vim-fireplace tutorial

  • hm, sometimes I have to completely reload the repl, bummer (cpR isn’t enough to get changes to the test file to reload)
  • ah, later in the tutorial they point out you might indeed have to restart the repl or try cq and unmap the test from the namespace: (ns-unmap *ns* 'a-test). Though to be honest I don’t understand the quasi-repl provided by cq..
  • K gives documentation about a given symbol

  • the % motion is helpful to go between matching parens

  • block motions are arguably better though – ab (all block) and ib (inner block)

back to purely functional for their webapp series:

  • ring is the standard that presents a standard interface for frameworks like jetty, netty and others. It consists of adapters, middleware and handlers. Adapters convert HTTP reqs into Ring Requests. Recall that middleware can modify requests /and/ responses
  • compojure is the ring lib used for routing and HTTP method switching
  • handle-dump is another nice function within ring, useful for debugging requests
  • you use defroutes to specify paths:
(defn calc [req]
  (let [a (Integer. (:a (:route-params req)))
        ...])
  ...)

(defroutes app
  (GET "/calc/:a/:operator/:b" [] calc))
  • here’s the full example:

and here’s a tutorial on clojure webapps from heroku

  • I had to futz with my version of javac so java, JDBC and the postgres driver all played nice together

and another tutorial on clojure desktop games:

  • the quil package is pretty great – it’s a port of Processing
  • note the atom pattern with swap!:
(def ball (atom [1 2 3 4]))
(swap! ball next-ball @ball-dir)
; where @ball-dir refers to yet another atom
  • the game fragment I ended up making:

what about quoting, that’s kinda weird.. here are notes from 8th light:

  • ' or quote is a very basic special form in Lisp – it will return the form without evaluating it:
user=> (quote (- 4 5 6))
(- 4 5 6)
  • all the symbols shown remain unevaluated symbols
  • syntax quote (backtick: `) is slightly different than quote and ': symbols are resolved (so you may get namespace-qualifed symbols returned) and unquoting can happen inside via ~.. from the 8thlight post:
user=> `(this ~(symbol (str "i" "s" \- "cool")))
(user/this is-cool)
  • this stuff is helpful in macros – keeping in mind the idiom that code is data
  • you will often see quoting when importing modules: (require 'cljs.build.api)

now looking a bit at the aphyr tutorial

  • -> vs ->> – both are “threading macros,” the former puts results as the first arg, the latter as the last arg, some good examples are here
  • I don’t like threading..think I need to practice doing things in the non-threading manner first..

now checking out this spell-checker tutorial

  • recall clojure lists are unordered collections (vectors are the ordered form)

todo: split into basics + vim, webapps and quil

⊃ vmstat

I typically use top to look at system performance and per-process metrics, but vmstat gives an alternative look:

mercury  ~  vmstat
procs ------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
 r  b  swpd     free     buff    cache   si   so    bi    bo   in   cs us sy id wa st
 1  0     0   272256    19132    96332    0    0  1169    18  134  481  6  5 89  1  0

Let’s go through section-by-section:

procs
 r  b
 1  0

The first block shows running (or runnable, i.e. waiting-to-run) and blocked processes.

------------memory--------------
 swpd     free     buff    cache
    0   272256    19132    96332

Next we have various memory usages: virtual, idle, in-buffer and in-cache (kB by default). Buffered memory is that which describes file metadata and tracks in-flight pages. Cached data is the contents of those pages.

---swap--
  si   so
   0    0

Then there’s the amount of memory swapped from disk.

----io----
  bi    bo
1169    18

And blocks received and written from and to a block device (e.g. a hard drive) in blocks / second. Blocks in linux are now 1024B.

 -system--
   in   cs
  134  481

There’s a count of interrupts per second and context switches per second – the latter just being the process scheduler giving time to different processes.

 ------cpu-----
 us sy id wa st
  6  5 89  1  0

Finally we have percentages of the total cpu time spent by user code and kernel code. There is also time spent idle, time spent waiting for IO and time stolen from a VM. You might also see nice time which is the time spent running processes with a positive nice value. A 0 nice value is normal, a negative value is higher priority. So when a positive nice valued process is running, it would get deprioritized if any other process started.

⊃ what happens when..

..you run a program from the shell?

background, from IITK:

  • the shell wraps the kernel, and the kernel determines what other processes can run and it also mediates access to the hardware
  • any such hardware access / general io must be made through system calls

the shell:

  • when focused, the kernel echoes keystrokes to the screen
  • when Enter is pressed, the line is passed to the shell, and the shell will attempt to interpret this input as a command
  • the shell figures out you want to run /bin/ls (or whatever) and it makes a system call to start /bin/ls as a child process (forking) and give it access to the screen and keyboard through the kernel
  • this forking results in a copy of the environment from the parent process to the child
  • then the shell sleeps, waiting for that command to finish.. really it’s just that the scheduler has gotten another process to deal with (/bin/ls) and so it will context switch the shell and other processes as usual
  • when /bin/ls finishes, it’ll issue an exit system call
  • then the kernel wakes up the shell and tells it to resume
  • note that some commands like cd are shell builtins and do not require a new process to be spawned – the shell can just take action on its own by calling chdir
  • to see all these system calls being made you use strace <some-command> – files are opened and stated and attributes are read, stdout may get written to..etc

..you access google.com

from IITK and the compiled notes here

overview:

  • recall that each layer – layer2 link/MAC, layer3 network/IP, layer 4 transport/TCP and layer 7 application/http – can work independently
  • ultimately we will construct “packets / MAC frames” as follows (and we will need to determine each of these ports and addresses):
Application data (e.g. HTTP)
---
TCP src/dst ports (transport)
---
IP src/dst addresses (network)
---
MAC src/dst addresses (link)

or similarly:

data layers

first we need to establish a connection with the machine where the document lives, this is done via DNS:

  • my laptop will make a request to a domain resolver for the IP address of google.com
  • then the domain resolver (8.8.8.8 or some other ISP-provided machine) will make an NS request to the .com root server
  • the response will be an IP addr for a nameserver, say some cloudflare endpoint
  • the domain resolver will then ask the nameserver for www.endaga and the nameserver might reply with an IP of the requested site or with another nameserver (and that could continue)
  • eventually though an IP will be returned with a TTL, the domain resolver will pass that on to the client
  • the OS manages this hostname/IP translation
  • and a large amount of caching can happen on all these intermediary machines

requesting the page (application data):

  • now that we have an IP address, we can form a packet
  • the application request data may be something like GET / HTTP/1.0 and additional HTTP headers for cache control, cookies, user-agent etc
  • the server’s response body will come in this layer, it may indicate that the response is unmodified and the browser should use its cache. It may also indicate whether a connection should persist or be closed.
  • after parsing the HTML, the process is repeated for every resource referenced by the HTML page
  • SPDY may also be used by some clients – this is like compression of HTTP requests and also performs some other optimizations by caching headers and keeping certain connections alive

on to TCP:

  • application data is wrapped up with the TCP destination port number (typically 80 or 443)
  • TCP is all about streaming program-to-program data over a socket so its headers include congestion window (controlling tx flow), flow control info (how much the receiver can buffer) and sequence and numbers
  • sequence numbers are also attached to each packet, this way we can handle slow start / AIMD providing some resiliency against weird network performance where packets may not arrive in the order sent
  • TCP has a three way handshake when the client first connects to the server: a SYN from the client, SYN-ACK from the server and an ACK from the client with sequence numbers incremented and acknowledged along the way

then IP data is added:

  • recall that IP is all about host-to-host communications..
  • IP source and destination headers are set
  • a TTL field in IP is decremented at each router hop – if it drops to zero, the packet is discarded. This prevents immortal packets from looping through networks forever. Note that this seemingly should belong in the transport layer but is in fact in IP! A typical starting value of TTL is 64.

finally we get to the link layer, MAC:

  • MAC is about connecting NICs and getting data out on the right physical wire by setting src/dst MAC address headers
  • routers do the hard work of finding the fastest functioning routes for a packet
  • as an aside, switches by definition will only look at link layer (aka MAC) headers the “outermost layers” of a packet, whereas routers will only consider the IP header
  • MAC layers will be continually stripped and re-added as the inner layers (untouched save for the IP packet’s TTL) are moved around between network devices in transit from server to client or client to server
  • we also use ARP in here to find the MAC address corresponding to an IP – machines on the same subnet use MACs to communicate. ARP involves sending a request asking “who is <some-ip>?” All machines receive the message and the target replies with its MAC.
  • as DNS finds IP/hostname mapping, ARP finds MAC/IP mapping

congestion control aside (notes from this deck:

  • the internet is basically a queue of packets, with some devices adding packets to the queue and some devices removing packets
  • sometimes parts of the network may have an inbalanced flow where more packets are being added than are being removed – this is congestion
  • routers may be forced to drop packets if they run out of memory or if they can’t process packets quickly enough packets time out (and drops causing retransmission, making things worse)
  • a “token” control strategy involves generating tokens at some rate and, when there are tokens available, it’s ok to transmit a packet, but if there are no tokens available, we have to wait to transmit
  • a closed loop approach involves detecting congestion and either shutting off new connections (ala the PSTN) or routing around congestion via new virtual connections
  • choke packets can be sent between routers as a downstream signal that traffic should be reduced between nodes
  • load shedding can be implemented where packets are dropped randomly or based on some assigned priority
  • the TCP spec has “slow start” to allow only a certain amount of data to be sent initially, with that max amount being raised with each packet’s ACK
  • more on slow start in this UCSD lecture
  • but the most important part is slow start + AIMD (additive increase, multiplicative decrease): double the number of packets we send until a packet is lost (slow start) and then +1 the number of packets we send, halving when they are lost
  • and this is all managed at the OS layer, the OS will buffer packets until the next ack comes. There is a congestion window header established at the TCP layer

⊃ os

Some notes on the Think OS book from Green Tea Press.

Ch1 - compilation

compiled vs interpreted languages:

  • compiled langs like C are translated into machine language and then executed by hardware
  • interpreted langs like Python are read and executed by another software program, an interepreter
  • although there are C interpreters and Python compilers..
  • and there are langs like Java which compile into an intermediate language, Java bytecode, and then executed by an interpreter, the JVM
  • compiled langs often have static types, meaning you can look at the program and determine a variable’s type
  • static types are resolved at compile time whereas dynamic types cannot be known until runtime
  • these type declarations can help errors be found more quickly (during compilation) whereas with a system like python you need to test (execute) the code to be sure nothing raises a RunTimeError
  • and these checks don’t have to run during runtime, so compiled languages get a speed boost from that
  • compiled langs can also be more memory efficient as they do not have to store type info

compilation steps:

  • preprocessing: e.g. address all of the #include statements in a C file
  • parsing: build an “abstract syntax tree” of the code
  • static checking: check types and function calls
  • code generation: generate machine or byte code
  • linking: includes required libraries – in Unix this is the job of ld, the Unix linker, so named because “loading” is another step related to linking
  • optimization: the compiler can perform its own optimizations during the previous steps

gcc:

  • with the -c flag, machine code is generated, but not an executable
  • this .o file is object code – not executable in and of itself, but it can be linked to form an executable
  • the -0 and -01 (etc) flags will slowly increase optimization levels
  • -S will generate assembly
  • -E will run the preprocessor only (bringing in include files)

Ch2 - processes

isolation:

  • the process model provides isolation in an OS
  • processes are “software objects” containing data and methods that operate on that data
  • multitasking: the OS allows a process to be interrupted at almost any time, its hardware state is saved and the process may resume later
  • virtual memory: the OS makes it appear as though each process has its own isolated, dedicated chunk of memory
  • device abstraction: the OS allows processes to access the network interface, graphics cards, the hard drive and other peripherals in an ordered way

unix processes:

  • typing something in the shell, like make, will create a new process to run make (via forking, I believe). This will then create another process to run LaTeX and then another to display the output.
  • ps will show running processes associated with the current terminal
  • adding -e will show all processes, even those belonging to another user (!)
  • “tty” comes from “teletypewriter”
  • and the etymology of “daemon” has to do with helpful spirits (ala His Dark Materials)

Ch3 - virtual memory

basics:

  • running processes put their data in main memory, usually RAM
  • memory is, for historical reasons, measured in binary units (e.g. gibibytes, or 2^30 bytes)
  • each byte in main memory has a physical address – in a 1GiB system the highest valid address is 2^30 - 1: 0x03ff ffff

virtual memory:

  • OSs provide virtual memory, the size of which is determined by the OS and the hardware: in 32 bit systems the virtual address space runs from 0 to 0xffff ffff (2^32 bytes), in 64 bit systems the size of the virtual address space is 2^64 bytes (16 exbibytes, ~ one billion times larger than typical physical address spaces)
  • programs generate virtual addresses when reading and writing values in memory – this is per-process, so even if the same virtual address is generated, they map to different locations in physical memory, providing per-process isolation
  • the memory management unit (MMU) sits between the CPU and main memory – it performs translation between virtual and physical addresses
  • VAs have two parts: the page number and the offset – the page is just a chunk of memory and the size of the page is typically around 1-4 KiB
  • the MMU looks up the page number in the page table and gets the corresponding physical page number, combining that with the offset produces a PA
  • page tables are often implemented as sparse arrays or associated arrays since most processes don’t use even a small fraction of their virtual address space
  • wikipedia has a nice article on this topic but the gist is that VM is used for security, to hide memory fragmentation and, via paging, to allow programs to use more memory than is physically present on the system
  • Unix calls this moving of memory pages between RAM and disk “swapping,” and an entire hard disk partition may be devoted to this (the swap partition)

memory segments:

  • data corresponding to a running process has four segments:
  • a text segment consisting of the machine language instructions that constitute the program,
  • a static segment with variables allocated by the compiler (global vars and local vars declared static),
  • another static segment with the run-time stack, itself consisting of stack frames (stack frames contain parameters and local variables of a function)
  • the heap segment with chunks of memory allocated at run-time

Ch4 - the file system

  • the file system abstraction is that of a key-value store with filenames as keys and the contents of a file (the sequence of bytes) as the value
  • the OS translates the byte-based operations we carry out in programs into block-based operations on the storage level
  • when a file is opened the initial block is read into memory and a variable is stored to track our position in the file
  • as we read through, we increment that position tracker and when there are no more bytes to read, we try to fetch the next block
  • we also track the fact that this file is currently open for reading
  • data for writing to a file is also stored in memory until a full block is available
  • let’s talk speed: a 2GHz processor completes an instruction every 0.5ns, an SSD will read a 4KiB block (4x2^10B) in 25us and write one in 250us whereas a disk drive will take 2-6ms to read a block. So the CPU will complete ~10M instructions while waiting for data from the hard drive.
  • OSs and hardware will do things like prefetching, buffering and caching to try to reconcile these differences in storage retrieval and processing speeds
  • OSs aren’t required to place data contiguously on a disk – a data structure called an “inode” (index node) tracks where each block is on a disk.
  • inodes track permissions, ownership, modified times and accessed times and block numbers for the first 12 blocks that make up a file
  • inodes use single-, double-, and triple- indirection blocks which point to other blocks, allowing us to reference files up to 8TiB. Other file systems handle things differently – FAT for instance has a big linked list of entries pointing to blocks
  • file systems have to track block allocation, Unix currently uses ext4 but may soon move to btrfs (a B-tree based filesystem)
  • the Unix “everything is a file” and “files are streams of bytes” ideas are useful for piping around data and handling network streams

notes from IITK

  • each partition on a disk is either swap (for virtual memory) or part of a file system
  • the first partition is often the boot partition, where a kernel would be located
  • an inode pool exists in the lowest-level blocks – each inode describes a file (but doesn’t know the filename). The filename is stored in the directory structure which maps names to inodes so multiple true names (hard links) are possible in Unix
  • on boot, other partitions are mounted as directories onto the root partition

Ch8 - multitasking

interrupts:

  • beyond just running several processes on multiple cores, each core can switch between processes quickly
  • the kernel, the lowest-level software, implements multitasking by handling interrupts – an event taht stops the normal instruction cycle and causes the flow of execution to jump to an interrupt handler
  • (hah, the “shell” is so named because it surrounds the kernel..)
  • a NIC might create a hardware interrupt when a packet arrives, or a disk drive might raise an interrupt when some data transfer completes
  • software interrupts might be raised when an instruction cannot compelte, e.g. division by zero
  • programs needing to access a hardware device will issue a system call which triggers an interrupt and causes the execution flow to jump to the kernel
  • the program state must be saved before executing an interrupt – the hardware state (state of registers) must also be saved and then later restored before the interrupted process resumes (the interrupted process will generally not know there ever was an interruption)
  • note that we only save the state of hw registers that will be used
  • context switches (switch to another process) are time consuming as the MMU might need to be cleared, new data has to be loaded into the new process and more registers have to be saved
  • the kernel’s scheduler deliberately interrupts processes for context switches – processes are allowed to run for a small time slice and then they are stopped. The scheduler may allow the process to resume or it may context switch.

process life cycle:

  • each process, when created, tracks information about itself in the process control block:
  • whether or not it is running, whether it’s ready and waiting for a core to be free, whether it’s blocked, awaiting more events/data, and whether it’s done with exit status info that hasn’t yet been read
  • a process is created when a running program executes something like fork – after this the scheduler may resume the parent process
  • the scheduler can do some prioritization to, for instance, prevent an interactive program from sitting in the ready state for too long
  • these priorities are generally set based on how long a process stays in its time slice, its connections to other processes and its CPU usage
  • nice is the system call used to decrease its own priority
  • real-time scheduling is more useful for robotic systems – there would be better control of priority, deadlines for process completion and pre-emption of low priority tasks by high priority tasks

⊃ adts

some notes on abstract data types compiled from Think CS and this blog

asides on the “why” of certain python practices

  • new style classes should be used in python 2 so that static and class methods can be defined, so that type works correctly and for other reasons.
  • the if __name__ == '__main__' guard works because when the python interpreter runs a module it will set the __name__ variable. If the module is just imported, __name__ will be set to the module’s name.

linked lists

  • LLs are either the empty list or a node that contains some value and a reference to a linked list
  • you can reference an entire collection just by the root node
  • my python implementation

stacks

  • very simple with python lists since there are already pop and append methods
  • my implementation in python

fifo queues

  • for add/remove operations to be constant time, you need to track both the head node and the tail node
  • my implementation in python from the Think CS example

binary search trees

  • each node in the tree may have a max of two subtrees
  • all values in the “left” subtree are less than the value of the node itself
  • whereas all values in the “right” subtree are greater than the node’s value
  • each node’s value must be unique
  • search, insertion and removal can all be ~log(n) operations if the tree is balanced (and not just a linked list)
  • much like the other structures above, the implementation is all about recursion, here’s mine in python, based loosely on this post and this diagram:

binary tree

binary heaps

  • these are complete binary trees that satisfy a heap-ordering property, either: the value of each node is greater than or equal to the value of its parent (min-heap) or each node is less than or equal its parent (max-heap)
  • (“complete” in that every level, except possibly the last, is completely filled and all nodes in the last level are as far to the left as possible)
  • think of it like a priority queue – in a min heap the “highest” priority element is at the root (er, for some def of highest)
  • suppose you had data streaming in, like in a scheduler, and you wanted fast inserts, but you also want to be able to easily (O(1) in fact) retrieve the smallest item
  • heaps are partially ordered (at a given level, there is no particular relationship between sibling nodes)
  • the neat thing about heaps is that they can be represented as an array, they don’t need a linked list like an ordinary binary tree: for the kth element of the array, the left child is at 2k, the right child is at 2k+1 and the parent is at k/2
  • in the OS world, the heap is just any memory not on the stack (and the stack is the organized chunk of memory where functions, parameters and return values get passed around)

min heap

⊃ eno

At Endaga I worked on eno, a “programmable cell phone” testing system.

Endaga built really affordable cell phone towers and we wanted to be able to test our boxes with real phones creating real traffic. Eno used a Beaglebone Black and an Adafruit Fona to send messages and place calls.

The Beaglebone nodes ran a small command and control server, while the testing machine used a simple API to pass messages to the test nodes.

See more on github!

⊃ openbts python

While at Endaga I worked a lot on openbts-python, our python package for interfacing with OpenBTS. OpenBTS provides a lot of pieces of cellular infrastructure like a subscriber registry and SIP functionality. The python package we made tied it all together.

I’m pretty happy with how it turned out – has nice unit and functional tests, and is reasonably consistent with its API and style. We could’ve done better with how it handles different OpenBTS versions – the Endaga fork and the Range-maintained fork became somewhat different, iirc, but mostly due to bug fixes that we published..

See more on github!

⊃ rsa

At work we’ve been setting up tinc networks alongside OpenVPN. I wanted to use the public key embedded in the OpenVPN crt with tinc, so I started trying to parse the crt and use pyasn1 to create a public RSA key in a tinc-friendly format. I was learning about the exponent and modulus in RSA when I eventually realized the public key could be generated from the private key :| but in the meantime here are some notes from reading about RSA.

basic background math

  • if two numbers have a gcd of 1, then the smaller number has a multiplicative inverse in the modulo of the larger number. E.g. gcd(4,9) = 1 so 4 has a multiplicative inverse in mod9, and that happens to be 7 because 4 * 7 = 28 = 1mod9. Whereas 3 is in mod9 (the set of integers between 1 and 9) but gcd(3,9) = 3 (not 1) so 3 doesn’t have a multiplicative inverse in modulo 9.
  • another example: 9 has a multiplicative inverse in the modulo of 11 (11 is prime so any number between 1 and 10 can be used here), and that is 5 because 5 * 9 = 45 = 1mod11.
  • and 2 has a multiplicative inverse in 7 (4): 2 * 4 = 8 = 1mod7, anyway..
  • for any prime number p, every number from 1 to p-1 has a gcd of 1 with p, and therefore has a multiplicative inverse in modulo p.
  • Euler’s totient phi is the number of elements that have a multiplicative inverse in a set of modulo integers. For example, phi(7) = 6. For prime numbers, p, phi(p) = p - 1

RSA

  • first generate two large primes, p and q
  • generate the modulus, n: n = p * q
  • the totient phi(n) is calculated pretty easily because the totient is associative: phi(n) = (p - 1)(q - 1)
  • determine the public key e: a prime number chosen in [3, phi(n)], this is typically 65537 (2^16 + 1), but e must have a gcd of 1 with phi(n)
  • find the multiplicative inverse of e with respect to phi(n), this is the private key, d: e * d = 1mod(phi(n))
  • then we can encrypt a message m with key k into (or out of) cyphertext c via c = m^k mod(n). Encryption and decryption are mirrored operations.

an example

  • let’s use the message 12 and the primes 17 and 19, so the modulus will be 323.
  • the totient, phi(n) is 16 * 18 = 288
  • and we can use a public key of 11 as it has a gcd of 1 with phi(n)
  • the multiplicative inverse of e = 11 with respect to phi(n) = 288 is 131 based on the Extended Euclidean Algorithm – this will be the private key
  • so we can encrypt our message 12 with the public key: c = 12^11 mod(323) = 312
  • and we can decrypt with the private key via m = 312^131 mod(323) = 12 (woo!)
  • we could also have encrypted with the private key: c = 12^131 mod(323) = 198 and then decrypted with the public key: m = 198^11 mod(323) = 12

asides

  • Wikipedia has nice pseudocode for using the Extended Euclidean Algorithm to find multiplicative inverses in some modulus
  • Rabin-Miller primality tests are used (in multiple rounds) to find large numbers that are extermely likely to be prime
  • to get our message into a numeric format, we can convert a string into a bit array, and then convert that into a number

⊃ WNDW book

I’m skimming Wireless Networks in the Developing World, a book covering the basics of setting up wireless networks. Since I started working at Endaga six months ago I’ve learned a lot more about networking and hardware, but I thought it’d be nice to try and round out some of my knowledge of those areas.

Ch1 - Physics

  • wavelengths: 802.11b at 2.4GHz has a wavelength of 12.5cm (and is considered a microwave), and our 900 MHz 2G networks are at 33.3cm
  • polarization can be used to limit interference: combine a horizontally-polarized link with a vertically-polarized signal and you could double the data rate while still using the same frequency
  • an example: 802.11b uses 22MHz-wide channels spaced 5MHz apart

Huygens’ Principle: each point of an advancing wave front is just the center of a fresh disturbance and the source of new waves. The totality of the advancing wave is the sum of the waves created at points in the medium already traversed. This explains how sound can travel around corners, for instance.

diffraction

That effect is more pronounced in longer wavelengths, but these secondary waveforms will be less energetic, as you might expect.

  • beamforming arrays take advantage of constructive inteference

Fresnel Zones: first note that beams will “spread” as they travel (think of a laser pointer painting a big circle on a distant mountain). But because of diffraction (and Huygens’) these “spread” signals can still reach a receiver.. however they’ve traveled further and are thus out-of-phase compared to waves that traveled along the LOS. This can lead to constructive and destructive interference. The book is describing the first, third, fifth, etc zones as constructive and the rest destructive. So in the design phase, you want to keep this first zone free of obstructions. The Fresnel radius is the midpoint along the link:

the Fresnel radius

Signal power is proportional to the square of the electric field.

  • dB = 10 * Log(P1 / P0)
  • +/- 3dB is double/half the power; +/- 10dB is an order of magnitude more or less power
  • dBm uses a base value of power (P-sub-zero) of 1mW whereas dBi is a measurement relative to an ideal isotropic antenna

Ch2 - Telecom Basics

  • waveform graph: the plot of a signal’s amplitude over time vs a spectrum graph: the plot of the amplitudes of the composite frequencies
  • Nyquist: we can reconstruct an analog signal if the sampling rate is at least double that of the highest frequency content of the input signal
  • jitter: the variability in the delay of the received signal
  • Marconi quickly conceived of multiple carrier frequencies to make better use of long-distance radio.
  • frequency-, amplitude- and phase-modulation can also help signals share a carrier channel
  • the techniques may optimize for some combo of robustness against noise, link capacity or spectral efficiency (bits / Hz)
  • bit-error-rates, the fraction of erroneously decoded bits, are typically 10^-3 to 10^-9
  • FDMA: different carriers for different users vs TDMA: different time slots for different users vs CDMA: users identified by a specific code vs SDMA: space-division where a received signal is compared between multiple antennas to determine “who” it came from (used in MIMO) – would be interesting to read more about this last one especially..
  • for uplink and downlink support, channels are shared via FDD (frequency division duplexing) or TDD (time)

Ch3 - Licensing and Regs

  • Unlicensed radio spectrum was set around the 2.4GHz band (globally) by ITU. 5GHz was added in 2003 and 900MHz is unregulated in the US (though it’s used for GSM phones in Western Europe and developing countries).
  • output power, /antenna/ output power, tower height and of course spectrum..it’s all regulated
  • homologation: formal certification of comms equipment via an independent lab

Ch4 - Spectrum

  • clarifies that SDMA is somewhat less interesting than I previously thought – just using the same spectrum in a different geographic area.
  • CDMA is “spread spectrum” comms that uses a special coding scheme.. which I don’t yet understand
  • 802.11af is about utilizing TV whitespace

Ch5 - Antennas and Tx Lines

  • the skin effect: high frequency electric signals only travel along the outer layer of a conductor – the inside material does not contribute. So with bigger signalling cables, we are really just seeking bigger circumference.
  • keep cabling as short as possible and don’t connectorize it yourself..
  • Voltage Standing Wave Ratio is a measurement of power loss due to signal reflection from the transmitter to the antenna. A theoretical perfect VSWR is 1, but <2 is the goal.
  • Most omnis have vertical polarization, so horizontally-polarized antennas are sometimes made to avoid man-made interference.
  • Uplink and downlink antennas must have matching polarization.. yet there is also “circular polarization” – how do you match that?
  • Antenna icing can lead to impedance mismatches and changes in the radiation pattern.

Ch6 - Networking

OSI standard (Open Systems Interconnection) divides network traffic into seven layers:

  • Layer 7: Application (http, ftp and smtp are application layer protocols). Humans are kind of at the “Layer 8” level, interacting with the application.
  • Layer 6: Presentation – data representation like html, encoding and compression.
  • Layer 5: Session – manages logical comms between apps (RPC, for example)
  • Layer 4: Transport – allows systems to reach a service on a network node. TCP and UDP are examples.
  • Layer 3: Network – where routing occurs. IP is an example. This layer is sometimes called “the internet level.”
  • Layer 2: Data Link – when two nodes are connected via the same physical medium, this layer is in play. Ethernet and 802.11a/b/g are examples. This is also known as the MAC layer. MAC addresses are unique 48 bit numbers assigned to every networking device when it’s manufactured.
  • Layer 1: Physical – copper CAT5 cable, radio waves, fiber bundles, the actual medium transmitting signals.

TCP/IP has a different organization in five layers: Application, Transport, Internet, Data Link, Physical. Layers 5-7 in OSI are rolled into the TCP/IP Application Layer as these layers are “just data.”

IPv6 will be the norm in 2020. Addresses are 128bits written in 32bit chunks.

  • for example: 2001:0db8:1234:babe:0000:0000:0000:0001.
  • but typically leading zeros are removed from the address: 2001:db8:1234:babe:0:0:0:1.
  • or consolidated further: 2001:db8:1234:babe::1.

The loopback form in IPv6 is ::1. The unspecified address is ::/128 (similar to 0.0.0.0/32 in IPv4). IPv6 prefixes are usually the most significant 64 bits, so an address would be written: 2001:db8:1234:babe::1/64. These prefixes are like IPv4 subnet masks. The latter half of the address is the IID, the interface identifier. All nodes on a LAN or WAN will share an address prefix.

IPv4 subnet masks define the size of networks: /24 indicates 8bits are reserved for hosts for 256 total hosts. But .0 is the address of the network itself and .255 is the broadcast address, so really only 254 hosts are available. Subnet masks can be applied to an IP address to find the network address.

IANA (Internet Assigned Numbers Authority) adminsters IPv4 and IPv6 allocation. All address are divided into subnets which are delegated to five large regional registries the US and Canada are in ARIN). These regional registries (RIRs) distribute to ISPs.

  • IPv4 uses DHCP to assign dynamic IPs (typically) but IPv6 uses Stateless Address Auto-Configuration (SLAAC) where the device generates a random address on its own.
  • private addresses: 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16
  • neighbor-discovery: Address Resolution Protocol (ARP) in IPv4 and NDP in IPv6. Someone asks the network (via multicast) “Who is 192.168.1.3?” and that host will reply with its MAC. Then the two can communicate directly.
  • default gateway: when a router receives a packet destined for a network for which it has no explicit route, the packet is forwarded to the default gateway. This is typically the “best route out of your network” and often towards the ISP.

Network Address Translation (NAT) devices manipulate the address of packets instead of just forwarding them. So the private network can use an address from the private range but the NAT router’s internet connection uses a globally-routed IPv4 address. You’re basically sharing a global address among a lot of private addresses. The MAC layer is unused in NAT, but, interestingly, TCP packets can be modified in flight. This is necessary when, say, two private addresses are browsing the same site. The NATting router will modify the TCP port such that there is a unique port/IP combo.

ICMP (internet control message protocol) is part of the IP suite (along with TCP and UDP) but is more used for debug and maintenance. It consists of things like echo and notifying when a packet couldn’t be delivered

The physical layer:

  • MAC addresses reveal in the first 24 bits the assigning entity of the address
  • hubs just connect multiple twisted-pair ethernet devices together – they repeat signals received on one port out to all the other ports. They work on the first layer (phy). A WiFi AP is a hub on the radio side of things.
  • switches are like hubs that can create dedicated connection between ports. They operate on the second level (the data link layer).

⊃ jan music

some recent favorites:

twin shadow

sun airway

willow smith (memorecks) :|

gipsy kings

⊃ beliefs

at the behest of Doebelli and especially his chapter on …, here is a list of beliefs that I hold dear:

education

Should be free through college. Only PhDs get it right – highly selective, largely self-guided, paid a stipend (albeit small). Early education should be more Montessori-style. Kids should be allowed to explore their interests and be creative. Too much testing and rote memorization. The goal should be to develop a kid’s sense of curiosity and self-determinism.

science

Where it’s at. Our best system for understanding our world and experiences. So wide-ranging, it makes me sad to hear kids say they don’t like it.

religion

Don’t believe in a god. Raised a methodist / UU though and I liked that. Worship and fellowship are great – just nice big peaceful group meditations. And the teachings of many religions are excellent. Pretty interested, actually, in religions from a historical perspective.

poverty

Today’s extreme national and global income disparity is just culturally embarrassing. Would like to learn more about the living wage work being proposed in Europe. Love the Buckminster Fuller quote on work:

We should do away with the absolutely specious notion that everybody has to earn a living. It is a fact today that one in ten thousand of us can make a technological breakthrough capable of supporting all the rest. The youth of today are absolutely right in recognizing this nonsense of earning a living. We keep inventing jobs because of this false idea that everybody has to be employed at some kind of drudgery because, according to Malthusian Darwinian theory he must justify his right to exist. So we have inspectors of inspectors and people making instruments for inspectors to inspect inspectors. The true business of people should be to go back to school and think about whatever it was they were thinking about before somebody came along and told them they had to earn a living.

relationships

I like Dan Savage’s advice to be “good, giving and game.”

family

An unflinching support net that I’ve been lucky enough to have.

speech

Would like to read more about the Charlie Hedbo debate. The attack on them was of course ludicrous but did they not push the bounds of satire even?

government

How can we not worry about it? Too much historical precedent to do otherwise. I remember my dad talking about Germany in the early 20th century, it was one of the world’s most culturally advanced nations full of extremely intelligent people. We’ve let some bad things happen as citizens of this country and should take care.

art and expression

Still reading Sontag, I really like it. Absolutely need more of it in my life. Working with computers all day, I begin to feel more like a machine.

drugs

Never tried em, which I find embarrassing to say.. But I’ve got no problem with people voluntarily doing things to their own bodies.. The harder stuff gives me pause because it seems to be a pretty clear path to self-destruction and seems to often harm others. But the world is full of such things :/

⊃ the silk road comedy

Perhaps still in Serial withdrawal, I’ve been reading a little bit about the Silk Road trial against Ulbricht. The guy tried to order six (!) hits on people who scammed SR users, threatened to expose SR users or worked with federal agents. (Well it was DPR ordering hits, presumption of innocence, ok..). When the Hells Angels hitman couldn’t get to Friendly Chemist alone, DPR haggled a bit on the cost of having FC and everybody else in the vicinity killed in one fell swoop. What a lunatic.

I love hubris – it seems like DPR was himself scammed, probably at least twice, in his quest for drug dealer justice. The execution of his former staffer, Flush, was faked by the FBI, and another hit, one that Ulbricht marked as confirmed in his diary, never matched any local deaths and was also likely faked..

More data:

  • Ulbricht’s spreadsheet on the SR business
  • the FBI’s stats on their undercover SR buys: 4850 successful from 40 dealers in 10 countries
  • tons of info compiled on antilop.cc (scrapes, timelines, user histories..geez)
  • and the ‘mastermind’ tab found open on Ulbricht’s laptop (the transaction numbers are maybe monthly? I don’t really have a good sense of how much money SR made or how many buyers there were):

sr mastermind

How they found Ulbricht, I had read it was from this SO post. But this ars article says it was through an IRS agent’s googling which led to another forum and subpoenaing Ulbricht’s gmail account.

Also fascinating was the amateur investigation into the FBI’s mole, eventually confirmed by trial testimony. Apparently Ulbricht was IMing a core SR member, cirrus at the time of his arrest. Cirrus, aka scout was an account that had been run by the FBI since 2013. In that wired article, the agent claims he got Ulbricht logged into SR as DPR during the chat as part of the setup for the arrest.

Ulbricht’s open laptop led to a lot of other arrests of SR core members and other folks, including this guy trying to buy ricin, jesus.. By the way, these arrests were possible because Ulbricht made his core team dox themselves (amazing..), so after the FBI’s visit to the Glen Park PL, Ulbricht’s “Staff” folder was a few clicks away..

The trainwreck continued when SR2 imploded months after it began.. I think there was a massive “hack” that was probably just an internal theft of many BTC. Then followed more arrests – they had been infiltrated since day one, also by the cirrus fellow, I believe.

A lot of this is sad – especially the alleged murder-for-hire stuff. But it is fun to see an amateur kingpin cum web dev hoisted by his own petard in very public fashion..

Sept 15 2015 update

Just read this Mar 31 article on Forbes about the DEA agent Carl Mark Force and the Secret Service agent Shaun Bridges. They orchestrated the fake hit ordered by DPR. And Force allegedly extorted DPR for bitcoin by selling off information from inside the federal investigation into Silk Road. Force made ~$1M off from these efforts. He also used his DEA authority to do background checks on behalf of CoinMKT, another exchange.

Force got into trouble though when he started using his badge to clear up personal issues with Venmo and Coinstamp – they got suspicious and reported him. There are also screenshots of some interesting emails between Karpeles and Force – Force trying to offload 250 BTC, then looking for a job at Mt Gox and then needling Karpeles when his exchange was raided. The information about Force and Bridges’ actions wasn’t raised in the Ulbricht trial. The judge explicitly declared it not exculpatory. The government says that agents Force and Bridges had gained admin access to Silk Road (through Curtis Clark Green), but they used that access in ilicit ways about which the government had no knowledge.

⊃ thinking clearly

I bought a paper (!) copy of this book, The Art of Thinking Clearly. It would do well as a set of flashcards.. But it is a good read! Dobelli reviews a compendium of logical fallacies, meditating on each a bit and casting them in various personal terms.

survivorship bias

We overestimate our chance of success because we’re naturally drawn to pay attention to the success of others. There’s no compelling story in comprehensive failure, so we don’t hear it.

the swimmer’s body illusion

Confusing the factor of selection for the results.

clustering illusion

A proclivity towards seeing patterns when there are none.

⊃ gps watch

There’s a new GPS watch out now by garmin, I believe. Quite expensive though.. I was thinking it’d be cool to have a very simple watch that tracked distance traveled. For runners, say, or cyclists.

There are some very small lipos out there..but how small are (cheap) GPS parts now? Ah, the ultimate gps breakout from adafruit looks watch-ish, even in breakout form.. This plus a few LEDs and a button and this lipo, say.

The teensy would work..though this watch is becoming more of an armband :/ Not bad for all off-the-shelf, breakout stuff.

gps front gps quarter teensy lipo

⊃ uniform sampling in a polygon

I was working on uniform sampling in an arbitrary polygon.

An interesting stackexchange thread. A pdf on the topic – these kinds of problems arise in geosciences. For instance when you need to analyze a large area of soil.. We need a low discrepancy sequence.

A thread on such matters – recommends ray casting and a triangulation method, among other things. Triangulation seems straightforward.. The original SE thread suggested hexagonal meshes and spatial simulated annealing. A brief explanation of SA, very cool.. What is the fitness function? Maybe something like spatial variance? So the SA run could be:

  • drop N (x, y) points in the polygon randomly
  • start iteration
  • for each point:
    • find distance to other points
    • find M nearest neighbors
    • calculate average distance to each neighbor
  • find variance of the averages – this is the output of the fitness function (lower is better)
  • this variance is better than the best yet-seen variance value?
    • yes: reduce temp, save new set of points and fitness
    • no: random value < temp?, then keep the set of points and fitness (and don’t touch the temp)
  • perturb points and repeat, hm, how to perturb points that remain valid (in the polygon)? scipy might not be able to do this..I guess points outside of the polygon could return a high energy.. simanneal could do this, but I can’t use that :(
  • you could wrap that variance function around a larger fitness function – it usually returns variance unless points are outside the polygon, then it returns the variance * some multiplier

Twiddling the SA dials, you can start to appreciate (laugh at?) the recursiveness of the work.. I’m watching a graph output get closer and closer to what I expect, myself optimizing the optimizer (and probably overfitting :|)

The end result:

⊃ learning react

fb’s getting started tutorial

React templates can be ‘compiled’ client- or server-side. When running client-side, you include this JSXTransformer code that converts your JSX (XML syntax inside js) into vanilla js. There is some weirdness – the XML is like your typical html tags but with some adjustments to work around reserved js keywords. Maybe the more appealing thing is to use react server-side:

$ sudo apt-get install npm
$ sudo npm install -g react-tools
$ sudo ln -s /usr/bin/nodejs /usr/bin/node
$ jsx --watch src/ build/

Now files in src are built into vanilla js, which can just be included in your website.

react’s comment box tutorial

Could of course write vanilla js with react, but let’s try JSX.. React has builtin xss protection, it says so we’re not quite just rendering strings. I added two vim plugins: pangloss/vim-javascript and mxw/vim-jsx, btw, With let g:jsx_ext_required = 0 in my vimrc so jsx is rendered in js file.

Composition of components is nice:

var CommentBox = React.createClass({
  render: function() {
    return (
      <div className="commentBox">
        <h1>Comments</h1>
        <CommentList />
        <CommentForm />
      </div>
    );
  }
});

You can pass data to child components – this passed data is called ‘props.’ So the comment becomes:

var Comment = React.createClass({
  render: function() {
    return (
      <div className="comment">
        <h2 className="commentAuthor">
          {this.props.author}
        </h2>
        {this.props.children}
      </div>
    );
  }
});

We get the actual text via this.props.children – note that the comment text is “an XML-like child node.”

For client-side markdown rendering, we skip the xss protection with dangerouslySetInnerHTML. Components can define a getInitialState method that will execute once during the component’s lifecycle. You can also specify componentDidMount which is called when the component is rendered. Changing the state with setState will trigger a re-render.

You setup your own system to get new data (websockets, polling, etc). Use the ref attribute to assign a name to a child component, and then access the refs with this.refs. Pass data to a parent node by attaching a callback to a prop and binding to an event (eh, this is kinda convoluted – see the handleCommentSubmit bits of code in the tutorial).

Note the little bind(this) appends to the ajax calls. It’s the new alternative to caching this ala var that = this;. An article on the subject.

the material dropdown tutorial

Note that this sets props name and key, which makes sense..they are HTML / XML properties, nothing special with the naming there:

return (<DropdownItem name={item} key={index} />);

This tutorial notes that state is mutable while properties are not, or that’s how you should treat it. So visibility of the menu is a state as it changes when the dropdown is shown/hidden. But the name of a menu item, that’s a property.

Interesting that we’re using react for an animation state change. And managing svg icons, that’s cool too. The end demo is pretty broken and outdated, but it’s good to just type this stuff, I think.

Also of note:

⊃ website thinkin

..I’d hate for this sidebar site to be my main thing – it’d be fun to be more creative

like what about a generative site.. you scroll down and roots continue being created in the side/background. and you can browse back into old projects. maybe you have to hack them away to actually read the site. generative flowers and animals on the notes site or like animated ascii art.

like this awesome demo from nick kwiatek or something colorful ala substack

or maybe the projects page scrolls horizontally, everybody loves stuff like that :| something a bit less prosaic and predictable, I mean, that’s ok for the blog stuff, but the projects section should be fun..

there are lots of interesting typographical glyphs to play with

on urls, what a wonderful timesink.. this site helped me find some interesting words for mashing together. for instance, “pinewire” and “oakmachine” were available, and they sound nice.

The workflow, I got my theme set as a subtree:

$ git subtree add --prefix themes/pasture https://github.com/yosemitebandit/pasture.git master --squash

Then I can pull or push

$ git subtree pull --prefix themes/pasture git@github.com:yosemitebandit/pasture.git master --squash
$ git subtree push --prefix themes/pasture git@github.com:yosemitebandit/pasture.git master --squash

structure

It would be nice to view tags in two ways: the first typical way would just shows articles corresponding to that tag. The second would actually show the full article for every tag to make searching with CTRL+F easier.

I also find myself adding on to notes a lot.. I think I should just write some notes each day and tag sections / paragraphs. When the tag is clicked, things could be aggregated there. Then this could be more of a traditional blog where I’m just writing.. So each article would be a “day” and the pages with meaningful names would be tags.

others

  • 8th light has a nice looking set of articles
  • this is a nice blog

ideas

  • page that summarizes everytime I’ve typed “wow”

⊃ up and down US 1

Or really down and up – Nora and I went to Pt Lobos and Big Sur..

US 1 S pasture big sur rocks

..then up to Pt Reyes for MLK weekend.

pfeiffer state beach nora and me

⊃ mexico 2015

Me and Nora went to Tulum and had a great time!

Swam in a cenote. Ate a lot of fish tacos. Went snorkeling a bunch Saw some amazing ruins. Stayed at a different airbnb place almost every night and met a ton of people. Slept in a bed made of sticks :|

⊃ copying in tmux

copying in tmux

  • ..couldn’t find any good way to do it with the keyboard.
  • lots of tutorials involving copy-pipe and xclip and vi-copy, none of them worked for me
  • best solution I found was to use the mouse :|
    • hold shift and then you can select text with the mouse
    • ctrl-shift-c to copy

⊃ hacking tip.golang.org

hacking tip.golang.org with Brad Fitzpatrick and Andrew Gerrand

  • hah, never even thought of connecting an external keyboard for pair programming
  • the source of what they built (and I guess I wrote this too)
  • nice examples of os/exec
  • I later used cmd.Run() rather than cmd.Start() to make execution wait until the cmd finished
  • this of course is nicely explained in the docs, the lesson being that I should’ve read those more carefully before my other trial-and-error attempts
  • kind of comforting that their hacked proto is just that..hacky and undocumented and hand-tested
  • the whole architecture is kind of cool too: there are two “worlds” and only one of them is presented at any given time. If golang’s master changes, the second (still-hidden) world is rebuilt with those docs’ changes. when this build is done, we switch to showing this new world and destroy the old one.
  • it’s also a nice demo of mutexes
  • and there used to be some content on the flag package, but no more

⊃ getting started with hugo

some notes on the static site builder, hugo..

  • download the hugo binary for 64 bit linux
  $ mkdir -p archive
  $ cd archive
  $ wget https://github.com/spf13/hugo/releases/download/v0.12/hugo_0.12_linux_amd64.tar.gz
  $ tar -xvf hugo_0.12_linux_amd64.tar.gz
  $ sudo ln -s /home/matt/archive/hugo_0.12_linux_amd64/hugo_0.12_linux_amd64 /usr/local/bin/hugo
  • setup a test site and the themes
  $ hugo new test-site
  $ hugo new about.md
  $ hugo new posts/first.md
  $ git clone --recursive https://github.com/spf13/hugoThemes themes
  • start the server, building drafts and selecting a theme
    • the watch flag will automatically reload pages when they change..cool
  $ hugo server --theme=hyde --buildDrafts --port=8080 --watch
  • playing with this more, and getting themes set up has been amusing.. I quite like this theme
    • it has required me to install npm and then bower..
    • and then rvm and then ruby gem and then bundler and then compass (rvm of course also has its own installer for more reqs)
    • of course I previously installed hugo with go and go with apt (well, I could’ve)
    • compass throws a bunch of warnings on scss compilation
    • and after all that, the hugo template fails to compile :(

⊃ javascript and es6

notes from glenmaddern.com

  • ran through the loopgifs demo from the video
  • it was cool to use js in a much more structured way (and in a functional style)
  • modules and some of the tools are both nice, though I still had weird node / npm install issues ..confusion I remember having years ago
  • nice to be exposed to some of the new es6 features like arrow functions and classes
  • also semicolons..I didn’t type a single one, whoa, what happened there?

⊃ optimizing in go

notes on optimization..

on benchmarking / optimization

  • part of their anti-DDOS system
  • they examine User Agents and ‘Referer’ (hah, a misspelled part of the HTTP header that persists)
  • they use kafka extensitvely, incidentally
  • benchmarking results showed a ton of calls to ‘external code’
  • note the cpu profiler: go test -bench=. -cpuprofile=cpu.out ..but this was just some virtualbox artifact
  • switched to vmware
  • string manipulation produced a lot of garbage

on testing

  • covers basic testing
  • the nice little outyet demo app
  • and info on testing http clients and servers: httptest.NewServer

⊃ golang first bits

spurious notes on golang..

how I start - golang

  • I’ve done some go tutorials here and there – this one is pretty well-paced given my present knowledge
  • “In Go, we tend to implement behavior in terms of functions operating on interfaces.”

on interfaces

  • interfaces are a set of methods
  • types satisfy an interface if they implement all of the interface’s methods
  • then you can declare a function arg to be an interface, and any type that satisfies the interface can be used in the function call
  • also applies to empty interfaces - any type can be used as a function arg or returned
  • for instance fmt.Println expects a Stringer interface as an arg. this interface contains a single method: String() string so if you make a struct and an associated String() string method, you can fmt.Println that struct
  • contains a nice sorting example

table-driven testing

on benchmarking

  • nice, it’s part of the testing package
  • cool: benchmarks are run a few times until the runner is satisfied with stability; they run for a minimum of one second each, with increasing b.N until this is satisfied (to build statistical confidence)
  • invoke with go test -bench=.
  • a common mistake: using b.N values in the benchmarked function: these just control the number of loops

  • on privacy: if the first letter of a struct or variable is capitalized, it is visible outside the struct

  • and some interesting gotchas with compiler optimization – but I don’t think my vanilla code fell into this trap..

  • then I memoized with var m map[int]int which panicked with “runtime error: assignment to entry in nil map”

  • but it works with var m = make(map[int]int)

  • explained in the gobook you should basically just use make to init an empty map (and slices and channels)

  • see the blog or effective go

on the web framework martini

  • too much magic
  • breaks type-safety via dependency injection
  • interestingly, the martini author took a lot of this to heart and later wrote a middleware system, negroni
  • gorilla would be something else to consider, as well as goji
  • consider trying webapps in pure go or here

⊃ microservices and the monolith

part I

  • lots of nice links to fowler
  • started adding new features as microservices
  • left the monolith in place
  • microservices talked to their public API like any other app
  • ..but eventually had to add an internal api

part II

  • broke the monolith by identifying ‘bounded contexts’ – “well-contained feature sets, highly cohesive, and not too coupled with the rest of the domain”

and part III

  • broke their team down to focus on specific areas of the platform (reminiscent of conway)
  • initially had tons of languages and tools, but decided to consolidate
  • then chose a stack for rpc, resiliency and concurrency
  • settled on the finagle framework, an rpc system for the jvm by twitter
  • for later reading: your server as a function

⊃ microservices, first bits

on architecture

“The job of good system architects is to create a structure whereby the components of the system – whether Use-cases, UI components, database components, or what have you – have no idea how they are deployed and how they communicate with the other components in the system.”

on micro-architectures

  • interesting note on synthetic / active monitoring: periodically simulate a customer using your service and measure the simulated actions
  • ..but overall not very substantive

on configuration drift

  • either reapply configurations or burn down the servers (see fowler’s ‘phoenix server’ concept)
  • this is from 2011, and I think it’s a pretty well-accepted idea now

on the size of microservices

  • one brain should be able to comprehend the whole thing
  • may also be useful to divide along read/write boundaries so those can scale separately
  • “The point, though, that each application within a business capability should have a comprehensible single purpose. The Single Source Of Truth for Customers, Processing Queue Entries, Storing Requests, Projecting Events. Writing transactions.”
  • ‘Many organisations end up arranged into teams according to specific technologies – the BPM team, the DBAs, the “services” people.” (reminiscent of conway) goes on to say that this is undesirable – we should be able to talk with all colleagues about all things

on testing microservices

  • proposes ‘contract testing’ – something less than end-to-end but more than component
  • basically an API test: certain fields must be available in a response, for example

⊃ on apache kafka

Notes from this slideshare.

  • very high throughput messaging
  • producers write to brokers; consumers read from brokers; data stored in ‘topics’, topics are split into partitions which are replicated
  • topics are ordered, immutable sequences of messages that are appended to
  • each message in a partition receives a sequential, unique (per-partition) id called the ‘offset’
  • consumers track their pointers via (topic, partition, offset) tuples
  • partitions are for consumer parallelism
  • there are also replicas for backup; never read to or written from though
  • for each partition, kafka picks one broker as the ‘leader’; partition replicas are spread over brokers

  • can audit kafka with custom topics and consumers; per-cluster consumers can read all messages out of the cluster and count all messages for a topic

  • tune the OS, the JVM, but not too much to be done on kafka itself, except concurrent processors

  • messages are committed when some number of in-sync replicas (a tunable parameter) for that partition have applied the data to their datalog; so you can trade latency for durability

  • consumers only ever see committed messages

⊃ on how committees invent

from melconway.com

  • “Given any design team organization, there is a class of design alternatives which cannot be effectively pursued by such an organization because the necessary communication paths do not exist.”
  • “there is a very close relationship between the structure of a system and the structure of the organization which designed it” (in fact they’re identical)
  • “It is an article of faith among experienced system designers that given any system design, someone someday will find a better one to do the same job.” (hah, no irony here either, I think)
  • “To the extent that an organization is not completely flexible in its communication structure, that organization will stamp out an image of itself in every design it produces.”
  • and then a perhaps cynical, but certainly interesting take on managing systems which concludes that communication in large teams disintegrate, and so, qed, large systems themselves disintegrate
  • “Because the design which occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design.”

⊃ outwards from the middle of the maze

https://www.youtube.com/watch?v=ggCffvKEJmQ

  • developers used to have application-level guarantees via transactions (think action1, action2, commit)
  • but in today’s ecosystem (mostly sans transactions), there are fundamental problems with latency, concurrency and partial failure that are hard to hide
  • when will we get those guarantees back? and how?
  • he mentions brewer’s previous keynote and brewer’s point that we should build simple reusable components that are intended to be combined; we can reason about these components (their latency, their failures) in a direct way; he also mentions people ten years ago thought this would be impossible – building libraries of reusable components for large scale systems

  • so we have composition now, but we have to compose /guarantees/

  • two things make distributed systems hard: asynchrony and partial failure

  • asynchrony could be handled in isolation by timestamping things and interleaving replies

  • partial failures can be handled in isolation by providing replication in time or space (more nodes or replay messages)

  • these fixes do not work with both problems together though..

  • on asynchrony, a mention of CRDTs, leveraging datastructure properties to help achieve cheaper replication; basically objects have semantics that we can exploit and replicas won’t diverge

  • or we can find ‘confluent components’ – functions that, for all orderings of inputs, produce the same set of outputs..but some things unfortunately need to coordinate

  • could enforce some order..or could enforce producer / consumer relationships

  • so ask how components change state and avoid coordination where possible..

  • on fault tolerance, composing fault-tolerant components doesn’t result in fault-tolerant systems because it also depends on the glue that binds them

  • nice example at 31min on kafka (message system), zookeeper (decides on replicas, clients)

  • component specs often do not compose..so we do a top-down testing approach; put the system together then just do integration testing :/ or we run chaos monkey.. but when do we stop running these things, they take forever..

  • so we really just run these things til they seem ok..

  • or, since we have a system, we can run it! and then we can ask why did a good thing happen?

  • ft is just redundancy in time and space (think having nodes rebroadcast messages [time] and then having replica nodes [space])

  • look at the fine-grained data lineage, and try to find a set of failures that breaks the good outcomes

  • more interesting failures at 52min from kafka, two-phase commit, 3pc

⊃ service discovery

progrium.com

  • there is this mesos project
  • says DNS is insufficient due to the impracticalities of managing one’s own DNS and the inability to handle real-time changes in name resolution
  • google used chubby in 06 as distributed lock and kv store, replacing DNS
  • zookeeper is open source chubby; high availability and reliability in exchange for performance
  • both use paxos consensus algorithm (see raft alternative)
  • etcd is the new http-friendly alternative to zookeeper
  • says service discovery should have a consistent, ha directory + registration + health monitoring + lookup and connection features
  • in another post, talks about consul.io, a “powerful tool” with monitoring, config store, DNS..maybe our tools should be less powerful and more puny. though I guess consul is just gluing together a lot of other stuff

⊃ shapefiles

some basics on shapefiles:

  • wikipedia has a nice intro
  • .shp, .shx and .dbf files are required – they’re binary files with all of the vector data, indexing info and feature notes
  • other optional files may be present too, like .prj (projection info)
  • everything’s sequential – first record in the .shp corresponds to the first record in the .dbf
  • fiona is a nice python package for reading shapefile info – it returns GeoJSON

  • and here’s a gist on using 2010 TIGER block-level census data to find the population of a circular area

  • you can also do a lot in ipython as this post highlights

  • my notes on setting up a virtualenv with gdal, basemap, ipython, etc are here

⊃ tdd roundtable

I use TDD at work and even on some personal projects so it was interesting to hear perspectives from Beck, Fowler and DHH.

is tdd dead? - hangout with dhh, martin fowler and kent beck

  • betteridge’s law says, of course, no
  • mostly just some interesting thoughts on their experiences
  • “if I can’t write a test first I have no business writing it,” says beck
  • “do programmers deserve to feel confident?” asks beck. (the answer must surely be yes.) but dhh doesn’t get that confidence from tdd, interestingly..ah, he just doesn’t like the red/green/refactor bit
  • for beck, it’s about flow, “from specific to general, this kind of inductive style”
  • “to make a design’s intermediate results testable comes with a cost..it comes with a benefit too,” says beck, and follows with a nice example of orthogonal functionality in a parser – actually parsing vs, given a parse tree, computing the correct result
  • beck mocks almost nothing – if he can’t test everything with the real stuff, he looks for another route; he’s seen mocks returning mocks returning mocks..tests too coupled to the implementation rather than the interface
  • fowler also avoids mocks

robert martin has a wrap up of this series

  • have to consider tdd as a team discipline

when tdd fails

  • claims tdd is ineffective at the physical boundary and at the layer in front of the boundary that requires ‘human fiddling’
  • not testing the human fiddling stuff doesn’t sit well with me..
  • says you have to manually test the physical interaction at some point and then trust a mocked driver, which I guess is true

also skimmed monogamous tdd which touches on similar topics:

  • interesting point about tdd vs integration-only tests; could just go full integration testing with, say selenium. but it fails the author’s requirements of trustworthiness (essentially coverage) and speed

⊃ think complexity chapter one

from greenteapress.com

  • there was once this phlogiston theory – phlogiston being an element allegedly released during combustion
  • thinking around ‘complex systems’ isn’t a paradigm shift, says downey, these modeling techniques are just becoming more acceptable, they won’t replace classical formulations
  • these results are often simulation-based rather than analyis-based
  • also trend from continuous math to discrete
  • mentions the four-color theorem was proved in the 70s with 1936 computer-generated special case supplements attached (a first), cool

⊃ think complexity chapter two

more from greenteapress

  • cool decay example, suppose you have lambda, decay rate per second between two nuclides
  • (half life is just ln2 / lambda)
  • probability that an atom decays in an interval is lambda * dt
  • with multiple decays, probability of a decay path is found by just multiplying all probabilities along the chain
  • decay chain of highest probability? give each edge a ‘length’ of -1*log(lambda) and then find shortest path via astar or dijkstra
  • because adding log probabilities is the same as multiplying probabilities
  • since log is negated, shortest path is most likely decay route

ex 2.1

  • simple graphs are undirected with no loops
  • regular graphs are those in which vertices have the same number of neighbors (vertices are of the same degree or valency)
  • complete graphs: each pair of vertices has an edge connecting them
  • paths are sequences of vertices that connect vertices
  • cycles are sequences of vertices starting and ending at the same vertex
  • forests are graphs with no cycles
  • trees are connected graphs with no cycles
  • connected graphs are those that contain a path from every node to every other node

ex 2.2, 2.3 and 2.4

⊃ storing and deploying keys securely

In several flask apps at aquaya, I used a combo of environment variables and sample config files filled in with real data on the live servers only, not in the source. But how to do this with a lot of servers that may be rebuilt at any time? How do you avoid passing state to these servers when you provision them? And how do you share these credentials among team members?

  • ah, 1password or some other password vault would work – schneier wrote one, even
  • and then you use two-factor on the vault
  • IAM roles may be the AWS way
  • so maybe use config files and ACLs?

  • nice article on token storage in mobile apps (suggests provisioning, splitting, rotating)

  • travis suggests having a type of secure, encrypted key

  • services hashing their API keys, displaying them in plaintext just once and expecting you to treat them like passwords (and use a vault)

  • ah, a provisioning idea

    • master service provisions a new instance
    • new instance generates a keypair
    • master encrypts secret with instance’s public key, and sends encrypted text to the instance
    • instance deciphers into an env var
  • a nice ansible example with mention of git-crypt which is kinda similar to the idea above

  • another ansible post using openssl to encrypt certs and vars_prompt to get a user-typed password

⊃ belize 2014

We started inland, heading to San Ignacio.

belize temple

There are some amazing archaelogical sites and great food – these are salbutes.

belize salbutes

Then we visited Placencia and Hopkins, snorkeling a bit in Laughing Bird Caye. I got to try hudut – a Garifuna fish stew, prettttty tasty!

belize hudut belize laughing bird caye

⊃ Project Wing

I was part of the team that worked on Google’s ‘self-flying vehicle’ project. We started with the goal of delivering emergency medical supplies to people in need – things like AEDs when someone’s having a heart attack, or antivenom to hikers in the wilderness. (A background video is here.)

I learned a ton on this project.. I designed and built early prototypes of the winching delivery system, our method of choice for getting things to the ground. The idea was to hover about 10m over the delivery area and, via a retractable cable, lower the payload to the ground in a controlled manner. This worked quite well in that it was accurate and kept people away from the propellors. It did make for some awesome crashes though ..we had a great crash reel (: Below is an early delivery prototype.

the egg prototype

I also worked on the electronic speed controllers – components that convert control signals into three phase power to drive a single brushless motor. Our vehicle had four of these components per vehicle and, for a few months, it felt like we’d lose an ESC on just about every flight. We were using an off-the-shelf component and unfortunately we were uncovering some firmware bugs. I built a test rig to replay control data from our logs onto a benchtop ESC, and eventually I could show that a specific sequence of control commands could put the ESC into a faulty state. We worked with the vendor to fix this, and also explored making our own ESCs in-house. (Turns one of the first engineers on Google Maps is also an ESC expert!)

GPS was another fun area to troubleshoot – we had bought some pretty high end gear with external antennas on each vehicle, but we faced issues with accuracy, interference and, at one point, even with data transmission rates. I built a standalone system to test differential GPS as a way to improve the precision of our location fix, though we ultimately didn’t deploy it due to its complexity. I used that same test rig to debug our transmission rate issue. And our interference woes were eventually solved via various shielding and antenna mount prototypes.

wing vehicles

I worked on a pair of simulators too: the first emulated the dynamics of our vehicle in XPlane, an off-the-shelf flight simulator. This didn’t work out so well unfortunately – our vehicle took off vertically but primarily flew horizontally, and it was tough to model the aerodynamic forces that occured in the transitory regime. The “glue” code between our control software and the aero simulator was also written in Python and that interpreted language was just too slow, should’ve used Go! The other simulator took a very broad look at the delivery service as a whole, helping the team understand how many vehicles we would need to cover demand given parameters like vehicle range and speed.

I also got to visit Australia to do some flight tests – Wing was a very fun project all around!

⊃ Uganda 2012

In the middle of my trip to Mozambique I flew briefly to Uganda to help with a conference Aquaya was putting on.

Unfortunately didn’t meet up with Kenneth or Ignitius.

Did see Roey and toured his awesome office. Heard of his amazing plans with husk power. ..Saw James Bond which mentioned Uganda to everyone’s delight.

⊃ arduino command line setup

My friends and I are using the new ARM-based Arduino Due for our sphere project. To use the Due, you have to run v1.5.2 of the Arduino software. This version also comes with some new command line support which seems to work well.. although it opens up a GUI and that’s annoying (asking around for a fix).

In the past I’ve used ino, and that was nice, but the current official release of ino doesn’t support Arduino v1.5.2. This fork almost works for me, just had a small problem using it with the Leonardo. Might also try this idea for something ino-esque.

The very first step to getting Arduino 1.5.2 working on Ubuntu 12.10 was to just test the Arduino IDE. I had the “greyed out serial port” problem. This was solved by adding my username to the dialout group via: sudo adduser matt dialout. Then I had to restart my machine (just logging in to a new terminal session wasn’t working..).

Then I could compile and upload from the command line like so:

$ arduino --verify /path/to/sketch
$ arduino --upload /path/to/sketch

This uses the last-used values for the serial port and board. Check the docs for more switches.

update

Got Arduino 1.5.2 and ino working with the fork from rogue-hack-lab. Well, I had to add one more tweak to make the Leonardo happy.

Then, like it says in the ino quickstart, I created a config file, ~/.inorc, with just one line:

board-model = leonardo

and now I can build and upload from the command line:

$ ino clean && ino build
$ ino upload

hooray!

⊃ OCR with OpenCV and Python

After finally getting my machine setup with OpenCV, NumPy, SciPy and matplotlib I was excited to try this code from Abid Rahman. He built a K-nearest neighbor classifier for images of digits in python. Below I’ve reposted his scripts with my annotations, and I’ve written how the system works in my own words.

The first script of two scripts, train.py, will take an input image of digits, perform some image adjustments and then look for contours in the image. The image’s contours are the “islands” of black in the sea of white, so if the image is clean, each discovered contour is probably a digit. Here’s an example training image from Rahman’s post:

training image

When you run the first script, the full training image will be shown, and the program will sequentially draw a bounding box around each contour it has found. After drawing a box, the script waits for the user to type in a number that corresponds to the digit that was just boxed. The system captures and stores this training data.

After training a lot of numbers in that first image, we can use a KNN model to classify another image. Again from Rahman’s post on SO, I used this for classification:

mystery image

The classify.py script will preprocess the image using the same methods performed on the training image. It will load up the training data, train a KNN model from OpenCV and then perform a classification for each discovered contour. The found numbers are written to an output image for a side-by-side comparison with the input.

With such clean training and classification images, this process has been 100% accurate for me. Definitely check out Abid Rahman’s OpenCV blog for more interesting scripts.

⊃ getting started with OpenCV in Python

I had a hard time training tesseract 3.02.02 for the mtrrdr OCR project so I’ve decided to try some options with OpenCV and SciPy. I was interested in testing the ideas posted here and here so I wanted to install OpenCV, NumPy, SciPy and matplotlib. I’m running Ubuntu 12.10, Python 2.7 and here’s how I got setup:

For OpenCV, this SO post was very informative. I first downloaded OpenCV 2.4.4 from Sourceforge. I added some needed libraries, and then I followed the build instructions here.

$ sudo apt-get install cmake libgtk2.0-dev
$ mkdir opencv_binary_dir
$ cd opencv_binary_dir
$ cmake ../opencv-2.4.4
$ make
$ sudo make install

For SciPy I needed a few extra libraries, then I could install from pip:

$ sudo apt-get install liblapack-dev libatlas-dev python-dev gfortran
$ pip install scipy

NumPy is mercifully simple:

$ pip install numpy

And matplotlib:

$ sudo apt-get install libfreetype6-dev
$ pip install matplotlib

These all took a while to compile.. At the end you should be able to do this with no complaints:

$ python
>>> import cv2, numpy, scipy, matplotlib

..Unless you put SciPy, NumPy and matplotlib in their own virtualenv like I did. OpenCV is installed system-wide and is not known to your little virtual world unless you set some flags during your build. This SO post suggested the solution:

$ virtualenv /conf/virtualenvs/opencv
$ cd /conf/virtualenvs/opencv/lib/python2.7/site-packages
$ ln -s /usr/local/lib/python2.7/dist-packages/cv.py .
$ ln -s /usr/local/lib/python2.7/dist-packages/cv2.so .

Then you can activate the virtualenv and import at will:

$ . /conf/virtualenvs/opencv
(opencv)$ python
>>> import cv2

⊃ mtrrdr

Trevor and I want to do some depth mapping of the bay in our boat. We looked at using underwater sonar transducers like those from MaxBotix but they’re usually water-resistant and not waterproof. (I guess that one’s IP67 rated – someone in a forum said they’ve still had issues.)

So now we’re looking at fishfinders and pre-built depth gauges. To get the data out we could open them up, listen in on the RF comms or maybe just take periodic images of the screen. It’s sort of a ridiculously roundabout method but I think it could be useful for other folks with pre-built instruments. Here’s a sample panel from Amazon:

hawkeye depth gauge

“mtrrdr” (meter reader) would be one part Android app and one part web service. The app would take periodic, timestamped images of the fishfinder screen (and also probably log position with GPS). Those images would at some point be processed by an OCR library on a server.

For OCR I’ve been testing tesseract and a simple python interface, pytesser. I compiled tesseract for Ubuntu 12.10 (Quantal) following the steps here. Then I added the English training data from here. I could then run the basic “fnord” pytesser example.

Next step I think is to create a training dataset for our dial. The default training data was unable to process the panel pictured above.

⊃ Cambodia

I had an amazing time traveling in Cambodia while working for Aquaya. I spent just a few hours in Phnom Penh – most of my time was with Kounthy and Livy of the Teuk Saat NGO, working in the excellent town of Battambang.

Here’s me giving a certificate to Chhong Bora.

Chhong Bora

Tem, Kounthy and I toured Banteay Chhmar:

Banteay Chhmar

And Livy and I went to Wat Banon – stairway up to the temple on one side of the hill, landmines on the other.

Wat Banon

⊃ Laurel walking tour

I’ve been working on a walking tour project for the Laurel District in Oakland. Kaliyah, a high school student, is the lead and I’m doing some technical assistance.

⊃ Mozambique

While working for Aquaya I spent a busy few weeks in Maputo and Beira.

Technical assistance background and the 1M Initiative.

Work with Matteus, Delfim and Sergio.

Other observations - the lumber camps, the rural areas, the resources, the rent, seeing the presidential results come back

⊃ redream

I helped build a dream restoration machine during a hackathon hosted by GAFFTA and the Tribeca Institute. Our site took written memories of dreams and pieced together video montages that evoke the nocturnal experience.

We used to tweet out all the dreams on @redream_us, but we no longer have the redream.us domain, and the compilations were rendered client side, so the montages are lost to time :(

We applied some really simple natural language processing techniques to find keywords in the dreams people described, and we pulled clips based on those keywords from Vimeo and the Prelinger Archive. The videos were spliced together on the site and kind of time-shifted, so the dream would seem to loop, but they’d be subtly different each time. And we let the audio of all clips play simultaneously – this was my favorite change that made the montages really flow.

Some of the filmmakers on the team put together this great compilation by hand – it gives some feel for what the reconstructed dreams are like.

This team was really, really great – everyone’s listed here. We put our code on github, and the readme goes through more of the technical specifics if you’re curious!

⊃ target accrual

Some friends and I built targetaccrual.com for the StartX-Med hackathon at Stanford.

It’s two frankly separate ideas.. We wanted to build a network graph of researchers for the Rare Cancer Research Foundation (led by friends that work on the Chordoma Foundation). PubMed was scraped to establish co-authorship networks for various diseases. See github and the demo for more.

target accrual

To present a business angle, we got into headhunting for clinical trials. Many trials fail due to a lack of enrollment and these failures are expensive. We planned to use scraping systems to inform a custom ad network. Ads would be targeted to connect doctors, candidates and trials.

⊃ Aquaya

Aquaya is a small nonprofit research and consulting group that works with water service providers around the world. I worked there from February 2012 to February 2013, primarily focusing on their information communication and technology initiatives.

When I joined Aquaya, I began working on Aquatest, a program that studied the use of mobile phones for collecting water quality data in rural areas. At the time, many of the organizations we worked with were successfully providing water, but they were struggling to measure water quality, especially in rural areas. The Aquatest pilot program combined a portable microbial testing system with the use of various mobile phone apps to get data from the remote test sites back to HQ. We interviewed teams using these systems in Mozambique, Vietnam and Cambodia. Here’s an operator that worked with Teuk Saat, our Cambodian partner.

Teuk Saat operator

Teuk Saat had an especially interesting dilemma – many of the old school phones we were trying to work with couldn’t support Khmer glyphs, so the text on their data collection apps had to be in English. Many operators just memorized the positions of responses on the screens of their phones, but this was clearly not ideal. I worked on IVRHub, a system that enabled voice-based Q&A for data collection, hoping it would allow managers and operators to use whatever language they preferred. Sadly we never tested it with real customers.

We later worked more with our partners in Vietnam (HueWaco) and Mozambique (UNICEF) to build an open source business intelligence system. Initially, many of their water quality records were kept on paper and their measurements were compiled manually. We wanted to change that so they could gather more data and compile reports more easily. To this end I built Pipeline, a flexible system for data collection and aggregation. We went through a lot of iterations with our clients and it was very gratifying to build something that improved their data collection and analysis capabilities.

⊃ Estes Park 2013

stayed for a few days in Estes Park with about 20 other folks this was just a few days after leaving my job at Aquaya we talked about various puzzles and challenges we’re all facing

I asked about whether I should consciuosly focus on a sector rather than bouncing from project to project as I do now got some good feedback too –

I especially liked meeting some of the slightly older folks – lots of great role models, of sorts. People who lead very successful and well-balanced lives, while also dealing with a lot of challenges in their lives.

⊃ SSL setup on nginx

update: DigitalOcean has a very nice guide on doing this with the LetsEncrypt CA – it worked well for me! Though I am a bit worried about the well-known dir and its persistence with my current build methods.

I keep forgetting how to do this, so here are some notes:

  1. you’re going to buy a ComodoSSL cert from Namecheap or something
  2. your server needs a cert-signing request, but first we need an associated private key: openssl genrsa -aes256 4096 > server.key
  3. move and rename the key based on your site: sudo mv server.key /etc/ssl/private/procession_org.key
  4. ..and protect it: sudo chmod 400 /etc/ssl/private/procession_org.key
  5. now use the key to make a CSR: sudo openssl req -sha256 -new -key /etc/ssl/private/procession_org.key -out server.csr
  6. this opens a rare unix wizard – the trick here is to put “procession.org” or whatever as the Common Name
  7. give the generated CSR to the commercial provider and wait for their reply..
  8. Comodo sent me procession_org.crt, PositiveSSLCA2.crt and AddTrustExternalCARoot.crt these need to be concatenated in reverse order:

    cat procession_org.crt > procession_org-ssl-bundle.crt
    cat PositiveSSLCA2.crt >> procession_org-ssl-bundle.crt
    cat AddTrustExternalCARoot.crt >> procession_org-ssl-bundle.crt
    
  9. I like to put that bundle in /etc/ssl/certs/

  10. then in an nginx config file (probably sites-available/procession_org.conf):

    server {
        listen 443;
    
        ssl on;
        ssl_certificate /etc/ssl/certs/procession_org-ssl-bundle.crt
        ssl_certificate_key /etc/ssl/private/procession_org.key
    }
    

The linode guide is quite good.

⊃ Inglourious Basterds

It had its moments. I liked the running joke about bad accents. The movie star’s “English with a touch of German” seemed pretty forced, especially in the bar scene when they were fixated on the accents of the imposter officers. And the botched Italian was kinda funny too.

I did enjoy Waltz. Why did his character kill the movie star? And he seemed to recognize Shoshana (he ordered milk for her) so was he planning to defect all along? More wikipedia-ing is in store..

⊃ the prism, ultimater version

My friend Trevor and I built a one-sheet boat using plans from the famous Hannu. All you need is a lone sheet of birch plywood (38” I think..), some 1x2 runners for support and fiberglass for the outer seams where the flat bottom meets the sides. The inner seams were covered with a thickened resin. The outside has a few coats of Tung oil.

Hannu’s ‘ultimater’ version of the Prism has some stability enhancements and can theoretically displace 800 lbs. But we got a little nervous with three people inside – the freeboard gets to be quite low and if you’re not all carefully balanced, the bow or stern can swamp.

boat finish

We have lots more pictures of our various excursions into the Bay, including Trevor’s excellent timelapse. The boat seems to handle South Bay waters just fine, and we’re tentatively discussing a journey to the East Bay, probably following the narrowing along the Dumbarton Bridge..

⊃ tutoring

I’ve been tutoring with CollegeTrack in SF for a few months now. Thought I’d keep a log of some memorable moments.

Me and K got all riled up about the Second Industrial Revolution. The wealth of Carnegie, the injustice of price-fixing trusts, the funny confusion over monopoly and polygamy, more mixups with labor unions and the unions-of-the-north.. She was really quick and curious, comparing Marxism and vertical integration. I was rapidly out of my depth but it was fun.

Worked with A on his geometry problems. He had really good recall for all of the unit circle coordinates, said he memorized it because he was tired of taking home his worksheet, haha.

I was asked to help with bio once – specifically the Krebs Cycle. Krebs was one of those topics that I never really mastered in high school but we gave it a go and I think we made some good progress. What was so infuriating though was hearing that the teacher didn’t make any sort of effort to teach Krebs in class. Students were given a worksheet and told to teach themselves in groups. There wasn’t even a lecture or a chance to ask questions. I mean, it’s the Krebs Cycle..not really self-explanatory stuff. Is this a case of a teacher not really knowing their material? Total speculation, but I’ll bet that happens.

Got an appreciation for helping with kinematics, that was a good feeling. Kolena really cemented that stuff in..

I remember helping at the English table once on a language classification assignment. Well, I guess it was grammar. I can’t quite remember but I think we were mapping direct and indirect objects, something along those lines. What I distinctly remember was my exasperation and my student’s frustration at the assignment. The answers to the problems, knowing the classification of some noun, seemed pointless. Or at least, I had no idea how to connect it to writing. And getting there, doing the actual work of classification, was not built on any clear rules.

⊃ pipeline

A data collection and management system being built during my work at Aquaya. It was in production at pipelinehq.org and is open sourced on github.

⊃ time traveling forum

brilliant time travelers forum, created by Desmond Warzel

⊃ illusory perception of space and time

an article about why the second hand seems to freeze for just a bit when you start watching a clock..very cool.

⊃ Philip Treacy's hats

would be fun to make some bike helmets inspired by these creations

⊃ john muir

Fast becoming a personal hero, this video was a nice biography of Mr. Muir.

Muir was a talented machinist and inventor during his early life in Wisconsin. His father was a Calvinist and brought the family over from Ireland. Muir worked in a shop until an accident injured (or cost him?) his right eye. Then he began a walk South, eventually reaching Florida. Somewhere around Georgia he was laid low by Malaria. This eventually sent him west to SF.

He corresponded with Emerson.. I wonder if he met Thoreau? Muir apparently was one of the first (or maybe the first) to hypothesize that the Yosemite Valley was carved up by glacial floes.

I’d like to know more about this relationship:

Muir disapproved of and even feared native Indians he met in the mountains of Northern California..

From his time in Alaska, part of a journal perhaps:

One learns that the world, though made, is yet being made.

At around 40 he slowed his wanderings to Alaska and the wilder parts of CA. He married and took up his wife’s family’s orchards. It’s not clear - did he ever leave the country (apart from his early boat ride to New York)?

On teaching the names of plants to his daughters,

How would you like it if someone didn’t call you by your name.

⊃ sense of an ending

A fantastic, meditative book from Julian Barnes. This passage is maybe less powerful if you haven’t read it:

“So, for instance, if Tony …” These words had a local, textual meaning, specific to forty years ago; and I might at some point discover that they contained, or led to, a rebuke, a criticism from my old clear-seeing, self-seeing friend. But for the moment I heard them with a wider reference – to the whole of my life. “So, for instance if Tony ..” And in this register the words were practically complete in themselves and didn’t need an explanatory main clause to follow. Yes indeed, if Tony had seen more clearly, acted more decisively, held to truer moral values, settled less easily for a passive peaceableness which he first called happiness and later contentment. If Tony hadn’t been fearful, hadn’t counted on the approval of others for his own self-approval … and so on, through a succession of hypotheticals leading to the final one: so, for instance, if Tony hadn’t been Tony.

One of the book’s many musings:

How often do we tell our own life story? How often do we adjust, embellish, make sly cuts? And the longer life goes on, the fewer are those around to challenge our account, to remind us that our life is not our life, merely the story we have told about our life. Told to others, but – mainly – to ourselves.

The frequent ruminations on memory and on how we construct our history – these are the best parts.

⊃ kavalier and klay

Loved this review by Ilana Teitelbaum

Think I still gave it four stars on goodreads, though.

The theme of art and its relationship with escapism is the one theme that threads consistently throughout the novel. Otherwise, one might say that “Kavalier and Clay,” for all its strong points, is lacking in that after the tight, virtuoso beginning, the story loses focus and eventually all sense of unity. The plot becomes somewhat convoluted in the manner of John Irving, as if Chabon is throwing oddities into the mix just to keep things interesting. Hence we get Antarctica, the oddball marriage, and the threatened jump from the Empire State Building, which feel as if they are taking place in a world apart from the rich world to which we were originally introduced as readers, which was in itself so compelling. The result is that one begins to wonder where the original story went, if this is the same book, and to wish that it had ended before the pure magic of the atmosphere became replaced with coincidence and contrived circumstance.

⊃ music of 2013

some things I was listening to in 2013:

Tim Eriksen

⊃ Synecdoche

A film to get lost in. To reflect on the worlds you’ve built and the lives you’ve led. The people you’ve known and loved and how you think of them, project to them. What would happen if we could progressively distill our lives? Or maybe it’s about our constant, simplifying reflections on our experiences.

I enjoyed the paintings and the recursed maps. The singer in the bar and the box office. The girl in the green jacket who would be missed.

update

Was he dead from a very early point in the film? Hallucinating on his unfinished works in a moment of elongated time, like that Borges story..?

⊃ Sufjan and delayed gratification

“Carol of St Benjamin The Bearded One” makes you wait about a minute. Then there’s “Djohariah.”

not that the prelude (er, most of the song?) is bad..

⊃ noscript's anti-clickjacking system

noscript’s ClearClick system is very neat. it alters the css of screen elements to determine if you’re being clickjacked

⊃ the colors of the bay

Me and Trevor spent a few hours making this page while our boat dried. The idea is to map a location to a color (or two) – it came about when we were thinking on Katie’s app and its geolocation possibilities. Her work provides a cool interface to the xckd color survey, so we wanted to know if you could devise a scheme for mapping a named color.

..Basically we wanted to make poop jokes about people’s houses. And we succeeded!

This version of the map converts lat/lon data to and from HSV. Well, the “V” is fixed, actually. So not all colors are possible but this made it easier to control the map compared to the original hex-conversion-based scheme.

In HSV mode we take a value of latitude or longitude, and split it into hue and saturation by just dividing by one thousand. There are some conversions done such that the hue is within [0, 360) and the saturation stays on [0, 1]. More conversions keep limited to the bay area so we can have a good amount of granularity and don’t end up in the Pacific.

So select a few colors and see where you end up. Or click on the map and that’ll drop a marker with your colors. The original hex-system was more interesting but more random. Any change in the red would change the most significant digit in the lat or lon due to the ordering, RGB in the hex representation. So it jumps quite a bit and there aren’t really regions of similar colors. But the alternative HSV scheme is kinda predictable, so..it’s fun to play with both.

⊃ configuring submodules with nginx and linode

I wanted to setup a custom subdomain to serve static content on nginx and a linode. The nginx config file looked as you might expect.

In the Linode DNS manager, I had to setup the A records. And this last part tripped me up: I also had to delegate nameservers for the subdomain. I made ns5 and ns6.linode.com point to my new subdomain and it all worked out.

update

Ok, I think the nameserver stuff is wrong. I thought that was a fix but in the morning that didn’t work. So I removed the nameservers and it was fine.

That means all that’s necessary is to setup your A records and your nginx config. Assigning nsX.linode.com should not be done..

⊃ doubleyou

After building MEDuele in 2011, I returned to the Cal Health Hackathon to work with Steph, Patrick, Cyrus, Jas and Kristen on another idea.

One of the hackathon’s challenges was to build something that encourages the use of a bodymedia fitness tracker armband. These are wearable sensors that log heartrate and accelerometer readouts (footsteps) throughout a normal day. We also had a sample dataset from the armband but we wanted to do more than just create a typical dashboard of charts.

So we began thinking about how this might be presented to a child. We came up with a Tamagotchi-like system – an online avatar would represent the data coming from the armband and other sources. This ‘DoubleYou’ would be an easy way for a child to understand his or her progress towards fitness and dietary goals.

The site we demoed was at mydoubleyou.org but we no longer maintain that domain. Steph built some great avatars that react to parameters like fitness and mood. I helped build the backend, crunch the sample data and setup an API for the avatars to consume. The code is on github. And we ended up taking second place, woo!

⊃ codifying drone policy

reading this nytimes article

We’re codifying a policy of extrajudicial killing. (Also disturbing that there aren’t even internal protocols for this, despite the fact that drone strikes have killed thousands.)

⊃ clusterbus

During the ReRoute/SF hackathon, I helped build a way to examine bus-bunching with Boaz, Wai Yip, Trucy and Kevin. We were thinking about the really busy bus lines and how they sometimes get ‘clumped’ with two buses on the same line only separated by a few seconds.

This happens when one bus gets behind and the second driver adheres to his schedule. Some transit agencies will tell the second driver to pass or slow down. But SFMTA operates based on on-time performance, so the second driver is incentivized to stay on his schedule, regardless of the first bus’s trouble. Other agencies handle this a little better by optimizing ‘headway,’ or the spacing between the buses. Spacing is what riders really care about – on a crowded line, they just want to minimize their wait. Being precisely on schedule is less important when buses come every five minutes.

So we built a way to look at the headway of historical data on certain lines. We pre-processed lots of old GPS data to calculate the headway of buses, and we then generated some metrics based on certain periods of time. SFMTA operators said they might be interested in using these tools in a live operation so the dispatchers can better control the spacing of their drivers.

I worked on the frontend, bits of the API and some of the pre-processing. The main site is in a state of flux.. but the code is on github.

⊃ IVRhub

Voice-based data collection – read more on github.

⊃ stravanova

The real action’s at yosemitebandit.github.io/stravanova.

Stravanova plays and overlays gpx files. It can be one rider on multiple rides or multiple riders doing the same ride. The start times are normalized. A python script interpolates points such that the dataset has a position for every second. This gives time-accurate and smooth playback, although it will put your computer through its paces..

It’s on github.

⊃ erik the wall-plotter

Inspired by several other vertical wall-plotters we saw online, my buddy Will and I built Erik, a drawing robot.

erik

It uses two stepper motors to pull a pen around a canvas. But the pen cannot be lifted! So there are some interesting path-generation challenges. The software that interfaces with the controller board was written by us and it’s available on github.

⊃ SMS time capsule

This is a time capsule for your text messages. Send a text to (650) 830-0777 and, after some amount of time, that message will be sent back to you.

For example, a message like king of Sunol, the wildcat sauntered towards me %5d 3h 4m 10s would be sent back in five days, three hours, four minutes and ten seconds. Read more about it and see the source on github.

⊃ MEDuele

Me and my friends Steph, Will and Patrick built a multilingual nursing hotline and website, MEDuele, for the Cal Health Data Hackathon. We came up with the idea from Patrick’s experience volunteering in a free South Bay medical clinic. Patrick, being the Renaissance man that he is, worked as a translator for patients speaking Mandarin and Spanish. He noticed the clinic was always really busy but didn’t have enough resources to see that many patients. We thought of MEDuele as a proof-of-concept triaging system.

Patients could call the MEDuele hotline and essentially “leave a message after the beep” describing their ailments. To triage these patients, two teams of people would use the MEDuele webapp – the first would simply translate and transcribe the message when the caller spoke a language other than English. The second group, one with more medical expertise, would then recommend a course of action. And the group of translators could then call the patient back with recommendations. A screenshot of the webapp is below:

meduele screenshot

Of course there would be a lot of valid concerns around a patient’s private information – thankfully this was just a hackathon / proof of concept, and we didn’t have to delve into that. But we did get some good feedback from the experts on the judging panel, they found it interesting and encouraged us to pursue it as a company.

The source for the site is on github and there’s a short demo video of the site in action here on youtube. (Alas, I recently let go of the callmeduele.org domain.)

⊃ vinyl

Redwood is ostensibly a lighting company, but they also have a ton of sensor data. I built some APIs on a MongoDB backend for managing a slice of this data in the Redwood offices.

Vinyl was a javascript visualization of this data, greatly inspired by ‘Icicle,’ one of Nathan’s projects. Each circle is a sensor in the building – they light up when something moves within their detection radius. There was a barebones UI, vaguely inspired by youtube, that let people rewind to any part of the day or change the playback speed.

⊃ redwood cloud

This was one of the last projects I made while working at Redwood Systems. Redwood collects a huge amount of highly-resolved sensor data and this not-so-cleverly-named project aggregated the data on a ‘cloud’ platform.

I built a backend storage system with MongoDB and created an API and some authentication mechanisms for sending and receiving the data. I also created the flask-based frontend for viewing reports – it’s demoed in the video below.

A shorter, more ‘promo-ish’ video might still be up at redwoodcloud.com. But since leaving Redwood in early December 2011, I no longer work on that site.

⊃ Perkinsense

I designed and built room occupancy sensors for the Perkins library at Duke while I was a senior. The study rooms in the library were extremely popular and spread over four floors – this project helped people find ones that were unoccupied. Battery-powered and running on 8-bit AVRs, the sensors used a rather fickle WiFi module called the ‘WiFly.’

This was one of my first web projects – the backend was, uh, interesting. But there was a very basic web interface that showed the status of the ~20 rooms. The early prototype also also had a twitter bot, @thesmome (:

perkinsense web demo

Alas, the project is no more and code and schematics are lost to me..perhaps someone else has copies? This video gives a good overview of how things were. (And here’s another video with even more!)

⊃ DuTrack

I leveraged the OpenGTS project to create a low-cost bus-tracking system to Duke University when I was a senior. For about $150 per year, you can buy an unlimited data plan on some really cheap GPS-enabled cell phones. The OpenGTS project uses a small client app running on the phones to transmit position data back to the OpenGTS server, which I initially ran on a small desktop in my dorm.

I installed ~30 trackers to cover the whole fleet. A mechanic was nice enough to connect the trackers to each bus’s electrical system so they could stay on. I then modified the OpenGTS backend source to greatly increase the update rate so it would send coordinates about every 5s.

dutrack screenshot

⊃ bike light

a 555 timer-based bike light, based on a design by dan at instructables

⊃ scene design

I took a great class on set design – we built scene models for The Women of Lockerbie and other plays.

⊃ savonius turbine

I worked for the ‘turbine research group’ in a contest to create a wind-powered, mechanical energy-only, desalination prototype. We built some variants on the the savonius vertical-axis design to test the effects of different geometries. Our team eventually took ‘most-innovative’ at the WERC contest.

See some notes about what we were up to are here.

⊃ trebuchet two

With many lessons learned, and only a little pre-planning, I’ve launched into a second trebuchet build. I’m looking at a 10’ arm on an 8’ tall frame. Around 18 pieces will be bolted together to constitute the machine. Custom-poured concrete blocks serve as counterweight.

Here’s an early launch with about 120+ lbs of counterweight and a half liter bottle projectile – more of the same on my vimeo page.

⊃ portable theater

Duke was soliciting “good student ideas,” but this got no reply, hah.

⊃ portable refrigeration

We experimented with portable refrigeration as part of EWH. The goal was to make something small for sample storage. Pictured are some Zeer pots we were testing.

More on the wiki.

⊃ Duke community garden

I loved working at the garden – I got to design and build raised beds, a hoop house, a shed, a bat box, bluebird houses, compost bins and an experimental rainwater harvesting system. I surveyed the site and helped work with volunteers at other area gardens too (like the excellent Durham group SEEDS).

Here’s the garden’s north half:

DCG from above

Made so many friends here – Emily, Margareta, Daniel, Kellyn, Joe.. More photos are on our old flickr page here. Hope to be part of another garden someday!

⊃ scrap scale

When the electricity went off in Nkokonjeru I (finally) had an excuse to stop welding the gates for RASD, the NGO where we worked. Some other projects at the site needed to weigh things, so I tried making a scale.

scrap scale

I used a moment-balance system where a known volume of water(usually 500mL) would be shifted until it balanced the load on the other side of the scale. It’s somewhat similar to a physician’s scale for weighing people. I used a bike hub and some tough wood to form the scale’s body – scrap metal on the short end helped cancel the weight of the long arm.

When it was working, Abhinav Kapur and I estimated a 40g precision. I made the mistake of taking the balance apart before finishing a second, nicer version. Version two, of course, was never completed. Sigh. I’d like to put one together in the States and employ the convenience of so many nice power tools, it would be handy to have a scale around.

⊃ rasd

I spent the better part of the summer in Nkokonjeru working with friends and the awesome people at the Rural Agency for Sustainable Development. Led by Ignitius Bwoogi, RASD provides agricultural assistance, leads local health programs, and offers technical training to the community. RASD and Ignitius are very well-respected within the community and this was our group’s third year of working with him.

My buddy Will worked for many months to plan out some of the infrastructure projects that RASD believed would help the organization make money and better serve the community. We worked with them during the summer to create a welding and carpentry workshop, classrooms, and a fence.

gate welding

Ignitius’s largest goal was to set up an internet cafe at RASD – it would be the first in the community and could provide a steady stream of income to finance RASD’s other programs. I played a small part in helping Will set this up with solar-powered Inveneo computers and Meraki routers. I also worked on the biomass charcoal and jatropha bio-fuel projects which can also be found on this site.

⊃ jatropha

Kerosene for lamps and generators is incredibly expensive in Uganda, with some households spending 30% of their income on the fuel. We wanted to work with RASD and explore the possibilities of small-scale bio-oil production using Piteba presses. The presses used a candle to heat a hand-cranked screw press. We made quite a bit of peanut butter during testing in the States and the presses proved to work decently well with jatropha seeds. The seeds are poisonous if ingested so some safety precautions had to be observed. We worked with RASD to find a small quantity of ripe seeds but, after producing a small amount of oil, we were unable to ignite it in any sort of lamp apparatus.

Further research was needed on precisely when to pick the jatropha seeds and how long to store them prior to expelling. And we unfortunately did not even scratch the surface of building an organizational structure for collecting seeds in appreciable quantities (probably the more interesting problem compared to figuring out how to squeeze more oil). Dan Moss completed a great write-up of the work here.

jatropha press

The backstory for this project is pretty interesting: Uganda once had a booming vanilla industry and, vanilla being a vine, the many plants required a support structure. The jatropha plant was a popular choice for many Ugandan farmers as it grew very rapidly and was native to the area. Vanilla waned after some time (Madagascar’s production recovered, we were told) and people began noticing that the abundance of jatropha seeds were an excellent candidate for bio-oil production.

While in Uganda in 2007, my friend Lee Pearson organized a trip to Royal Van Zanten’s enormous flower-growing operation, just a short distance from the capital. Many flower exporters have greenhouses in Uganda because of the excellent climate and the small timezone difference between East Africa and Europe. Thousands of tons of flowers are flown to the Netherlands every week. RVZ was just in the midst of a large-scale jatropha operation intending to power their generators with jatropha-based biodiesel. They had previously employed several truck drivers to collect the seeds in surrounding communities and had recently planted every spare acre of their land with the bushy plants. We were given a tour of their industrial expelling and filtration process (led by Bas van Lankfeld, I believe).

⊃ the altiplano

I traveled to Obrajes, Bolivia in the summer of 2008 as part of an Engineers Without Borders team. (Patrick and Steph went too, woo!) The communities in the area are seasonally impeded by a river that swells with the rains and mountain runoff – crops cannot be taken to the nearby city of Oruro and children are sometimes kept from school by the waters.

obrajes crossing

Our small team took survey data of the river contours, tested the soil’s bearing capacity, and conducted numerous interviews with residents, all in preparation for a potential bridge construction project to take place in 2009. The report I created to summarize the MATLAB analysis of the survey data may be viewed here.

obrajes shepherd

This work generated a lot of reflection on issues of project sustainability and impact. Should US citizens be building infrastructure in Bolivia? I’m not so sure.. But Evo promised a bridge and had not yet delivered. And the communities made it clear that this was important. I’m no longer involved with the work since the summer’s end, but many Duke students traveled to the area in 2009 and completed a crossing.

Up-to-date information about the Bolivia team’s work can be viewed on the team’s wiki. And all of my photos from this amazing trip are online here.

⊃ plywood dug

Me and my friend Jaime built two canoes as gifts using Hannu’s ‘Dug’ design. These were awesome little boats – really stable and could hold about 250lbs before it was time to start baling.

boats on the water

⊃ biomass charcoal

My friends and I closely followed Amy Smith and the D-Lab’s research on biomass charcoal production. She was operating in Haiti, pyrolizing corn cobs to create a safe cooking fuel. Smoke-inhalation from indoor cooking fires is the number one killer of children in the developing world and smokeless charcoal is an ideal alternative to burning sticks or other matter. Producing traditional wood-based charcoal is becoming more difficult in Haiti, Uganda, and elsewhere due to rampant deforestation.

The D-Lab developed a small-scale pyrolysis process which we expounded upon in experiments in the States as well as in Uganda (see the photo below). Some notes on our work, collected by Dan Moss, are on the Duke Wiki.

biomass charcoal barrel

⊃ rainwater harvesting

Learned a tremendous amount from this one – I lived and worked for three weeks in rural Uganda with friends from the States (including Patrick, woo!) and with our hosts from the Central Buganda University. An assessment team had visited the community in 2006 and, working with CBU, determined that clean water access was an area where we could lend a hand. That group decided rainwater harvesting and a small hand washing campaign was the way to go.

Back stateside, I started conducting some rainwater harvesting simulations based on the size of the communities, some info on the catchment buildings and weather data from the area. This gave us estimates of supply and demand which we used to size several rainwater harvesting tanks (Uganda has one and sometimes two rainy seasons. Our goal was to store enough water to get people through the dry spells).

I got picked for the travel team and jumped in to the literature on the construction of RWH tanks, guttering systems, and other “accessories” that can help keep RWH tanks clean. The Warwick DTU was a tremendous resource – they have very accessible papers on RWH tech and we used a lot of their ideas. I designed some formwork for building a 2900 gallon octagonal tank with 4” concrete walls. We used the forms to pour three sections of two foot high walls on top of one another. Rebar and a wire mesh made up the roof. Here we are, just prior to pouring the second “lift.”

RWH form

So we ended up with one finished tank that continues to function to the present day (woo!). And my friend Patrick and I became pretty good friends with Kenneth, a really nice guy that has gone on to do interesting things all over Uganda. There were some, uh, challenges that I should probably note.. so in no particular order:

The tiny community of Kasaka already had a handful of rainwater harvesting tanks in various states of disrepair, which made me wonder why our team was building another one. For the sake of “experiential learning,” I was told to disregard Warwick’s published methods for building a concrete tank and create my own. It was clear the Ugandans we worked with knew how to build these things and our formwork method was atypical. We also just generally struggled in our communication with the local partner group. That was the biggest lesson: talk to the local group as often as possible, there really can’t be enough need-finding.

RWH tank complete

This was an incredible trip though, I ended up back in Uganda the next year and expect I’ll be there again at some point. More pictures from the people we met and our work are online here.

⊃ soldering station

Made a little soldering iron station – wrote it up on instructables.

⊃ ecg testers

I led a small team in the production of 70 ECG calibration devices. Feedback from Engineering World Health’s partners in the field indicated that these devices were very useful and that more were needed.

The hardware, designed by a student group of yore, created sinusoidal waveforms and emulated heartbeats to assist in the calibration of electrocardiogram machines.

I sourced the parts for these devices and organized weekly builds where the solder flowed freely. After in-house testing, we gave them to the EWH national group, where they were sent along with trained volunteers to Central American hospitals.

⊃ fire extinguisher lamp

Made a lamp from an old fire extinguisher with some friends, wrote it up on instructables.

⊃ weatherwood

My family and I tore down and then built back the walls of this 14x20 cabin. We had a pro do the roof cause that’s a bit hairy. It’s right on the blueridge parkway and the AT.

⊃ ultrasound induced hyperthermia

I worked on ultrasound-induced hyperthermia as a senior at NCSSM – it was a great taste of self-guided research. My project was to build a small apparatus that could heat something on the order of 100mL of water, and to predict the rate of heating based on the physical parameters of the setup.

Hyperthermia was thought to be an adjuvant treatment for some cancers – the heating you could achieve would aid in blood perfusion for better uptake of chemotherapies or could be used to activate certain chemo drugs in a region of the body.

⊃ trebuchet

My friends Ben and Jen and I made this at NCSSM during miniterm.

trebuchet firing

It’s on instructables! And Ben posted a video: