Thursday, January 17, 2013

Using go to unmarshal json lists with multiple types

Everyday I seem to be writing go code to parse a json string, and this problem seems to come up often enough for me to write about it. Thanks to adg and asoko on #go-nuts for their suggestions.

The Problem

Given a list of json objects of different types (lets say People and Places). You want to Unmarshal them into two lists. A list of all the people and a list of all the places.

A bit more definition

Let's use this json string
{
    "things": [
        {
            "name": "Alice",
            "age": 37
        },
        {
            "city": "Ipoh",
            "country": "Malaysia"
        },
        {
            "name": "Bob",
            "age": 36
        },
        {
            "city": "Northampton",
            "country": "England"
        }
    ]
}
To help us write some code, let's give ourselves a function, which should be self explanatory:
func solution(jsonString []byte) ([]Person []Place) {}
And some structures
type Person struct {
 Name string
 Age  int
}

type Place struct {
 City    string
 Country string
}
I've got two solutions to this problem. I would love to know of better ways.

SolutionA: map and type assert

If we tell json to unmarshal into a map we can get it to deal with the parts we know about, and the rest of it will go into an interface{}. As we loop over the json structures we use what we do know about the structures to pass the interface{} to some helper functions what will create one of our structs and add it to our list. Because the map we take in is a map[string]interface{} we will need to type assert our values
func solutionA(jsonStr []byte) ([]Person, []Place) {
 persons := []Person{}
 places := []Place{}
 var data map[string][]map[string]interface{}
 err := json.Unmarshal(jsonStr, &data)
 if err != nil {
  fmt.Println(err)
  return persons, places
 }

 for i := range data["things"] {
  item := data["things"][i]
  if item["name"] != nil {
   persons = addPerson(persons, item)
  } else {
   places = addPlace(places, item)
  }

 }
 return persons, places
}

func addPerson(persons []Person, item map[string]interface{}) []Person {
 name, _ := item["name"].(string)
 age, _ := item["age"].(int)
 person := Person{name, age}
 persons = append(persons, person)
 return persons
}

func addPlace(places []Place, item map[string]interface{}) []Place {
 city, _ := item["city"].(string)
 country, _ := item["city"].(string)
 place := Place{city, country}
 places = append(places, place)
 return places
}

SolutionB: Mixed Type struct

This solution involves creating an interim struct which can be used to represent either a person or a place
type Mixed struct {
 Name    string `json:"name"`
 Age     int    `json:"age"`
 City    string `json:"city"`
 Country string `json:"country"`
}
With this struct we can then unmarshal our json string into a list of these mixed types. As we loop over our Mixed structs we just need to examine each one to work out which type it represents, and then build the right struct from it
func solutionB(jsonStr []byte) ([]Person, []Place) {
 persons := []Person{}
 places := []Place{}
 var data map[string][]Mixed
 err := json.Unmarshal(jsonStr, &data)
 if err != nil {
  fmt.Println(err)
  return persons, places
 }

 for i := range data["things"] {
  item := data["things"][i]
  if item.Name != "" {
   persons = append(persons, Person{item.Name, item.Age})
  } else {
   places = append(places, Place{item.City, item.Country})
  }

 }
 return persons, places
}
These are just two ways I've used to solve these problems, I'd love to know how others have done it.

SolutionC: json.RawMessage (Updated 18Jan13)

Thanks to Jordan's comment and zemo on reddit there is another solution. Using the json.RawMessage structure in the json package we can delay unmarshalling the json structures in the list. We can then go through our list and unmarshal each of them into the correct type
func solutionC(jsonStr []byte) ([]Person, []Place) {
 people := []Person{}
 places := []Place{}
 var data map[string][]json.RawMessage
 err := json.Unmarshal(jsonStr, &data)
 if err != nil {
  fmt.Println(err)
  return people, places
 }
 for _, thing := range data["things"] {
            people = addPersonC(thing, people)
            places = addPlaceC(thing, places)
        }
 return people, places
}

func addPersonC(thing json.RawMessage, people []Person) []Person {
    person := Person{}
    if err := json.Unmarshal(thing, &person); err != nil {
        fmt.Println(err)
    } else {
        if person != *new(Person) {
            people = append(people, person)
        }
    }

    return people
}

func addPlaceC(thing json.RawMessage, places []Place) []Place {
    place := Place{}
    if err := json.Unmarshal(thing, &place); err != nil {
        fmt.Println(err)
    } else {
        if place != *new(Place) {
            places = append(places, place)
        }
    }

    return places
}
Here's the full gist:

12 comments:

  1. I generally use the json.RawMessage type here. You can unmarshal the original tying to a slice of json.RawMessage structs, each of which can be unmarshaled conditionally to the corresponding type.
    now... the article doesn't state the desired ending output. It doesn't, for example, tell us if we're interested in preserving the order of the input, or if we should separate the structs into slices of their associated type. Since the blog post examples performed the later, I will as well.
    anyway, I'm deeply suspect of anything that involves repeating the name of the field and doesn't use the struct tags. That's going to be very prone to regression bugs. I would just unmarshal each item to a json.RawMessage, and then unmarshal each of those with each of the potential types, checking for validity along the way. It's not the most efficient method, but it will detect errors along the way, it's easy to understand, and it's going to be fast enough for most use cases.

    http://play.golang.org/p/IqWLPtqhEf

    ReplyDelete
    Replies
    1. Hi Jordan,

      Thanks for pointing out json.RawMessage I'd missed that. Very useful

      Delete
  2. I'm curious... what code is outputting a single list of different types? I'd go fix that code so that it doesn't do that before I tried to write code to handle the poorly generated JSON. I know that's not always possible... but, geez. Who does that?

    ReplyDelete
    Replies
    1. This is when you need it

      {
      "type":"Person",
      "objects":[per1, per2],
      }

      and

      {
      "type":"Place",
      "objects":[city1,city2,...],
      }

      yes this does happen. RawMessage is great!

      Delete
  3. I agree with Nate. If you control the server, then you should return a more sensible JSON response.

    {
    "people": […],
    "places": […],
    }

    ReplyDelete
    Replies
    1. That does seem to be the ideal, however the exercise is more fun if you don't have control of the server

      Delete
  4. You might also be interested in poking around with my jsonpointer stuff. It's *really* fast and allows you to grab chunks of a json []byte without parsing it into a proper structure. It'll also list all possible jsonpointers within a []byte (assuming it's valid json).

    https://github.com/dustin/go-jsonpointer

    ReplyDelete
  5. I have a fourth solution, here is the source code and explanation: https://github.com/tlehman/json_unmarshall_blogchallenge

    It uses dustin's jsonpointer library to scan the json byte array and search for 'name' or 'city'. If the former is found, it makes a Person, if the latter is found it makes a Place, otherwise it kicks out of the loop.

    ReplyDelete
  6. Solution 3 will fail if person and city will contain common field for instance "notes":

    {
    "things": [
    {
    "name": "Alice",
    "notes": "fat",
    "age": 37
    },
    {
    "city": "Ipoh",
    "notes": "boring place",
    "country": "Malaysia"
    },
    {
    "name": "Bob",
    "notes": "likes icecream",
    "age": 36
    },
    {
    "city": "Northampton",
    "notes": "none",
    "country": "England"
    }
    ]
    }

    go run multiple_types.go
    4 4

    ReplyDelete
  7. There is a Solution D.

    As it turns out, if you know what you're expecting in your array, you can use `interface{}` still - here's an example:


    package main

    import (
    "encoding/json"
    "fmt"
    )

    type testStructA struct {
    A int `json:"a"`
    }

    type testStructB struct {
    B int `json:"b"`
    }

    func main() {
    data := []byte(`[{"a": 1}, {"b": 2}]`)

    s := []interface{}{&testStructA{}, &testStructB{}}

    err := json.Unmarshal(data, &s)

    if err != nil {
    fmt.Println(err)
    return
    }

    fmt.Printf("%#v\n", s)
    fmt.Printf("A: %#v\n", s[0])
    fmt.Printf("B: %#v\n", s[1])
    }

    ReplyDelete