Category Archives: Fieldschool Student Posts

Of Maps and Rain: Third Week Down and We’ve got a Theme!

This was a tough week at the CHI Fieldschool with one person out due to a communicable illness and then myself getting sick on Monday thanks to my one hour, four mile hike to campus in the pouring rain followed by seven hours sitting in wet clothes with a slight chill, but we created a theme!  But I am getting ahead of myself, here is this week in fast forward:

We started off the week by learning about different types of data visualizations through mapping such as Mapbox and CartoDB.  Of these two mapping devices, I personally found Mapbox to be easier to use and I also preferred Mapbox’s visuals compared to CartoDB.  The layering function in Mapbox proved to be much more beneficial to my team during this weeks mapping challenge.  We were supposed to check out GeoCommons but a lot of people ended up spending too much time or getting stuck on CartoDB so we kind of skipped over it.

The challenge this week focused on mapping data provided by the University of Pennsylvania Museum of Archaeology and Anthropology or choosing to continue using National Park/UNESCO World Heritage Sites and mapping information provided there. Since we were feeling ambitious, my team decided to mix the focuses and create our project using information from both.  This is why Mapbox was more useful with the layering function because we could create one layer that included World Heritage Sites in Africa and then create another layer of artifacts.  We couldn’t include exact coordinates for the artifacts because they were listed in the UPenn Museum archives by provenience or sometimes just by a country so it was hard to find coordinates.  Therefore we attempted to just map the artifacts by density within a certain area and then compare it to UNESCO World Heritage sites to see if artifacts were coming from heritage sites.

The first project of the week took up a lot of time on our schedule, but thankfully Leaflet proved to be much easier to use than the other mapping functions.  The Leaflet project/challenge was also easier considering we were only asked to locate a specific city with a specific point and then create a small text box that popped up when you clicked on the marker for the location that described the location.  One thing I was disappointed in was that I could not create two text boxes for the two markers I created because we didn’t have the time nor skills necessary for such a feat.  I liked the look of the map styles, but I wish there were more bright and fun styles that we could access without subscribing to the site.

The last major tutorial we got for the week was Tableau.  While I think the look of Tableau is great and it is relatively easy to use, I continually came across problems that may or may not have been due to using Windows 8.  I could use pre-made data sets and then mess with the functions to alter them but I could not upload ANY of the example sets.  If anyone randomly comes across this and has dealt with this issue, I would super appreciate some hints.  I think Tableau could be really useful with it’s multi-visual presentation and functionality, but I am worried about being able to use 97-2003 Xcel files which may be necessary later on.

Then we came up with a project idea of which I don’t want to share~!  I can’t stop others from sharing but I would much rather launch our site at the end and have it be surprising and new for everyone to look at!

Getting to know your data

This week I was on a project team that learned an incredibly important lesson about creating data-driven visualisations:  get to know your data really well before you get started. Any visualisation you build is considerably sculpted, not only by the meaning implicit the data, but also by how the data has been captured. The issue we faced was trying to explore a story that depended on finely granulated geocoding, but we only realised too late that our data reflected a ‘coarser grain’ of location. For another visualisation this would have been perfect – our issue is no reflection on the quality of the (super cool) data we were using – but, in our context, the data clearly didn’t function with the narrative we wanted to construct and the meaning we wanted to convey. At the last minute we had to re-think our visualisation, and explore a completely different facet of the data. Though we managed to get a new project finished, the process made for some frazzling moments and a late night.

What makes ‘getting to know’ your data difficult?

Time pressure

We were working with 8000 records under significant time pressure, so wanted to dive straight into the building. But, though we began with a great idea for what we wanted to explore, our hurrying meant we didn’t take the time to carefully assess how the location data might function on a map; or how it might relate to the other fields we were trying to represent.  It sounds painfully simple but on the next project, I would take the time early on to speculate on these subjects. With a deadline looming, it would be less hectic to miss out some final features than have to re-think the visualization.

Starting with an undefined idea of what you (ideally) want to do explore.

It’s oxymoronic: the final story and visualisation must appropriately reflect how the underlying data has been captured; but, in order to assess your data you must have a clear idea of that story and the visualisation you want to build. We began with a general idea of what we wanted to convey. This meant that, while we conjectured about how it could look and work, we didn’t sharply envision the vital features needed to convey that idea. And… you can’t always work it as you go along. I think next time I would follow an iterative process: start with a clear idea, assess what foundational features demonstrate that effectively, check the data is capable of that, reshape the idea and so on. Again, taking the time for this at the beginning saves a whole world of coffee and confusion at 11pm.

Applying this lesson outside of the Fieldschool?

The data I work with on a day-to-day basis focuses on Wellington’s print history. My biggest dataset is spreadsheet upon spreadsheet containing details about late 19th-century printers, publishers, booksellers and engravers. It is meticulously geocoded and able to be sliced by year or print service. At this stage, some records are geocoded by numbered street address but some only by street name. From this week’s project I can easily tell that there are limitations to what this data can tell me or how it will convey meaning most effectively. It could tell you an interesting story about the frequency of available print services in individual streets of the city. But – at this stage I would not be able to create a network diagram that links individual addresses.

Digital Technology & The ‘Age of Surveillance’

Many of the news articles in the past week have left me dazed and confused. When Edward Snowden’s name was released as being the triggerman for leaking the NSA court order to collect massive databases of US Verizon Business communications ‘metadata’, the media went into a frenzy. Almost instantaneously, investigators were probing the personal and professional memoir of Snowden – using surveillance of his public/private communications, of course. The irony of this almost makes me laugh, but reifies the ease of access into personal intellectual property.. [especially with digital surveillance technology]

I am grateful to be in a field of study [Anthropology] where I was able to recognize this threat before it became public knowledge – but that’s exactly what scares me. The United States’ Department of Homeland Security, Federal Bureau of Investigation and National Security Agency have been making similar court orders for the past seven years, right under our noses.

I fear our future will become not too far from the science-fiction worlds of ‘The Minority Report’, ‘Blade Runner’, and George Orwell’s ‘1984’. As you read this, these secret surveillance court orders continue to be implemented and Verizon endlessly fills their databases. Our time to question not only the effectiveness, but also the constitutionality, of the Obama administration’s actions is now.

Prior to starting the fieldschool this summer, I was aware of the ease of ‘hacking’ – or at least that’s what my computer-savvy friends told me. Now, as I begin to become more knowledgable on the back-end of the Internet, my fear of abuse [by the government] of this sophisticated technology has only risen. The world we live in is a world of technology and technological advancement; there doesn’t seem to be an end in sight, nor a reduced speed in progress.

Literacy of technology does not only give you a step ahead of cohorts, but it also allows you to fully understand the capabilities of our government’s surveillance technologies. This, to me, is a crucial awakening.

I would also like to share the breaking news article The Guardian published last Wednesday [click on link]

Codecademy and beyond

I agree with my cohort, Richelle: Codecademy was finicky and frustrating. However, I am grateful to have had some preparation before the Field-school to play around with HTML and JS. HTML is fairly straightforward, like learning a new language. .. the phrases one needs to find the bank, restroom, water, and ice cream. The fundamentals, but JS is like learning a new grammar and that takes time to apply the rules correctly.

Codecademy is always there and available to reference at ones fingertips. It set an expectation for the CHI experience. During the first week I was intimidated by the technology in spite of the realization the only way to learn is to dive into the alphabet soup of HTML and JS.  During the first week of CHI, one was required to have faith in the process and trust where our instructor was leading us. While we had opportunities to play with HTML and JS, as well as WordPress, GitHub, Bootstrap, and Text Wrangler, just to name a few, we were informed of their functions and purposes. Concurrently, we made things which allowed us to embody what we were learning from knowledge shared from our instructor and codecademy.

Each day was different: frustrating, satisfying, exciting, fatiguing, and we as learners were invited to share our experiences in real-time which for me, empowered me to keep trying….as I hopefully emerge out of the analog mire.

CHI for a Historian-in-Training Part 1: From Primary Document to Data

As a history graduate student, I wanted to blog mainly about how this fieldschool has influenced the way I approach historical research and thinking.

Last summer I had the chance to travel to the colonial archives in Aix-en-provence, France (Archives nationales d’outre mer or ANOM) to get a taste of ‘primary docoument archival research.’ Armed with a digital camera, a macbook, and a French dictionary, I bumbled around the archives, attempting to mirror the sense of confidence and purposefulness that other scholars seemed to have. After a month of 9-5’s at the archives (and evenings of pastis and concerts in Aix), what did I have to show for my dedicated data-collecting? Over 3,000 poorly labeled digital photos, an incomprehensible excel sheet of ‘important!’ records, and the overwhelming sense of gloom that I would never get through the endless number of primary documents needed to do my research.

 

My Excel Notes: I know this made sense at one point...

My Excel Notes: I know this made sense at one point…

After a speedy bootcamp introduction to data this week, I now realize the incredible importance of creating a sensible workflow and metadata structure as I am doing archival research. Historians don’t necessary call this process ‘data-collecting,’ but looking at the process that way could be an useful way to save time and not feel overwhelmed. The topics this week didn’t necessarily address organization and workflow, but in our discussions about cleaning/scraping/visualizing data, it reminded me to think about the basic components needed to produce good data.

1. Organization is Key

For historians,’data-collecting’ is akin to semi-purposefully/randomly reading old documents with a theme in mind. With the ease of digital photography, OCR, and more and more online databases, many scholars including myself fall into the trap of ‘over-collecting.’ Although over-collecting can be helpful to make more thorough and better supported arguments, your data won’t be of any use if it’s not organized. Most simply you need 1) a place to store the metadata (data that gives information about other data, e.g. title, author, date, publisher) of each record and 2) a way to insure you can find the original file. Some scholars do this by an excel sheet and subfolders in their harddrive. I have personally used Zotero to input my metadata and Dropbox and Mac Timemachine to continually back up my data. Although it might seem to take a lot of your time, detailed recordkeeping will prove useful when you return from field research and begin your writing stages.

  • Other useful pre-archive tips to keep in mind : Link
  • A full guide to archival research including a ‘record keeping sheet’ that can be an example for your metadata schema: Link

2. How can a Historical Text translate to Data?

In my own research on Vietnamese travel stories, I deal with a lot of narratives and reports that don’t automatically translate into ‘hard’ data that can be easily visualized or manipulated. Like other disciplines that do close readings of texts and qualitative analyses, history can seem antithetical to large datasets and quantitative analyses. However, data-oriented methods such as ‘text-mining’ seem to be making their way into changing traditional humanistic inquiry and research. Essential to analyzing large and small data sets is the actual collection of metadata that describes the object. This is no simple task though because it also involves the larger question, “what do you as a researcher want these objects to say/show/prove/demonstrate?”

 

Tourism in Indochina Travel Brochure

I have just submitted to my committee my thesis titled “Where People and Places Meet: Travel and the Spatial Identities of Indochina, France, and Hue in 1920s-1940s Vietnamese Print,” where I examine tourism advertisements, socio-cultural reports, and travel stories, or du ký, to understand how travelers ideologically ‘mapped’ places with cultural, colonial, political, personal significance through the publication of their travel experiences. As you can tell, this study was quite textual and theoretical in nature.  Even though I have extensively read and analyzed these texts, I did not extract these sources for data in a consistent method. I started to reflect how I could input relevant components into a table (such as traveler name, gender, age, group size, destinations, and transportation), and in doing so, I have already begun to look at my research in a different way. I asked basic questions such as “How do these texts relate? How are they different?” to “Are there more isolated journeys or group journeys? What are the primary modes of transportation represented?” 

I am currently brainstorming different ways of translating these textual representations of movement into a visualization, such as a map of the popular travel routes with a temporal component to understand global events and transportation developments. Hopefully by next week’s blog post I will know a lot more about visualizing data and space and get a better sense of my project.

 

Creative Commons

I first learned about Creative Commons licenses last semester when I took Anthropology 370 with Ethan. We were encouraged to read up on what the licenses were and choose the one that suited us best for our class blog posts. I read up on the options then but until now I hadn’t actually read up on the real ideas behind Creative Commons. The internet has really changed the way we look at creative works. These days works are a lot more open to collaboration. People can combine, re-mix or re-use —. While copyright law is still very important, it’s not very well-adapted to the technology of today. I think creative commons fills this gap.

Creative Commons allows for custom licensing that is much more flexible than traditional copyright law. While traditional copyright still has its place, Creative Commons is great for may works, especially online. Creative Commons does raise some interesting questions about intellectual property. To what degree is your work your own? How much access should others get to it? How can other use your work? The most open creative commons license allows for others to do pretty much anything with your work as long as they credit the original source to the creator. It could even be distributed commercially. This is a far cry from traditional licensing. The story Ethan told about his own book that was released online under creative commons licensing is what really got me thinking about this. The fact that someone distributed his book commercially in a slightly modified form, and that it was fine under the licensing seemed so odd to me. While I like the idea of things being open source and open to remixing by others, allowing others to use it commercially is a step too far for me. However, that brings up one of the coolest aspects of Creative commons. You get to choose exactly how others can use your work. So this blog post can be (and is) available for remixing but not commercial use.

Command Line Mastery, Not Yet

Oh, the “risks” of data. Data is always mediated, they say. I want to challenge this sort of assumption. I think that the most efficient way to get at this is to say that there is a reverse “risk”, a sort of ideology, that wishes itself worthy of combing through the rough terrain of implicit and methodological biases, political spin and other data diseases. Certainly, I don’t hope to hold anyones cautions against them, but this kind of dubiousness can tend to stymie creativity.

Data is also the process of transformation. It is recursive. And if the privileging of the measurable defects in data leads to the failure to register the immeasurable, that is recursion playing out at the level of a faltering analysis. There is always the danger, that is, of falling into the trap that any data analyst sets him/herself. Recognizing that there are a mountainous amount of bias and spin that is outside of the purview of anyone in the room is the first step in getting beyond this. It may not make people happy to (ex)pose their humility, but I hold it to be a pretty important part of best practice, and one that the Fieldschool should adopt.

Of course, interdisciplinarity is a hedge. So is having a few experts around. Navigating the networks of authority/expertise is an important skill in this regard. That said, I think that we, as 10 slightly aimless-at-times students, have been woefully equipped to use technology in a way that is productive to this kind of exchange. I can’t stress enough how this is a byproduct of a sclerotic university system, but lol, I graduated. And granted, a few weeks isn’t any kind of time to practice, but experimentation is possible.

I would propose here some kind of elaborate lifehack if it weren’t for the above constraints. Dialogue is obviously, obviously a good. And sharing ones links, annotations and what one cares about is crucial to setting up autonomous (very much describes us) places of hacking and mastery. The title of my post refers to the command line. I see it as a sort of heuristic where people who admire simplicity are composing knowingly under the restrictions of the command line. For the sake of this posts readability, I will just tell you that the command line is anything but simplistic. Sure, there is utility to restrictive environments: the change of pace, the occluding from distractions. But it is the purity of the command line interface that works to obscure the complexity of its use.

THE 5 W’s AND 1T OF COLLABORATION

Art of CollaborationCaptain Primate (a.k.a. Ethan Watrall), ringleader of the 2013 CHI Fieldschool emphasized on the first day of the fieldschool that as participants we would be enculturated into the CHI domain through modeling and experimentation with standards and practices of the sector.  Theoretical knowledge and practical application are important components of any discipline.  In transdisciplinary arenas like digital heritage informatics and curation, collaborative processes require soft skills, resources, and networking between institutions and teams of individuals of differing cultures, personalities, styles of communication, and levels of expertise.

Stephen Dale, Collaborative Behavior (2012)1

What does collaboration entail?  The U.S. Forest Service2 identified the following elements as important aspects to working jointly on a project:

  • Leverage differences in strength, knowledge, and power on behalf of the collective to build the capacity to achieve objectives.
  • Support equal participation, even when there are differences in power, authority, and responsibility.
  • Focus on finding common ground and a willingness to live with and learn from decisions.

The Hack Library School website recently featured an interesting post by Paul Lai entitled Praxis and the Perennial Conflict Between Theory and Practice in Library Education.  In this post, Lai touches on the significance of collaboration as a basis of library information science practice.  I believe the same standards apply, by extension, to cultural heritage and memory institutions.

This week we concentrated on applying phases of project management and visualization of time.  Working through the process of launching a project from prototype to implementation of design.  We developed vision documents, wireframes, and work plans; utilized open-source tools such as the HTML/CSS framework, Twitter Bootstrap and the web-hosting site GitHub. We also created timelines using the JavaScript library, Timeline.js.  All of our assignments this week were accomplished through collaborative effort. Stephan Dale, founder of Collabor8now, Ltd summed up the collaborative process best, stating “the most important requirement of collaborative behavior is T-R-U-S-T.”1

1Dale, S. (2012). Collaborative Behavior. KIN Summer Workshop: Knowledge and Innovation Network. Retrieved from http://www.slideshare.net/stephendale/collaborative-behaviours

2 U.S. Department of Agriculture. (2013). Partnership Resource Center- The Art of Collaboration.  Retrieved from http://www.fs.usda.gov/wps/portal/fsinternet/!ut/p/c4/04_SB8K8xLLM9MSSzPy8xBz9CP0os3gjAwhwtDDw9_AI8zPwhQoY6BdkOyoCAPkATlA!/?ss=119979&navtype=BROWSEBYSUBJECT&cid=null&navid=121100000000000&pnavid=121000000000000&position=BROWSEBYSUBJECT&ttype=main&pname=Partnership%20Resource%20Center-%20The%20Art%20of%20Collaboration

Shhh don’t tell…We’re pulling ourselves up by our Bootstraps

This week the field school dove into the wonderful world of Bootstrap. I’ve played around with a few templates before, but I knew much less html, CSS, and javascript than I do now. It’s really great that Twitter and their developers released their code so other people can play with it. I particularly liked Bootstrap because I think front-end webpage design is one of the most important pieces to a project. Yes, the data and the database and the underlying programming has to work — and work well — but I am of the belief that no one is going to want to use a site that looks terrible. I myself have quit playing games (regardless of the story) and abandoned websites (no matter how useful the content) when I couldn’t stand the design. If I don’t like the way it looks, I won’t use it. Bootstrap helps website and project developers solve at least the basic problems of design when they don’t have more advanced knowledge to do it themselves.

Modifying a template is so much easier than trying to write “from the ground-up.” I tried the latter a few weeks ago, and while I think I can say it wasn’t ugly — it definitely wasn’t smooth, beautiful, or giving off the appearance of being created by anyone with real skill. But Bootstrap makes it pretty simple to modify the html and CSS to alter the appearance of the site, so long as you understand the basics of programming. And honestly, even if I wanted to create a brand new site without the underlying Bootstrap, I think it would be easier for me to start from Bootstrap and just continue altering it until it became completely different and mine — because I like to see how the code I’m writing is actually changing things and that’s much harder to see when writing from scratch.

The only issue I have with Bootstrap is that it is so popular and I’ve started noticing it throughout the web, particularly on the list of example projects that Ethan has shared with us. And yes, I know it’s kind of snobbish to say I don’t want people to know I’ve used Bootstrap when I did use Bootstrap — but isn’t it a good goal to aim to make my use of Bootstrap less obvious? That would mean I’m writing code of my own, right? An important element of artistic design is to produce something unique and beautiful. I just wish I was talented enough to do that, but until then I’ll just build off Bootstrap and hope no one notices.

Long time listener, first time hacker

Last weekend I spent a fun few hours following Australia’s 2013 #GovHack on Twitter.  Like the name suggests, this event aimed to encourage “open government and open data” by inviting teams to “mashup, reuse, and remix government data” at meetings held across the country. Unsurprisingly, there were some wonderful results. The theme of open data and reuse resonated strongly at the Fieldschool this week as we practised finding, extracting and manipulating not just Government statistics but any open and available data. I had thought that the technical skills needed to remix data from the web were out of my reach because I wasn’t a programmer but happily the Fieldschool proved me totally wrong. How did this happen?

1. Finding data is surprisingly simple.

Many organisations give away data in formats that are easy to interpret

Many governments (including the US, UK and New Zealand) provide giant datasets for people to reuse. Meanwhile, an increasing number of museums, galleries and online repositories are opening their data doors too. Often, all it takes is roaming around a website to find the ‘download data’ option. On top of this, data is often provided in formats that people can easily understand: a CSV file is no more complex than spreadsheet. I would hazard a guess that simply knowing useable data exists and that it can be, often, easily understood dismantles the first significant barrier to reuse.

APIs are incredible

Learning about APIs felt like being given the keys to the castle because they allow you to reuse data on-the-fly. To my non-programmer mind, APIs took a while to understand because you can only really ‘get’ how they work on their own terms (culprit #2: JavaScript functions) and the process of requesting data dynamically is more complex than downloading it once-off. APIs come in many flavors too, so you aren’t assured the same request and response format every time. But this week we learnt the basic recipe and despite the increased complexity I would never hesitate to use an API: I at least feel confident that I can figure it out.

You may be able to scrape it

Scraping is the process of extracting unstructured data from an HTML document (i.e. webpage) and structuring it so that it can be manipulated for visualisation. Our technique was so straightforward that all we needed was a Google spreadsheet and tabular data from Wikipedia. I did learn that it is not a fail-proof technique: my spreadsheet went a little bit haywire when I tried to scrape this table later on. (Bonus points for anyone who can figure out why it didn’t work).

2. Cleaning data is surprisingly fulfilling.

While we learnt that cleaning data is crucial to successfully reusing it, opinions vary on how enjoyable this process is. Using a powerful tool like OpenRefine meant that I was surprised at how enjoyable it was. If you enjoy meticulous activities like jigsaw puzzles or knitting then take my word for it: cleaning data is genuinely absorbing.

3. Meanwhile: Data licensing is incredibly important

One important point we learnt this week (if not from the Fieldschool, then from the media) is that data is not neutral or free floating. When remixing, you have to be aware of use limitations placed by the person providing the data.  But, even then, licensing is not the impenetrable brick-wall that you think it might be. Navigating licensing can be as simple as familiarising yourself with the Creative Commons. A handy tip for the remainder of the Fieldschool is that visualisations are derivative copies.

3. Knowing what you want to do with data becomes wonderfully obvious

The last exciting discovery of this week is that I actually have ideas about what I’d like to make. I thought I’d have ‘hackers block’ about what to do with data, but I’m relieved to discover that’s definitely not the case. As soon as we learnt about various sources and techniques for extracting data, 1000 ideas appeared from nowhere. It obviously just took learning about what was possible for my mind to leap into action.

Essentially, while I can’t step out and immediately build the world’s best data-driven app, this week has proved that many of the barriers to remixing data I’d anticipated are roughly a day’s worth of (hard) concentrating away from being dismantled.