Crowdsourcing Knowledge by Simon Lindgren

I recently published a paper in Culture Unbound about scholarly references to Wikipedia.

Crowdsourcing Knowledge: Interdiscursive Flows from Wikipedia into Scholarly Research

Information increasingly flows from smart online knowledge systems, based on ‘collective intelligence’, and to the more traditional form of knowledge production that takes place within academia. Looking specifically at the case of Wikipedia, and at how it is employed in scholarly research, this study contributes new knowledge about the potential role of user-generated information in science and innovation. This is done using a dataset collected from the Scopus research database, which is processed with a combination of bibliometric techniques and qualitative analysis.

Results show that there has been a significant increase in the use of Wikipedia as a reference within all areas of science and scholarship. Wikipedia is used to a larger extent within areas like Computer Science, Mathematics, Social Sciences and Arts and Humanities, than in Natural Sciences, Medicine and Psychology. Wikipedia is used as a source for a variety of knowledge and information as a replacement for traditional reference works. A thematic qualitative analysis showed that Wikipedia knowledge is recontextualised in different ways when it is incorporated into scholarly discourse. In general, one can identify two forms of framing where one is unmodalised, and the other is modalised. The unmodalised uses include referring to Wikipedia as a complement or example, as a repository, and as an unproblematic source of information. The modalised use is characterised by the invocation of various markers that emphasise – in different ways – that Wikipedia can not be automatically trusted. It has not yet achieved full legitimacy as a source.

View Article

Symbolic interactionism, social networks and social media by Simon Lindgren

The most recent volume (43) of Studies in Symbolic Interaction includes a chapter by me and Annette Markham. 

From Object to Flow: Network Sensibility, Symbolic Interactionism, and Social Media

This article discusses how certain sensibilities and techniques from a network perspective can facilitate different levels of thinking about symbolic interaction in mediated contexts. The concept of network implies emergent structures that shift along with the people whose connections construct these webs of significance. A network sensibility resonates with contemporary social media contexts in that it focuses less on discrete objects and more on the entanglements among elements that may create meaning. From a methodological stance, this involves greater sensitivity to movement and connection, both in the phenomenon and in the researcher’s relationship to this flow. The goal is to embody the perspective of moving with and through the data, rather than standing outside it as if it can be observed, captured, isolated, and scrutinized outside the flow. Rather than reducing the scope, the practice of moving through and analyzing various elements of networks generates more data, more directions, and more layers of meaning. We describe various ways a network sensibility might engender more creative and ethically grounded approaches to studying contemporary cultures of information flow.

Targeting transparency by Simon Lindgren

This new paper by myself and Luke Justin Heemsbergen was recently published in the Australian Journal of International Affairs.

The power of precision air strikes and social media feeds in the 2012 Israel–Hamas conflict: ‘targeting transparency’

This article analyses the evolving uses of social media during wartime through the IDF (Israel Defense Forces) Spokesperson Facebook and Twitter accounts. The conflict between Israel and Hamas-affiliated groups in November 2012 has generated interesting data about social media use by a sovereign power in wartime and the resultant networked discourse. Facebook data is examined for effective patterns of dissemination through both content analysis and discourse analysis. Twitter data is explored through connected concept analysis to map the construction of meaning in social media texts shared by the IDF. The systematic examination of this social media data allows the authors’ analysis to comment on the evolving modes, methods and expectations for state public diplomacy, propaganda and transparency during wartime.

Extracting hashtag co-occurrences in a timeframe with Python by Simon Lindgren

Been working hard the last few days on my programming skills. We are doing a paper about the #idlenomore hashtag analysing co-occurrences between that and other hashtags in activist tweets. To incorporate a time dimension in the network analysis, I wanted to extract those co-occurrences that happen within a certain timeframe. I finally came up with this code, using Python Pandas, to keep only pairs of hashtags where the hashtag that comes before the second one has been present in the dataset for less than time T.

import pandas

## Set column names

## Read csv adding column names
## The csv must be formatted like: 
## date;item1;item2
data = pandas.read_csv('/path/file.csv', names=colnames)


## Create a dataframe with info 
## on dates for first column
pubdates = data[['Date', 'Item1']]

## Sort the dataframe by Date and 
## keep only the earliest occurrence of a value
## drop_duplicates considers the column 'Item1' 
## and keeps only the first occurrence
pubdates = pubdates.sort('Date').drop_duplicates(cols=['Item1'])

## Create a dataframe with co-occurrence pairs 
## and the pubdates of Item1 in each pair
timematrix = pandas.merge(left=data, right=pubdates, left_on='Item1', right_on='Item1')

## Rename some of the columns for clarity
timematrix = timematrix.rename(columns={'Date_x':'Coocdate', 'Date_y':'Item1-pubdate', 'value':'Item1 (check)'})

## Sort them
timematrix = timematrix.sort(['Coocdate','Item1-pubdate'], ascending=False)

## Add a column calculating the "age" 
## of Item 1 on the occasion of each co-occurrence
timematrix['Item1-age'] = timematrix['Coocdate'] - timematrix['Item1-pubdate']


## Set a timeframe
timeframe = 1

## Extract only the rows where the 
## "age" of Item 1 is less than or
## equal to the user defined timeframe
mask = (timematrix['Coocdate'] - timematrix['Item1-pubdate'] <= timeframe)
keptpairs = timematrix.loc[mask]

## Output kept pairs to a file
dataset = keptpairs[['Item1', 'Item2']]
dataset.to_csv('/path/dataset.csv', sep='\t', encoding='utf-8', index=False, header=False)

Insert something between something using Regex by Simon Lindgren

When using Regex for text processing, for example cleaning up or otherwise preparing full text data for various forms of content analysis, this is a great trick. Let's say you want to find any places where a certain type of thing comes before some other type of thing and you want to insert something between them. Your data may look like this:

1976 dog whatever whatever
1981 rat something something
1995 gorilla hello hello

For some reason, let's pretend you want to insert a tab between the years and the animal words. Now, if you would use Regex to search for any 4-digit string followed by a space and any blank separated word

[0-9]+ [a-z]+

you would match what you're looking for, but now how to paste the thing you found back into the data but with something added? If you told your editor to find [0-9]+ [a-z]+, and replace iwith \t (the tab you want), your data would look like this:

whatever whatever
something something
hello hello

Your pasted tabs would replace the entire match. 

The solution
Find this:

([0-9]+)( [a-z]+)

And replace it with this:


The $1 will bring back the match within your first set of parenthesis, and the $2 will bring back what matched the regex in the second set of parenthesis. In this example, we entered a \t (for a tab) between them.

Python code for extracting dates from files in a folder by Simon Lindgren

# Import necessary libraries
import os
import glob
import datetime

# Set the path
path = '/Users/Simon/Downloads/Baltic/'

# Loop through the files to process (extension selected by wildcard)
for file in glob.glob( os.path.join(path, '*.jpg') ):

# Get the filename
filename = os.path.basename(file)

# Use statinfo to get a bunch of info about the file
statinfo = os.stat(file)

# Extract only the info on modification time (st_mtime)
timestamp = statinfo.st_mtime

# Convert the timestamp to readable date
date = datetime.datetime.fromtimestamp(timestamp)

# Print name and date, tab separated
print filename,'\t',date
Read More