Targeting transparency by Simon Lindgren

This new paper by myself and Luke Justin Heemsbergen was recently published in the Australian Journal of International Affairs.

The power of precision air strikes and social media feeds in the 2012 Israel–Hamas conflict: ‘targeting transparency’

This article analyses the evolving uses of social media during wartime through the IDF (Israel Defense Forces) Spokesperson Facebook and Twitter accounts. The conflict between Israel and Hamas-affiliated groups in November 2012 has generated interesting data about social media use by a sovereign power in wartime and the resultant networked discourse. Facebook data is examined for effective patterns of dissemination through both content analysis and discourse analysis. Twitter data is explored through connected concept analysis to map the construction of meaning in social media texts shared by the IDF. The systematic examination of this social media data allows the authors’ analysis to comment on the evolving modes, methods and expectations for state public diplomacy, propaganda and transparency during wartime.

http://www.tandfonline.com/eprint/E7898NsUIbDgUcrw9eVN/full#.U8WEJhZ4HTY

Extracting hashtag co-occurrences in a timeframe with Python by Simon Lindgren

Been working hard the last few days on my programming skills. We are doing a paper about the #idlenomore hashtag analysing co-occurrences between that and other hashtags in activist tweets. To incorporate a time dimension in the network analysis, I wanted to extract those co-occurrences that happen within a certain timeframe. I finally came up with this code, using Python Pandas, to keep only pairs of hashtags where the hashtag that comes before the second one has been present in the dataset for less than time T.

import pandas

## Set column names
colnames=['Date','Item1','Item2']

## Read csv adding column names
## The csv must be formatted like: 
## date;item1;item2
data = pandas.read_csv('/path/file.csv', names=colnames)

# STEP 1: GET "PUBLICATIONDATES" OF THE TAGS

## Create a dataframe with info 
## on dates for first column
pubdates = data[['Date', 'Item1']]

## Sort the dataframe by Date and 
## keep only the earliest occurrence of a value
## drop_duplicates considers the column 'Item1' 
## and keeps only the first occurrence
pubdates = pubdates.sort('Date').drop_duplicates(cols=['Item1'])

## Create a dataframe with co-occurrence pairs 
## and the pubdates of Item1 in each pair
timematrix = pandas.merge(left=data, right=pubdates, left_on='Item1', right_on='Item1')

## Rename some of the columns for clarity
timematrix = timematrix.rename(columns={'Date_x':'Coocdate', 'Date_y':'Item1-pubdate', 'value':'Item1 (check)'})

## Sort them
timematrix = timematrix.sort(['Coocdate','Item1-pubdate'], ascending=False)

## Add a column calculating the "age" 
## of Item 1 on the occasion of each co-occurrence
timematrix['Item1-age'] = timematrix['Coocdate'] - timematrix['Item1-pubdate']

# STEP 2: KEEP ONLY COOCS THAT HAPPEN IN TIME T AFTER ITEM1 PUBDATE

## Set a timeframe
timeframe = 1

## Extract only the rows where the 
## "age" of Item 1 is less than or
## equal to the user defined timeframe
mask = (timematrix['Coocdate'] - timematrix['Item1-pubdate'] <= timeframe)
keptpairs = timematrix.loc[mask]

## Output kept pairs to a file
dataset = keptpairs[['Item1', 'Item2']]
dataset.to_csv('/path/dataset.csv', sep='\t', encoding='utf-8', index=False, header=False)

Insert something between something using Regex by Simon Lindgren

When using Regex for text processing, for example cleaning up or otherwise preparing full text data for various forms of content analysis, this is a great trick. Let's say you want to find any places where a certain type of thing comes before some other type of thing and you want to insert something between them. Your data may look like this:

1976 dog whatever whatever
1981 rat something something
1995 gorilla hello hello

For some reason, let's pretend you want to insert a tab between the years and the animal words. Now, if you would use Regex to search for any 4-digit string followed by a space and any blank separated word

[0-9]+ [a-z]+

you would match what you're looking for, but now how to paste the thing you found back into the data but with something added? If you told your editor to find [0-9]+ [a-z]+, and replace iwith \t (the tab you want), your data would look like this:

whatever whatever
something something
hello hello

Your pasted tabs would replace the entire match. 

The solution
Find this:

([0-9]+)( [a-z]+)

And replace it with this:

$1\t$2

The $1 will bring back the match within your first set of parenthesis, and the $2 will bring back what matched the regex in the second set of parenthesis. In this example, we entered a \t (for a tab) between them.

Python code for extracting dates from files in a folder by Simon Lindgren

# Import necessary libraries
import os
import glob
import datetime

# Set the path
path = '/Users/Simon/Downloads/Baltic/'

# Loop through the files to process (extension selected by wildcard)
for file in glob.glob( os.path.join(path, '*.jpg') ):

# Get the filename
filename = os.path.basename(file)

# Use statinfo to get a bunch of info about the file
statinfo = os.stat(file)

# Extract only the info on modification time (st_mtime)
timestamp = statinfo.st_mtime

# Convert the timestamp to readable date
date = datetime.datetime.fromtimestamp(timestamp)

# Print name and date, tab separated
print filename,'\t',date
Read More