
First, I read the files containing the information into my notebook with the function read.csv. After this, I used a summary function to analyze the word count information. This tool provided the answered to the first question that I had about my data. Next, I extracted the selected columns I would need from the two data sets that I had previously read into my notebook and renamed them.  To be able to make use of these two separate variables, birth year and word count, I made use of the merge function to combine the data. Some of the information in the interviews that could not be found had NA in place of the information.  So I made use of another function to remove all the rows that had NA in them. With the final data set,  I used the correlation function, cor (), to obtain the numerical relationship between my two variables. The correlation value was approximately  -0.0068. This small magnitude of the value is almost negligible but the negative sign tells us that there is an inverse relationship between the two. This means that as the birth year increases (the younger the individual) the word count decreases. Of course, there is not enough information to confirm this theory but that is the current result of my research task.  (“A Report Has Come Here”),  (“Collections As Data: Conditions Of Possibility.” ),  (“Searching For Black Girls.”)

Link to notebook PDF version:  file:///home/chronos/u-7ae5fd1db5891f4dc70ac5740406b524cb353f30/Downloads/Chenemi%20Maji%20Final%20(1).pdf

Ethical concerns surrounding data

While my classmates and I were cleaning the data and extracting it from the interviews they were concerns about the type of information we were extracting for example the living address of the interviewees. We acknowledge that releasing information concerning the living addresses might implicate their safety and this example was one of the ethical concerns that e had. While I was cleaning my data, especially when merging data tables and taking out columns that weren’t going to contribute to my research tasks, I took sometime to reflect on  whether or not by reducing the amount of information I will be publishing in my research task I would also be publishing incomplete or biased data.  My classmates and I were able o find solutions for this  concern by acknowledging the possibility of bias and addressing it alongside the release of our research tasks. (“Eviction Lab Misses The Mark — Shelterforce.” ),  (Tatman 2018),  (“Noble [Searching For Black Girls].Pdf.”) 

Context of data used

The data that was used in this research task was obtained from the recorded interviews of people that had previously worked in some of the manufacturing industries in Lewiston and Auburn, Maine in the past years. In these interviews, the people being interviewed retell their stories of how they came to work here, what the working environment was like and any other bit of information that they would like to share. There were three major industries; shoe, mill and brick and under these industries were various business institutions and places of work.  This workforce, at that time, had a blend of immigrants from within the United States and from other countries. Each of the interviewees have their various stories surrounding their work place and professions and our final research task looks into the subtle implicit information and trends that the interviews are able to share.  (“A Report Has Come Here”),  (Museum L-A)

How the data was produced

The interviews were initially in plain text format. Members  the class anonymously and randomly selected interviews to read through and collect information from.  The type of information we collected includes; birth year, current age, work place, whether or not injuries were obtained at work etc.  This extracted information was more specific and was then entered firstly through a survey and then into various documents in csv format and shared.  The information was grouped in different documents according to their main focus, some were about the interview itself, others were about the person being interviewed and their family background. The information that I needed to execute this research task was present in this collated data but in two separate files.  After reading the files into my notebook, I extracted only the columns I would need from the two data sets and renamed them.  Once this was done, I then merged the two separate filtered data sets into one table. Some of the information in the interviews that could not be found had NA in place of the information.  To further clean my data, the rows lacking information had to be taken out and the final data set that was produced is the one that I used for my research task.  The data cleaning took the most effort to complete but once it was done, all that was left to do was find the correlation between my two variables, birth year of the interviewee and word count of the interview.  (“Eviction Lab Misses The Mark — Shelterforce”) (“Noble [Searching For Black Girls].Pdf”)  (“Collections As Data: Conditions Of Possibility”)

Questions I sought to answer about my data

1.)  How many of the interviewees worked in each of the three industries (brick, mill and shoe factory)

2.) The minimum, median and maximum word count of each of the interviews 

3.) Whether or not the  age of the individual affected how much they had to say about their work experience
