Monday, August 19, 2013

Programming Project: Text Analysis


Hello readers :)

Anyone who knows anything about me knows I read addictively. Aside from the countless articles I read daily, I also read a large number of books (sometimes a few at a time, depending on what type of book). All the books I read are non-fiction, by-the-way (I REALLY don't like reading fiction-- sci-fi excluded).

Presently, one of the books I am reading is called The Secret Life of Pronouns. Now, because I have a project page, because I love programming (particularly in C), because I am hyperanalytical and because positively LOVE deducing and/or inferring things people are saying by how they speak, I decided to use this text as a launching point for a program that I will make to analyze medium/large text fragments.

I might add that when only going over in my own mind as to how EXACTLY I would write the program based SOLELY upon the aforementioned book, there REALLY isn't enough to go on! For instance, much is written on the over/under use of the words like: I/me/my v. we/our's. However, no specific fraction is mentioned. This specific instance ALONE makes it clear that I have a lot of googling/research ahead of me to perfectly clarify this point! If I fail to clarify this, I will simply output ratios of each "cluster." So that, if I should EVER come across this in a future I can add a simple if/then/else loop which goes off of those ratios.

My plan of attack is to go about this in passes:

 First, I will do simple things like pronoun use (I say that now-- referring to my using the word "simple"!  : /  ). 
            Pronoun use can relay information about whether or not you are depressed, 
            emotionally distant, authoritative, and your gender.

Then, I am thinking I will have to create emotive clusters (which incorporate synonyms, for instance).
           I WILL say on this that using words like sad, frustrated, etc. does NOT 
           have to mean that you are an unhappy person.

Perhaps, my third pass will incorporate proper participle of speech.
         This can predict gender, age, and education.


I consider this to be a project which takes an infinite amount of time. For, one can ALWAYS refine such a program. However, I would expect my proposed schedule to run something like this:

Stage 1: one month (I have to do some-one line research to do, for one thing).
Stage 2: two months (I can go to thesuarus.com to look up synonyms but, programming all of them will take time. There are non-synonymous words which belong in these clusters, also).
Stage 3: one month (I have to do some-one line research to do, for one thing).

There are --POSSIBLY-- insurmountable problems. For instance, how does one code for the program to acknowledge sarcasm.

I do plan on offering to give people updates of the program if they ask for it (after I state that it is available, of course).

Wish me luck! I might need it... : /     :D
_______________________________


So, I had thought of deleting this entry. For, it has been shelved till further notice. Why, though? Did I give up?! Pssshh! I would NEVER!

What happened is that, upon going back over the book which inspired me, I realized that nothing concrete was given: All findings were written too "generally" and I had no base line to go off of. Sounds about righ,t considering the author is a psychologist. So, till THEN, my friends, this project will be nothing more than a fantasy ;-) Unless, someone would like to send me some information?!
 ;-)


No comments:

Post a Comment