Monthly Archives: May 2013

The Desiderata of Automata: Exploring open research tools for knowledge discovery

There are still in social science some incredible optimists who really believe we are on the verge of breaking through into a wonderful land of mechanical research, in which highly significant findings are ground out of little whirring models set in motion by precise instructions.

That has been about to happen for far too many years…

James David Barber, 1930-2004

This paper explores the methodological feasibility of open source research tools for presidential studies to measure the power of the President.  The intent is to promote the utility of data visualization as well as natural language processing in the social sciences.  This is because there is a chasm between the methods practiced in political science and the rest of everyday life experience.  The praxis between theory and practice is often in stark contrast between those who study political science and those who witness politics.   A democratic society is predicated on an informed electorate.  The fostering and renewal of this functional requirement is among the constituent purposes of political science research; to further knowledge, to resolve inconsistencies in our understanding and improve everyday life.  The bridge to simultaneously informing both students and participants of the political process requires a departure the present trajectory with an eye toward the titans of information outside academia.  This paper is motivated by a disconnection between the outputs of democratic society and the esoteric outputs of those who study it.  This paper hopes to create feedback between voters and the political scientists who study voters.  The paper finds that this work is being done by a few talented individuals who are willing to reach very broadly across disciplines to connect the insights of specialists.

Science 2.0

The current peer reviewed paradigm of selling free user generated content is an abomination to a free and open information society.  Citizens pay for research once when it is subsidized by the government and most cannot enrich themselves with the findings without paying for it again.  The Economist reports that both the UK and the US have recently passed legislation requiring journals to make findings available to the public for free.  Scientists consider themselves fortunate to work for free and have their work published.  The same Web 2.0 technologies that facilitate dynamic content allow for a great deal more dynamism in scientific collaboration.  Scientific achievement in the very near future will require a technological fluency in these development tools.  Ben Scheiderman coined the term “Science 2.0” about the same time that he, his student Dr. Adam Perer[1] and journalist Chris Wilson were working on the visualization software SocialAction (Perer 2008; 2010).

This paper rejects the premise that insights can only be made by establishing a research question[2].   Like Charles Jones, “I was not fully certain what I was looking for”, and “nor was I very clear about the questions” (2005, 249).  Political science is stymied in retrospection where the newest knowledge has already sat for months under review before publication.  Yet a tsunami of data is committed to history in that time; put more eloquently by Aedhmar Hynes, “the very point of looking to Big Data is ‘to identify patterns that create answers to questions you didn’t even know to ask” (Boiler and Firestone 2010, 36).  In addition to addressing the “file drawer effect” created by shelved null findings (Iyengar and Greenhouse 1988, Rosenthal 1979), open research cross-pollinates disciplines with hybrid vigor (see Evans and Foster, 2011; and Wastell, McMaster and Kawalek 2007).  Figuratively, political science departments “need three to breed,” to reproduce institutional memories in new PhD students, a specialty must be represented by three chairs.  A lack of interdisciplinary collegiality poses the same risks as academic inbreeding; “a reduced flow of external knowledge… ossified and less responsive… to a fast evolving… knowledge based society” (Horta, Veloso and Grediaga 2010, 17).

This paper provides some of the rationale for social scientists to take those first steps outside of their comfort zones as digital immigrants and to communicate meaningfully with the natives (Prensky 2001).  Despite the assumptions of the digital native metaphor, natives are not imbued with technological prowess at birth; they are simply immersed in the user experience.  Many immigrants can remember early markup word processors, their relationship to HTML, and the limitations of .cgi scripting that gave rise to existing Web 2.0 technologies.  Natives are not inherently better developers, in contrast to immigrants; natives have very little contextual background to differentiate developing a webpage to editing user generated content on Facebook; both are “automagical.”  Political Science is at risk outsourcing its own methodological innovation to external communities at a point in history where these skills will become fundamentally more difficult to reproduce.

Measurement Concerns

There have only been 44 Presidents of the United States.  This “small n” creates a unique problem for presidential researchers; to collect data from a few individuals over a large period of time which can be generalized into the future.  Observations which can be generalized across presidencies tend to overlook the executive as an individual.  In contrast, focusing on the individual tends to understate the effects of the system (See Barber 1992).  The ongoing debate between qualitative and quantitative studies has become a rhetorical false option; a false dichotomy.  Qualitative studies, such as Neustadt’s 1990 Presidential Power, provide the reader with a rich context at the price of seeming speculative and insufficiently scientific.  Often such qualitative work has provoked quantitative studies with scrutiny and rigor but a diminished worldview.  Existing quantitative studies of presidential power struggle to operationalize a proxy for agency with sufficient face validity.  This is principally due to assumptions that the President’s influence can be measured by “affairs transpiring in other parts of the federal government” (Howell 2003, 175).

Operationalization

Presidential agency is deliberately diluted many times by the separation of powers. The separated system complicates the means in which presidential power can be measured.  When the President is the intended subject of study, the federal budget (see Barry, Burden and Howell 2010, Canes-Wrone 2001) and the President’s span of control (see Dickinson and Lebo 2003) are remote measures.  Dickenson and Lebo operationalize growth in the Executive Office of the President (EOP) as a measure of the institutional presidency (2000, 209) which explains 30% of variance in EOP growth (217).  Knowing the executive branch grows, gives us some insights about the growth of government, but less about the president’s power.  Despite the value of these insights, their specificity begs questions of multifinality, equifinality and spuriousness.  Equifinality occurs when an outcome occurs as a result of more than one possible cause; what explains the remaining 70% of variance?  Researchers may fail to measure a potential cause or incorrectly attribute the effect to a spurious event.  Multifinality is a breakdown in causation and scientific determinism which accepts the potential for divergence that results from an event.  An event may have many possible outcomes in both the physical (see Hawking 1999) and social sciences.

The Present Methods

The qualitative and quantitative repartee which continues in presidential studies is an aftershock of the impact rational choice, or “economic imperialism,” had on political science (Lehtinen and Kuorikoski 2007, 115).  In international studies, it is framed between the positivist’s “explaining” and the post-positivist’s “understanding” (see Wendt 1998).  Students can easily perceive the volleys between the two camps; “rational choice strips presidential narratives of much that makes them so compelling” (Howell 2003, 25).  King, Keohane and Verba (KKV) and Gerring suggest both methods are imperfect.  They confess the naked truth that results is that “we can never hope to know a causal effect for certain” (KKV 1994, 79).  The mechanisms of abstraction in qualitative analysis lay barer, and offend the principles of decency to some quantitative scholars.  The complication is that “investigators often take down the scaffolding after putting up their intellectual buildings, leaving little trace of the agony and uncertainty of construction” (KKV 1994, 13).

By focusing on aspects of qualitative and quantitative analysis such as their shared use of the scientific method and inference, KKV are buttressing their initial assumption that the two are equals in the eyes of science.  This is not conventional wisdom however.  Most students of the social sciences will recognize the often repeated mantra “correlation does not imply causation” as an extension of the logical fallacy “post hoc ergo propter hoc,” or literally, “after this, therefore because of this.”  Unfortunately the mechanisms of causation are so spurious in political science that a dependent event before the independent variable is sometimes as significant as it was prescient (see Hedge and Johnson 2002, 349).  Hedge and Johnson claimed that their dependent variable was triggered by an independent event that happened after its effect.  The public responded in anticipation.

Quantitative analysis is frequently championed here for its concreteness and rigor, which is of course built on the unsteady bedrock of abstraction (See KKV 1994, 3).  Before variables can be measured, they must be conceptualized narrowly and defined in the arbitrary and normative terms of language.  Gerring and KKV take a more inclusive approach (see Wendt 1998, 104) by suggesting that these mechanisms are essentially the same, and both yield imperfect correlational or covariational inferences; “All empirical evidence of causal relationships is covariational in nature” (Gerring 2004, 342).

The Error in Y = a + bX + e

A regression model requires that the error term be distinct from the predicting variable.  The scaffolding just mentioned includes two types of abstraction which have the potential to conflate the error with the cause.  For quantitative analysis, error is shrouded in a probabilistic model with two parts.  First, the model serves as an abstraction which distances the viewer from the untidy estimations and definitions of terms, second it insulates a quantitative hypothesis from refutation.  Flyvbjerg elucidates this second point in his evaluation of case studies by explaining that “if predictive laws would exist in the social sciences” as they do in the natural sciences a “black swan” would be sufficient to disprove the hypothesis; “all swans are white” (Flyvbjerg 2006, 227).  Quantitative studies are much more apt to imply, and do so with a high degree of confidence, that “all swans are white.”  Error is controlled for, error happens.  Within the error tolerance of this model, particularly in the social sciences, a black swan does nothing to fault the tectonic bedrock of truth.  If “[a] concept is defined by the indicators used to measure it,” then the creature found fails as being either black or swan (Mahoney and Goertz 2006, 244).  This misattribution occurs repeatedly in presidential studies.  The predictive power of Krehbiel’s theory of pivotal politics is not falsifiable.  A false prediction does not undermine it; the congressmen may be voting either sincerely, or strategically.  The predictive value of Krehbiel’s theory is significantly diminished by acknowledging the limitations of probability versus certainty.  We should not be tempted to jettison the signal due to the ratio of noise.  Krehbiel’s theory should be expanded to account for the possibility of strategic voting.

Parsimony

The parsimony employed to distill presidential agency imparts to readers an effect of myopia which is similar to looking away from a book after wearing strong reading glasses.  Broad proxies for measuring presidential power both provide the reader with a level of confidence and acuity which presumes a very specific and particular context.  The reader is then charmed with findings which not surprisingly comport to the model’s initial assumptions.  Ridiculous assumptions should yield ridiculous conclusions and yet the logic itself may be infallible.

Voting for the President is irrational only when we make assumptions which disregard notions of altruism and duty.  Rational choice frequently makes additional assumptions about the availability and symmetry of information.  Rational choice assumptions are implicit in Dickinson and Lebo cited above when studying budget maximization.  For the President more so than bureaucrats, there may be “empirically more tractable reasons why budget maximization is not a convincing operationalization of self-interest”, let alone presidential power where pay is fixed (Lehtinen and Kuorikoski 2007, 122).  One reasonable yet arduous response is to generate mixed methods studies utilizing both qualitative and quantitative designs.  Traditionally these have used statistics and case studies (for exemplar; see Hickey 2011a).  Where reductionist approaches to presidential power claim, for example, to only explain 30 percent of a concept while measuring as broadly as the federal budget we are forced to embrace complexity.

Industry

Another approach to generating scientific knowledge is to marshal the austere logic of machine learning with experiential human intuition.  The challenge of understanding the president is not unlike that facing many businesses.  Political scientists want to know both what the president believes and to estimate the effects of his actions.  Target wants to know what shoppers want and to have an estimable effect on their actions.  Rational choice theory allows political scientists to make assumptions about beliefs so that they may make deductions about behavior.  Data mining allowed Target to identify pregnant shoppers, often before in-laws (Duhigg 2012).  Google can “predict flue outbreaks and unemployment trends” based on the geospatial information of internet searches (Boiler and Firestone 2010, 1 also Gentry et al).  What is political science waiting for?  When the Huffington Post create an Advanced Programming Interface (API) for presidential polling data, they state that their intent is to help not only journalists, but “researchers and policy analysts to better understand current opinions and trends” (Scheinkman and Blumenthal 2012).  The public and a great deal of academics[3] in other disciplines expect political scientists and the trope of “policy makers” to be able to address their concerns.  Unless some response is taken, these concerns will fall on deaf ears.         

Data Visualization

This is the epistemological crux facing political science.  The monopoly of academia on the creation of knowledge is illusory.   Chris Wilson is a data journalist for Yahoo! News and an active developer of open source software.  In a project with the University of Maryland Human-Computer Interaction Lab’s, Dr. Adam Perer, Wilson developed an interactive visualization that allows the user to adjust the percentage of senators displayed who voted the same way on the same bill.  Using the web application is very easy.  The project is open source however implementing it locally is by no means “open sesame.”  This author was able to install the clients and servers necessary to run the application[4], but unable to expand the data set to include additional nodes during the semester.  The congressional data was mined collaboratively using “scrapers”, which are typically Python[5] programming scripts, used to remove key words from individual and sometimes multiple websites.  Wilson’s visualization demonstrates the Senate half of Krehbiel’s theory in a social network analysis.  A robust implementation of Krehbiel’s theory would also include the members of Congress and the President.  Using this visualization, we can see pivotal voters begin to separate from the parties.  To test Krehbiel’s theory, this visualization might help to determine if pivotal players really are median voters.

Social Network Analysis

Clark, Pelika and Rigby used social network analysis in 2010 test Krehbiel’s theory to study Health Reform.  Clark et al found that in addition to “proximity to the median voter”, acting as “party leader, chairman[] or member[]”, “occupational expertise” in health care was also important (2010, 1).  Cho and Fowler built congressional network maps based on co-sponsorship of legislation.  Others considered the role of “constituency influence and member vulnerability” in addition to Krehbiel’s original model (Hickey 2011b).  After a few moments with Wilson’s web page visualization users can bear witness to a clearer demonstration of contemporary party cohesion than offered by Fleisher and Bond’s 1996 or 2004 work.  The Democrats in the 113th session have shown a great deal of party unity. The balance of actors in Krehbiel’s theory can be examined in an open source data set from 1789 to the present on the open source github website.  While JSON data files are “human readable” they underutilize the bandwidth of the human eye in comparison to a well-designed visualization (Perer 2010).  While being far from a reality, a robust implementation of Krehbiel’s theory is feasible.  Another small addition to Krehbiel’s theory found in this data set includes vice presidential tie breaking voting records. The Vice President is ostensibly the de facto proxy for presidential power as the President of the Senate.

Natural Language

Natural Language Processing[6]
(NLP) analysis may provide political science with a research alternative which is both scientifically reproducible as well as informed by the contextual information.  This context is encoded in the semantics of the corpora, the collections of writing, under study.  Political science has historically looked to metrics such as the use of word counts as a measure of legislative agency.  Longer directives leave less room for discretion; while short statutes give agents room to fill in the details (see Rinquist, Worsham and Eisner 2003).  The executive orders issued by the President are perhaps some of the best evidence of presidential agency (See Howell 2003).  Executive orders are not encumbered by consensus and are issued directly by the President.

As a trial run I examined executive order 13636, “Improving Critical Infrastructure Cybersecurity” which Obama issued in 2013.  This text was compared using the open source plagiarism detection software WCopyFind[7] to the computer crime laws collected for the 50 United States and 54 additional countries.  Surprisingly, no evidence for policy transfer was found between the executive order which suggests that the text language is novel to computer crime policies.  This also raises concerns about the extent that such documents reflect the voice of the President or something issued from principle to agent.   More recently, President George W. Bush’s email was hacked revealing texts in the President’s own voice as well as information about President George H. W. Bush, according to thesmokinggun.com (2013).  A corpus of presidential writing could be very valuable in identifying documents the President actually authored, but comes with unfortunate ethical concerns.

Classified

Classified documents present another potential data source for presidential studies. President Clinton issued Executive order 12958 to increase government transparency by declassifying over 800 million pages of text (Atallah 2001).  With present technologies the process of redacting information, or downgrading, on this scale is complicated, exceedingly expensive and time consuming.  Modifications made by GW Bush in executive order 13292 expand presidential powers over classified documents to the Vice President.

The task of “automatic” declassification of the existing backlog during the Obama presidency was simply too great and resulted in Obama’s EO 13526 (Atallah 2001).   This recursive example describes a rolling 25 year window of declassification which still predates most of the information revolution, or circa Bush (I). However, public papers are available beginning with Reagan, and this existing body of text is very amenable to analysis in that the conversion from analogue to digital should have already occurred.  Experience in electronic document classification could have direct applications for political science graduates in the public sector.  This is a valid concern for PhD programs whose return on investment has greatly diminished over Master’s programs; to as little as 2% more on average and 0% for the social sciences in the United Kingdom (Casey 2009, 221).

William Howell created a database of 83 court cases involving congressional bills issued in response to executive orders from 1945 to 1998 (Howell 2003, xvii, for complete list; 198).  David Mayhew identifies 267 “important enactments” between 1947 and 1990 to study gridlock (Jones 2005, 241).  Charles Jones selects 28 of these laws (Jones 249). Instead of combing through thousands of cases, NLP could be used to create a training dataset from those 83 bills to apply to bills from 1998 to the present.  The analysis of executive orders could yield valuable insights for the field.  Training sets from existing studies could be used to compare the internal validity from similar studies.  A Stanford study has pioneered text mining use as a proxy to measure agency jurisdictional activity in the ocean waters (Ekstrom et al 2009).  While dissimilar on face value, this is not unlike the document classification task just proposed.  Political science has been slow to adopt these emergent technologies.

The open source NLP package the Natural Language Tool Kit (NLTK) was used for this project.  This toolkit happens to include a corpus of inaugural addresses which make speak to its intended audience.  This corpus will be used in order to establish that the software suite is functioning properly and to test some degree of the feasibility of verifying empirical claims about the President.  NLTK is a Python library that includes tools for document classification and clustering.

To facilitate installation and avoid maintaining a separate machine, Oracle VirtualBox Virtual Machine software will emulate a UNIX computer for the Python interpreter.  The dispersion plot (figure 1) was created from the words “America” and “terror” within text4; the corpus of “Presidential Inaugural Addresses,” by following the example given in the NLTK text.  This toy Lexical Dispersion Plot demonstrates some of the potential of this package environment.  It demonstrates the package’s ability to establish the frequency distributions of words and locates them spatially in relation to their position in the text and in time.  Older texts from the beginning of the file are represented towards the left of the output.  Metrics such as frequency and dispersion are necessary to determine the most important themes of a text.  These dimensions can be used to generate signature vectors which will allow the computer to discriminate a training sample from a work sample.

In 1964 Mosteller and Wallace 1964 were able determine authorship of the 12 unattributed Federalist papers using lexical analysis.  Fifty years later, The Federalist Papers example is a comparably trivial problem and is available as a tutorial for SAS Enterprise.  Other commercially available text analysis packages such as Linguistic Inquiry and Word Count are intended to identify personality traits of the authors (Pennebaker, Booth and Francis 2007).  Studies in Romania have utilized similar tools (AnaDip-2011) to identify personality traits in presidential television campaigns (Gîfu and Cristea[8] 2011).

NLTK Generated Gibberish     from corpus of inaugural addresses:

Fellow – Citizens of the danger of depreciation in the wars betweenother powers , designates the objects and the constant effort of the law as a necessary adjunct . The object sought was not then , nor should the public exigencies require a recurrence to them . Already something has been recommended to the upbraidings of all men , let usclearly understand the wants and be found in the divine oracle which declares that ” in favor of honest political belief . In the swift rush of great emergency in the enlarged views , the mere material value

NLTK in contrast is “free as in beer.”  To demonstrate the linguistic prowess of natural language processing; “text4.generate()” yielded the kind of political gibberish that might be heard during a filibuster.  It is almost intelligible, but more importantly it demonstrates the package’s ability to reform to the syntax of sentence structure from existing text.  This means that it was able to identify parts of speech from the original text before using it again.  Punctuation is treated in the same way as a word.  The semantics of the paragraph as a whole are still wanting and have not yet arrived at Orwell’s Minitrue novel-writing machines.

 

Conclusion

Regrettably, this paper was not able to implement computer based visualizations for document clustering.  The paper was not able to demonstrate any improvement upon Chris Wilson’s existing Senate Social Network visualization to more closely implement Krehbiel’s Pivotal Politics theory of lawmaking.  However it was able to explore and implement component features of both and demonstrate feasible means to do so for researchers in the social sciences.  The technologies which underpin network analysis and document clustering are not tangential; both seek to quantify a relationship between nodes, or vertices.  Krehbiel’s theory is sometimes criticized for being too one-dimensional, and this is a valuable feature of a reductionist explanation of complex systems.  As a framework, it allows the theory user to hang members of the House and Senate and the President in the mind’s eye or on paper amidst this intellectual scaffolding.  One of the advantages of computers for this application is the ability to remember the minutiae of many dimensions and calculate the abstraction layer at the final moment of usage.  The added complexity does not distract from the parsimony needed for explanation; visualizations provide more easily grasped insights, with fewer oversights.

An unexpected virtue of this open source project was the level of interaction with key participants.  Dr. Perer, Chris Wilson and Chris Anderson, all innovators in their fields and cited below, took time to offer direction for this project, and I thank them very much.  John Line, a private contractor and developer also gave valuable help towards my understanding of this project; thank you.  This author is accustomed to being on the receiving end of requests for technical support which made it difficult to ask for help.  This same spirit of charitable assistance abounds in increasingly step by stem guides for the use and installation of UNIX packages designed for novice and advanced users to get working quickly.

This paper concludes that open research technologies are sufficiently mature and accessible for social scientists.  This is particularly true for those researchers with more time than money.  The technologies are free and do not require unreasonable expenses from students and researchers.  In contrast to expensive commercial packages, these tools empower social scientists to do research such as the document clustering of the federalist papers which only a generation ago required the backing of elite institutions.  In light of the present austerity measures and a focus on Science, Technology, Engineering and Math largely not met by political science departments, these methodologies may provide financial security to educators competing for scarce funds.  The same technological literacies will be valuable to students seeking employment.  Open source technologies can serve to fill the gap between the expectations of a democratic state for a return on its educational investment and the struggle for relevance facing political scientists today.

 

 

References

Anderson, Chris. 2008. “The end of theory.” Wired Magazine 16.

Atallah, M. J., McDonough, C. J., Raskin, V., & Nirenburg, S. 2001. Natural language processing for information assurance and security: an overview and implementations. In Proceedings of the 2000 workshop on New security paradigms (pp. 51-65). ACM.

Barber, J. D. (1992). The presidential character (4th ed., pp. 141-160). Englewood Cliffs, NJ: Prentice-Hall.

Berry, C. R., Burden, B. C., & Howell, W. G. 2010. The president and the distribution of federal spending. American Political Science Review, 104(04), 783-799.

Bird, Steven, Ewan Klein, and Edward Loper 2009. “Natural Language Processing with Python Analyzing Text with the Natural Language Toolkit.”  http://nltk.org/book/ch01.html

Bollier, David, and Charles M. Firestone. The promise and peril of big data. Aspen Institute, Communications and Society Program, 2010.

Bond, Jon R. and Richard Fleisher. 1980. “The Limits of Presidential Popularity as a Source of Influence in the U.S. House.” Legislative Studies Quarterly 5.1: 69 – 78.

Casey, Bernard H. 2009. “The economic contribution of PhDs.” Journal of Higher Education Policy and Management 31.3 : 219-227.

Clark, Jennifer Hayes, Stacey Pelika and Elizabeth Rigby. 2010. Identifying Central Actors: A Network Analysis of the 2009-2010 Health Reform Debate. The Selected Works of Jennifer Hayes Clark Available at: http://works.bepress.com/jennifer_clark/11

Duhigg ,Charles. 2012. New York Times.  How Companies Learn Your Secrets http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html

Evans, J. A., and J. G. Foster. “Metaknowledge.” Science (New York, NY) 331, no. 6018 (2011): 721.

Ekstrom, Julia A., Lau, Gloria T., Cheng, Jack C.P., Spiteri, Daniel J., and Law, Kincho H.. 2009. “Gauging Agency Involvement in Environmental Management Using Text Analysis of Laws and Regulations.”  I/S: A Journal of Law and Policy for the Information Society.

Flyvbjerg, Bent. 2006. “Five Misunderstandings about Case Study Research.” Qualitative Inquiry 12(2): 219-245.

Gentry, Ryan, et al. 2009. “Searching for better flu surveillance? A brief communication arising from Ginsberg et al. Nature 457, 1012-1014 (2009).” Nature 457 1012-1014.

Gerring, John. 2004. “What is a Case Study? What is It Good For?” American Political Science Review. 98(2):341-54.

Gerring, John. 2010. “Causal Mechanisms: Yes, But. . .” Comparative Political Studies. http://people.bu.edu/jgerring/documents/CausalMechanisms.pdf

Gîfu, Daniela., and Cristea, Dan. 2011.  Computational Techniques in Political Language Processing: AnaDiP-2011.  Future Information Technology Communications in Computer and Information Science, Springer Berlin Heidelberg http://dx.doi.org/10.1007/978-3-642-22309-9_23 Park, James J., Yang, Laurence T., and Lee, Changhoon Edited Book.

Hawking, Stephen. W.. 1999. “Does God play Dice?” http://www.hawking.org.uk/does-god-play-dice.html

Hedge, D. and Johnson, R., J. 2002. “The Plot That Failed.” Journal of Public Administration Research and Theory. 12(Jul)3:333-351

Hickey, P. 2011a. “Beyond Pivotal Politics: An Analysis of Vote Switching on Veto Override Votes.” In APSA 2011 Annual Meeting Paper.

Hickey, P. 2011b. “Beyond Pivotal Politics: Constituency, Electoral Vulnerability and Challenged Vetoes”

Horta, Hugo, Francisco M. Veloso, and Rócio Grediaga. “Navel gazing: Academic inbreeding and scientific productivity.” Management Science 56.3 (2010): 414-429.

Iyengar, Satish & Joel Greenhouse. 1988. Selection Models and the File Drawer Problem.  Statistical Science, Vol. 3, No. 1, 109-135.

Kenson, M.J. and M.J. Lebo, 2007. “Reexamining the Growth of the Institutional Presidency, 1940-2000,” Journal of Politics 69 (2007): 206-219.

King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press.

Krehbiel, Keith and Rivers, Douglas. (1988). The Analysis of Committee Power: An Application to Senate Voting on the Minimum Wage, American Journal of Political Science  , Vol. 32, No. 4 (Nov., 1988), pp. 1151-1174, Published by: Midwest Political Science Association Article Stable URL: http://www.jstor.org/stable/2111204

Lehtinen, Aki, and Jaakko Kuorikoski. “Unrealistic assumptions in rational choice theory.” Philosophy of the Social Sciences 37.2 (2007): 115-138.

Mahoney, James and Gary Goerzt. 2006. “A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research.” Political Analysis 14(2): 227-249.

F. Mosteller and D. L. Wallace. Inference and Disputed Authorship: The Federalist. Series in behavioral science: Quantitative methods edition. Addison Wesley, Massachusetts, 1964.

Osiński, Stanisław, and Dawid Weiss. 2005. “Carrot2: Design of a flexible and efficient web information retrieval framework.” Advances in Web Intelligence. Springer Berlin Heidelberg, 2005. 439-444.

Pennebaker, James W., Booth, Roger J., and Francis, Martha E.. 2007. Operator’s Manual. Linguistic Inquiry and Word Count LIWC2007. http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/reprints/LIWC2007_OperatorManual.pdf

Perer, Adam. 2010. Finding Beautiful Insights in the Chaos of Social Network Visualizations. In Beautiful Visualization. O’Reilly Press. http://perer.org/papers/pererBV_ch10.pdf

Perer, Adam and Shneiderman, 2006. Ben: Balancing Systematic and Flexible Exploration of Social Networks. IEEE Transactions on Visualization and Computer Graphics (InfoVis 2006). 12(5): 693-700. Baltimore, United States. (2006). http://hcil2.cs.umd.edu/trs/2007-26/2007-26.pdf

Prensky, Marc. “Digital natives, digital immigrants part 1.” On the horizon 9.5 (2001): 1-6.

Ringquist, Evan J., Worsham, Jeff and Eisner, Marc Allen. 2003.  “Salience, Complexity, and the Legislative Direction of Regulatory Bureaucracies” J Public Administration Research Theory. (2003) 13(2): 141-164 doi:10.1093/jpart/mug013

Rosenthal, R. 1979.  The File Drawer Problem and Tolerance for Null Results. Psychological Bulletin, 86, 638-641.

Scheinkman, Andrei and Blumenthal, Mark. 2012. HuffPost Pollster API Enables Open Access to Polling Data. http://www.huffingtonpost.com/2012/07/02/polling-data-api-pollster_n_1643556.html

Schneiderman, Ben. 2008. Science 2.0.  http://www.cs.umd.edu/hcil/science20/Science%202%200-AAAS-3-7-2008.pdf

Thesmokinggun.com. 2013. “Audacious Hack Exposes Bush Family Pix, E-Mail: Hacker breached AOL account of ex-president’s kin.”  http://www.thesmokinggun.com/documents/bush-family-hacked-589132

Wastell, David G., Tom McMaster, and Peter Kawalek. “The rise of the phoenix: methodological innovation as a discourse of renewal.” Journal of Information Technology 22.1 (2006): 59-68.

Wendt, Alexander. 1998. “On Constitution and Causation in International Relations.” Review of International Studies 24: 101-118.

Wilson, Chris. 2013. Vote Studies. Github.  https://github.com/wilson428/vote_studies

Wilson, Chris. 2013. Congress-Legislators. Github. https://github.com/unitedstates/congress-legislators

Wilson, Chris. 2013. The Senate Social Network. Yahoo! News. http://news.yahoo.com/senate-social-network-diagram-mcconnell-mean-girls-000513361.html

Not yet used

Marshall, B. W., & Pacelle, R. L. (2005). “Revisiting the Two Presidencies: The Strategic Use of Executive Orders.” American Politics Research, 33(1), 81-105.

Dr. Duval quote “Our use of classical hypothesis tests for most quantitative research has become ubiquitous, while SNA seems to avoid the topic entirely.”

Duval, R. D., Christensen, K., & Spahiu, A. (2010). Bootstrapping a terrorist network.


[1] Perer now works with IBM Research’s Healthcare Analytics Team.

[2] However, data surrounding the President does not present such “Big Data” as to negate theory.

[3] Countless papers encountered during this writing in Education, Computer Science, Information Technology, and Public Health among others proffered their findings explicitly for the benefit of “policy makers” and “analysts,” a population ostensibly trained and educated by political science departments.

[4] With help from developer John Line, MBA.

[5] Clark et al use PERL (2010, 12).  Other important dynamic scripting languages include PHP, Ruby and JavaScript

[6] Also referred to as computational linguistics

[7] WCopyFind Copyright 1997-2011 © Louis A. Bloomfield, University of Virginia, All Rights Reserved

[8] Even a cursory text analysis of this paper should indicate that English is not the authors’ first language; to translate a wildcard character ‘*’ “plays the role of the universal jolly-joker” (189).