Ben Mazzotta's Weblog

Ben Mazzotta is a postdoc at the Center for Emerging Market Enterprises (CEME).

Posts Tagged ‘open source

R starter resources

with 3 comments

I’m hardly the first person you would want to talk to about learning statistics in R. But if you’re bent on teaching yourself R, and you’ve ended up at my blog, here are some resources I found useful. (No opinions here about whether R is good/bad better/worse than Excel, Minitab, Matlab, Octave, SPSS, Stata, SAS, or others.)

R Rroject is the mothership.

Rstudio is an IDE for R, which provides a better GUI for some basic tasks. Most of what you’d expect from a modern IDE: syntax highlighting, GUI commands for loading and saving data, setting the working directory, separate panes for help files.

UCLA tutorials are a well written introduction to basic data entry, functions, and graphics in R. There are similar tutorials for Stata and other languages here as well.

Quick-R is a blog and a book written by a statistician for people switching from SPSS and Stata to R. Excellent and concise website detailing all of the basics: data entry, functions, plots, and how to think about all of the above.

R help list and archives are a way to ask questions of experienced users. You’ll get excellent help here, but it’s important to respect the etiquette. Basically, (1) read the package manual, (2) work up a minimal example with your question, and (3) be extremely precise about the data you have and the data you want, as opposed to the way you’re trying to solve that problem. This will become clearer if you read a few discussions in the archives.

StackExchange is a glorified bulletin board for programmers exchanging help and (frequently great) advice. Search the archives before posting new questions–the guys that hang out here hate duplicate postings. But it’s easier to navigate than the R help archives.

Read the rest of this entry »

Written by Ben Mazzotta

June 8, 2012 at 1:29 pm

Posted in statistics, technology

Tagged with , ,

World Bank opens access to WDI

leave a comment »

The World Bank has opened access to a flagship dataset that was mostly closed. Until recently, only a fraction of the thousand-plus data series that comprise the World Development Indicators (WDI) were available to non-paying customers in advanced economies. As of April 10, 2010, the World Bank has opened access to the complete dataset.

Read the rest of this entry »

Written by Ben Mazzotta

June 11, 2010 at 4:06 pm

Fuzzy Thinking on African Botnets

with 2 comments

I call “bull.” African botnets are not WMD, and the solution to African botnets is not to prosecute the lucky few who have computers there. Franz-Stefan Gady is completely out of touch with the realities of IT in Africa. The last thing African governments need is shunt scarce resources into prosecuting cyber criminals, particularly within their own borders. Please do something more useful with whatever resources you have: support export industries, build infrastructure, build a call center or an export processing zone, make jobs, and provide education and health care.

Honestly. Beefed up law enforcement? Where does Gady think most infections in Africa originate? Why would he presume that the botnets are home-grown?

Governments should find ways to make legitimate software available at prices users can afford. That means not taxing software imports, encouraging the use of free and open source software, and ensuring broadband access. Yes, greater bandwidth, and not less bandwidth, is crucial to safer computing. Bandwidth will give end users access to security updates and current virus databases that are prohibitively difficult to download when connections are slow.

Read the rest of this entry »

Written by Ben Mazzotta

March 25, 2010 at 7:16 am

Equations in your dissertation

leave a comment »

What do you use to edit equations for your dissertation? OpenOffice has a LaTeX equation editor plugin that takes latex input. You can enter in LaTeX equations, and then choose the resolution and file format in which you’d like a graphic inserted into your paper. Fantastic! Even better, its name is OOolatex. Who can resist enjoying that name?

What is the current MS Word solution to this problem? I’d be interested to know how others manage.

UPDATE (3/24/10): Please check out OpenOffice Math, which makes the LaTeX plugin largely obsolete. Most users will have everything they need in the way of math, from calculus to Greek letters to set operations to summations.

Written by Ben Mazzotta

November 19, 2009 at 3:19 pm

Posted in technology

Tagged with , ,

Coding Qualitative Data: Web Solution

with one comment

Professor Stuart Schulman of University of Massachusetts (formerly University of Pittsburgh) designed a web server to provide qualitative data analysis (QDA) via web for social science datasets. The solution is called QDAP, currently housed at UMass but also at Pitt.

Bravo! Free, multi-user, qualitative data analysis for anyone with a web browser. They have clearly stated data warehouse privacy disclosures as part of the user agreement, and a tutorial for new users.

Thank you, Dr. Shulman.

From the About Us page:

The original QDAP lab was founded in the fall of 2005 by Dr. Stuart Shulman at the University of Pittsburgh. QDAP-UMass, founded in September of 2008 when Dr. Shulman moved to the Department of Political Science at UMass Amherst, trains and employs personnel able to code text from a wide variety of sources. Original material for content analysis might include in-depth interviews, open-ended survey answers, field notes, transcripts from focus groups or Web logs (blogs), e-mails, Web site content, results from database searches (such as LexisNexis™), congressional testimony or other historical texts, and a host of other unstructured but digitized text data sets. QDAP-UMass employs both UMass Amherst and University of Pittsburgh students, as well as professional staff trained in using ATLAS.ti (www.atlasti.com) as well as the Coding Analysis Toolkit, invented by Dr. Shulman. QDAP-UMass will continue to develop and make available online tools to improve the accuracy, reliability, and validity of coding projects.

Written by Ben Mazzotta

November 17, 2009 at 7:22 pm

Skype sound configuration under Linux 9.10

leave a comment »

Disclalimer: The settings under Ubuntu 9 are markedly different from my earlier post. I’ll try to post an update here with different instructions. The naming convention under Ubuntu 9.10 Koala’s sound mixer appear to be far more straightforward.

Manual gain settings still did better than the default.

–UPDATE

Read the rest of this entry »

Written by Ben Mazzotta

November 10, 2009 at 8:11 am

Posted in politics

Tagged with ,

Transcriber for DIY interview transcription

with one comment

Some day we’ll all have grants big enough to outsource our transcription needs.

I have to say I was pleased with the performace of the free and open source Transcriber software. No need for new hardware (read, foot pedals) or mouse clicks while transcribing. The software loads an audio clip and provides simple keystrokes for all major functions:

  • dividing the audio track into chunks of text,
  • marking the points where the speaker changes, and
  • identifying the new speaker.

James Drisko’s excellent site at Smith College gives a fantastic overview of the choices you’ll make regarding software, solutions, and methodology.

Honorable mention: F4 transcription software.

Written by Ben Mazzotta

August 27, 2009 at 8:03 am

Coding Qualitative Data

with 3 comments

A friend of mine recently pointed me towards MAXQDA for coding and parsing qualitative research. Too bad I just wrote a post on how garden-variety relational databases could be hornswoggled for the task. I was so proud of my handwritten beta, too….

A couple of quick web searches turned up NVivo and XSight, by QSR, QDA Miner by Provalis, and Atlas.ti. TAMS for Mac OSX may be the most honestly titleds: text analysis markup system.

And sure enough someone has been on the free and open source (FOSS) track. Weft QDA. Dexter. Transana.

  • UPDATE: The CDC (United States) publishes AnSWR at zero cost.

And a review site or two for multi-methods CAQDAS research tools. Clearly I have some reading to do.

  • UPDATE: There are a multitude of review sites, often hosted at university social science departments (e.g., sociology, ethnography, psychology), too many to list here and I’m not sure how to categorize them.

Please comment if you have worked with these packages and can recommend a way of organizing them by functionality and quality. There does not seem to be a single standard for what the packages ought to do, and how to do it well.

Written by Ben Mazzotta

August 10, 2009 at 8:00 am

Skype sound configuration in Linux

with 2 comments

As my readers know, recently I have been looking at Skype configuration for recording interviews here. [Ed-- What readers?] Skype on Linux has the great advantage of a high-quality, free call recorder, Skype Call Recorder, of uncertain provenance. (Read, install at your own risk. Their website doesn’t list much about the authors, and it’s not based in the US.) Once installed, this package provides crisp, stereo recordings of your Skype calls in both MP3 and WAV formats.

For some reason Ubuntu Linux ALSA did a relatively poor job of automating the sound configuration with my headset. If you’re using Skype on Linux and experiencing some of the following errors, read on for solutions.

Symptoms

  1. Skype test calls are terminated due to “Problem with Audio Capture.”
  2. Skype test calls fail to pick up your microphone’s audio.
  3. Skype test calls work fine for the laptop’s integrated mic, but won’t pick up the headset/mic you plugged in.

Read the rest of this entry »

Written by Ben Mazzotta

August 9, 2009 at 8:00 am

Posted in technology

Tagged with ,

Tracking research interactions

with one comment

How much information should the researcher keep about each site? Each interview? The answer, of course, is “all of it.” This can be an enormously time-consuming task, depending on the richness of the information the interviewer needs to collect about the site, the subject, and the interview instrument to be used.

Relational databases are purpose built for this sort of task. In a relational database, the user enters all the relevant information about each entity once, and only once. Whenever it is needed in the future, the database query looks up all the relevant bits of information from as many places as necessary for the task at hand. Many vendors are out there (Access, Filemaker, SAS, Oracle), but some of them are free and open source (MySQL, SugarCRM) and do not require years of study to become competent (OpenOffice).

To reiterate, you don’t need any money, and you don’t need a computer science degree to track your interviews in a relational database, but you can save yourself a ton of time.

Read the rest of this entry »

Written by Ben Mazzotta

August 8, 2009 at 8:00 am

Follow

Get every new post delivered to your Inbox.