Tag Archives: statistics

Cheat Sheets for Stata

Cheat sheets for programming languages were commonplace before the internet became widely available with powerful search engines. Nowadays, I believe that cheat sheets are still very useful because search engines become simply too powerful and provide more answers than required by the question. Some of the most popular cheat sheets for Stata were prepared by Tim Essam and Laura Hughes from the US Agency for International Trade and Development. Follow them on twitter: @StataRGIS and @flaneuseks. Here are some of their cheat sheets:


Commands for Data Analysis

Programming

Commands and functions for data processing

Commands and functions for data transformation

Free book about basic statistics with examples in R

An interesting book for those that are starting to learn basic statistics for data analysis is Practical Statistics for Data Scientists by Peter Bruce and Andrew Bruce. It can be downloaded for free here. This book covers the basics of data exploratory analysis and frequentist statistical theory. Additionally, its final chapters cover classification, statistical machine learning, and unsupervised learning. I do like these last chapters as they are carefully written and easy to understand. The examples using R are very useful. In sum, this is a book worth reading for beginners. The last chapters can also be very helpful for those readers with more experience in data analysys.

A comprehensive road map to learn R

Learning a new software or even a new programming language is always an interesting journey. Most of the times the tutorials and books we find are never exactly what we need. A useful resource that I found is the Big Book of R. It is one of the most comprehensive repositories of tutorials and general information about R. You can find suggestions according to your needs or even according to your background, for instance for Journalism, Social Sciences, or Life Sciences.  It has very good sections on Machine Learning and on R programming.

An Introduction to Statistical Learning with Applications in Python

I came across this very interesting Github repository by Qiuping X., in which she posted the codes she prepared in Python for the book “An Introduction to Statistical Learning with Applications in R”  by Gareth James, Daniela Witten,  Trevor Hastie, and Robert Tibshirani.  This is very useful for those that are learning Python and certainly facilitates the migration from R to Python too.

Statistical Learning using R

I recently came across this book titled “An Introduction to Statistical Learning, with Applications in R“.

It can be downloaded for free at the authors webpage, which also contain the R codes, data sets, errata, slides and videos for Statistical Learning MOOC, and other valuable information.

That said, I think this is a very useful book for those interested in Statistical Learning. It is very accessible to most people, since it does not require a strong mathematical background.

For those interested in gaining a deeper understanding of these topics, I strongly suggest the book “The Elements of Statistical Learning“, which is also available for download at no cost.

Transferring IPEADATA series to Stata

A common issue that arises when converting time series data from IPEADATA to Stata format is dealing appropriately with the time variable. For instance, for monthly series the date format will be YYYY.MM. Stata usually interprets this format as numeric.

Suppose you already downloaded a monthly series from IPEADATA and transferred it to Stata. It is very likely that the date variable (let’s call it date) has been automatically handled as a numeric variable. The first thing to pay attention is that the numeric format disregards zeroes on the right-hand side of the decimal point. This means that October of 1940 is coded as 1940.10 by IPEADATA and interpreted as 1940.1 by Stata. To recover the missing zero, the first step is to convert this variable to string format. This can be done with the string() function.

generate sdate=string(date)

To add back the missing zeroes, we can do the following:

replace sdate=sdate+”0″ if length(sdate)<7

Now, we just need to tell Stata to interpret sdate as a monthly date variable. This can be accomplished with the command numdate. This is not a standard Stata command and needs to be installed in your computer (ssc install numdate).

numdate mo newdate = sdate, pattern(YM)

The above line can be interpreted as create a new date variable named newdate from variable sdate that is in the YYYY.MM format.

The numdate ado file can deal with very flexible date specifications, and its help file is very comprehensive. Two other useful commands are convdate and extrdate. They are used to convert or extract parts of dates from variables that are already in the Stata date format.

A final recommendation is to take a look at Stata documentation on dates that is available at http://www.stata.com/manuals13/ddatetime.pdf.

A few tips for programming in Stata

Stata is a very powerful and useful statistical software. Just like any sophisticated tool, it takes time to learn about it. And you need to invest some time to master it. Programming is one of those skills that knowing a little bit can be very beneficial. Below you will find four videos. The first video goes over the functionalities of the Stata Program Editor. The second video covers some basics of Stata commands. The third video talks about loops, which are an essential tool for programmers. Finally, the fourth video is about macros, which together with loops are very useful to handle repetitive tasks.

How to use the Stata Program editor:

Basics of Stata:

Quick guide to loops:

More about macros:

Exporting Stata’s correlation tables to a document file

I came across a very useful ado file for Stata named asdoc that facilitates the creation of neat tables.

To install asdoc, just type ssc install asdoc .

Here is an example of exporting a correlation table to a document named table.doc.

sysuse auto

asdoc correl price mpg headroom trunk weight, save(table.doc) dec(3)

 

Note that dec(3) means to export the correlations with 3 decimal places.

Asdoc has tons of other applications. Its help file is very comprehensive. And you can have a glimpse of its capabilities in the following videos:

A nice tutorial for those interested in learning the basics of Python and its applications to Finance

The website called FinaceandPython has a very good tutorial on the basics of Python and its applications to Finance, Statistics, and Economics. This tutorial is organized in lessons that are carefully designed for a step-by-step learning experience. It also has several problem sets that will allow students to practice the concepts developed in the lessons. A key differential of this website are the examples and applications of Python coding to Finance problems. In sum, this is a fine tutorial for those interested in learning Python.

A good tutorial for learning the basics of Python for data analysis

I founds this interesting tutorial for Python. In my opinion, Python is a very simple and intuitive language, and at the same time it is very powerful. This link leads to a straightforward tutorial of Python that focuses on the basic knowledge needed to use Python for data analysis.

Importing data into R

Transferring data from one file format to another is almost as fun as working on concordance tables for the different versions of the HS codes. That said, you can find several packages in R that allow for file conversions. The link below talks about the rio package and provides a very useful summary of its capabilities.
R-project rio package

In the event you have to convert a long list of files, perhaps the following link may be of help:
Convert several txt files to csv using R.