Category Archives: Statistics

A Suggestion for Teaching Econometrics and R Together

Besides choosing a useful textbook, an interesting aspect of teaching Applied Econometrics has been the choice of the software package. As a student, I had a hard time dealing with textbooks that were not integrated with the software used in homework problems. The issue was that I had to spend a lot of time finding other sources to learn more about the software. In hindsight, I think it was very inefficient. Anyway, I came across a very nice textbook of Applied Econometrics that is well integrated with R, and it includes some data and examples. This textbook is called “Applied Econometrics with R” by Christian Kleiber and Achim Zeileis. It also has a companion website with slides, handouts, and some R code that I find extremely useful.

New evidence on WTO membership after the Uruguay Round: an analysis at the sectoral level.

Magnus dos Reis (Unisinos), André Filipe Zago de Azevedo (CNPq), and I have a forthcoming paper at Open Economies Review that examines the effects of WTO membership on trade flows, with a special focus on sectoral trade flows. The full paper is available upon request.

Abstract

The creation of the World Trade Organization in 1995 brought several changes to the world trade system, including more stringent accession commitments, separate agreements for agricultural products and for textiles and garment. This study examines the effects of WTO membership on disaggregated sectoral trade flows and their extensive and intensive margins by means of a gravity model estimated by Poisson pseudo-maximum likelihood. We employ a panel dataset on bilateral imports for agriculture, textile, and manufacturing sectors for the 1995–2017 period. Our estimates suggest that WTO membership has succeeded in expanding trade flows for new members. Nevertheless, this growth occurred asymmetrically between developed and developing countries, and among the different types of products. In the period under review, developing countries benefited most from this WTO-promoted increase in world trade, in stark contrast to the findings of the extant literature for 1950–2000. The largest trade growth occurred in the agriculture sector, which is also at odds with earlier findings of growth in manufacturing products only. Furthermore, our results show that the increase in trade due to WTO liberalization took place exclusively in the extensive margin of trade, most of which also happened in the agricultural sector.

Data repositories

Open ended questions are usually difficult to answer, for instance where one can find data sources for economic indicators. Perhaps, the following list may suggest good places to start your quest for data.

Data Sources: A Compendium by the Econbrowser website

Data Sources at the American Economic Association website

Economic Data freely available online by the Economics Network

IMF Data by the International Monetary Fund

Public Use Data Archive by the NBER

OECD Statistics

Good luck in your data hunting!

Free book about basic statistics with examples in R

An interesting book for those that are starting to learn basic statistics for data analysis is Practical Statistics for Data Scientists by Peter Bruce and Andrew Bruce. It can be downloaded for free here. This book covers the basics of data exploratory analysis and frequentist statistical theory. Additionally, its final chapters cover classification, statistical machine learning, and unsupervised learning. I do like these last chapters as they are carefully written and easy to understand. The examples using R are very useful. In sum, this is a book worth reading for beginners. The last chapters can also be very helpful for those readers with more experience in data analysys.

An introductory book on econometrics using R

A major issue with introductory econometric textbooks is that they are either too theoretical or too practical. While the former does not motivate students, the latter provides training and not an education. It is very difficult to reach a compromise between these two types. Nevertheless, there is a book that was able to get a good balance between theory and applications using R. It is titled “Introduction to Econometrics with R” and written by Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer. Most important, the examples and applications presented in the book are carefully chosen to illustrate the theoretical aspects that are being discussed. For those that already know basic econometrics, this book is still useful to learn a lot about R.

A comprehensive road map to learn R

Learning a new software or even a new programming language is always an interesting journey. Most of the times the tutorials and books we find are never exactly what we need. A useful resource that I found is the Big Book of R. It is one of the most comprehensive repositories of tutorials and general information about R. You can find suggestions according to your needs or even according to your background, for instance for Journalism, Social Sciences, or Life Sciences. It has very good sections on Machine Learning and on R programming.

Nesting local macros in Stata

Local macros are a very useful feature of Stata. Here is a simple example.

local macro1 “Hello!”

local macro2 = “How are you?”

local macro3 2+2

local macro4 = 2+2

di ” `macro1'`macro2′ ”

di “Here’s some math: `macro3' = `macro4′ ”

A few comments are in order here. macro1′ tells Stata to replace this expression with the contents of the local macro1. Make sure to leave no spaces inside the `’. Also, note how macro3 and macro4 lead to different outcomes. You need to use the equal sign to make Stata to evaluate the expression, otherwise it will treat it like a string.

Now, Stata applies the “parentheses rule” when replacing local macros for their contents. That is, it first replaces the innermost local macro, then the second innermost local macro, and so on. Here is an example.

local macro5 “nest”

local nestmacro “Local macros can be nested!”

di ” “macro5’macro’ ”

Let’s add another layer to the nesting of the previous example and play with it a little bit more:

local macro5 “nest”

local macro6 “macro”

local macro7 5

local nestmacro “Local macros can be nested!”

di ” “macro`macro7”`macro6” ”

I guess you have a pretty good idea of how this works now. Good luck in your Stata coding.

An Introduction to Statistical Learning with Applications in Python

I came across this very interesting Github repository by Qiuping X., in which she posted the codes she prepared in Python for the book “An Introduction to Statistical Learning with Applications in R” by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. This is very useful for those that are learning Python and certainly facilitates the migration from R to Python too.

A little bit more about loops and macros in Stata

Loops and lists are important tools for Stata programmers. Although these tools may not be the most computationally efficient technique, they can be of great help during the early stages of code development. By the way, it is always good to keep in mind that a good programming practice is to always keep definitions local to the do-file you are coding whenever possible.

The first example is about using a counter. You can place it on the do-file editor and then run it.

local fruits apple banana pear
local counter 0
foreach d of local fruits{
local counter= `counter’+1
display “ `counter’ – `d’”
}

After running it, the output should be something like this:

1 – apple
2 – banana
3 – pear

Notice that I used the equal sign “=” to build the counter. In short, you have to use it. Just for fun, see what happens when you replace “local counter= `counter’+1” by “local counter `counter’+1”.

Now suppose we have an ordered list of produce 4-digit codes and we would like to display the PLU code next to each fruit name.

local fruits apple banana pear
local plucode 3009 4011 4414
/* The first step is to find out the number of fruits */
local fruitnumber: word count `fruits’
di “Fruit – PLU code”
/* The second step is to build a loop to pair each fruit with its PLU code */
forvalues j = 1/`fruitnumber’ {
/* The third step is to pick the j-th element of each list to have them paired */
local pair1: word `j’ of `fruits’
local pair2: word `j’ of `plucode’
display “`pair1′ – `pair2′”
}

Output:

. local fruits apple banana pear

. local plucode 3009 4011 4414

. /* The first step is to find out the number of fruits */
. local fruitnumber: word count `fruits’

. di “Fruit – PLU code”
Fruit – PLU code

.
. /* The second step is to build a loop to pair each fruit with its PLU code */
. forvalues j = 1/`fruitnumber’ {

/* The third step is to pick the j-th element of each list to have them paired */

local pair1: word `j’ of `fruits’
local pair2: word `j’ of `plucode’
display “`pair1′ – `pair2′”
}

apple – 3009
banana – 4011
pear – 4414
.
end of do-file

So, the “word count” function returns the number of elements of the macro. And the “word `j’ of `fruits’” returns the j-th element of the macro fruits. You can find more about these macro extended functions on Stata by typing “help extended_fcn”.

Statistical Learning using R

I recently came across this book titled “An Introduction to Statistical Learning, with Applications in R“.

It can be downloaded for free at the authors webpage, which also contain the R codes, data sets, errata, slides and videos for Statistical Learning MOOC, and other valuable information.

That said, I think this is a very useful book for those interested in Statistical Learning. It is very accessible to most people, since it does not require a strong mathematical background.

For those interested in gaining a deeper understanding of these topics, I strongly suggest the book “The Elements of Statistical Learning“, which is also available for download at no cost.

Stata tip: creating a local containing all (or almost all) variables of the data set

Locals containing a list of variables can be very useful when using Stata. A common need is a local containing all variables of a data set. This local can be created by means of the ds command.

Here is an example using the lifeexp.dta data file.

. webuse lifeexp, clear
(Life expectancy, 1998)

Now, let’s create a local named allvar that will contain all variables of this data set.

. ds
region country popgrowth lexp gnppc safewater

. local allvar `r(varlist)’

. di “`allvar'”
region country popgrowth lexp gnppc safewater

We can see that ds stored the variable list into r(varlist). One interesting variation is the creation of a local containing all variables except region. You will need to specify the variables to be escluded right after ds, and add the option not after a comma.

. ds region, not
country popgrowth lexp gnppc safewater

. local othervar `r(varlist)’

. di “`othervar'”
country popgrowth lexp gnppc safewater

The command ds has several other useful applications that will be commented later in this blog.

Some resources for using R with spatial data

Spatial data models and visualizations are important tools in the design of public policies. I provide below some links to tutorials about spatial modelling using R that I have found useful.

The first stop should be at the Spatial Data Science with R website. It contains a wealth of information organized in a straightforward way. My next suggestion is An Introduction to Spatial Econometrics in R website. A different version of the previous website is also available at R-Bloggers: An Introduction to Spatial Econometrics in R. There is also a tutorial by Luc Anselin: An Introduction to Spatial Regression Analysis in R.

Finally, there is a video made by econometricsacademy about Spatial Econometrics in R.

Transferring IPEADATA series to Stata

A common issue that arises when converting time series data from IPEADATA to Stata format is dealing appropriately with the time variable. For instance, for monthly series the date format will be YYYY.MM. Stata usually interprets this format as numeric.

Suppose you already downloaded a monthly series from IPEADATA and transferred it to Stata. It is very likely that the date variable (let’s call it date) has been automatically handled as a numeric variable. The first thing to pay attention is that the numeric format disregards zeroes on the right-hand side of the decimal point. This means that October of 1940 is coded as 1940.10 by IPEADATA and interpreted as 1940.1 by Stata. To recover the missing zero, the first step is to convert this variable to string format. This can be done with the string() function.

generate sdate=string(date)

To add back the missing zeroes, we can do the following:

replace sdate=sdate+”0″ if length(sdate)<7

Now, we just need to tell Stata to interpret sdate as a monthly date variable. This can be accomplished with the command numdate. This is not a standard Stata command and needs to be installed in your computer (ssc install numdate).

numdate mo newdate = sdate, pattern(YM)

The above line can be interpreted as create a new date variable named newdate from variable sdate that is in the YYYY.MM format.

The numdate ado file can deal with very flexible date specifications, and its help file is very comprehensive. Two other useful commands are convdate and extrdate. They are used to convert or extract parts of dates from variables that are already in the Stata date format.

A final recommendation is to take a look at Stata documentation on dates that is available at http://www.stata.com/manuals13/ddatetime.pdf.

A few tips for programming in Stata

Stata is a very powerful and useful statistical software. Just like any sophisticated tool, it takes time to learn about it. And you need to invest some time to master it. Programming is one of those skills that knowing a little bit can be very beneficial. Below you will find four videos. The first video goes over the functionalities of the Stata Program Editor. The second video covers some basics of Stata commands. The third video talks about loops, which are an essential tool for programmers. Finally, the fourth video is about macros, which together with loops are very useful to handle repetitive tasks.

How to use the Stata Program editor:

Basics of Stata:

Quick guide to loops:

More about macros:

Exporting Stata’s correlation tables to a document file

I came across a very useful ado file for Stata named asdoc that facilitates the creation of neat tables.

To install asdoc, just type ssc install asdoc .

Here is an example of exporting a correlation table to a document named table.doc.

sysuse auto

asdoc correl price mpg headroom trunk weight, save(table.doc) dec(3)

Note that dec(3) means to export the correlations with 3 decimal places.

Asdoc has tons of other applications. Its help file is very comprehensive. And you can have a glimpse of its capabilities in the following videos:

A nice tutorial for those interested in learning the basics of Python and its applications to Finance

The website called FinaceandPython has a very good tutorial on the basics of Python and its applications to Finance, Statistics, and Economics. This tutorial is organized in lessons that are carefully designed for a step-by-step learning experience. It also has several problem sets that will allow students to practice the concepts developed in the lessons. A key differential of this website are the examples and applications of Python coding to Finance problems. In sum, this is a fine tutorial for those interested in learning Python.