Category Archives: Data Science

New evidence on WTO membership after the Uruguay Round: an analysis at the sectoral level.

Magnus dos Reis (Unisinos), André Filipe Zago de Azevedo (CNPq), and I have a forthcoming paper at Open Economies Review that examines the effects of WTO membership on trade flows, with a special focus on sectoral trade flows. The full paper is available upon request.

Abstract

The creation of the World Trade Organization in 1995 brought several changes to the world trade system, including more stringent accession commitments, separate agreements for agricultural products and for textiles and garment. This study examines the effects of WTO membership on disaggregated sectoral trade flows and their extensive and intensive margins by means of a gravity model estimated by Poisson pseudo-maximum likelihood. We employ a panel dataset on bilateral imports for agriculture, textile, and manufacturing sectors for the 1995–2017 period. Our estimates suggest that WTO membership has succeeded in expanding trade flows for new members. Nevertheless, this growth occurred asymmetrically between developed and developing countries, and among the different types of products. In the period under review, developing countries benefited most from this WTO-promoted increase in world trade, in stark contrast to the findings of the extant literature for 1950–2000. The largest trade growth occurred in the agriculture sector, which is also at odds with earlier findings of growth in manufacturing products only. Furthermore, our results show that the increase in trade due to WTO liberalization took place exclusively in the extensive margin of trade, most of which also happened in the agricultural sector.

Data repositories

Open ended questions are usually difficult to answer, for instance where one can find data sources for economic indicators. Perhaps, the following list may suggest good places to start your quest for data.

Data Sources: A Compendium by the Econbrowser website

Data Sources at the American Economic Association website

Economic Data freely available online by the Economics Network

IMF Data by the International Monetary Fund

Public Use Data Archive by the NBER

OECD Statistics

Good luck in your data hunting!

Cheat Sheets for Stata

Cheat sheets for programming languages were commonplace before the internet became widely available with powerful search engines. Nowadays, I believe that cheat sheets are still very useful because search engines become simply too powerful and provide more answers than required by the question. Some of the most popular cheat sheets for Stata were prepared by Tim Essam and Laura Hughes from the US Agency for International Trade and Development. Follow them on twitter: @StataRGIS and @flaneuseks. Here are some of their cheat sheets:

Commands for Data Analysis

Programming

Commands and functions for data processing

Commands and functions for data transformation

Learning about Forecasting (and using R)

As I was designing an introductory course in forecasting for International Business students I came across a very interesting book available for download in the web. The book is “Forecasting: Principles and Practice” by Rob J Hyndman and George Athanasopoulos. Among several nice features, I would stress that this book is for (undergraduate and MBA) business students with little formal training in statistics and it presents plenty of examples using R. The third edition uses the tsibble and fable packages, while the second edition uses the forecast package. I really like chapters 2, 3, and 4 where the authors go over exploratory analysis of time series.

Free book about basic statistics with examples in R

An interesting book for those that are starting to learn basic statistics for data analysis is Practical Statistics for Data Scientists by Peter Bruce and Andrew Bruce. It can be downloaded for free here. This book covers the basics of data exploratory analysis and frequentist statistical theory. Additionally, its final chapters cover classification, statistical machine learning, and unsupervised learning. I do like these last chapters as they are carefully written and easy to understand. The examples using R are very useful. In sum, this is a book worth reading for beginners. The last chapters can also be very helpful for those readers with more experience in data analysys.

Learning the basics of Unix

Teaching computer skills to students has always been a challenge in Economics. The first difficulty is the limited amount of time for them to learn the basics of any software. A second difficulty is when the software is only available for Unix. Recently, I found a very simple yet comprehensive tutorial for the basics of Unix. Most of my students that used it were able to quickly gain some proficiency and write better code, especially when using bash.

An introductory book on econometrics using R

A major issue with introductory econometric textbooks is that they are either too theoretical or too practical. While the former does not motivate students, the latter provides training and not an education. It is very difficult to reach a compromise between these two types. Nevertheless, there is a book that was able to get a good balance between theory and applications using R. It is titled “Introduction to Econometrics with R” and written by Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer. Most important, the examples and applications presented in the book are carefully chosen to illustrate the theoretical aspects that are being discussed. For those that already know basic econometrics, this book is still useful to learn a lot about R.

A comprehensive road map to learn R

Learning a new software or even a new programming language is always an interesting journey. Most of the times the tutorials and books we find are never exactly what we need. A useful resource that I found is the Big Book of R. It is one of the most comprehensive repositories of tutorials and general information about R. You can find suggestions according to your needs or even according to your background, for instance for Journalism, Social Sciences, or Life Sciences. It has very good sections on Machine Learning and on R programming.

Nesting local macros in Stata

Local macros are a very useful feature of Stata. Here is a simple example.

local macro1 “Hello!”

local macro2 = “How are you?”

local macro3 2+2

local macro4 = 2+2

di ” `macro1'`macro2′ ”

di “Here’s some math: `macro3' = `macro4′ ”

A few comments are in order here. macro1′ tells Stata to replace this expression with the contents of the local macro1. Make sure to leave no spaces inside the `’. Also, note how macro3 and macro4 lead to different outcomes. You need to use the equal sign to make Stata to evaluate the expression, otherwise it will treat it like a string.

Now, Stata applies the “parentheses rule” when replacing local macros for their contents. That is, it first replaces the innermost local macro, then the second innermost local macro, and so on. Here is an example.

local macro5 “nest”

local nestmacro “Local macros can be nested!”

di ” “macro5’macro’ ”

Let’s add another layer to the nesting of the previous example and play with it a little bit more:

local macro5 “nest”

local macro6 “macro”

local macro7 5

local nestmacro “Local macros can be nested!”

di ” “macro`macro7”`macro6” ”

I guess you have a pretty good idea of how this works now. Good luck in your Stata coding.

An Introduction to Statistical Learning with Applications in Python

I came across this very interesting Github repository by Qiuping X., in which she posted the codes she prepared in Python for the book “An Introduction to Statistical Learning with Applications in R” by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. This is very useful for those that are learning Python and certainly facilitates the migration from R to Python too.

A little bit more about loops and macros in Stata

Loops and lists are important tools for Stata programmers. Although these tools may not be the most computationally efficient technique, they can be of great help during the early stages of code development. By the way, it is always good to keep in mind that a good programming practice is to always keep definitions local to the do-file you are coding whenever possible.

The first example is about using a counter. You can place it on the do-file editor and then run it.

local fruits apple banana pear
local counter 0
foreach d of local fruits{
local counter= `counter’+1
display “ `counter’ – `d’”
}

After running it, the output should be something like this:

1 – apple
2 – banana
3 – pear

Notice that I used the equal sign “=” to build the counter. In short, you have to use it. Just for fun, see what happens when you replace “local counter= `counter’+1” by “local counter `counter’+1”.

Now suppose we have an ordered list of produce 4-digit codes and we would like to display the PLU code next to each fruit name.

local fruits apple banana pear
local plucode 3009 4011 4414
/* The first step is to find out the number of fruits */
local fruitnumber: word count `fruits’
di “Fruit – PLU code”
/* The second step is to build a loop to pair each fruit with its PLU code */
forvalues j = 1/`fruitnumber’ {
/* The third step is to pick the j-th element of each list to have them paired */
local pair1: word `j’ of `fruits’
local pair2: word `j’ of `plucode’
display “`pair1′ – `pair2′”
}

Output:

. local fruits apple banana pear

. local plucode 3009 4011 4414

. /* The first step is to find out the number of fruits */
. local fruitnumber: word count `fruits’

. di “Fruit – PLU code”
Fruit – PLU code

.
. /* The second step is to build a loop to pair each fruit with its PLU code */
. forvalues j = 1/`fruitnumber’ {

/* The third step is to pick the j-th element of each list to have them paired */

local pair1: word `j’ of `fruits’
local pair2: word `j’ of `plucode’
display “`pair1′ – `pair2′”
}

apple – 3009
banana – 4011
pear – 4414
.
end of do-file

So, the “word count” function returns the number of elements of the macro. And the “word `j’ of `fruits’” returns the j-th element of the macro fruits. You can find more about these macro extended functions on Stata by typing “help extended_fcn”.

Statistical Learning using R

I recently came across this book titled “An Introduction to Statistical Learning, with Applications in R“.

It can be downloaded for free at the authors webpage, which also contain the R codes, data sets, errata, slides and videos for Statistical Learning MOOC, and other valuable information.

That said, I think this is a very useful book for those interested in Statistical Learning. It is very accessible to most people, since it does not require a strong mathematical background.

For those interested in gaining a deeper understanding of these topics, I strongly suggest the book “The Elements of Statistical Learning“, which is also available for download at no cost.

Stata tip: creating a local containing all (or almost all) variables of the data set

Locals containing a list of variables can be very useful when using Stata. A common need is a local containing all variables of a data set. This local can be created by means of the ds command.

Here is an example using the lifeexp.dta data file.

. webuse lifeexp, clear
(Life expectancy, 1998)

Now, let’s create a local named allvar that will contain all variables of this data set.

. ds
region country popgrowth lexp gnppc safewater

. local allvar `r(varlist)’

. di “`allvar'”
region country popgrowth lexp gnppc safewater

We can see that ds stored the variable list into r(varlist). One interesting variation is the creation of a local containing all variables except region. You will need to specify the variables to be escluded right after ds, and add the option not after a comma.

. ds region, not
country popgrowth lexp gnppc safewater

. local othervar `r(varlist)’

. di “`othervar'”
country popgrowth lexp gnppc safewater

The command ds has several other useful applications that will be commented later in this blog.

Some resources for using R with spatial data

Spatial data models and visualizations are important tools in the design of public policies. I provide below some links to tutorials about spatial modelling using R that I have found useful.

The first stop should be at the Spatial Data Science with R website. It contains a wealth of information organized in a straightforward way. My next suggestion is An Introduction to Spatial Econometrics in R website. A different version of the previous website is also available at R-Bloggers: An Introduction to Spatial Econometrics in R. There is also a tutorial by Luc Anselin: An Introduction to Spatial Regression Analysis in R.

Finally, there is a video made by econometricsacademy about Spatial Econometrics in R.

Transferring IPEADATA series to Stata

A common issue that arises when converting time series data from IPEADATA to Stata format is dealing appropriately with the time variable. For instance, for monthly series the date format will be YYYY.MM. Stata usually interprets this format as numeric.

Suppose you already downloaded a monthly series from IPEADATA and transferred it to Stata. It is very likely that the date variable (let’s call it date) has been automatically handled as a numeric variable. The first thing to pay attention is that the numeric format disregards zeroes on the right-hand side of the decimal point. This means that October of 1940 is coded as 1940.10 by IPEADATA and interpreted as 1940.1 by Stata. To recover the missing zero, the first step is to convert this variable to string format. This can be done with the string() function.

generate sdate=string(date)

To add back the missing zeroes, we can do the following:

replace sdate=sdate+”0″ if length(sdate)<7

Now, we just need to tell Stata to interpret sdate as a monthly date variable. This can be accomplished with the command numdate. This is not a standard Stata command and needs to be installed in your computer (ssc install numdate).

numdate mo newdate = sdate, pattern(YM)

The above line can be interpreted as create a new date variable named newdate from variable sdate that is in the YYYY.MM format.

The numdate ado file can deal with very flexible date specifications, and its help file is very comprehensive. Two other useful commands are convdate and extrdate. They are used to convert or extract parts of dates from variables that are already in the Stata date format.

A final recommendation is to take a look at Stata documentation on dates that is available at http://www.stata.com/manuals13/ddatetime.pdf.

A few tips for programming in Stata

Stata is a very powerful and useful statistical software. Just like any sophisticated tool, it takes time to learn about it. And you need to invest some time to master it. Programming is one of those skills that knowing a little bit can be very beneficial. Below you will find four videos. The first video goes over the functionalities of the Stata Program Editor. The second video covers some basics of Stata commands. The third video talks about loops, which are an essential tool for programmers. Finally, the fourth video is about macros, which together with loops are very useful to handle repetitive tasks.

How to use the Stata Program editor:

Basics of Stata:

Quick guide to loops:

More about macros:

How to plot curves with different domains in Mathematica

Mathematica (or the Wolfram language) is a very useful tool to plot functions. Suppose you are interested in plotting two curves in the same diagram, let’s say y = 1/x and z = 2/x. Both functions have the same domain of x belonging to the [0,10] interval. This is easily accomplished by typing:

Plot[{1/x, 2/x}, {x, 0, 10}]

Suppose now that you would like to plot in the same diagram both functions but the domain of z change to x belonging to the [2,12] interval. How could you do this? This is not difficult. You just need to use the command show with the option PlotRange->All. Here is a piece of code that accomplish this:

p1 = Plot[{1/x}, {x, 0, 10}]

p2 = Plot[{2/x}, {x, 2, 11}]

Show[p1, p2, PlotRange->All]

Exporting Stata’s correlation tables to a document file

I came across a very useful ado file for Stata named asdoc that facilitates the creation of neat tables.

To install asdoc, just type ssc install asdoc .

Here is an example of exporting a correlation table to a document named table.doc.

sysuse auto

asdoc correl price mpg headroom trunk weight, save(table.doc) dec(3)

Note that dec(3) means to export the correlations with 3 decimal places.

Asdoc has tons of other applications. Its help file is very comprehensive. And you can have a glimpse of its capabilities in the following videos:

A nice tutorial for those interested in learning the basics of Python and its applications to Finance

The website called FinaceandPython has a very good tutorial on the basics of Python and its applications to Finance, Statistics, and Economics. This tutorial is organized in lessons that are carefully designed for a step-by-step learning experience. It also has several problem sets that will allow students to practice the concepts developed in the lessons. A key differential of this website are the examples and applications of Python coding to Finance problems. In sum, this is a fine tutorial for those interested in learning Python.

Reading DATASUS and ANS .dbc files using R

Those interested in conducting research using Brazilian data often come across some unusual file formats used by Brazilian government agencies. One example is the .dbc files used for health data produced by DATASUS and by ANS. These .dbc files are not related to the.dbf files used by FoxPro for instance. In fact, they are just compressed .dbf files.

There is a package in R that can convert these .dbc files into regular .dbf files. This package is named read.dbc. Once you generated the .dbf files, you can use the package foreign and its function read.dbf to import the data into R. This very same package allow you to save the data using Stata .dta format by means of the function write.dta.

The package maptools also has a function to read dbf files named dbf.read, though I have not tried it yet.

Machine Learning using Matlab

Economists in general do not consider Matlab their preferred choice of software. In part this is due to Matlab’s peculiar licencing scheme for its toolboxes. That said, for some applications, it is still a very useful piece of software.

Those interested in Machine Learning would certainly be interested in this new Matlab’s e-book. It is a basic tutorial on ML using Matlab. To download this e-book you need to go to this website and fill out a form, which will probably reduce your interest in using Matlab.

Book on Linear Algebra with applications in Julia

I came across this very interesting book titled “Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares” by

Stephen Boyd and Lieven Vandenberghe and published by Cambridge University Press.

On the book’s website you can find a pdf file of the book for download. You can also purchase it at a (virtual) bookstore.

Additionally, you can find there more exercises and a Julia language companion.

I think this is a great book and the companion materials are very helpful. I strongly recommend those interested in learning Julia to go over the companion material.

Tutorials for learning Julia

The Julia Language webpage contains a page with several tutorials.

I recommend the following video tutorial:

Julia: a powerful computer language

Julia is a programming language that should be of interest for those dealing with big data. Many observers claim that it was developed with the primary goal of high performance. And it is free for everybody to use. You can find its source code on GitHub.

You can download Julia and find its official documentation at the Julia Programming Language webpage.

Tutorials for learning Tableau

Some students asked me about a tutorial to learn the basics of Tableau.

Although there are many tutorials available, I found the following videos to be the most straightforward ones.

More specifically, for those wanting to learn how to create a dashboard in Tableau, I recommend the following tutorial: