APHEO - 10 Stata Resources

To advance and promote the discipline and professional practice of epidemiology in Ontario public health units

Please click here to visit our new website

10 Stata Resources

Introductory Books on Stata

Websites on Stata

Handy Stata Syntax

Introductory Books on Stata

A Gentle Introduction to Stata, Revised Third Edition (2012, 401 pages)
Author: Alan C. Acock
ISBN: 978-59718-109-9
To order: http://www.stata.com/bookstore/gentle-introduction-to-stata/

Comment from the Stata technical group (verbatim from their website, linked)
Alan C. Acock’s A Gentle Introduction to Stata, Revised Third Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users not only will be able to use Stata well but also will learn new aspects of Stata easily.
Acock assumes that the user is not familiar with any statistical software. This assumption of a blank slate is central to the structure and contents of the book. Acock starts with the basics; for example, the portion of the book that deals with data management begins with a careful and detailed example of turning survey data on paper into a Stata-ready dataset on the computer. When explaining how to go about basic exploratory statistical procedures, Acock includes notes that will help the reader develop good work habits. This mixture of explaining good Stata habits and good statistical habits continues throughout the book.
Acock is quite careful to teach the reader all aspects of using Stata. He covers data management, good work habits (including the use of basic do-files), basic exploratory statistics (including graphical displays), and analyses using the standard array of basic statistical tools (correlation, linear and logistic regression, and parametric and nonparametric tests of location and dispersion). Acock teaches Stata commands by using the menus and dialog boxes while still stressing the value of do-files. In this way, he ensures that all types of users can build good work habits. Each chapter has exercises that the motivated reader can use to reinforce the material.
The tone of the book is friendly and conversational without ever being glib or condescending. Important asides and notes about terminology are set off in boxes, which makes the text easy to read without any convoluted twists or forward-referencing. Rather than splitting topics by their Stata implementation, Acock arranges the topics as they would appear in a basic statistics textbook; graphics and postestimation are woven into the material in a natural fashion. Real datasets, such as the General Social Surveys from 2002 and 2006, are used throughout the book.
The focus of the book is especially helpful for those in psychology and the social sciences, because the presentation of basic statistical modeling is supplemented with discussions of effect sizes and standardized coefficients. Various selection criteria, such as semipartial correlations, are discussed for model selection.
The revised third edition of the book has been updated to reflect the new features available in Stata 12 and Stata 11. The ANOVA chapter has been revised to incorporate the pwmeans command, to do mean comparisons, and the marginsplot command, which simplifies the construction of graphs showing interaction effects. Menus and screenshots have also been updated. As in the third edition, an entire chapter is devoted to the analysis of missing data and the use of multiple-imputation methods. Factor-variable notation is introduced as an alternative to the manual creation of interaction terms. The new Variables Manager and revamped Data Editor are featured in the discussion of data management.

A Short Introduction to Stata for Biostatistics (Updated to Stata 12) (2012, 181 pages)
Author: Michael Hilts
ISBN: 978-0-9571708-0-3
To order: http://stata.com/bookstore/short-intro-stata-biostatistics/

Comment from the Stata technical group (verbatim from their website, linked)
A Short Introduction to Stata for Biostatistics bridges the information gap between Stata's Getting Started manual and Reference manuals by providing a more detailed introduction to the most often used analytic methods in biomedical research. Although the book is written specifically for biostatisticians, epidemiologists, and health professionals new to Stata, it is also useful for more-experienced users wanting more in-depth knowledge of both Stata commands and biostatistical issues. The book is hands on, intended to be used while working with Stata, and includes a CD-ROM containing the datasets and several author-written programs.
The first four chapters provide an overview of data entry and management commands, including those used to create, label, and drop variables and those used to sort observations. The next two chapters cover graphics. Then comes the bulk of the book, which details methods used in data description and analysis. Beginning with commands used to create frequency tables and summary statistics, the authors proceed to describe commands used for univariate and multivariate analyses, including linear regression, Poisson regression, logistic regression, survival data analysis (proportional hazards models and competing-risks models), and meta analysis. Included among the final chapters is a useful tutorial on report generation.

A Visual Guide to Stata Graphics, Third Edition (2012, 499 pages)
Author: Michael N. Mitchell
ISBN: 978-1-59718-106-8
To order: http://www.stata.com/bookstore/visual-guide-to-stata-graphics/

Comment from the Stata technical group (verbatim from their website, linked)

In its third edition, Michael Mitchell’s A Visual Guide to Stata Graphics remains the essential introduction and reference for Stata graphics. The third edition retains all the features that made the first two editions so useful:
A complete guide to Stata’s graph command and Graph Editor
Exhaustive examples of customized graphs using both command options and the Graph Editor
Visual indexing of features—just look for a picture that matches what you want to do
New in this edition are treatments of contour plots, margins plots, and font handling. Mitchell dedicates a new subsection to contour plots, showing you how to control the number of levels, how to change the colors used, and how to produce effective legends. Over 30 graphs are used to demonstrate what you can accomplish with the new marginsplot command—graphs of estimated means and marginal means (with confidence intervals), interaction graphs, comparisons of groups, and more. Mitchell also adds a section that shows you how to get bold text, italic text, subscripts, superscripts, and Greek letters into your titles, axes, labels, and other text.
The book retains its visual style, presenting the reader with a color-coded, visual table of contents that runs along the right edge of every page and shows readers exactly where they are in the book. You can see the color-coded chapter tabs without opening the book, providing quick visual access to each chapter.
The heart of each chapter is a series of entries that are typically formatted three to a page. Each entry shows a graph command (with the emphasized portion of the command highlighted in red), the resulting graph, a description of what is being done, the dataset and scheme used, and a section showing how to produce the result by using the Graph Editor. Because every feature, option, and edit is demonstrated with a graph or screen capture, you can often flip through a section of the book to find exactly the effect you are seeking.
The first chapter details how to use the book, the types of Stata graphs, how to use schemes to control the overall appearance of graphs, and how to use options to make specific modifications. It also outlines a process for building graphs with the graph command.
The second chapter is a complete overview of the Graph Editor. It includes over 120 color graphics and screen captures to show exactly how things are done and how they look on the graph. With pictures and words, Mitchell shows how to change the color, size, or placement of any titles, markers, annotations, or other objects on your graph by using just a few mouse clicks. More subtly, he shows how to change things such as the number of ticks and labels on your axes, the number of columns in your legends, the label on an individual point, and more. He even shows how to convert, for example, a scatterplot to a line plot and how to rotate or pivot bar charts. Mitchell also covers advanced topics such as how to draw lines and arrows on graphs so that they continue to reference your objects of interest even if you resize the graph, combine it with other graphs, or change the scale or range of the axes. In short, he exposes all the Graph Editor’s tools, from the simplest to the most powerful. Mitchell does not stop there; almost every example in the book shows you how to accomplish the desired graph or effect not only by using a command or command-line option but also by using the Graph Editor.
Of the Graph Editor, Mitchell writes,
[...] You need to use the Graph Editor for only a short amount of time to see what a smart and powerful tool it is. Whereas commands offer the power of repeatability, the Graph Editor provides a nimble interface that permits you to tangibly modify graphs like a potter directly handling clay.
In the third chapter, Mitchell discusses twoway graphs such as scatterplots, line plots, area plots, bar plots, range plots, contour plots, regression fits, and smooths. He shows how to create each of these types of graphs and how to use options (and the Graph Editor) to control how the graph looks. He also introduces graphing across groups of data, and options for adding and controlling titles, notes, legends, and so forth. Beyond the basics, he shows how to easily overlay plots to obtain graphs such as regression fits with error contours and observed data scatters, local polynomial smooths with scatters of their underlying data, stock market–style graphs of open and closed values with quantities traded as a bar chart at the bottom, histograms with density smooths, and more. Because Stata’s graph command will let you customize any aspect of the graph, Mitchell spends ample time showing you the most valuable options for obtaining the look you want. If you are in a hurry to discover one special option, you can skim the chapter until you see the effect you want, and then glance at the command to see what is highlighted in red.
In the succeeding five chapters, Mitchell covers scatterplot matrices, bar graphs, box plots, dot plots, and pie charts. As with twoway graphs, he shows you how to create each of these graphs and how to adjust every aspect of the graph to your taste (or to a publisher’s required form).
In chapters 9 and 10, Mitchell undertakes an in-depth presentation of the options that are available across almost all graph types—options that add and change the look of titles, notes, and such; control the number of ticks on axes; control the content and appearance of the numbers and labels on axes; control legends; add and change the look of annotations; graph over subgroups; change the look of markers and their labels; apply schemes to control the look of the graph; change the look of graph regions; size graphs and their elements; and more. Again, he shows how to make these changes both by using options and by using the Graph Editor.
To complete the graphical journey, Mitchell discusses and demonstrates the 12 styles that unite and control the appearance of the myriad graph objects. These styles are angles, colors, clock positions, compass directions, connecting points, line patterns, line widths, margins, marker sizes, orientations, marker symbols, and text sizes.
That completes the main body of the Visual Guide, but don’t skip the appendix. There, Mitchell first gives a quick overview of the dozens of statistical graph commands that are not strictly the subject of the book. Even so, these commands use the graph command as an engine to draw their graphs, and therefore almost all that Mitchell has discussed applies to them. To make this clear, he shows explicitly how to apply common options and common Graph Editor tools to statistical graphs. Then, Mitchell takes you on a tour of the new marginsplot command. After that, he addresses combining graphs—showing you how to create complex and multipart images from previously created graphs.
In a crucial section entitled “Putting it all together”, Mitchell shows us how to do just that. We learn more about overlaying twoway plots, and we learn how to combine data management and graphics to create plots such as bar charts of rates with capped confidence intervals, scatterplots with range-finder confidence intervals in both dimensions, and population pyramids.
Mitchell then warns us about mistakes that can be made when typing graph commands and how to correct them. In the appendix, he even show us how to create our own scheme files. Scheme files allow you to control every aspect of how your graphs look without having to specify options. They are the answer to department or journal standards or if you just want all your graphs to have a common appearance that is not one of the schemes shipped with Stata. As with the rest of the book, this section includes cross-references to the Stata Graphics Reference Manual to provide more depth on the subject. Finally, Mitchell reviews all datasets, schemes, and other online supplements available for the book.
The third edition of A Visual Guide to Stata Graphics is a complete guide to Stata’s graph command and the associated Graph Editor. Whether you want to tame the Stata graph command, quickly find out how to produce a graphical effect, master the Stata Graph Editor, or learn approaches that can be used to construct custom graphs, this is the book to read.

Applied Survey Data Analysis (2010, 462 pages)
Author: Steve G. Heeringa
ISBN: 978-1-4100-8066-7
To order: http://www.stata.com/bookstore/applied-survey-data-analysis/index.html

This is an excellent resource on the analysis of complex survey data, including the theory behind it, and has many useful examples of code for Stata and other programs, especially on the accompanying website.
Gets at the 'high end' philosophical issues like 'super population'.
Comment from the Stata technical group (verbatim from their website, linked)

Applied Survey Data Analysis is an intermediate-level, example-driven treatment of current methods for complex survey data. It will appeal to researchers of all disciplines who work with survey data and have basic knowledge of applied statistical methodology for standard (nonsurvey) data.
The authors begin with some history and by discussing some widely used survey datasets, such as the National Health and Nutrition Examination Survey (NHANES). They then follow with the basic concepts of survey data: sampling plans, weights, clustering, prestratification and poststratification, design effects, and multistage samples. Discussion then turns to the types of variance estimators: Taylor linearization, jackknife, bootstrap, and balanced and repeated replication.
The middle sections of the text provide in-depth coverage of the types of analyses that can be performed with survey data, including means and proportions, correlations, tables, linear regression, regression with limited dependent variables (including logit and Poisson), and survival analysis (including Cox regression). Two final chapters are devoted to advanced topics, such as multiple imputation, Bayesian analysis, and multilevel models. The appendix provides overviews of popular statistical software, including Stata.

Data Management Using Stata: A Practical Handbook (2010, 387 pages)
Author: Michael N. Mitchell
ISBN: 978-1-59718-076-4
To order: http://www.stata.com/bookstore/data-management-using-stata/index.html

Comment from the Stata technical group (verbatim from their website, linked)
Michael N. Mitchell’s Data Management Using Stata comprehensively covers data-management tasks, from those a beginning statistician would need to those hard-to-verbalize tasks that can confound an experienced user. Mitchell does this all in simple language with illustrative examples.
The book is modular in structure, with modules based on data-management tasks rather than on clusters of commands. This format is helpful because it allows readers to find and read just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.
Throughout the book, Mitchell subtly emphasizes the absolute necessity of reproducibility and an audit trail. Instead of stressing programming esoterica, Mitchell reinforces simple habits and points out the time-savings gained by being careful. Mitchell’s experience in UCLA’s Academic Technology Services clearly drives much of his advice.
Mitchell includes advice for those who would like to learn to write their own data-management Stata commands. Even experienced users will learn new tricks and new ways to approach data-management problems.
This is a great book—thoroughly recommended for anyone interested in data management using Stata.

Statistics with Stata (2009, 504 pages)
Author: Lawrence Hamilton
ISBN: 978-0-495-55786-9
To order: http://www.stata.com/bookstore/statistics-with-stata/

Comment from the Stata technical group (verbatim from their website, linked)

Statistics with Stata (Updated for Version 10) is the latest edition in Professor Lawrence C. Hamilton’s popular Statistics with Stata series. Intended to bridge the gap between statistical texts and Stata’s own documentation, Statistics with Stata demonstrates how to use Stata to perform a variety of tasks. This text is ideal as a self-study course for those new to statistics or those migrating from other statistical software to Stata and as a valuable reference for experienced Stata users wishing to explore Stata’s capabilities in fields new to them.
Hamilton covers topics including getting started in Stata, data manipulation, graphics, summary statistics and tables, ANOVA, linear regression (and diagnostics), curve fitting, robust methods, regression models for limited dependent variables, panel (longitudinal) data and mixed models, survey data, survival analysis, factor analysis, cluster analysis, time series, and an introduction to programming.
Notable changes to Statistics with Stata (Updated for Version 10) include a new chapter on survey data analysis using Stata’s svy: prefix command and a chapter on the multilevel and mixed model commands introduced in Stata 10. Chapter 3, covering graphics, has been updated to include a section demonstrating Stata’s Graph Editor. The entire book has also been updated to reflect changes in output, syntax, and features.

Websites on Stata

Introduction to Stata
Authors: Phil Bardsley & Dan Blanchette, Carolina Population Center of the University of North Carolina at Chapel Hill
URL: http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/index.html

An introduction to Stata with many task examples.

Notepad+ + Stata, a better do file editor
Author: Howard Chang
URL: http://opensourceeconomics.wordpress.com/2009/10/13/notepad-and-stata-a-better-do-file-editor/

Stata
Author: Princeton Data and Statistical Services
URL: http://dss.princeton.edu/online_help/stats_packages/stata/

Presents modules such as working with time series data, similar to the UCLA site (below).

Stata Programming Essentials
Author: Social Science Computing Cooperative, University of Madison, Wisconsin
URL: http://www.ssc.wisc.edu/sscc/pubs/stata_prog1.htm

Detailed programming examples in Stata.

Stata Tutorial
Author: Germán Rodríguez, Princeton University
URL: http://data.princeton.edu/stata/default.html

Comprehensive resource including an introduction to Stata, and information about data management, graphing and programming.

Statalist Archives
Author: Stata
URL: http://www.stata.com/statalist/archive/

This is the searchable statalist archive. If you register, read the FAQ about posting questions to the list.

UCLA Academic Technology Services: Statistical Computing
Author: UCLA Academic Technology Services
URL: http://www.ats.ucla.edu/stat/

A collection of web-based statistics tutorials, with associated examples, for a variety of statistical packages, including Stata.
Includes also web books about each of the major packages, including annotated output and detailed examples of how to do several analyses from published papers, linking the analysis work with the finished product.
These tutorials, examples, output and broken-down analyses are available for Stata users, SAS users and SPSS users.
This particular page: http://www.ats.ucla.edu/stat/stata/faq/spss_command_to_stata.htm helps guide you to commands in Stata compared to SPSS commands.

Handy Stata Syntax

Weighting RRFSS Data
Once you have your data trimmed down to the waves you require, you can run the following to get appropriate weights:

. sum nadult

. gen weights=nadult/`r(sum)'*_N

Dropping Empty Variables : Submitted by Cam McDermaid Aug 22, 2014

Place the ado file linked below into the directory where your personal ado's live. To find the directory, enter sysdir into the Stata command window. The directory flagged as personal is probably the best choice for where to put the ado file.

With an open file, enter dropempty into the command window. The command will drop any variable for which all values are missing.

Link to dropempty.do

** There is also a 'dropmiss' command that Nick Cox wrote that you can install through Stata. It can be used to drop entire empty variables, or to drop observations. Simply typing "dropmiss all" removes any offending empty variables from your file.
- Virginia McFarland, Nov 28, 2014

Last Edited Date:	Last Edited By:	Changes Made:
September 6, 2013	Virginia McFarland	Replaced weight syntax with solution provided by Cam McDermaid via APHEOLIST 6 September 2013

Treasurer/Secretary | Admin | Members Login

BrickHost