How To Describe A Dataset

The new student has visited 41 states. Select the variables you wish to define and move them to the right column using the arrow button. Then we'll learn how to describe a dataset. The function returns a dictionary of properties for all shapefiles in a folder or datasets in a file geodatabase. The measures of spread tell us how extreme the values in the dataset are. Dataset (data_vars=None, coords=None, attrs=None, compat=None) ¶. describe() Alternatively, you may use this template to get the descriptive statistics for the entire DataFrame: df. Before we look at correlation methods, let’s define a dataset we can use to test the methods. Let I(t) represent the number of individuals that are/were infected at t. Descriptive statistics generally characterizes or describes a set of data elements by graphically displaying the information or describing its central tendancies and how it is distributed. Structured data in the documentation will also allow the dataset to be found more easily. Compare and contrast at least three data sets used in electronic health records. These entities can have attributes that define its properties. Articulate the assumptions of the statistical test. The feature classes that are part of the feature dataset are listed as children data elements of the feature dataset. C) Use describe and codebook to characterize your variables, and labelbook to characterize your data labels The describe command gives information about how the variable is stored in Stata, while the codebook provides diverse information, including the type of variable, range, frequent values, amount of missing, etc. For example, suppose I subtract 1 at one end, and add 1 at the other, to get 2, 3, 3, , 3, 3, 4 The median won't change. This dataset consists of four frames that show the winter and summer effects of both El Nino and La Nina. Learn vocabulary, terms, and more with flashcards, games, and other study tools. From the latter, I concluded: “…the ChestXray14 dataset, as it exists now, is not fit for training medical AI systems to do diagnostic work. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. How to describe a dataset. Male lions are distinguishable for their impressive manes,. 2 Responses to “Identifying Outliers in a Data Set” Nicholas Pindar Dibal Says: April 2, 2011 at 3:42 pm | Reply. Filled meteorological data fields (which are flagged in the data file) may also be useful for certain renewable energy applications. The Microsoft Azure Machine Learning Studio Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Azure Machine Learning Studio library of algorithms. There are four measures of spread, and we'll talk about each one of them. Couple dressed up for Halloween-themed wedding. Code, data, and usage examples are available at this https URL. However, the provision at 45 CFR 164. Click Scatter plot and select with Fit to show the fit, with Fit + CI to show the fit with confidence interval for the fit, with Fit + PI to show the fit with prediction interval,. Describe how you approach a new problem/dataset. In this lesson you will learn how to determine the number of observations in a set of data by looking at histograms and line plots. By default, the tool outputs a table layer containing calculated field statistics and a JSON string outlining geometry and time settings for the input layer. How to create a 3D Terrain with Google Maps and height maps in Photoshop - 3D Map Generator Terrain - Duration: 20:32. “A dataset (or data set) is a collection of data, usually presented in tabular form. Math Expression Renderer, Plots, Unit Converter, Equation Solver, Complex Numbers, Calculation History. Display data in a DataGrid / data relationship between two tables. How do I describe the use of pre-existing data in the application materials? Initial Application Risk/Benefit Assessment for adults and/or children: Evaluate the risk to the participants within the data set. These are referred to as outliers, which are out of line with the normal data set. Dataset: GeneAtlas U133A, gcrma The tissue-specific pattern of mRNA expression can indicate important clues about gene function. NET DataSet is a memory-resident representation of data that provides a consistent relational programming model regardless of the source of the data it contains. An Introduction to SAS Data Steps. It i the smallest value in the data set and corresponds to the value of the left-most end of the. The present review is the first in an ongoing guide to medical statistics, using specific examples from intensive care. It is about making your research data retraceable for others for verification purposes and for making them reusable beyond the original purpose for which they were collected. How to Find the Range of a Data Set. As you can see, we are going to use both categorical and. Often referred to as the electronic medical record (EMR), the EMR is generally not available to outside researchers. This dataset is of the same type as the colon cancer dataset and can therefore be used for the same kind of experiments. To look at the contents of a SAS data set without displaying the observations, use the CONTENTS statement in the DATASETS procedure. A record is defined to be a series of bytes which are to be read or written together. Half of all values are smaller than the median value and half are larger. These element definitions are followed by a definition of the feature dataset itself. According to Shamoo and Resnik (2003) various analytic procedures “provide a way of drawing inductive inferences from data and distinguishing the signal (the phenomenon of interest. An alternative function (statsBy) returns a list of means, n, and standard deviations for each group. Use one kind of chart (any kind) to describe the data. The dataset is collected from English examinations in China, which are designed for middle school and high school students. Create an LAS dataset before creating contour lines from the lidar data. For Awards Made After. Our focus is primarily on medical research and on practical relevance for the medical investigator. The idea of storytelling is fascinating; to take an idea or an incident, and turn it into a story. Positive trend. You'll use statistics to describe the wines in the data set and derive some insights for yourself. Understand:. org are unblocked. Reports the "Flow" style summary rollups on an instance of H2OFrame. Our focus is primarily on medical research and on practical relevance for the medical investigator. Structured data in the documentation will also allow the dataset to be found more easily. Interoperability issues Valeria Pesce Global Forum on Agricultural Research 2. Dataset size is in bytes (original format). Interview question for Quantitative Analyst in Mountain View, CA. Describing Data: Purpose: This started out as a lecture on descriptive statistics, but it is more than that. Its goal is to create the cornerstone for a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. Select a dataset that you find interesting. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available. Instead of drawing a million features, draw a sample. Describe the Sun Coast data using the descriptive statistics tools discussed in the unit lesson. Often, a data set or collection contains a large number of files, perhaps organized into a number of directories or database tables. Dataset: GeneAtlas U133A, gcrma The tissue-specific pattern of mRNA expression can indicate important clues about gene function. It is normal to analyze a dataset in a paper. New Project 2. How to describe line graphs. Washington, DC 20590. The latter improvement could not only lead to a faster generation of experimental fingerprints, but could also supply a large dataset to improve the prediction of gel run behaviors of fragments based on the sequence composition. A dataset is a collection of data, generally represented in tabular form, with columns signifying different variables and rows signify different members of the set. The public datasets are datasets that BigQuery hosts for you to access and integrate into your applications. It is important to take your time to learn about your data when starting on a new machine learning problem. In this assignment, you will be expected to utilize a data set to analyze and interpret information. Each method is briefly described and includes a recipe in R that you can run yourself or copy and adapt to your own needs. DataSet objects, a key part of data access in the Microsoft. Show the measurements by making a line plot, where the horizontal scale is marked off in whole-number units. describe your research question(s) indicate the data set you will use, and; describe your hypothesis(es) and the variables you intend to examine in evaluating your hypothesis(es) Please see me in office hours or by appointment to discuss ideas if you are unsure about your plans! Do not do any analysis until your plan has been approved by me. As we noted in the earlier section, a key consideration in what distribution to use to describe the data is the likelihood of extreme values for the data, relative to the middle value. We will describe the shape of the distribution of a data set using the following basic categories: symmetric, bell-shaped, skewed right, and skewed left. Dataset size is in bytes (original format). Through data visualization you can easily: Visualise data (make sense of data, especially big data) Classify and categorise data. Datasets are objects that contain data tables where you can temporarily store the data for use in your application. If you find this dataset useful, please cite the following publication: Scene Parsing through ADE20K Dataset. Now split the dataset into a training set and a test set. ProPara aims to promote the research in natural language understanding in the context of procedural text. • Other statistical packages are SPSS, SAS and Stata. PU/DSS/OTR id. Each of the states listed in the table is an element or member of the sample. By describing and documenting this organization, files and data can be easily located and used. Many studies with amazing results are not acted on or fall to the wayside because of its confusing report. The mean is a descriptive statistic that measures the center of balance of the data. I also show the mean of data with and without outliers. You can now use your dataset in 2 ways. There are two modes in this data set. There are 4 facets statisticians use when describing a frequency distribution or data set: the skew, measures of central tendency, spread, and kurtosis. The data is homoskedastic , meaning the variance in the residuals (the difference in the real and predicted values) is more or less constant. Next, let's create an external entity. For instance, if describing a. Specify Group as the data set to describe, give Read access to the Group data set, and create the output data set Grpout. is this a feasible research topic?. Describe each data set element, who developed the data set, and compare the similarities and differences of each data set to the others for the following 3 data sets: Demonstrate your ability to apply ethics to your future career in a health care setting by creating a statement and story of your professional ethics and the story behind your. The preparatory to research provision permits covered entities to use or disclose protected health information for purposes preparatory to research, such as to aid study recruitment. So in order to create a more accurate analysis of the GRAIN dataset we have narrowed down the categories used to describe the sorts of land use to which transacted land is put. Pandas describe() is used to view some basic statistical details like percentile, mean. Learn vocabulary, terms, and more with flashcards, games, and other study tools. Journal of Environmental Quality Dataset Papers - Author and Reviewer Instructions GUIDELINES FOR AUTHORS OF DATASET PAPERS. To create new DFD, select Diagram > New from the toolbar. To learn more about the data set we can type the command describe in the Command window. Data > Create or change data > Other variable-transformation commands > Make data set of frequencies. pandas has several methods that allow you to quickly analyze a dataset and get an idea of the type and amount of data you are dealing with along with some important statistics. Next, let's create an external entity. This will allow for the document to be embedded as part of an App; And no doubt this will evolve over time with better requirements. The Windows Forms DataGrid control provides a user interface to ADO. These element definitions are followed by a definition of the feature dataset itself. The five numbers are the minimum, the first quartile(Q1) value, the median, the third quartile(Q3) value, and the maximum. However, there is a big catch in this. All other numbers occur one time only. 5MB) Geographies available. 4 Measures of Variation 413 5. Review of mean, median and mode Exposition - discussion strategy (15 minutes) Teacher presents the question: A test was scored out of 20 (only whole marks were given) and 12 pupils. One method is based on determining to what degree the values in a data set differ from the mean of that data set. Here are the most powerful and flexible approaches, which include using Excel arrays. IMLS Discretionary Grant and Cooperative Agreement Awards. In total, 793 different samples across the different omics datasets define. The Patent Claims Research Dataset contain detailed information on claims from U. In this example the answer may, or may not, be obvious, depending what we are trying to measure about air quality. How would you rate its importance. This can skew your results. open-data-security description format is a simple JSON format to describe dataset released as open data by security researchers, security vendors or CSIRTs. Describe the sample you will use. It is normal to analyze a dataset in a paper. Below is one possible approach to Lesson 3 Practice Exercise D. A dataset resembles an in-memory representation of a NetCDF file, and consists of variables, coordinates and attributes which together form a self describing dataset. Google's approach to dataset discovery makes use of schema. Explain how to retrieve a data frame cell value with the square bracket operator. Understand attribute values with summarized field statistics. Labels indicate which of two variants of leukemia is present in the sample (AML, 25 samples, or ALL, 47 samples). Either way, the more data a company has, the. Spooky Season. The data set shouldn't have too many rows or columns, so it's easy to work with. Dataset Search works in multiple languages with support for additional languages coming soon. I also show the mean of data with and without outliers. values, 'id' is the identifier for each observation. 5 (Note that N is used here rather than N-1 because the true mean is known). Table 1: Table 1. Programming is done in Python and the results can easily be integrated into cloud-based applications. One way to minimize this fear is to remember that only three things can be done with statistics describe, compare and relate. While measures of central tendency are used to estimate "normal" values of a dataset, measures of dispersion are important for describing the spread of the data, or its variation around a central value. To compute the range of a data set, we subtract the minimum from the maximum: range = maximum – minimum. Washington, DC 20590. axis int or None, optional. Regression equation calculation depends on the slope and y-intercept. Since the title is what people will see first in a search of the Clearinghouse, you'll want to make sure it is both informative and concise. First, this is how adults describe a database in an encyclopedia: A database is stored as a file or a set of files on magnetic disk or tape, optical disk, or some other secondary storage device. An element could be an item, a state, a person, and so forth. How does the data set describe geographic features? Who produced the data set? Who are the originators of the data set? Who also contributed to the data set? To whom should users address questions about the data? Why was the data set created? How was the data set created? Where did the data come from? What changes have been made?. The median of a set of data values is the middle value of the data set when it has been arranged in ascending order. Feature datasets are grouped under the DatasetDefinitions tag in an XML document and defined by a DataElement tag of type esri:DEFeatureDataset. Time series analysis is generally used when there are 50 or more data points in a series. Code, data, and usage examples are available at this https URL. It i the smallest value in the data set and corresponds to the value of the left-most end of the. ANA Recognized Minimum Data Sets. dataset, Stata has a command that gives a good in-depth description of the structure, contents, and values of the variables: the codebook command. describe using always shows the full names of the variables, so fullnames may not be specified with describe using. We can use the data to determine if a linear relationship between the variables exists. How to sample a smaller data set from a very large data set ? I have a very large dataset with dimension given by (203000 x 150), with some of the columns (attributes) have a very large number of. The Describe Dataset task provides an overview of your big data. A quick tour of ArcPy ArcPy is a site package that builds on (and is a successor to) the successful arcgisscripting module. You'll use statistics to describe the wines in the data set and derive some insights for yourself. Communicating your study’s results to your co-workers, managers, and clients in a way that is both professional and easy to understand is a crucial skill for any researcher. Couple dressed up for Halloween-themed wedding. Datasets are usually accompanied by some documentation that functions like an owner's manual. Histogram: Calculate descriptive statistics. This lesson is designed to provide students with a review of these concepts and then an in-depth look at how the overall composition of the data set can impact the calculation of both the mean and median. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms. Each column represents a particular variable. 5 33 67 1 -20. Map Viewer analysis tools. News sites that release their data publicly can be great places to find data sets for data visualization. Dataset timestamps are Python timedate in UTC (converted from original to Python type). Given a sample dataset with missing values, apply an appropriate technique to deal with them. However, there is a big catch in this. describe¶ scipy. The extent to which the median and mean are good representatives of the values in the original dataset depends upon the variability or dispersion in the original data. Taking samples of information can help ease these costs because it is often impractical to collect all the data. Regression equation calculation depends on the slope and y-intercept. Examples of variables include quantities such as: gender, investment type, test scores, and. The TRI preliminary dataset is available each July through September so the public can access the most recent TRI information prior to the annual publication of the TRI National Analysis in January. (b) Display the information in Table 1. Thus it is always a good idea to check a dataset for missing data, and to think about how the missing data may influence our analyses. What we’re going to do is display the thumbnails of the latest 16 photos, which will link to the medium-sized display of the image. by(), you might find this helpful. There are (at least) three different ways to describe the middle value: mean, median and mode. If the data set is 1, 2, 3, 3, then 2 and 3 are the middle numbers and must be averaged to obtain the median of 2. Pandas describe() is used to view some basic statistical details like percentile, mean, std etc. SageMaker is Amazon’s solution for developers who want to deploy predictive machine learning models into a production environment. The codebook also indicates how the data are stored and organized for use by the computer. A histogram divides the range of values into various groups (or buckets), and then shows the frequency — how many. open-data-security description format is a simple JSON format to describe dataset released as open data by security researchers, security vendors or CSIRTs. Input data. The measures of spread tell us how extreme the values in the dataset are. To compute the range of a data set, we subtract the minimum from the maximum: range = maximum – minimum. Now it’s time to dig in, start sorting, and analyze the data. NET 3-tier application is a program which is organized into three major disjunctive tiers on layers. Length 150 0 35 1 5. Time series analysis is generally used when there are 50 or more data points in a series. Regression examples · Baseball batting averages · Beer sales vs. You just hover your mouse on your DataSet variable and you will see a tiny little magnifying glass icon as below :. I am not sure when to use present/past tense to describe how you analyze the data and what your results are. This will allow for the document to be embedded as part of an App; And no doubt this will evolve over time with better requirements. Bar Graphs and Dot Plots L E S S O N 1. A two-way table presents categorical data by counting the number of observations that fall into each group for two variables, one divided into rows and the other divided into columns. dataset, Stata has a command that gives a good in-depth description of the structure, contents, and values of the variables: the codebook command. For example, the test. Descriptive statistics of your dataset The psych package provides several functions to help you visualize your data. In some data sets there may be a point or two that can be out of context with the bulk of the data. The first step in any analysis is to describe and summarize the data. It lists values for each of the variables, such as height and weight of an object. This means that the first decile is the 10th percentile, the second decile is the 20th percentile, etc. A data set (or dataset) is a collection of data. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms. Let I(t) represent the number of individuals that are/were infected at t. Introduction to HDF5. There are different measures of central tendency. Think of describe, replace as separate from but related to describe without the replace option. First, this is how adults describe a database in an encyclopedia: A database is stored as a file or a set of files on magnetic disk or tape, optical disk, or some other secondary storage device. Hint: To find the standard deviation, think Empirical Rule covered in Week 1. Interview question for Quantitative Analyst in Mountain View, CA. As such, the shape of a histogram is its most evident and informative characteristic: it allows you to easily see where a relatively large amount of the data is situated and where there is very little data to be found (Verzani 2004). This can skew your results. Or copy & paste this link into an email or IM:. Data is limitless and present everywhere in the universe. If you do not like knowing about the kurtosis of your data, you could read up on the options of describe. Describing and Interpreting Data. 9 Describe, Analyze, and/or Summarize a Data Set 6. If you have the jobs that reference the data sets, you can open each of the data set descriptor files using the Data Set Management tool and then view and delete the data set. The option EXPECTED will cause expected cell frequencies (under the hypothesis of independence) to be printed. There are two points that need to be raised here, 1) cross-checks within the data collection process and 2) data quality being as much an observation-level issue as it is a complete data set issue. Q&A for users of Wolfram Mathematica. In some data sets there may be a point or two that can be out of context with the bulk of the data. Describe each data set element, who developed the data set, and compare the similarities and differences of each data set to the others for the following 3 data sets: Uniform Hospital Discharge Data Set (UHDDS) for inpatients. They typically clean the data for you, and they often already have charts they've made that you can learn from, replicate, or improve. to find the center of the data set. datasetsearch), a search engine over dataset metadata that we built with an open ecosystem at its core: data publishers, large and small, use Schema. It is a nightmare for the students who are facing the issue of writer’s block. A dataset is a collection of data, generally represented in tabular form, with columns signifying different variables and rows signify different members of the set. The data dictionary is a crucial component of any relational database. The Amazon Resource Name (ARN) of the dataset to describe. The median value is the middle value in a set of values. The DataSet is a memory-resident representation of data that provides a consistent relational programming model regardless of the data source. The DESCRIBE command enables you to describe objects recursively to the depth level set in the SET DESCRIBE command. dataset is written and maintained by Friedrich Lindenberg, Gregor Aisch and Stefan Wehrmeyer. For example, you may want to extract all cases in a dataset beginning at the 5th row, or extract the first 30 cases in a dataset, or extract rows 20 through 30 of a dataset. Either type codebook in the Command window and press Enter or navigate the menus to Data > Describe data > Describe data contents (codebook) and click on OK. Whereas a mean of 3. Using a build-in data set sample as example, discuss the topics of data frame columns and rows. read_csv('train. Description The tutorial will cover a wide spectrum of multicamera systems from micro to macro. Describe a population proportion that could be estimated using the GSS. Describe how you would handle uncertainties in large dataset. A variable can have one or several values (information for one or several cases). org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow. Get the amenities of each business in the data set. Race is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. We begin by learning some common terms used in examining data. We can do this using a combination of three useful features of the spreadsheet:. How the EHR would be used by the organization is clear, appropriate, and grounded in scholarly evidence. pandas has several methods that allow you to quickly analyze a dataset and get an idea of the type and amount of data you are dealing with along with some important statistics. Understand:. Data Set Information: This data set was generated to model psychological experimental results. To modify the scatter plot: If the Linear regression dialog box is not visible click Edit on the Analyse-it tab/toolbar. summary(A) prints a summary of a dataset array and the variables that it contains. Input data. Correlation can have a value: 1 is a perfect positive correlation; 0 is no correlation (the values don't seem linked at all)-1 is a perfect negative correlation; The value shows how good the correlation is (not how steep the line is), and if it is positive or negative. 6, 7, 13, 15, 18, 21, 21, and 25 will be the data set that we use to find the mean. Using the test. If this is the case, complementing a measure of location with a measure of spread or variation is an appropriate choice. Couple dressed up for Halloween-themed wedding. To describe the data I preferred to show the number (%) of outliers and the mean of the outliers in dataset. The infomation given in the table above is a data set. VGGFace2 is a large-scale face recognition dataset. This lesson helps students understand complex math concepts in an accessible way. The Flickr JSON is a little confusing, and it doesn’t provide a direct link to the thumbnail version of our photos, so we’ll have to use some trickery on our end to get to it, which we’ll cover in just a moment. Five-Number Summary. You may cite your previous description if the same data set is used from a previous assignment. The complete dataset is provided as an additional file, see Additional file 1: [data. Why is this dataset important. In the dataset with the odd number of observations, notice how the number 12 has six values above it and six below it. Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number. The variability of data relates to how different or “spread out” the scores are from each other. NET DataSet is a memory-resident representation of data that provides a consistent relational programming model regardless of the source of the data it contains. So the mean and median will be the same. Minimum Data Set (MDS) for long-term care. In the New Diagram window, select Data Flow Diagram and click Next. So, the scatter plot follows no trend. describe (a, axis=0, ddof=1, bias=True, nan_policy='propagate') [source] ¶ Compute several descriptive statistics of the passed array. The CDC WONDER online databases, as well other public health data collections hosted on CDC WONDER, are listed by name below, in two sections, CDC WONDER online databases and other public health data collections hosted on CDC WONDER. Width 150 0 23. median, the middle number, and the mode, the most common number, to compare the center of each data set in a different way. Notes on linear regression analysis (pdf file) Introduction to linear regression analysis. A dataset contains information about "individuals". And the middle value is going to be a 3. You have treated missing values and outliers as the main problems of data analysis which you presented ways of handling them separately, I am thinking of imputing missing values in a data set which is noted to have serious outliers. A dataset contains information about "individuals". As such, the shape of a histogram is its most evident and informative characteristic: it allows you to easily see where a relatively large amount of the data is situated and where there is very little data to be found (Verzani 2004). Example - Consider a dataset of people id - Long, Unique Identifier Age - Integer, The age of the individual. The line connecting the points in the above graph is simply the conversion equation between the two measurement scales. Get the amenities of each business in the data set. 1 Derived Timepoint Rows A row could be derived from other rows within the current parameter for the purpose of creating an analysis value,. Merging Qualitative and Quantitative Data in Mixed Methods Research: How To and Why Not. Identifying individuals, variables and categorical variables in a data set If you're seeing this message, it means we're having trouble loading external resources on our website. Width 150 0 23. One type of solution involves altering sensitive attributes in a data set to offset the outcome. Minimum Data Set (MDS) for long-term care. While other tools, particularly Stata, have similar capabilities and are easier to learn, most SAS experts have seen little reason to switch. Map Viewer analysis tools. See also information and knowledge. Variance of the means. The Polity IV dataset covers all major, independent states in the global system over the period 1800-2017 (i. The average is the addition of all the numbers in the data set and then having those numbers divided by the number of numbers within that set. Measures of Central Tendency The measures of central tendency describe a distribution in terms of its most “frequent”, “typi-cal” or “average” data value. The median of the lower half is called Q1 and the median of the upper half is called Q3. But there are different ways of representing or expressing the idea of “typicality”. The Contents of the GROUP Data Set 2 The DATASETS Procedure Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 11 HPHONE Char 12 1 IDNUM Char 4 7 JOBCODE Char 4 2 LNAME Char 15 8 SALARY Num 8 COMMA8. 20 Describe the type of data that the organization collects and propose a simple informatics system (data set) to manage this type of data. Think of describe, replace as separate from but related to describe without the replace option. Each post expands upon one item listed in the master Tableau Performance Checklist. Here I share with you a data preparation approach and SAS coding technique that will significantly simplify, unclutter and streamline your SAS programming life by using data templates. Spooky Season. Time series analysis is generally used when there are 50 or more data points in a series. " to "street, road, etcetera"). These are referred to as outliers, which are out of line with the normal data set. † When conducting a statistical analysis, you often want to use values of one variable to predict or explain the values of another variable. Describing Data: Purpose: This started out as a lecture on descriptive statistics, but it is more than that. Describe how you would handle uncertainties in large dataset.