CRENC Learn

How to Create a Data Analysis Plan: A Detailed Guide

by Barche Blaise | Aug 12, 2020 | Writing

how to create a data analysis plan

If a good research question equates to a story then, a roadmap will be very vita l for good storytelling. We advise every student/researcher to personally write his/her data analysis plan before seeking any advice. In this blog article, we will explore how to create a data analysis plan: the content and structure.

This data analysis plan serves as a roadmap to how data collected will be organised and analysed. It includes the following aspects:

  • Clearly states the research objectives and hypothesis
  • Identifies the dataset to be used
  • Inclusion and exclusion criteria
  • Clearly states the research variables
  • States statistical test hypotheses and the software for statistical analysis
  • Creating shell tables

1. Stating research question(s), objectives and hypotheses:

All research objectives or goals must be clearly stated. They must be Specific, Measurable, Attainable, Realistic and Time-bound (SMART). Hypotheses are theories obtained from personal experience or previous literature and they lay a foundation for the statistical methods that will be applied to extrapolate results to the entire population.

2. The dataset:

The dataset that will be used for statistical analysis must be described and important aspects of the dataset outlined. These include; owner of the dataset, how to get access to the dataset, how the dataset was checked for quality control and in what program is the dataset stored (Excel, Epi Info, SQL, Microsoft access etc.).

3. The inclusion and exclusion criteria :

They guide the aspects of the dataset that will be used for data analysis. These criteria will also guide the choice of variables included in the main analysis.

4. Variables:

Every variable collected in the study should be clearly stated. They should be presented based on the level of measurement (ordinal/nominal or ratio/interval levels), or the role the variable plays in the study (independent/predictors or dependent/outcome variables). The variable types should also be outlined.  The variable type in conjunction with the research hypothesis forms the basis for selecting the appropriate statistical tests for inferential statistics. A good data analysis plan should summarize the variables as demonstrated in Figure 1 below.

Presentation of variables in a data analysis plan

5. Statistical software

There are tons of software packages for data analysis, some common examples are SPSS, Epi Info, SAS, STATA, Microsoft Excel. Include the version number,  year of release and author/manufacturer. Beginners have the tendency to try different software and finally not master any. It is rather good to select one and master it because almost all statistical software have the same performance for basic and the majority of advance analysis needed for a student thesis. This is what we recommend to all our students at CRENC before they begin writing their results section .

6. Selecting the appropriate statistical method to test hypotheses

Depending on the research question, hypothesis and type of variable, several statistical methods can be used to answer the research question appropriately. This aspect of the data analysis plan outlines clearly why each statistical method will be used to test hypotheses. The level of statistical significance (p-value) which is often but not always <0.05 should also be written.  Presented in figures 2a and 2b are decision trees for some common statistical tests based on the variable type and research question

A good analysis plan should clearly describe how missing data will be analysed.

How to choose a statistical method to determine association between variables

7. Creating shell tables

Data analysis involves three levels of analysis; univariable, bivariable and multivariable analysis with increasing order of complexity. Shell tables should be created in anticipation for the results that will be obtained from these different levels of analysis. Read our blog article on how to present tables and figures for more details. Suppose you carry out a study to investigate the prevalence and associated factors of a certain disease “X” in a population, then the shell tables can be represented as in Tables 1, Table 2 and Table 3 below.

Table 1: Example of a shell table from univariate analysis

Example of a shell table from univariate analysis

Table 2: Example of a shell table from bivariate analysis

Example of a shell table from bivariate analysis

Table 3: Example of a shell table from multivariate analysis

Example of a shell table from multivariate analysis

aOR = adjusted odds ratio

Now that you have learned how to create a data analysis plan, these are the takeaway points. It should clearly state the:

  • Research question, objectives, and hypotheses
  • Dataset to be used
  • Variable types and their role
  • Statistical software and statistical methods
  • Shell tables for univariate, bivariate and multivariate analysis

Further readings

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552232/pdf/cjhp-68-311.pdf

Creating an Analysis Plan: https://www.cdc.gov/globalhealth/healthprotection/fetp/training_modules/9/creating-analysis-plan_pw_final_09242013.pdf

Data Analysis Plan: https://www.statisticssolutions.com/dissertation-consulting-services/data-analysis-plan-2/

Photo created by freepik – www.freepik.com

Barche Blaise

Dr Barche is a physician and holds a Masters in Public Health. He is a senior fellow at CRENC with interests in Data Science and Data Analysis.

Post Navigation

16 comments.

Ewane Edwin, MD

Thanks. Quite informative.

James Tony

Educative write-up. Thanks.

Mabou Gabriel

Easy to understand. Thanks Dr

Amabo Miranda N.

Very explicit Dr. Thanks

Dongmo Roosvelt, MD

I will always remember how you help me conceptualize and understand data science in a simple way. I can only hope that someday I’ll be in a position to repay you, my dear friend.

Menda Blondelle

Plan d’analyse

Marc Lionel Ngamani

This is interesting, Thanks

Nkai

Very understandable and informative. Thank you..

Ndzeshang

love the figures.

Selemani C Ngwira

Nice, and informative

MONICA NAYEBARE

This is so much educative and good for beginners, I would love to recommend that you create and share a video because some people are able to grasp when there is an instructor. Lots of love

Kwasseu

Thank you Doctor very helpful.

Mbapah L. Tasha

Educative and clearly written. Thanks

Philomena Balera

Well said doctor,thank you.But when do you present in tables ,bars,pie chart etc?

Rasheda

Very informative guide!

Submit a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

Submit Comment

  Receive updates on new courses and blog posts

Never Miss a Thing!

Never Miss a Thing!

Subscribe to our mailing list to receive the latest news and updates on our webinars, articles and courses.

You have Successfully Subscribed!

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

data analysis in research plan example

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

Experimental vs Observational Studies: Differences & Examples

Experimental vs Observational Studies: Differences & Examples

Sep 5, 2024

Interactive forms

Interactive Forms: Key Features, Benefits, Uses + Design Tips

Sep 4, 2024

closed-loop management

Closed-Loop Management: The Key to Customer Centricity

Sep 3, 2024

Net Trust Score

Net Trust Score: Tool for Measuring Trust in Organization

Sep 2, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Can J Hosp Pharm
  • v.68(4); Jul-Aug 2015

Logo of cjhp

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study

There are three kinds of lies: lies, damned lies, and statistics. – Mark Twain 1

INTRODUCTION

Statistics represent an essential part of a study because, regardless of the study design, investigators need to summarize the collected information for interpretation and presentation to others. It is therefore important for us to heed Mr Twain’s concern when creating the data analysis plan. In fact, even before data collection begins, we need to have a clear analysis plan that will guide us from the initial stages of summarizing and describing the data through to testing our hypotheses.

The purpose of this article is to help you create a data analysis plan for a quantitative study. For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2 , 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to summarize study data, and a process to help identify relevant statistical tests. My intention here is to introduce the main elements of data analysis and provide a place for you to start when planning this part of your study. Biostatistical experts, textbooks, statistical software packages, and other resources can certainly add more breadth and depth to this topic when you need additional information and advice.

TERMS AND CONCEPTS USED IN DATA ANALYSIS

When analyzing information from a quantitative study, we are often dealing with numbers; therefore, it is important to begin with an understanding of the source of the numbers. Let us start with the term variable , which defines a specific item of information collected in a study. Examples of variables include age, sex or gender, ethnicity, exercise frequency, weight, treatment group, and blood glucose. Each variable will have a group of categories, which are referred to as values , to help describe the characteristic of an individual study participant. For example, the variable “sex” would have values of “male” and “female”.

Although variables can be defined or grouped in various ways, I will focus on 2 methods at this introductory stage. First, variables can be defined according to the level of measurement. The categories in a nominal variable are names, for example, male and female for the variable “sex”; white, Aboriginal, black, Latin American, South Asian, and East Asian for the variable “ethnicity”; and intervention and control for the variable “treatment group”. Nominal variables with only 2 categories are also referred to as dichotomous variables because the study group can be divided into 2 subgroups based on information in the variable. For example, a study sample can be split into 2 groups (patients receiving the intervention and controls) using the dichotomous variable “treatment group”. An ordinal variable implies that the categories can be placed in a meaningful order, as would be the case for exercise frequency (never, sometimes, often, or always). Nominal-level and ordinal-level variables are also referred to as categorical variables, because each category in the variable can be completely separated from the others. The categories for an interval variable can be placed in a meaningful order, with the interval between consecutive categories also having meaning. Age, weight, and blood glucose can be considered as interval variables, but also as ratio variables, because the ratio between values has meaning (e.g., a 15-year-old is half the age of a 30-year-old). Interval-level and ratio-level variables are also referred to as continuous variables because of the underlying continuity among categories.

As we progress through the levels of measurement from nominal to ratio variables, we gather more information about the study participant. The amount of information that a variable provides will become important in the analysis stage, because we lose information when variables are reduced or aggregated—a common practice that is not recommended. 4 For example, if age is reduced from a ratio-level variable (measured in years) to an ordinal variable (categories of < 65 and ≥ 65 years) we lose the ability to make comparisons across the entire age range and introduce error into the data analysis. 4

A second method of defining variables is to consider them as either dependent or independent. As the terms imply, the value of a dependent variable depends on the value of other variables, whereas the value of an independent variable does not rely on other variables. In addition, an investigator can influence the value of an independent variable, such as treatment-group assignment. Independent variables are also referred to as predictors because we can use information from these variables to predict the value of a dependent variable. Building on the group of variables listed in the first paragraph of this section, blood glucose could be considered a dependent variable, because its value may depend on values of the independent variables age, sex, ethnicity, exercise frequency, weight, and treatment group.

Statistics are mathematical formulae that are used to organize and interpret the information that is collected through variables. There are 2 general categories of statistics, descriptive and inferential. Descriptive statistics are used to describe the collected information, such as the range of values, their average, and the most common category. Knowledge gained from descriptive statistics helps investigators learn more about the study sample. Inferential statistics are used to make comparisons and draw conclusions from the study data. Knowledge gained from inferential statistics allows investigators to make inferences and generalize beyond their study sample to other groups.

Before we move on to specific descriptive and inferential statistics, there are 2 more definitions to review. Parametric statistics are generally used when values in an interval-level or ratio-level variable are normally distributed (i.e., the entire group of values has a bell-shaped curve when plotted by frequency). These statistics are used because we can define parameters of the data, such as the centre and width of the normally distributed curve. In contrast, interval-level and ratio-level variables with values that are not normally distributed, as well as nominal-level and ordinal-level variables, are generally analyzed using nonparametric statistics.

METHODS FOR SUMMARIZING STUDY DATA: DESCRIPTIVE STATISTICS

The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data.

Selection of an appropriate figure to represent a particular set of data depends on the measurement level of the variable. Data for nominal-level and ordinal-level variables may be interpreted using a pie graph or bar graph . Both options allow us to examine the relative number of participants within each category (by reporting the percentages within each category), whereas a bar graph can also be used to examine absolute numbers. For example, we could create a pie graph to illustrate the proportions of men and women in a study sample and a bar graph to illustrate the number of people who report exercising at each level of frequency (never, sometimes, often, or always).

Interval-level and ratio-level variables may also be interpreted using a pie graph or bar graph; however, these types of variables often have too many categories for such graphs to provide meaningful information. Instead, these variables may be better interpreted using a histogram . Unlike a bar graph, which displays the frequency for each distinct category, a histogram displays the frequency within a range of continuous categories. Information from this type of figure allows us to determine whether the data are normally distributed. In addition to pie graphs, bar graphs, and histograms, many other types of figures are available for the visual representation of data. Interested readers can find additional types of figures in the books recommended in the “Further Readings” section.

Figures are also useful for visualizing comparisons between variables or between subgroups within a variable (for example, the distribution of blood glucose according to sex). Box plots are useful for summarizing information for a variable that does not follow a normal distribution. The lower and upper limits of the box identify the interquartile range (or 25th and 75th percentiles), while the midline indicates the median value (or 50th percentile). Scatter plots provide information on how the categories for one continuous variable relate to categories in a second variable; they are often helpful in the analysis of correlations.

In addition to using figures to present a visual description of the data, investigators can use statistics to provide a numeric description. Regardless of the measurement level, we can find the mode by identifying the most frequent category within a variable. When summarizing nominal-level and ordinal-level variables, the simplest method is to report the proportion of participants within each category.

The choice of the most appropriate descriptive statistic for interval-level and ratio-level variables will depend on how the values are distributed. If the values are normally distributed, we can summarize the information using the parametric statistics of mean and standard deviation. The mean is the arithmetic average of all values within the variable, and the standard deviation tells us how widely the values are dispersed around the mean. When values of interval-level and ratio-level variables are not normally distributed, or we are summarizing information from an ordinal-level variable, it may be more appropriate to use the nonparametric statistics of median and range. The first step in identifying these descriptive statistics is to arrange study participants according to the variable categories from lowest value to highest value. The range is used to report the lowest and highest values. The median or 50th percentile is located by dividing the number of participants into 2 groups, such that half (50%) of the participants have values above the median and the other half (50%) have values below the median. Similarly, the 25th percentile is the value with 25% of the participants having values below and 75% of the participants having values above, and the 75th percentile is the value with 75% of participants having values below and 25% of participants having values above. Together, the 25th and 75th percentiles define the interquartile range .

PROCESS TO IDENTIFY RELEVANT STATISTICAL TESTS: INFERENTIAL STATISTICS

One caveat about the information provided in this section: selecting the most appropriate inferential statistic for a specific study should be a combination of following these suggestions, seeking advice from experts, and discussing with your co-investigators. My intention here is to give you a place to start a conversation with your colleagues about the options available as you develop your data analysis plan.

There are 3 key questions to consider when selecting an appropriate inferential statistic for a study: What is the research question? What is the study design? and What is the level of measurement? It is important for investigators to carefully consider these questions when developing the study protocol and creating the analysis plan. The figures that accompany these questions show decision trees that will help you to narrow down the list of inferential statistics that would be relevant to a particular study. Appendix 1 provides brief definitions of the inferential statistics named in these figures. Additional information, such as the formulae for various inferential statistics, can be obtained from textbooks, statistical software packages, and biostatisticians.

What Is the Research Question?

The first step in identifying relevant inferential statistics for a study is to consider the type of research question being asked. You can find more details about the different types of research questions in a previous article in this Research Primer series that covered questions and hypotheses. 5 A relational question seeks information about the relationship among variables; in this situation, investigators will be interested in determining whether there is an association ( Figure 1 ). A causal question seeks information about the effect of an intervention on an outcome; in this situation, the investigator will be interested in determining whether there is a difference ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f1.jpg

Decision tree to identify inferential statistics for an association.

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f2.jpg

Decision tree to identify inferential statistics for measuring a difference.

What Is the Study Design?

When considering a question of association, investigators will be interested in measuring the relationship between variables ( Figure 1 ). A study designed to determine whether there is consensus among different raters will be measuring agreement. For example, an investigator may be interested in determining whether 2 raters, using the same assessment tool, arrive at the same score. Correlation analyses examine the strength of a relationship or connection between 2 variables, like age and blood glucose. Regression analyses also examine the strength of a relationship or connection; however, in this type of analysis, one variable is considered an outcome (or dependent variable) and the other variable is considered a predictor (or independent variable). Regression analyses often consider the influence of multiple predictors on an outcome at the same time. For example, an investigator may be interested in examining the association between a treatment and blood glucose, while also considering other factors, like age, sex, ethnicity, exercise frequency, and weight.

When considering a question of difference, investigators must first determine how many groups they will be comparing. In some cases, investigators may be interested in comparing the characteristic of one group with that of an external reference group. For example, is the mean age of study participants similar to the mean age of all people in the target group? If more than one group is involved, then investigators must also determine whether there is an underlying connection between the sets of values (or samples ) to be compared. Samples are considered independent or unpaired when the information is taken from different groups. For example, we could use an unpaired t test to compare the mean age between 2 independent samples, such as the intervention and control groups in a study. Samples are considered related or paired if the information is taken from the same group of people, for example, measurement of blood glucose at the beginning and end of a study. Because blood glucose is measured in the same people at both time points, we could use a paired t test to determine whether there has been a significant change in blood glucose.

What Is the Level of Measurement?

As described in the first section of this article, variables can be grouped according to the level of measurement (nominal, ordinal, or interval). In most cases, the independent variable in an inferential statistic will be nominal; therefore, investigators need to know the level of measurement for the dependent variable before they can select the relevant inferential statistic. Two exceptions to this consideration are correlation analyses and regression analyses ( Figure 1 ). Because a correlation analysis measures the strength of association between 2 variables, we need to consider the level of measurement for both variables. Regression analyses can consider multiple independent variables, often with a variety of measurement levels. However, for these analyses, investigators still need to consider the level of measurement for the dependent variable.

Selection of inferential statistics to test interval-level variables must include consideration of how the data are distributed. An underlying assumption for parametric tests is that the data approximate a normal distribution. When the data are not normally distributed, information derived from a parametric test may be wrong. 6 When the assumption of normality is violated (for example, when the data are skewed), then investigators should use a nonparametric test. If the data are normally distributed, then investigators can use a parametric test.

ADDITIONAL CONSIDERATIONS

What is the level of significance.

An inferential statistic is used to calculate a p value, the probability of obtaining the observed data by chance. Investigators can then compare this p value against a prespecified level of significance, which is often chosen to be 0.05. This level of significance represents a 1 in 20 chance that the observation is wrong, which is considered an acceptable level of error.

What Are the Most Commonly Used Statistics?

In 1983, Emerson and Colditz 7 reported the first review of statistics used in original research articles published in the New England Journal of Medicine . This review of statistics used in the journal was updated in 1989 and 2005, 8 and this type of analysis has been replicated in many other journals. 9 – 13 Collectively, these reviews have identified 2 important observations. First, the overall sophistication of statistical methodology used and reported in studies has grown over time, with survival analyses and multivariable regression analyses becoming much more common. The second observation is that, despite this trend, 1 in 4 articles describe no statistical methods or report only simple descriptive statistics. When inferential statistics are used, the most common are t tests, contingency table tests (for example, χ 2 test and Fisher exact test), and simple correlation and regression analyses. This information is important for educators, investigators, reviewers, and readers because it suggests that a good foundational knowledge of descriptive statistics and common inferential statistics will enable us to correctly evaluate the majority of research articles. 11 – 13 However, to fully take advantage of all research published in high-impact journals, we need to become acquainted with some of the more complex methods, such as multivariable regression analyses. 8 , 13

What Are Some Additional Resources?

As an investigator and Associate Editor with CJHP , I have often relied on the advice of colleagues to help create my own analysis plans and review the plans of others. Biostatisticians have a wealth of knowledge in the field of statistical analysis and can provide advice on the correct selection, application, and interpretation of these methods. Colleagues who have “been there and done that” with their own data analysis plans are also valuable sources of information. Identify these individuals and consult with them early and often as you develop your analysis plan.

Another important resource to consider when creating your analysis plan is textbooks. Numerous statistical textbooks are available, differing in levels of complexity and scope. The titles listed in the “Further Reading” section are just a few suggestions. I encourage interested readers to look through these and other books to find resources that best fit their needs. However, one crucial book that I highly recommend to anyone wanting to be an investigator or peer reviewer is Lang and Secic’s How to Report Statistics in Medicine (see “Further Reading”). As the title implies, this book covers a wide range of statistics used in medical research and provides numerous examples of how to correctly report the results.

CONCLUSIONS

When it comes to creating an analysis plan for your project, I recommend following the sage advice of Douglas Adams in The Hitchhiker’s Guide to the Galaxy : Don’t panic! 14 Begin with simple methods to summarize and visualize your data, then use the key questions and decision trees provided in this article to identify relevant statistical tests. Information in this article will give you and your co-investigators a place to start discussing the elements necessary for developing an analysis plan. But do not stop there! Use advice from biostatisticians and more experienced colleagues, as well as information in textbooks, to help create your analysis plan and choose the most appropriate statistics for your study. Making careful, informed decisions about the statistics to use in your study should reduce the risk of confirming Mr Twain’s concern.

Appendix 1. Glossary of statistical terms * (part 1 of 2)

  • 1-way ANOVA: Uses 1 variable to define the groups for comparing means. This is similar to the Student t test when comparing the means of 2 groups.
  • Kruskall–Wallis 1-way ANOVA: Nonparametric alternative for the 1-way ANOVA. Used to determine the difference in medians between 3 or more groups.
  • n -way ANOVA: Uses 2 or more variables to define groups when comparing means. Also called a “between-subjects factorial ANOVA”.
  • Repeated-measures ANOVA: A method for analyzing whether the means of 3 or more measures from the same group of participants are different.
  • Freidman ANOVA: Nonparametric alternative for the repeated-measures ANOVA. It is often used to compare rankings and preferences that are measured 3 or more times.
  • Fisher exact: Variation of chi-square that accounts for cell counts < 5.
  • McNemar: Variation of chi-square that tests statistical significance of changes in 2 paired measurements of dichotomous variables.
  • Cochran Q: An extension of the McNemar test that provides a method for testing for differences between 3 or more matched sets of frequencies or proportions. Often used as a measure of heterogeneity in meta-analyses.
  • 1-sample: Used to determine whether the mean of a sample is significantly different from a known or hypothesized value.
  • Independent-samples t test (also referred to as the Student t test): Used when the independent variable is a nominal-level variable that identifies 2 groups and the dependent variable is an interval-level variable.
  • Paired: Used to compare 2 pairs of scores between 2 groups (e.g., baseline and follow-up blood pressure in the intervention and control groups).

Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006.

Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003.

Plichta SB, Kelvin E. Munro’s statistical methods for health care research . 6th ed. Philadelphia (PA): Wolters Kluwer Health/ Lippincott, Williams & Wilkins; 2013.

This article is the 12th in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous articles in this series:

  • Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.
  • Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.
  • Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.
  • Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.
  • Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.
  • Gamble JM. An introduction to the fundamentals of cohort and case–control studies. Can J Hosp Pharm . 2014;67(5):366–72.
  • Austin Z, Sutton J. Qualitative research: getting started. C an J Hosp Pharm . 2014;67(6):436–40.
  • Houle S. An introduction to the fundamentals of randomized controlled trials in pharmacy research. Can J Hosp Pharm . 2014; 68(1):28–32.
  • Charrois TL. Systematic reviews: What do you need to know to get started? Can J Hosp Pharm . 2014;68(2):144–8.
  • Sutton J, Austin Z. Qualitative research: data collection, analysis, and management. Can J Hosp Pharm . 2014;68(3):226–31.
  • Cadarette SM, Wong L. An introduction to health care administrative data. Can J Hosp Pharm. 2014;68(3):232–7.

Competing interests: None declared.

Further Reading

  • Devor J, Peck R. Statistics: the exploration and analysis of data. 7th ed. Boston (MA): Brooks/Cole Cengage Learning; 2012. [ Google Scholar ]
  • Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006. [ Google Scholar ]
  • Mendenhall W, Beaver RJ, Beaver BM. Introduction to probability and statistics. 13th ed. Belmont (CA): Brooks/Cole Cengage Learning; 2009. [ Google Scholar ]
  • Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003. [ Google Scholar ]
  • Plichta SB, Kelvin E. Munro’s statistical methods for health care research. 6th ed. Philadelphia (PA): Wolters Kluwer Health/Lippincott, Williams & Wilkins; 2013. [ Google Scholar ]

Data Analysis Plan: Ultimate Guide and Examples

Learn the post survey questions you need to ask attendees for valuable feedback.

data analysis in research plan example

Once you get survey feedback , you might think that the job is done. The next step, however, is to analyze those results. Creating a data analysis plan will help guide you through how to analyze the data and come to logical conclusions.

So, how do you create a data analysis plan? It starts with the goals you set for your survey in the first place. This guide will help you create a data analysis plan that will effectively utilize the data your respondents provided.

What can a data analysis plan do?

Think of data analysis plans as a guide to your organization and analysis, which will help you accomplish your ultimate survey goals. A good plan will make sure that you get answers to your top questions, such as “how do customers feel about this new product?” through specific survey questions. It will also separate respondents to see how opinions among various demographics may differ.

Creating a data analysis plan

Follow these steps to create your own data analysis plan.

Review your goals

When you plan a survey, you typically have specific goals in mind. That might be measuring customer sentiment, answering an academic question, or achieving another purpose.

If you’re beta testing a new product, your survey goal might be “find out how potential customers feel about the new product.” You probably came up with several topics you wanted to address, such as:

  • What is the typical experience with the product?
  • Which demographics are responding most positively? How well does this match with our idea of the target market?
  • Are there any specific pain points that need to be corrected before the product launches?
  • Are there any features that should be added before the product launches?

Use these objectives to organize your survey data.

Evaluate the results for your top questions

Your survey questions probably included at least one or two questions that directly relate to your primary goals. For example, in the beta testing example above, your top two questions might be:

  • How would you rate your overall satisfaction with the product?
  • Would you consider purchasing this product?

Those questions offer a general overview of how your customers feel. Whether their sentiments are generally positive, negative, or neutral, this is the main data your company needs. The next goal is to determine why the beta testers feel the way they do.

Assign questions to specific goals

Next, you’ll organize your survey questions and responses by which research question they answer. For example, you might assign questions to the “overall satisfaction” section, like:

  • How would you describe your experience with the product?
  • Did you encounter any problems while using the product?
  • What were your favorite/least favorite features?
  • How useful was the product in achieving your goals?

Under demographics, you’d include responses to questions like:

  • Education level

This helps you determine which questions and answers will answer larger questions, such as “which demographics are most likely to have had a positive experience?”

Pay special attention to demographics

Demographics are particularly important to a data analysis plan. Of course you’ll want to know what kind of experience your product testers are having with the product—but you also want to know who your target market should be. Separating responses based on demographics can be especially illuminating.

For example, you might find that users aged 25 to 45 find the product easier to use, but people over 65 find it too difficult. If you want to target the over-65 demographic, you can use that group’s survey data to refine the product before it launches.

Other demographic segregation can be helpful, too. You might find that your product is popular with people from the tech industry, who have an easier time with a user interface, while those from other industries, like education, struggle to use the tool effectively. If you’re targeting the tech industry, you may not need to make adjustments—but if it’s a technological tool designed primarily for educators, you’ll want to make appropriate changes.

Similarly, factors like location, education level, income bracket, and other demographics can help you compare experiences between the groups. Depending on your ultimate survey goals, you may want to compare multiple demographic types to get accurate insight into your results.

Consider correlation vs. causation

When creating your data analysis plan, remember to consider the difference between correlation and causation. For instance, being over 65 might correlate with a difficult user experience, but the cause of the experience might be something else entirely. You may find that your respondents over 65 are primarily from a specific educational background, or have issues reading the text in your user interface. It’s important to consider all the different data points, and how they might have an effect on the overall results.

Moving on to analysis

Once you’ve assigned survey questions to the overall research questions they’re designed to answer, you can move on to the actual data analysis. Depending on your survey tool, you may already have software that can perform quantitative and/or qualitative analysis. Choose the analysis types that suit your questions and goals, then use your analytic software to evaluate the data and create graphs or reports with your survey results.

At the end of the process, you should be able to answer your major research questions.

Power your data analysis with Voiceform

Once you have established your survey goals, Voiceform can power your data collection and analysis. Our feature-rich survey platform offers an easy-to-use interface, multi-channel survey tools, multimedia question types, and powerful analytics. We can help you create and work through a data analysis plan. Find out more about the product, and book a free demo today !

We make collecting, sharing and analyzing data a breeze

Get started for free. Get instant access to Voiceform features that get you amazing data in minutes.

data analysis in research plan example

Writing the Data Analysis Plan

  • First Online: 01 January 2010

Cite this chapter

data analysis in research plan example

  • A. T. Panter 4  

5899 Accesses

3 Altmetric

You and your project statistician have one major goal for your data analysis plan: You need to convince all the reviewers reading your proposal that you would know what to do with your data once your project is funded and your data are in hand. The data analytic plan is a signal to the reviewers about your ability to score, describe, and thoughtfully synthesize a large number of variables into appropriately-selected quantitative models once the data are collected. Reviewers respond very well to plans with a clear elucidation of the data analysis steps – in an appropriate order, with an appropriate level of detail and reference to relevant literatures, and with statistical models and methods for that map well into your proposed aims. A successful data analysis plan produces reviews that either include no comments about the data analysis plan or better yet, compliments it for being comprehensive and logical given your aims. This chapter offers practical advice about developing and writing a compelling, “bullet-proof” data analytic plan for your grant application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

data analysis in research plan example

A Holistic Approach to Empirical Analysis: The Insignificance of P, Hypothesis Testing and Statistical Significance*

data analysis in research plan example

Meta-Analysis

data analysis in research plan example

Researchers’ data analysis choices: an excess of false positives?

Aiken, L. S. & West, S. G. (1991). Multiple regression: testing and interpreting interactions . Newbury Park, CA: Sage.

Google Scholar  

Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension of Aiken, West, Sechrest and Reno’s (1990) survey of PhD programs in North America. American Psychologist , 63 , 32–50.

Article   PubMed   Google Scholar  

Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology , 112 , 545–557.

American Psychological Association (APA) Task Force to Increase the Quantitative Pipeline (2009). Report of the task force to increase the quantitative pipeline . Washington, DC: American Psychological Association.

Bauer, D. & Curran, P. J. (2004). The integration of continuous and discrete latent variables: Potential problems and promising opportunities. Psychological Methods , 9 , 3–29.

Bollen, K. A. (1989). Structural equations with latent variables . New York: Wiley.

Bollen, K. A. & Curran, P. J. (2007). Latent curve models: A structural equation modeling approach . New York: Wiley.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Multiple correlation/regression for the behavioral sciences (3rd ed.). Mahwah, NJ: Erlbaum.

Curran, P. J., Bauer, D. J., & Willoughby, M. T. (2004). Testing main effects and interactions in hierarchical linear growth models. Psychological Methods , 9 , 220–237.

Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists . Mahwah, NJ: Erlbaum.

Enders, C. K. (2006). Analyzing structural equation models with missing data. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 313–342). Greenwich, CT: Information Age.

Hosmer, D. & Lemeshow, S. (1989). Applied logistic regression . New York: Wiley.

Hoyle, R. H. & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 158–176). Thousand Oaks: Sage.

Kaplan, D. & Elliott, P. R. (1997). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling , 4 , 1–23.

Article   Google Scholar  

Lanza, S. T., Collins, L. M., Schafer, J. L., & Flaherty, B. P. (2005). Using data augmentation to obtain standard errors and conduct hypothesis tests in latent class and latent transition analysis. Psychological Methods , 10 , 84–100.

MacKinnon, D. P. (2008). Introduction to statistical mediation analysis . Mahwah, NJ: Erlbaum.

Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods , 9 , 147–163.

McCullagh, P. & Nelder, J. (1989). Generalized linear models . London: Chapman and Hall.

McDonald, R. P. & Ho, M. R. (2002). Principles and practices in reporting structural equation modeling analyses. Psychological Methods , 7 , 64–82.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.

Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research , 22 , 376–398.

Muthén, B. (2008). Latent variable hybrids: overview of old and new models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 1–24). Charlotte, NC: Information Age.

Muthén, B. & Masyn, K. (2004). Discrete-time survival mixture analysis. Journal of Educational and Behavioral Statistics , 30 , 27–58.

Muthén, L. K. & Muthén, B. O. (2004). Mplus, statistical analysis with latent variables: User’s guide . Los Angeles, CA: Muthén &Muthén.

Peugh, J. L. & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research , 74 , 525–556.

Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interaction effects in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics , 31 , 437–448.

Preacher, K. J., Curran, P. J., & Bauer, D. J. (2003, September). Probing interactions in multiple linear regression, latent curve analysis, and hierarchical linear modeling: Interactive calculation tools for establishing simple intercepts, simple slopes, and regions of significance [Computer software]. Available from http://www.quantpsy.org .

Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Addressing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivariate Behavioral Research , 42 , 185–227.

Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.

Radloff, L. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement , 1 , 385–401.

Rosenberg, M. (1965). Society and the adolescent self-image . Princeton, NJ: Princeton University Press.

Schafer. J. L. & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods , 7 , 147–177.

Schumacker, R. E. (2002). Latent variable interaction modeling. Structural Equation Modeling , 9 , 40–54.

Schumacker, R. E. & Lomax, R. G. (2004). A beginner’s guide to structural equation modeling . Mahwah, NJ: Erlbaum.

Selig, J. P. & Preacher, K. J. (2008, June). Monte Carlo method for assessing mediation: An interactive tool for creating confidence intervals for indirect effects [Computer software]. Available from http://www.quantpsy.org .

Singer, J. D. & Willett, J. B. (1991). Modeling the days of our lives: Using survival analysis when designing and analyzing longitudinal studies of duration and the timing of events. Psychological Bulletin , 110 , 268–290.

Singer, J. D. & Willett, J. B. (1993). It’s about time: Using discrete-time survival analysis to study duration and the timing of events. Journal of Educational Statistics , 18 , 155–195.

Singer, J. D. & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence . New York: Oxford University.

Book   Google Scholar  

Vandenberg, R. J. & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods , 3 , 4–69.

Wirth, R. J. & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods , 12 , 58–79.

Article   PubMed   CAS   Google Scholar  

Download references

Author information

Authors and affiliations.

L. L. Thurstone Psychometric Laboratory, Department of Psychology, University of North Carolina, Chapel Hill, NC, USA

A. T. Panter

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to A. T. Panter .

Editor information

Editors and affiliations.

National Institute of Mental Health, Executive Blvd. 6001, Bethesda, 20892-9641, Maryland, USA

Willo Pequegnat

Ellen Stover

Delafield Place, N.W. 1413, Washington, 20011, District of Columbia, USA

Cheryl Anne Boyce

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Panter, A.T. (2010). Writing the Data Analysis Plan. In: Pequegnat, W., Stover, E., Boyce, C. (eds) How to Write a Successful Research Grant Application. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1454-5_22

Download citation

DOI : https://doi.org/10.1007/978-1-4419-1454-5_22

Published : 20 August 2010

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4419-1453-8

Online ISBN : 978-1-4419-1454-5

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

data analysis in research plan example

Qualitative Data Analysis Methods 101:

The “big 6” methods + examples.

By: Kerryn Warren (PhD) | Reviewed By: Eunice Rautenbach (D.Tech) | May 2020 (Updated April 2023)

Qualitative data analysis methods. Wow, that’s a mouthful. 

If you’re new to the world of research, qualitative data analysis can look rather intimidating. So much bulky terminology and so many abstract, fluffy concepts. It certainly can be a minefield!

Don’t worry – in this post, we’ll unpack the most popular analysis methods , one at a time, so that you can approach your analysis with confidence and competence – whether that’s for a dissertation, thesis or really any kind of research project.

Qualitative data analysis methods

What (exactly) is qualitative data analysis?

To understand qualitative data analysis, we need to first understand qualitative data – so let’s step back and ask the question, “what exactly is qualitative data?”.

Qualitative data refers to pretty much any data that’s “not numbers” . In other words, it’s not the stuff you measure using a fixed scale or complex equipment, nor do you analyse it using complex statistics or mathematics.

So, if it’s not numbers, what is it?

Words, you guessed? Well… sometimes , yes. Qualitative data can, and often does, take the form of interview transcripts, documents and open-ended survey responses – but it can also involve the interpretation of images and videos. In other words, qualitative isn’t just limited to text-based data.

So, how’s that different from quantitative data, you ask?

Simply put, qualitative research focuses on words, descriptions, concepts or ideas – while quantitative research focuses on numbers and statistics . Qualitative research investigates the “softer side” of things to explore and describe , while quantitative research focuses on the “hard numbers”, to measure differences between variables and the relationships between them. If you’re keen to learn more about the differences between qual and quant, we’ve got a detailed post over here .

qualitative data analysis vs quantitative data analysis

So, qualitative analysis is easier than quantitative, right?

Not quite. In many ways, qualitative data can be challenging and time-consuming to analyse and interpret. At the end of your data collection phase (which itself takes a lot of time), you’ll likely have many pages of text-based data or hours upon hours of audio to work through. You might also have subtle nuances of interactions or discussions that have danced around in your mind, or that you scribbled down in messy field notes. All of this needs to work its way into your analysis.

Making sense of all of this is no small task and you shouldn’t underestimate it. Long story short – qualitative analysis can be a lot of work! Of course, quantitative analysis is no piece of cake either, but it’s important to recognise that qualitative analysis still requires a significant investment in terms of time and effort.

Need a helping hand?

data analysis in research plan example

In this post, we’ll explore qualitative data analysis by looking at some of the most common analysis methods we encounter. We’re not going to cover every possible qualitative method and we’re not going to go into heavy detail – we’re just going to give you the big picture. That said, we will of course includes links to loads of extra resources so that you can learn more about whichever analysis method interests you.

Without further delay, let’s get into it.

The “Big 6” Qualitative Analysis Methods 

There are many different types of qualitative data analysis, all of which serve different purposes and have unique strengths and weaknesses . We’ll start by outlining the analysis methods and then we’ll dive into the details for each.

The 6 most popular methods (or at least the ones we see at Grad Coach) are:

  • Content analysis
  • Narrative analysis
  • Discourse analysis
  • Thematic analysis
  • Grounded theory (GT)
  • Interpretive phenomenological analysis (IPA)

Let’s take a look at each of them…

QDA Method #1: Qualitative Content Analysis

Content analysis is possibly the most common and straightforward QDA method. At the simplest level, content analysis is used to evaluate patterns within a piece of content (for example, words, phrases or images) or across multiple pieces of content or sources of communication. For example, a collection of newspaper articles or political speeches.

With content analysis, you could, for instance, identify the frequency with which an idea is shared or spoken about – like the number of times a Kardashian is mentioned on Twitter. Or you could identify patterns of deeper underlying interpretations – for instance, by identifying phrases or words in tourist pamphlets that highlight India as an ancient country.

Because content analysis can be used in such a wide variety of ways, it’s important to go into your analysis with a very specific question and goal, or you’ll get lost in the fog. With content analysis, you’ll group large amounts of text into codes , summarise these into categories, and possibly even tabulate the data to calculate the frequency of certain concepts or variables. Because of this, content analysis provides a small splash of quantitative thinking within a qualitative method.

Naturally, while content analysis is widely useful, it’s not without its drawbacks . One of the main issues with content analysis is that it can be very time-consuming , as it requires lots of reading and re-reading of the texts. Also, because of its multidimensional focus on both qualitative and quantitative aspects, it is sometimes accused of losing important nuances in communication.

Content analysis also tends to concentrate on a very specific timeline and doesn’t take into account what happened before or after that timeline. This isn’t necessarily a bad thing though – just something to be aware of. So, keep these factors in mind if you’re considering content analysis. Every analysis method has its limitations , so don’t be put off by these – just be aware of them ! If you’re interested in learning more about content analysis, the video below provides a good starting point.

QDA Method #2: Narrative Analysis 

As the name suggests, narrative analysis is all about listening to people telling stories and analysing what that means . Since stories serve a functional purpose of helping us make sense of the world, we can gain insights into the ways that people deal with and make sense of reality by analysing their stories and the ways they’re told.

You could, for example, use narrative analysis to explore whether how something is being said is important. For instance, the narrative of a prisoner trying to justify their crime could provide insight into their view of the world and the justice system. Similarly, analysing the ways entrepreneurs talk about the struggles in their careers or cancer patients telling stories of hope could provide powerful insights into their mindsets and perspectives . Simply put, narrative analysis is about paying attention to the stories that people tell – and more importantly, the way they tell them.

Of course, the narrative approach has its weaknesses , too. Sample sizes are generally quite small due to the time-consuming process of capturing narratives. Because of this, along with the multitude of social and lifestyle factors which can influence a subject, narrative analysis can be quite difficult to reproduce in subsequent research. This means that it’s difficult to test the findings of some of this research.

Similarly, researcher bias can have a strong influence on the results here, so you need to be particularly careful about the potential biases you can bring into your analysis when using this method. Nevertheless, narrative analysis is still a very useful qualitative analysis method – just keep these limitations in mind and be careful not to draw broad conclusions . If you’re keen to learn more about narrative analysis, the video below provides a great introduction to this qualitative analysis method.

Private Coaching

QDA Method #3: Discourse Analysis 

Discourse is simply a fancy word for written or spoken language or debate . So, discourse analysis is all about analysing language within its social context. In other words, analysing language – such as a conversation, a speech, etc – within the culture and society it takes place. For example, you could analyse how a janitor speaks to a CEO, or how politicians speak about terrorism.

To truly understand these conversations or speeches, the culture and history of those involved in the communication are important factors to consider. For example, a janitor might speak more casually with a CEO in a company that emphasises equality among workers. Similarly, a politician might speak more about terrorism if there was a recent terrorist incident in the country.

So, as you can see, by using discourse analysis, you can identify how culture , history or power dynamics (to name a few) have an effect on the way concepts are spoken about. So, if your research aims and objectives involve understanding culture or power dynamics, discourse analysis can be a powerful method.

Because there are many social influences in terms of how we speak to each other, the potential use of discourse analysis is vast . Of course, this also means it’s important to have a very specific research question (or questions) in mind when analysing your data and looking for patterns and themes, or you might land up going down a winding rabbit hole.

Discourse analysis can also be very time-consuming  as you need to sample the data to the point of saturation – in other words, until no new information and insights emerge. But this is, of course, part of what makes discourse analysis such a powerful technique. So, keep these factors in mind when considering this QDA method. Again, if you’re keen to learn more, the video below presents a good starting point.

QDA Method #4: Thematic Analysis

Thematic analysis looks at patterns of meaning in a data set – for example, a set of interviews or focus group transcripts. But what exactly does that… mean? Well, a thematic analysis takes bodies of data (which are often quite large) and groups them according to similarities – in other words, themes . These themes help us make sense of the content and derive meaning from it.

Let’s take a look at an example.

With thematic analysis, you could analyse 100 online reviews of a popular sushi restaurant to find out what patrons think about the place. By reviewing the data, you would then identify the themes that crop up repeatedly within the data – for example, “fresh ingredients” or “friendly wait staff”.

So, as you can see, thematic analysis can be pretty useful for finding out about people’s experiences , views, and opinions . Therefore, if your research aims and objectives involve understanding people’s experience or view of something, thematic analysis can be a great choice.

Since thematic analysis is a bit of an exploratory process, it’s not unusual for your research questions to develop , or even change as you progress through the analysis. While this is somewhat natural in exploratory research, it can also be seen as a disadvantage as it means that data needs to be re-reviewed each time a research question is adjusted. In other words, thematic analysis can be quite time-consuming – but for a good reason. So, keep this in mind if you choose to use thematic analysis for your project and budget extra time for unexpected adjustments.

Thematic analysis takes bodies of data and groups them according to similarities (themes), which help us make sense of the content.

QDA Method #5: Grounded theory (GT) 

Grounded theory is a powerful qualitative analysis method where the intention is to create a new theory (or theories) using the data at hand, through a series of “ tests ” and “ revisions ”. Strictly speaking, GT is more a research design type than an analysis method, but we’ve included it here as it’s often referred to as a method.

What’s most important with grounded theory is that you go into the analysis with an open mind and let the data speak for itself – rather than dragging existing hypotheses or theories into your analysis. In other words, your analysis must develop from the ground up (hence the name). 

Let’s look at an example of GT in action.

Assume you’re interested in developing a theory about what factors influence students to watch a YouTube video about qualitative analysis. Using Grounded theory , you’d start with this general overarching question about the given population (i.e., graduate students). First, you’d approach a small sample – for example, five graduate students in a department at a university. Ideally, this sample would be reasonably representative of the broader population. You’d interview these students to identify what factors lead them to watch the video.

After analysing the interview data, a general pattern could emerge. For example, you might notice that graduate students are more likely to read a post about qualitative methods if they are just starting on their dissertation journey, or if they have an upcoming test about research methods.

From here, you’ll look for another small sample – for example, five more graduate students in a different department – and see whether this pattern holds true for them. If not, you’ll look for commonalities and adapt your theory accordingly. As this process continues, the theory would develop . As we mentioned earlier, what’s important with grounded theory is that the theory develops from the data – not from some preconceived idea.

So, what are the drawbacks of grounded theory? Well, some argue that there’s a tricky circularity to grounded theory. For it to work, in principle, you should know as little as possible regarding the research question and population, so that you reduce the bias in your interpretation. However, in many circumstances, it’s also thought to be unwise to approach a research question without knowledge of the current literature . In other words, it’s a bit of a “chicken or the egg” situation.

Regardless, grounded theory remains a popular (and powerful) option. Naturally, it’s a very useful method when you’re researching a topic that is completely new or has very little existing research about it, as it allows you to start from scratch and work your way from the ground up .

Grounded theory is used to create a new theory (or theories) by using the data at hand, as opposed to existing theories and frameworks.

QDA Method #6:   Interpretive Phenomenological Analysis (IPA)

Interpretive. Phenomenological. Analysis. IPA . Try saying that three times fast…

Let’s just stick with IPA, okay?

IPA is designed to help you understand the personal experiences of a subject (for example, a person or group of people) concerning a major life event, an experience or a situation . This event or experience is the “phenomenon” that makes up the “P” in IPA. Such phenomena may range from relatively common events – such as motherhood, or being involved in a car accident – to those which are extremely rare – for example, someone’s personal experience in a refugee camp. So, IPA is a great choice if your research involves analysing people’s personal experiences of something that happened to them.

It’s important to remember that IPA is subject – centred . In other words, it’s focused on the experiencer . This means that, while you’ll likely use a coding system to identify commonalities, it’s important not to lose the depth of experience or meaning by trying to reduce everything to codes. Also, keep in mind that since your sample size will generally be very small with IPA, you often won’t be able to draw broad conclusions about the generalisability of your findings. But that’s okay as long as it aligns with your research aims and objectives.

Another thing to be aware of with IPA is personal bias . While researcher bias can creep into all forms of research, self-awareness is critically important with IPA, as it can have a major impact on the results. For example, a researcher who was a victim of a crime himself could insert his own feelings of frustration and anger into the way he interprets the experience of someone who was kidnapped. So, if you’re going to undertake IPA, you need to be very self-aware or you could muddy the analysis.

IPA can help you understand the personal experiences of a person or group concerning a major life event, an experience or a situation.

How to choose the right analysis method

In light of all of the qualitative analysis methods we’ve covered so far, you’re probably asking yourself the question, “ How do I choose the right one? ”

Much like all the other methodological decisions you’ll need to make, selecting the right qualitative analysis method largely depends on your research aims, objectives and questions . In other words, the best tool for the job depends on what you’re trying to build. For example:

  • Perhaps your research aims to analyse the use of words and what they reveal about the intention of the storyteller and the cultural context of the time.
  • Perhaps your research aims to develop an understanding of the unique personal experiences of people that have experienced a certain event, or
  • Perhaps your research aims to develop insight regarding the influence of a certain culture on its members.

As you can probably see, each of these research aims are distinctly different , and therefore different analysis methods would be suitable for each one. For example, narrative analysis would likely be a good option for the first aim, while grounded theory wouldn’t be as relevant. 

It’s also important to remember that each method has its own set of strengths, weaknesses and general limitations. No single analysis method is perfect . So, depending on the nature of your research, it may make sense to adopt more than one method (this is called triangulation ). Keep in mind though that this will of course be quite time-consuming.

As we’ve seen, all of the qualitative analysis methods we’ve discussed make use of coding and theme-generating techniques, but the intent and approach of each analysis method differ quite substantially. So, it’s very important to come into your research with a clear intention before you decide which analysis method (or methods) to use.

Start by reviewing your research aims , objectives and research questions to assess what exactly you’re trying to find out – then select a qualitative analysis method that fits. Never pick a method just because you like it or have experience using it – your analysis method (or methods) must align with your broader research aims and objectives.

No single analysis method is perfect, so it can often make sense to adopt more than one  method (this is called triangulation).

Let’s recap on QDA methods…

In this post, we looked at six popular qualitative data analysis methods:

  • First, we looked at content analysis , a straightforward method that blends a little bit of quant into a primarily qualitative analysis.
  • Then we looked at narrative analysis , which is about analysing how stories are told.
  • Next up was discourse analysis – which is about analysing conversations and interactions.
  • Then we moved on to thematic analysis – which is about identifying themes and patterns.
  • From there, we went south with grounded theory – which is about starting from scratch with a specific question and using the data alone to build a theory in response to that question.
  • And finally, we looked at IPA – which is about understanding people’s unique experiences of a phenomenon.

Of course, these aren’t the only options when it comes to qualitative data analysis, but they’re a great starting point if you’re dipping your toes into qualitative research for the first time.

If you’re still feeling a bit confused, consider our private coaching service , where we hold your hand through the research process to help you develop your best work.

data analysis in research plan example

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

87 Comments

Richard N

This has been very helpful. Thank you.

netaji

Thank you madam,

Mariam Jaiyeola

Thank you so much for this information

Nzube

I wonder it so clear for understand and good for me. can I ask additional query?

Lee

Very insightful and useful

Susan Nakaweesi

Good work done with clear explanations. Thank you.

Titilayo

Thanks so much for the write-up, it’s really good.

Hemantha Gunasekara

Thanks madam . It is very important .

Gumathandra

thank you very good

Faricoh Tushera

Great presentation

Pramod Bahulekar

This has been very well explained in simple language . It is useful even for a new researcher.

Derek Jansen

Great to hear that. Good luck with your qualitative data analysis, Pramod!

Adam Zahir

This is very useful information. And it was very a clear language structured presentation. Thanks a lot.

Golit,F.

Thank you so much.

Emmanuel

very informative sequential presentation

Shahzada

Precise explanation of method.

Alyssa

Hi, may we use 2 data analysis methods in our qualitative research?

Thanks for your comment. Most commonly, one would use one type of analysis method, but it depends on your research aims and objectives.

Dr. Manju Pandey

You explained it in very simple language, everyone can understand it. Thanks so much.

Phillip

Thank you very much, this is very helpful. It has been explained in a very simple manner that even a layman understands

Anne

Thank nicely explained can I ask is Qualitative content analysis the same as thematic analysis?

Thanks for your comment. No, QCA and thematic are two different types of analysis. This article might help clarify – https://onlinelibrary.wiley.com/doi/10.1111/nhs.12048

Rev. Osadare K . J

This is my first time to come across a well explained data analysis. so helpful.

Tina King

I have thoroughly enjoyed your explanation of the six qualitative analysis methods. This is very helpful. Thank you!

Bromie

Thank you very much, this is well explained and useful

udayangani

i need a citation of your book.

khutsafalo

Thanks a lot , remarkable indeed, enlighting to the best

jas

Hi Derek, What other theories/methods would you recommend when the data is a whole speech?

M

Keep writing useful artikel.

Adane

It is important concept about QDA and also the way to express is easily understandable, so thanks for all.

Carl Benecke

Thank you, this is well explained and very useful.

Ngwisa

Very helpful .Thanks.

Hajra Aman

Hi there! Very well explained. Simple but very useful style of writing. Please provide the citation of the text. warm regards

Hillary Mophethe

The session was very helpful and insightful. Thank you

This was very helpful and insightful. Easy to read and understand

Catherine

As a professional academic writer, this has been so informative and educative. Keep up the good work Grad Coach you are unmatched with quality content for sure.

Keep up the good work Grad Coach you are unmatched with quality content for sure.

Abdulkerim

Its Great and help me the most. A Million Thanks you Dr.

Emanuela

It is a very nice work

Noble Naade

Very insightful. Please, which of this approach could be used for a research that one is trying to elicit students’ misconceptions in a particular concept ?

Karen

This is Amazing and well explained, thanks

amirhossein

great overview

Tebogo

What do we call a research data analysis method that one use to advise or determining the best accounting tool or techniques that should be adopted in a company.

Catherine Shimechero

Informative video, explained in a clear and simple way. Kudos

Van Hmung

Waoo! I have chosen method wrong for my data analysis. But I can revise my work according to this guide. Thank you so much for this helpful lecture.

BRIAN ONYANGO MWAGA

This has been very helpful. It gave me a good view of my research objectives and how to choose the best method. Thematic analysis it is.

Livhuwani Reineth

Very helpful indeed. Thanku so much for the insight.

Storm Erlank

This was incredibly helpful.

Jack Kanas

Very helpful.

catherine

very educative

Wan Roslina

Nicely written especially for novice academic researchers like me! Thank you.

Talash

choosing a right method for a paper is always a hard job for a student, this is a useful information, but it would be more useful personally for me, if the author provide me with a little bit more information about the data analysis techniques in type of explanatory research. Can we use qualitative content analysis technique for explanatory research ? or what is the suitable data analysis method for explanatory research in social studies?

ramesh

that was very helpful for me. because these details are so important to my research. thank you very much

Kumsa Desisa

I learnt a lot. Thank you

Tesfa NT

Relevant and Informative, thanks !

norma

Well-planned and organized, thanks much! 🙂

Dr. Jacob Lubuva

I have reviewed qualitative data analysis in a simplest way possible. The content will highly be useful for developing my book on qualitative data analysis methods. Cheers!

Nyi Nyi Lwin

Clear explanation on qualitative and how about Case study

Ogobuchi Otuu

This was helpful. Thank you

Alicia

This was really of great assistance, it was just the right information needed. Explanation very clear and follow.

Wow, Thanks for making my life easy

C. U

This was helpful thanks .

Dr. Alina Atif

Very helpful…. clear and written in an easily understandable manner. Thank you.

Herb

This was so helpful as it was easy to understand. I’m a new to research thank you so much.

cissy

so educative…. but Ijust want to know which method is coding of the qualitative or tallying done?

Ayo

Thank you for the great content, I have learnt a lot. So helpful

Tesfaye

precise and clear presentation with simple language and thank you for that.

nneheng

very informative content, thank you.

Oscar Kuebutornye

You guys are amazing on YouTube on this platform. Your teachings are great, educative, and informative. kudos!

NG

Brilliant Delivery. You made a complex subject seem so easy. Well done.

Ankit Kumar

Beautifully explained.

Thanks a lot

Kidada Owen-Browne

Is there a video the captures the practical process of coding using automated applications?

Thanks for the comment. We don’t recommend using automated applications for coding, as they are not sufficiently accurate in our experience.

Mathewos Damtew

content analysis can be qualitative research?

Hend

THANK YOU VERY MUCH.

Dev get

Thank you very much for such a wonderful content

Kassahun Aman

do you have any material on Data collection

Prince .S. mpofu

What a powerful explanation of the QDA methods. Thank you.

Kassahun

Great explanation both written and Video. i have been using of it on a day to day working of my thesis project in accounting and finance. Thank you very much for your support.

BORA SAMWELI MATUTULI

very helpful, thank you so much

ngoni chibukire

The tutorial is useful. I benefited a lot.

Thandeka Hlatshwayo

This is an eye opener for me and very informative, I have used some of your guidance notes on my Thesis, I wonder if you can assist with your 1. name of your book, year of publication, topic etc., this is for citing in my Bibliography,

I certainly hope to hear from you

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

data analysis in research plan example

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

data analysis in research plan example

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

  • 1-icon/ui/arrow_right Amsterdam Public Health
  • 1-icon/ui/arrow_right Home
  • 1-icon/ui/arrow_right Research Lifecycle

More APH...

  • 1-icon/ui/arrow_right About
  • 1-icon/ui/arrow_right News
  • 1-icon/ui/arrow_right Events
  • 1-icon/ui/arrow_right Research information
  • 1-icon/ui/arrow_right Our strenghts
  • Amsterdam Public Health
  • Research Lifecycle
  • Research information
  • Our strenghts
  • Proposal Writing
  • Study Preparation
  • Methods & Data Collection
  • Process & Analyze Data
  • Writing & Publication
  • Archiving & Open Data
  • Knowledge Utilization
  • Supervision
  • Analysis plan
  • Set-up & Conduct
  • Quantitative research

Data analysis

  • Initial data analysis
  • Post-hoc & sensitivity analyses
  • Data analysis documentation
  • Handling missing data

To promote structured targeted data analysis.

Requirements

An analysis plan should be created and finalized prior to the data analyses.

Documentation

The analysis plan (Guidelines per study type are provided below)

Responsibilities

  • Executing researcher: To create the analysis plan prior to the data analyses, containing a description of the research question and what the various steps in the analysis are going to be. This should also be signed and dated by the PI.
  • Project leaders: To inform the executing researcher about setting up the analysis plan before analyses are undertaken.
  • Research assistant: N.a.

An analysis plan should be created and finalized (signed and dated by PI) prior to the data analyses. The analysis plan contains a description of the research question and what the various steps in the analysis are going to be. It also contains an exploration of literature (what is already know? What will this study add?) to make sure your research question is relevant (see Glasziou et al. Lancet 2014 on avoiding research waste).The analysis plan is intended as a starting point for the analysis. It ensures that the analysis can be undertaken in a targeted manner, and promotes research integrity.

If you will perform an exploratory study you can adjust your analysis based on the data you find; this may be useful if not much is known about the research subject, but it is considered as relatively low level evidence and it should be clearly mentioned in your report that the presented study is exploratory. If you want to perform an hypothesis-testing study (be it interventional or using observational data) you need to pre-specify the analyses you intend to do prior to performing the analysis, including the population, subgroups, stratifications and statistical tests. If deviations from the analysis plan are made during the study this should be documented in the analysis plan and stated in the report (i.e. post-hoc tests). If you intend to do hypothesis-free research with multiple testing you should pre-specify your threshold for statistical significance according to the number of analyses you will perform. Lastly, if you intend to perform an RCT, the analysis plan is practically set in stone. (Also see ICH E9 - statistical principles for clinical trials )

If needed, an exploratory analysis may be part of the analysis plan, to inform the setting up of the final analysis (see initial data analysis ). For instance, you may want to know distributions of values in order to create meaningful categories, or determine whether data are normally distributed. The findings and decisions made during these preliminary exploratory analyses should be clearly documented, preferably in a version two of the analysis plan, and made reproducible by providing the data analysis syntax (in SPSS, SAS, STATA, R) (see guideline Documentation of data analysis ).

The concrete research question needs to be formulated firstly within the analysis plan following the literature review; this is the question intended to be answered by the analyses. Concrete research questions may be defined using the acronym PICO: Population, Intervention, Comparison, Outcomes. An example of a concrete question could be: “Does frequent bending at work lead to an elevated risk of lower back pain occurring in employees?” (Population = Employees; Intervention = Frequent bending; Comparison = Infrequent bending; Outcome = Occurrence of back pain). Concrete research questions are essential for determining the analyses required.

The analysis plan should then describe the primary and secondary outcomes, the determinants and data needed, and which statistical techniques are to be used to analyse the data. The following issues need to be considered in this process and described where applicable:

  • In case of a trial: is the trial a superiority, non-inferiority or equivalence trial.
  • Superiority: treatment A is better than the control.
  • Non-inferiority: treatment A is not worse than treatment B.
  • Equivalence: testing similarity using a tolerance range.

In other studies: what is the study design (case control, longitudinal cohort etc).

  • Which (subgroup of the) population is to be included in the analyses? Which groups will you compare?;
  • What are the primary and secondary endpoints? Which data from which endpoint (T1, T2, etc.) will be used?;
  • Which (dependent and independent) variables are to be used in the analyses and how are the variables to be analysed (e.g. continuous or in categories)?;
  • Which variables are to be investigated as potential confounders or effect modifiers (and why) and how are these variables to be analysed? There are different ways of dealing with confounders. We distinguish the following: 1) correct for all potential confounders (and do not concern about the question whether or not a variable is a ‘real’ confounder). Mostly, confounders are split up in little groups (demographic factors, clinical parameters, etc.). As a result you get corrected model 1, corrected model 2, etc. However, pay attention to collinearity and overcorrection if confounders coincide too much with primary determinants. 2) if the sample size is not big enough relative to the number of potential confounders,  you may consider to only correct for those confounders that are relevant for the association between determinant and outcome. To select the relevant confounders, mostly a forward selection procedure is performed. In this case the confounders are added to the model one by one (the confounder that is associated strongest first). Subsequently, consider to what extent the effect of the variable of interest is changed. Then first choose the strongest confounder in the model. Subsequently, repeat this procedure untill no confounder has a relevant effect (<10% change in regression coefficient). Alternatively, you can select the confounders that univariately change the point estimate of the association with >10%. 3) Another option is to set up a Directed Acyclic Graph (DAG), to determine which confounders should be added to the model. Please see http://www.dagitty.net/ for more information.
  • How to deal with missing values? (see chapter on handeling missing data );
  • Which analyses are to be carried out in which order (e.g. univariable analyses, multivariable analyses, analysis of confounders, analysis of interaction effects, analysis of sub-populations, etc.)?; Which sensitivity analyses will be performed?
  • Do the data meet the criteria for the specific statistical technique?

A statistician may need to be consulted regarding the choice of statistical techniques (also see this intanetpage on statistical analysis plan ).

It is recommended to already design  the empty tables to be included in the article prior to the start of data analysis. This is often very helpful in deciding which analyses are exactly required in order to analyse the data in a targeted manner.

You may consider to make your study protocol including the (statistical) analysis plan public, either by placing in on a publicly accessible website (Concept Paper/Design paper) or by uploading it in an appropriate studies register (for human trials: NTR / EUDRACT / ClinicalTrials.gov , for non-/preclinicaltrials: preclinicaltrials.eu ).

Check the reporting guidelines when writing an analysis plan . These will help increase the quality of your research and guide you.

 

 

 

A Really Simple Guide to Quantitative Data Analysis

  • Affiliation: Birmingham City University

Peter Samuels at Birmingham City University

  • Birmingham City University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Anthony Kofi Badu

Ginn Bonsu Assibey

  • Ralitsa Diana Debrah
  • Joan Chepsergon

Elijah Macharia Ndung’u

  • Zubaidah Zubaidah
  • Reri Syafitri

Metropolitan International University Research Repository Extension

  • Fitra Andana
  • Azwandi Azwandi

Habizah Sheikh Ilmi

  • Jing Wen Yee

Kenn Jhun Kam

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Process Street logo

Data Analysis Plan Template

Define research objectives, identify data sources, plan data collection method, define sample size and sampling procedure, approval: research design.

  • Define Research Objectives Will be submitted
  • Identify Data Sources Will be submitted
  • Plan Data Collection Method Will be submitted
  • Define Sample Size and Sampling Procedure Will be submitted

Collect Data

Prepare and clean data.

  • 1 Remove irrelevant data
  • 2 Remove redundant data
  • 3 Filter outliers

Conduct Preliminary Analysis

Identify and address data quality issues.

  • 1 Inaccurate data
  • 2 Incomplete data
  • 3 Inconsistent data
  • 4 Missing data
  • 5 Erroneous data

Approval: Initial Findings

  • Collect Data Will be submitted
  • Prepare and Clean Data Will be submitted
  • Conduct Preliminary Analysis Will be submitted
  • Identify and Address Data Quality Issues Will be submitted

Conduct Advanced Analysis

  • 1 Regression analysis
  • 2 Predictive modeling
  • 3 Machine learning algorithms

Interpret Data Analysis Results

Formulate conclusions, prepare analysis report, approval: final report.

  • Conduct Advanced Analysis Will be submitted
  • Interpret Data Analysis Results Will be submitted
  • Formulate Conclusions Will be submitted
  • Prepare Analysis Report Will be submitted

Present Results to Stakeholders

Approval: presentation.

  • Present Results to Stakeholders Will be submitted

Take control of your workflows today.

More templates like this.

Organizational Change Management Template

Business growth

Business tips

What is data analysis? Examples and how to get started

A hero image with an icon of a line graph / chart

Even with years of professional experience working with data, the term "data analysis" still sets off a panic button in my soul. And yes, when it comes to serious data analysis for your business, you'll eventually want data scientists on your side. But if you're just getting started, no panic attacks are required.

Table of contents:

Quick review: What is data analysis?

Data analysis is the process of examining, filtering, adapting, and modeling data to help solve problems. Data analysis helps determine what is and isn't working, so you can make the changes needed to achieve your business goals. 

Keep in mind that data analysis includes analyzing both quantitative data (e.g., profits and sales) and qualitative data (e.g., surveys and case studies) to paint the whole picture. Here are two simple examples (of a nuanced topic) to show you what I mean.

An example of quantitative data analysis is an online jewelry store owner using inventory data to forecast and improve reordering accuracy. The owner looks at their sales from the past six months and sees that, on average, they sold 210 gold pieces and 105 silver pieces per month, but they only had 100 gold pieces and 100 silver pieces in stock. By collecting and analyzing inventory data on these SKUs, they're forecasting to improve reordering accuracy. The next time they order inventory, they order twice as many gold pieces as silver to meet customer demand.

An example of qualitative data analysis is a fitness studio owner collecting customer feedback to improve class offerings. The studio owner sends out an open-ended survey asking customers what types of exercises they enjoy the most. The owner then performs qualitative content analysis to identify the most frequently suggested exercises and incorporates these into future workout classes.

Why is data analysis important?

Here's why it's worth implementing data analysis for your business:

Understand your target audience: You might think you know how to best target your audience, but are your assumptions backed by data? Data analysis can help answer questions like, "What demographics define my target audience?" or "What is my audience motivated by?"

Inform decisions: You don't need to toss and turn over a decision when the data points clearly to the answer. For instance, a restaurant could analyze which dishes on the menu are selling the most, helping them decide which ones to keep and which ones to change.

Adjust budgets: Similarly, data analysis can highlight areas in your business that are performing well and are worth investing more in, as well as areas that aren't generating enough revenue and should be cut. For example, a B2B software company might discover their product for enterprises is thriving while their small business solution lags behind. This discovery could prompt them to allocate more budget toward the enterprise product, resulting in better resource utilization.

Identify and solve problems: Let's say a cell phone manufacturer notices data showing a lot of customers returning a certain model. When they investigate, they find that model also happens to have the highest number of crashes. Once they identify and solve the technical issue, they can reduce the number of returns.

Types of data analysis (with examples)

There are five main types of data analysis—with increasingly scary-sounding names. Each one serves a different purpose, so take a look to see which makes the most sense for your situation. It's ok if you can't pronounce the one you choose. 

Types of data analysis including text analysis, statistical analysis, diagnostic analysis, predictive analysis, and prescriptive analysis.

Text analysis: What is happening?

Here are a few methods used to perform text analysis, to give you a sense of how it's different from a human reading through the text: 

Word frequency identifies the most frequently used words. For example, a restaurant monitors social media mentions and measures the frequency of positive and negative keywords like "delicious" or "expensive" to determine how customers feel about their experience. 

Language detection indicates the language of text. For example, a global software company may use language detection on support tickets to connect customers with the appropriate agent. 

Keyword extraction automatically identifies the most used terms. For example, instead of sifting through thousands of reviews, a popular brand uses a keyword extractor to summarize the words or phrases that are most relevant. 

Statistical analysis: What happened?

Statistical analysis pulls past data to identify meaningful trends. Two primary categories of statistical analysis exist: descriptive and inferential.

Descriptive analysis

Here are a few methods used to perform descriptive analysis: 

Measures of frequency identify how frequently an event occurs. For example, a popular coffee chain sends out a survey asking customers what their favorite holiday drink is and uses measures of frequency to determine how often a particular drink is selected. 

Measures of central tendency use mean, median, and mode to identify results. For example, a dating app company might use measures of central tendency to determine the average age of its users.

Measures of dispersion measure how data is distributed across a range. For example, HR may use measures of dispersion to determine what salary to offer in a given field. 

Inferential analysis

Inferential analysis uses a sample of data to draw conclusions about a much larger population. This type of analysis is used when the population you're interested in analyzing is very large. 

Here are a few methods used when performing inferential analysis: 

Hypothesis testing identifies which variables impact a particular topic. For example, a business uses hypothesis testing to determine if increased sales were the result of a specific marketing campaign. 

Regression analysis shows the effect of independent variables on a dependent variable. For example, a rental car company may use regression analysis to determine the relationship between wait times and number of bad reviews. 

Diagnostic analysis: Why did it happen?

Diagnostic analysis, also referred to as root cause analysis, uncovers the causes of certain events or results. 

Here are a few methods used to perform diagnostic analysis: 

Time-series analysis analyzes data collected over a period of time. A retail store may use time-series analysis to determine that sales increase between October and December every year. 

Correlation analysis determines the strength of the relationship between variables. For example, a local ice cream shop may determine that as the temperature in the area rises, so do ice cream sales. 

Predictive analysis: What is likely to happen?

Predictive analysis aims to anticipate future developments and events. By analyzing past data, companies can predict future scenarios and make strategic decisions.  

Here are a few methods used to perform predictive analysis: 

Decision trees map out possible courses of action and outcomes. For example, a business may use a decision tree when deciding whether to downsize or expand. 

Prescriptive analysis: What action should we take?

The highest level of analysis, prescriptive analysis, aims to find the best action plan. Typically, AI tools model different outcomes to predict the best approach. While these tools serve to provide insight, they don't replace human consideration, so always use your human brain before going with the conclusion of your prescriptive analysis. Otherwise, your GPS might drive you into a lake.

Here are a few methods used to perform prescriptive analysis: 

Algorithms are used in technology to perform specific tasks. For example, banks use prescriptive algorithms to monitor customers' spending and recommend that they deactivate their credit card if fraud is suspected. 

Data analysis process: How to get started

The actual analysis is just one step in a much bigger process of using data to move your business forward. Here's a quick look at all the steps you need to take to make sure you're making informed decisions. 

Circle chart with data decision, data collection, data cleaning, data analysis, data interpretation, and data visualization.

Data decision

As with almost any project, the first step is to determine what problem you're trying to solve through data analysis. 

Make sure you get specific here. For example, a food delivery service may want to understand why customers are canceling their subscriptions. But to enable the most effective data analysis, they should pose a more targeted question, such as "How can we reduce customer churn without raising costs?" 

Data collection

Next, collect the required data from both internal and external sources. 

Internal data comes from within your business (think CRM software, internal reports, and archives), and helps you understand your business and processes.

External data originates from outside of the company (surveys, questionnaires, public data) and helps you understand your industry and your customers. 

Data cleaning

Data can be seriously misleading if it's not clean. So before you analyze, make sure you review the data you collected.  Depending on the type of data you have, cleanup will look different, but it might include: 

Removing unnecessary information 

Addressing structural errors like misspellings

Deleting duplicates

Trimming whitespace

Human checking for accuracy 

Data analysis

Now that you've compiled and cleaned the data, use one or more of the above types of data analysis to find relationships, patterns, and trends. 

Data analysis tools can speed up the data analysis process and remove the risk of inevitable human error. Here are some examples.

Spreadsheets sort, filter, analyze, and visualize data. 

Structured query language (SQL) tools manage and extract data in relational databases. 

Data interpretation

After you analyze the data, you'll need to go back to the original question you posed and draw conclusions from your findings. Here are some common pitfalls to avoid:

Correlation vs. causation: Just because two variables are associated doesn't mean they're necessarily related or dependent on one another. 

Confirmation bias: This occurs when you interpret data in a way that confirms your own preconceived notions. To avoid this, have multiple people interpret the data. 

Small sample size: If your sample size is too small or doesn't represent the demographics of your customers, you may get misleading results. If you run into this, consider widening your sample size to give you a more accurate representation. 

Data visualization

Automate your data collection, frequently asked questions.

Need a quick summary or still have a few nagging data analysis questions? I'm here for you.

What are the five types of data analysis?

The five types of data analysis are text analysis, statistical analysis, diagnostic analysis, predictive analysis, and prescriptive analysis. Each type offers a unique lens for understanding data: text analysis provides insights into text-based content, statistical analysis focuses on numerical trends, diagnostic analysis looks into problem causes, predictive analysis deals with what may happen in the future, and prescriptive analysis gives actionable recommendations.

What is the data analysis process?

The data analysis process involves data decision, collection, cleaning, analysis, interpretation, and visualization. Every stage comes together to transform raw data into meaningful insights. Decision determines what data to collect, collection gathers the relevant information, cleaning ensures accuracy, analysis uncovers patterns, interpretation assigns meaning, and visualization presents the insights.

What is the main purpose of data analysis?

In business, the main purpose of data analysis is to uncover patterns, trends, and anomalies, and then use that information to make decisions, solve problems, and reach your business goals.

Related reading: 

This article was originally published in October 2022 and has since been updated with contributions from Cecilia Gillen. The most recent update was in September 2023.

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

Shea Stevens picture

Shea Stevens

Shea is a content writer currently living in Charlotte, North Carolina. After graduating with a degree in Marketing from East Carolina University, she joined the digital marketing industry focusing on content and social media. In her free time, you can find Shea visiting her local farmers market, attending a country music concert, or planning her next adventure.

  • Data & analytics
  • Small business

What is data extraction? And how to automate the process

Data extraction is the process of taking actionable information from larger, less structured sources to be further refined or analyzed. Here's how to do it.

Related articles

Header image for a blog post about streamlining project management with Zapier and AI

Project milestones for improved project management

Project milestones for improved project...

Hero image with an icon representing data visualization

14 data visualization examples to captivate your audience

14 data visualization examples to captivate...

Hero image with the arms and hands of two people looking over financial documents, with a calculator

61 best businesses to start with $10K or less

61 best businesses to start with $10K or...

Hero image with an icon representing a SWOT analysis

SWOT analysis: A how-to guide and template (that won't bore you to tears)

SWOT analysis: A how-to guide and template...

Improve your productivity automatically. Use Zapier to get your apps working together.

A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'

Data Analysis Plan Templates

Statistics Solutions provides a data analysis plan template based on your selected analysis.  These templates are available from within Intellectus Statistics (see video). You can use these templates to develop the data analysis section of your dissertation or research proposal.  If you do not know your analysis, you can figure it out using the  decision tree  function also available within Intellectus.

The templates includes research questions stated in statistical language, analysis justification and assumptions of the analysis.  Simply edit the blue text to reflect your research information and you will have the data analysis plan for your dissertation or research proposal.

request a consultation

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

  • Bring dissertation editing expertise to chapters 1-5 in timely manner.
  • Track all changes, then work with you to bring about scholarly writing.
  • Ongoing support to address committee feedback, reducing revisions.

Logo for Mavs Open Press

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

2.3 Data management and analysis

Learning objectives.

Learners will be able to…

  • Define and construct a data analysis plan
  • Define key quantitative data management terms—variable name, data dictionary, and observations/cases
  • Differentiate between univariate and bivariate quantitative analysis
  • Explain when we might use quantitative bivariate analysis in social work research
  • Identify how your qualitative research question, research aim, and type of data may influence your choice of analytic methods
  • Outline the steps you will take in preparation for conducting qualitative data analysis

After you have your raw data, whether this is secondary data or data you collected yourself, you will need to analyze it. While the specific steps to follow in quantitative or qualitative data analysis are beyond the scope of this chapter, we are going to address some basic concepts in this section to help you create a data analysis plan. A data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. If you look back at Table 2.1, you will see that creating a data analysis plan is a part of the study design process. The data analysis plan flows from the research question, is integral to the study design, and should be well conceptualized prior to beginning data collection. In this section, we will walk through the basics of quantitative and qualitative data analysis to help you understand the fundamentals of creating a data analysis plan.

Quantitative Data: Management

When considering what data you might want to collect as part of your project, there are two important considerations that can create dilemmas for researchers. You might only get one chance to interact with your participants, so you must think comprehensively in your planning phase about what information you need and collect as much relevant data as possible. At the same time, though, especially when collecting sensitive information, you need to consider how onerous the data collection is for participants and whether you really need them to share that information. Just because something is interesting to us doesn’t mean it’s related enough to our research question to chase it down. Work with your research team and/or faculty early in your project to talk through these issues before you get to this point. And if you’re using secondary data, make sure you have access to all the information you need in that data before you use it.

Once you’ve collected your quantitative data, you need to make sure it is well-organized in a database in a way that’s actually usable. “Database” can be kind of a scary word, but really, it can be as simple as an Excel spreadsheet or a data file in whatever program you’re using to analyze your data.  You may want to avoid Excel and use a formal database such as Microsoft Access or MySQL if you’ve got a large or complicated data set. But if your data set is smaller and you plan to keep your analyses simple, you can definitely get away with Excel. A typical data set is organized with variables as columns and observations/cases as rows. For example, let’s say we did a survey on ice cream preferences and collected the following information in Table 2.3:

Table 2.3 Results of our ice cream survey
Tom 54 0 1 Rocky Road
Jorge 18 2 0 French Vanilla
Melissa 22 1 0 Espresso
Amy 27 1 0 Black Cherry

There are a few key data management terms to understand:

  • Variable name : Just what it sounds like—the name of your variable. Make sure this is something useful, short and, if you’re using something other than Excel, all one word. Most statistical programs will automatically rename variables for you if they aren’t one word, but the names can be a little ridiculous and long.
  • Observations/cases : The rows in your data set. In social work, these are often your study participants (people), but can be anything from census tracts to black bears to trains. When we talk about sample size, we’re talking about the number of observations/cases. In our mini data set, each person is an observation/case.
  • Data dictionary (also called a code book or metadata) : This is the document where you list your variable names, what the variables actually measure or represent, what each of the values of the variable mean if the meaning isn’t obvious (i.e., if there are numbers assigned to gender), the level of measurement and anything special to know about the variables (for instance, the source if you mashed two data sets together). If you’re using secondary data, the researchers sharing the data should make the data dictionary available.

Let’s take that mini data set we’ve got up above and we’ll show you what your data dictionary might look like in Table 2.4.

Table 2.4 Sample data dictionary/code book
Name Participant’s first name open-ended response Nominal First name only. If name appears more than once, a random number has been attached to the end of the name to distinguish.
Age Participant’s age at time of survey integer, in years Ratio Self-reported
Gender Participant’s self-identified gender 0=cisgender female;       1=cisgender male;            2=non-binary; 3=transgender female; 4=transgender male; 5=another gender Nominal Self-reported
Hometown Participant’s hometown 0=This town

1=Another town

Nominal Self-reported
Fav_Flav Participant’s favorite ice cream open-ended response Nominal Self-reported

Quantitative Data: Univariate Analysis

As part of planning for your research, you should come up with a data analysis plan. Remember, a data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. A basic data analysis plan might look something like what you see in Table 2.5. Don’t panic if you don’t yet understand some of the statistical terms in the plan; we’re going to delve into some of them in this section, and others will be covered in more depth in your statistics courses. Note here also that this is what operationalizing your variables and moving through your research with them looks like on a basic level. We will cover operationalization in more depth in Chapter 10.

Table 2.5 A basic data analysis plan
: What is the relationship between a person’s race and their likelihood to graduate from high school?
: Individual-level U.S. American Community Survey data for 2017 from , which includes race/ethnicity and other demographic data (i.e., educational attainment, family income, employment status, citizenship, presence of both parents, etc.). Only including individuals for which race and educational attainment data is available.

, including mean, median, mode, range, distribution of interval/ratio variables, and missing values between the independent, control, and dependent variables. For instance, Chi-square test between race and high school graduation (both nominal variables), ANOVA on income and race. Correlations between interval/ratio variables. , like logistic regression, with high school graduation (yes/no) as my dependent variable, race as the independent variable, and multiple control variables I think are relevant based on my conceptual framework. of logistic regression results and of results.

An important point to remember is that you should never get stuck on using a particular statistical method because you or one of your co-researchers thinks it’s cool or it’s the hot thing in your field right now. You should certainly go into your data analysis plan with ideas, but in the end, you need to let your research question guide what statistical tests you plan to use. Be prepared to be flexible if your plan doesn’t pan out because the data is behaving in unexpected ways.

You’ll notice that the first step in the quantitative data analysis plan is univariate and descriptive statistics.   Univariate data analysis is a quantitative method in which a variable is examined individually to determine its distribution , or the way the scores are distributed across the levels, or values, of that variable. When we talk about levels ,  what we are talking about are the possible values of the variable—like a participant’s age, income or gender. (Note that this is different from levels of measurement , which will be discussed in Chapter 11, but the level of measurement of your variables absolutely affects what kinds of analyses you can do with it.) Univariate analysis is non-relational , which just means that we’re not looking into how our variables relate to each other. Instead, we’re looking at variables in isolation to try to understand them better. For this reason, univariate analysis is used for descriptive research questions.

So when do you use univariate data analysis? Always! It should be the first thing you do with your quantitative data, whether you are planning to move on to more sophisticated statistical analyses or are conducting a study to describe a new phenomenon. You need to understand what the values of each variable look like—what if one of your variables has a lot of missing data because participants didn’t answer that question on your survey? What if there isn’t much variation in the gender of your sample? These are things you’ll learn through univariate analysis.

Quantitative Data: Bivariate Analysis

Did you know that ice cream causes shark attacks? It’s true! When ice cream sales go up in the summer, so does the rate of shark attacks. So you’d better put down that ice cream cone, unless you want to make yourself look more delicious to a shark.

Photo of shark with open mouth emerging from water

Ok, so it’s quite obviously not true that ice cream causes shark attacks. But if you looked at these two variables and how they’re related, you’d notice that during times of the year with high ice cream sales, there are also the most shark attacks. This is a classic example of the difference between correlation and causation. Despite the fact that the conclusion we drew about causation was wrong, it’s nonetheless true that these two variables appear related, and researchers figured that out through the use of bivariate analysis.

Bivariate analysis consists of a group of statistical techniques that examine the association between two variables. We could look at how anti-depressant medications and appetite are related, whether there is a relation between having a pet and emotional well-being, or if a policy-maker’s level of education is related to how they vote on bills related to environmental issues.

Bivariate analysis forms the foundation of multivariate analysis, which we don’t get to in this book. All you really need to know here is that there are steps beyond bivariate analysis, which you’ve undoubtedly seen in scholarly literature already! But before we can move forward with multivariate analysis, we need to understand the associations between the variables in our study.

Throughout your PhD program, you will learn much more about quantitative data analysis techniques, including more sophisticated multivariate analysis methods. Hopefully this section has provided you with some initial insights into how data is analyzed, and the importance of creating a data analysis plan prior to collecting data. Next, we will discuss some basic strategies for creating a qualitative data analysis plan.

Resources for Quantitative Data Analysis

While you are affiliated with a university, it is likely that you will have access to some kind of commercial statistics software. Examples in the previous section uses SPSS, the most common one our authoring team has seen in social work education. Like its competitors SAS and STATA, SPSS is expensive and your license to the software must be renewed every year (like a subscription). Even if you are able to install commercial statistics software on your computer, once your license expires, your program will no longer work. We believe that forcing students to learn software they will never use is wasteful and contributes to the (accurate, in many cases) perception from students that research class is unrelated to real-world practice. SPSS is more accessible due to its graphical user interface and does not require researchers to learn basic computer programming, but it is prohibitively costly if a student wanted to use it to measure practice data in their agency post-graduation.

Instead, we suggest getting familiar with JASP Statistics , a free and open-source alternative to SPSS developed and supported by the University of Amsterdam. It has a similar user interface as SPSS, and should be similarly easy to learn. Moreover, usability upgrades from SPSS like generating APA formatted tables make it a compelling option. While a great many of my students will rely on statistical analyses of their programs and practices in reports to funders, it is unlikely that any will use SPSS. Browse JASP’s how-to guide or consult this textbook Learning Statistics with JASP: A Tutorial for Psychology Students and Other Beginners , written by  Danielle J. Navarro ,  David R. Foxcroft , and  Thomas J. Faulkenberry .

Another open source statistics software package is R (a.k.a. The R Project for Statistical Computing ). R uses a command line interface, so you will need some coding knowledge in order to use it. Luckily, R is the most commonly used statistics software in the world, and the community of support and guides for using R are omnipresent online. For beginning researchers, consult the textbook Learning Statistics with R: A tutorial for psychology students and other beginners by Danielle J. Navarro .

While statistics software is sometimes needed to perform advanced statistical tests, most univariate and bivariate tests can be performed in spreadsheet software like Microsoft Excel, Google Sheets, or the free and open source LibreOffice Calc . Microsoft includes a ToolPak to perform complex data analysis as an add-on to Excel. For more information on using spreadsheet software to perform statistics, the open textbook Collaborative Statistics Using Spreadsheets by Susan Dean, Irene Mary Duranczyk, Barbara Illowsky, Suzanne Loch, and Janet Stottlemyer.

Statistical analysis is performed in just about every discipline, and as a result, there are a lot of openly licensed, free resources to assist you with your data analysis. We have endeavored to provide you the basics in the past few chapters, but ultimately, you will likely need additional support in completing quantitative data analysis from an instructor, textbook, or other resource. Browse the Open Textbook Library for statistics resources or look for video tutorials from reputable instructors like this video textbook on statistics by Bryan Koenig .

Qualitative Data: Management

Qualitative research often involves human participants and qualitative data can include of recordings or transcripts of their words, photographs or images, or diaries and documents. The personal nature of qualitative data poses the challenge of recognizability of sensitive information on individuals, communities, and places. If you choose this methodology for your research, you should familiarize yourself with policies, procedures, and rules to ensure safety and security of data in the documentation and dissemination process.

In any research involving primary data, a researcher is not only entrusted with the responsibility of upholding privacy of their participants but also accountable to them, making confidentiality and human subjects’ protection front and center of qualitative data management. Data such as audiotapes, videotapes, transcripts, notes, and other records should be stored and secured in locations where only authorized persons have access to them.

Sometimes in qualitative research, you will learn intimate details about people’s lives. Often, qualitative data contain personal identifiers. A helpful practice to ensure that participants confidentiality is to replace personal information in transcripts with pseudonyms or descriptive language (e.g., “[the participant’s sister]” instead of the sister’s name). Once audio and video recordings have been accurately transcribed with the de-identification of personal identifiers, the original recordings should be destroyed.

Qualitative Data: Analysis

There are many different types of qualitative data, including transcripts of interviews and focus groups, observational data, documents and other artifacts, and more. Your qualitative data analysis plan should be anchored in the type of data collected and the purpose of your study. Qualitative research can serve a range of purposes. Below is a brief list of general purposes we might consider when using a qualitative approach.

  • Are you trying to understand how a particular group is affected by an issue?
  • Are you trying to uncover how people arrive at a decision in a given situation?
  • Are you trying to examine different points of view on the impact of a recent event?
  • Are you trying to summarize how people understand or make sense of a condition?
  • Are you trying to describe the needs of your target population?

If you don’t see the general aim of your research question reflected in one of these areas, don’t fret! This is only a small sampling of what you might be trying to accomplish with your qualitative study. Whatever your aim, you need to have a plan for what you will do once you have collected your data.

Iterative or Linear

Some qualitative research is linear , meaning it follows more of a traditionally quantitative process: create a plan, gather data, and analyze data; each step is completed before we proceed to the next. You can think of this like how information is presented in this book. We discuss each topic, one after another.

However, many times qualitative research is iterative , or evolving in cycles. An iterative approach means that once we begin collecting data, we also begin analyzing data as it is coming in. This early and ongoing analysis of our (incomplete) data then impacts our continued planning, data gathering and future analysis. Again, coming back to this book, while it may be written linear, we hope that you engage with it iteratively as you design and conduct your own research. By this we mean that you will revisit previous sections so you can understand how they fit together and you are in continuous process of building and revising how you think about the concepts you are learning about.

As you may have guessed, there are benefits and challenges to both linear and iterative approaches. A linear approach is much more straightforward, each step being fairly defined. However, linear research being more defined and rigid also presents certain challenges. A linear approach assumes that we know what we need to ask or look for at the very beginning of data collection, which often is not the case. Figure 2.1 contrasts the two approaches.

Comparison of linear and iterative systematic approaches. Linear approach box is a series of boxes with arrows between them in a line. The first box is "create a plan", then "gather data", ending with "analyze data". The iterative systematic approach is a series of boxes in a circle with arrows between them, with the boxes labeled "planning", "data gathering", and "analyzing the data".

With iterative research, we have more flexibility to adapt our approach as we learn new things. We still need to keep our approach systematic and organized, however, so that our work doesn’t become a free-for-all. As we adapt, we do not want to stray too far from the original premise of our study. It’s also important to remember with an iterative approach that we may risk ethical concerns if our work extends beyond the original boundaries of our informed consent and institutional review board agreement (IRB; see Chapter 3 for more on IRBs). If you feel that you do need to modify your original research plan in a significant way as you learn more about the topic, you can submit an addendum to modify your original application that was submitted. Make sure to keep detailed notes of the decisions that you are making and what is informing these choices. This helps to support transparency and your credibility throughout the research process.

Acquainting yourself with your data

As you begin your analysis, you need to get to know your data. This often means reading through your data prior to any attempt at breaking it apart and labeling it. You might read through a couple of times, in fact. This helps give you a more comprehensive feel for each piece of data and the data as a whole, again, before you start to break it down into smaller units or deconstruct it. This is especially important if others assisted us in the data collection process. We often gather data as part of team and everyone involved in the analysis needs to be very familiar with all of the data.

Capturing your emerging understanding of the data

During your reviewing you will start to develop and evolve your understanding of what the data means. Coding is a part of the qualitative data analysis process where we begin to interpret and assign meaning to the data. It represents one of the first steps as we begin to filter the data through our own subjective lens as the researcher. This understanding of the data should be dynamic and flexible, but you want to have a way to capture this understanding as it evolves. You may include this as part of your qualitative codebook where you are tracking the main ideas that are emerging and what they mean. Table 2.6 is an example of how your thinking might change about a code and how you can go about capturing it.

Table 2.6 Example of the evolution of a code in a codebook

There are a variety of different approaches to qualitative analysis, including thematic analysis, content analysis, grounded theory, phenomenology, photovoice, and more. The specific steps you will take to code your qualitative data, and to generate themes from these codes, will vary based on the analytic strategy you are employing. In designing your qualitative study, you would identify an analytical approach as you plan out your project. The one you select would depend on the type of data you have and what you want to accomplish with it. In Chapter 19, we will go into more detail about various types of qualitative data analysis. Each qualitative approach has specific techniques and methods that take substantial study and practice to master.

Key Takeaways

  • Getting organized at the beginning of your project with a data analysis plan will help keep you on track. Data analysis plans should include your research question, a description of your data, and a step-by-step outline of what you’re going to do with it. [chapter 14.1]
  • Be flexible with your data analysis plan—sometimes data surprises us and we have to adjust the statistical tests we are using. [chapter 14.1]
  • Always make a data dictionary or, if using secondary data, get a copy of the data dictionary so you (or someone else) can understand the basics of your data. [chapter 14.1]
  • Bivariate analysis is a group of statistical techniques that examine the relationship between two variables. [chapter 15.1]
  • You need to conduct bivariate analyses before you can begin to draw conclusions from your data, including in future multivariate analyses. [chapter 15.1]
  • There are a lot of high quality and free online resources to learn and perform statistical analysis.
  • Qualitative research analysis requires preparation and careful planning. You will need to take time to familiarize yourself with the data in a general sense before you begin analyzing. [chapter 19.3]
  • The specific steps you will take to code your qualitative data and generate final themes will depend on the qualitative analytic approach you select.

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

  • Make a data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you’re going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. You can do this exercise whether you’re using quantitative or qualitative data! The same principles apply.
  • Make a data dictionary for the data you are proposing to collect as part of your study. You can use the example above as a template.

TRACK 2 (IF YOU  AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

You are researching the impact of your city’s recent harm reduction interventions for intravenous drug users (e.g., sterile injection kits, monitored use, overdose prevention, naloxone provision, etc.).

  • Make a draft quantitative data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you’re going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. It’s okay if you don’t yet have a complete idea of the types of statistical analyses you might use.

An ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact analyses, step-by-step, that you plan to run to answer your research question.

The name of your variable.

The rows in your data set. In social work, these are often your study participants (people), but can be anything from census tracts to black bears to trains.

This is the document where you list your variable names, what the variables actually measure or represent, what each of the values of the variable mean if the meaning isn't obvious.

process by which researchers spell out precisely how a concept will be measured in their study

A group of statistical techniques that examines the relationship between at least three variables

Univariate data analysis is a quantitative method in which a variable is examined individually to determine its distribution.

the way the scores are distributed across the levels of that variable.

Chapter Outline

  • Practical and ethical considerations ( 14 minute read)
  • Raw data (10 minute read)
  • Creating a data analysis plan (?? minute read)
  • Critical considerations (3 minute read)

Content warning: Examples in this chapter discuss substance use disorders, mental health disorders and therapies, obesity, poverty, gun violence, gang violence, school discipline, racism and hate groups, domestic violence, trauma and triggers, incarceration, child neglect and abuse, bullying, self-harm and suicide, racial discrimination in housing, burnout in helping professions, and sex trafficking of indigenous women.

2.1 Practical and ethical considerations

Learners will be able to...

  • Identify potential stakeholders and gatekeepers
  • Differentiate between raw data and the results of scientific studies
  • Evaluate whether you can feasibly complete your project

Pre-awareness check (Knowledge)

Similar to practice settings, research has ethical considerations that must be taken to ensure the safety of participants. What ethical considerations were relevant to your practice experience that may have impacted the delivery of services?

As a PhD student, you will have many opportunities to conduct research. You may be asked to be a part of a research team led by the faculty at your institution. You will also conduct your own research for your dissertation. As you will learn, research can take many forms. For example, you may want to focus qualitatively on individuals’ lived experiences, or perhaps you will quantitatively assess the impact of interventions on research subjects. You may work with large, already-existing datasets, or you may create your own data. Though social work research can vary widely from project to project, researchers typically follow the same general process, even if their specific research questions and methodologies differ. Table 2.1 outlines the major components of the research process covered in this textbook, and indicates the chapters where you will find more information on each subject. You will notice that your research paradigm is an organizing framework that guides each component of the research process.

Table 2.1 Components of the Research Process

The research paradigm is a guiding framework at each step. See Chapter 7 for more information on paradigms.

How does your paradigm influence the decisions you make as a researcher?

Problem formulation The researcher chooses a social problem to focus on in their study. 2
Theory After selecting a topic for study, researchers will often choose one or more theories they believe will inform the design, conduct, and interpretation of their study. 7
Conceptual framework Researchers propose how the chosen theories, as well as the variables included within these theories, are connected to their research problem. If working quantitatively, the team will think through the causal factors and outcomes of interest for this particular study. 7,11
Literature review The researchers search several databases for peer-reviewed empirical articles on the chosen problem and theories. The conceptual framework will be adapted as needed based on this review. 3,4,5
Research question(s) Using the knowledge gained through the literature review, researchers pose specific research questions they intend to answer. 2,9
Study design Researchers decide if their research questions will be answered by quantitative or qualitative methods.

Quantitative studies may use survey or experimental design. The research team will make decisions about sampling, design, measurement, and analysis.

Sampling: 10, Design: 13, Measurement: 11
IRB approval If the study will involve human subjects, the researchers will need to get institutional review board (IRB) approval prior to commencing the study. The team will need to think through the ethical risks and mitigation strategies for their chosen study design. 1, 6
Collect, manage, and analyze data Once IRB approval has been obtained, the data collection process is ready to begin. Researchers will conduct their study, clean and manage data, and then will analyze it.

Quant: 14, 15, 16

Qual: 17, 18, 19

Publish results At the end of the research process, the team will determine how to disseminate results. They may choose to write a research article. Such articles typically explain the study’s literature review, methods, and results, and include a discussion of the implications and conclusions of the study. Future research directions may also be identified. 24

Feasibility

Feasibility refers to whether you can practically conduct the study you plan to do, given the resources and ethical obligations you have. In this chapter, we will review some important practical and ethical considerations researchers should start thinking about from the beginning of a research project. These considerations apply to all research, but it is important to also consider the context of research and researchers when thinking about feasibility.

For example, as a doctoral student, you likely have a unique set of circumstances that inspire and constrain your research. Some students have the ability to engage in independent studies where they can gain skills and expertise in specialized research methods to prepare them for a research-intensive career. Others may have reasons, such as a limited amount of funding or family concerns, that encourage them to complete their dissertation research as quickly as possible. These circumstances relate to the feasibility of a research project. Regardless of the potential societal importance of a 10-year longitudinal study, it’s not feasible for a student to conduct it in time to graduate! Your dissertation chair, doctoral program director, and other faculty mentors can help you navigate the many decisions you will face as a doctoral student about conducting independent research or joining research projects.

The context and role of the researcher continue to affect feasibility even after a doctoral student graduates. Many will continue in their careers to become tenure track faculty with research expectations to obtain tenure. Some funders expect faculty members to have a track record of successful projects before trusting them to lead expensive or long-term studies.  Realistically, these expectations will influence what research is feasible for a junior faculty member to conduct. Just like for doctoral students, mentorship is incredibly valuable for junior faculty to make informed decisions about what research to conduct. Senior faculty, associate deans of research, chairs, and deans can help junior faculty decide what projects to pursue to ensure they meet the expectations placed on them without losing sight of the reasons they became a researcher in the first place.

As you read about other feasibility considerations such as gaining access, consent, and collecting data, consider the ways in which context and roles also influence feasibility.

Access, consent, and ethical obligations

One of the most important feasibility issues is gaining access to your target population. For example, let’s say you wanted to better understand middle-school students who engaged in self-harm behaviors. That is a topic of social importance, but what challenges might you face in accessing this population? Let's say you proposed to identify students from a local middle school and interview them about self-harm. Methodologically, that sounds great since you are getting data from those with the most knowledge about the topic, the students themselves. But practically, that sounds challenging. Think about the ethical obligations a social work practitioner has to adolescents who are engaging in self-harm (e.g., competence, respect). In research, we are similarly concerned mostly with the benefits and harms of what you propose to do as well as the openness and honesty with which you share your project publicly.

data analysis in research plan example

Gatekeepers

If you were the principal at your local middle school, would you allow researchers to interview kids in your schools about self-harm? What if the results of the study showed that self-harm was a big problem that your school was not addressing? What if the researcher's interviews themselves caused an increase in self-harming behaviors among the children? The principal in this situation is a gatekeeper . Gatekeepers are the individuals or organizations who control access to the population you want to study. The school board would also likely need to give consent for the research to take place at their institution. Gatekeepers must weigh their ethical questions because they have a responsibility to protect the safety of the people at their organization, just as you have an ethical obligation to protect the people in your research study.

For vulnerable populations, it can be a challenge to get consent from gatekeepers to conduct your research project. As a result, researchers often conduct research projects in places where they have established trust with gatekeepers. In the case where the population (children who self-harm) are too vulnerable, researchers may collect data from people who have secondary knowledge about the topic. For example, the principal may be more willing to let you talk to teachers or staff, rather than children.

Stakeholders

In some cases, researchers and gatekeepers partner on a research project. When this happens, the gatekeepers become stakeholders . Stakeholders are individuals or groups who have an interest in the outcome of the study you conduct. As you think about your project, consider whether there are formal advisory groups or boards (like a school board) or advocacy organizations who already serve or work with your target population. Approach them as experts and ask for their review of your study to see if there are any perspectives or details you missed that would make your project stronger.

There are many advantages to partnering with stakeholders to complete a research project together. Continuing with our example on self-harm in schools, in order to obtain access to interview children at a middle school, you will have to consider other stakeholders' goals. School administrators also want to help students struggling with self-harm, so they may want to use the results to form new programs. But they may also need to avoid scandal and panic if the results show high levels of self-harm. Most likely, they want to provide support to students without making the problem worse. By bringing in school administrators as stakeholders, you can better understand what the school is currently doing to address the issue and get an informed perspective on your project's questions. Negotiating the boundaries of a stakeholder relationship requires strong meso-level practice skills.

Of course, partnering with administrators probably sounds quite a bit easier than bringing on board the next group of stakeholders—parents. It's not ethical to ask children to participate in a study without their parents' consent. We will review the parameters of parental and child consent in Chapter 5 . Parents may be understandably skeptical of a researcher who wants to talk to their child about self-harm, and they may fear potential harm to the child and family from your study. Would you let a researcher you didn't know interview your children about a very sensitive issue?

Social work research must often satisfy multiple stakeholders. This is especially true if a researcher receives a grant to support the project, as the funder has goals it wants to accomplish by funding the research project. Your university is also a stakeholder in your project. When you conduct research, it reflects on your school. If you discover something of great importance, your school looks good. If you harm someone, they may be liable. Your university likely has opportunities for you to share your research with the campus community, and may have incentives or grant programs for researchers. Your school also provides you with support and access to resources like the library and data analysis software.

Target population

So far, we've talked about access in terms of gatekeepers and stakeholders. Let's assume all of those people agree that your study should proceed. But what about the people in the target population? They are the most important stakeholder of all! Think about the children in our proposed study on self-harm. How open do you think they would be to talking to you about such a sensitive issue? Would they consent to talk to you at all?

Maybe you are thinking about simply asking clients on your caseload. As we talked about before, leveraging existing relationships created through field work can help with accessing your target population. However, they introduce other ethical issues for researchers. Asking clients on your caseload or at your agency to participate in your project creates a dual relationship between you and your client. What if you learn something in the research project that you want to share with your clinical team? More importantly, would your client feel uncomfortable if they do not consent to your study? Social workers have power over clients, and any dual relationship would require strict supervision in the rare case it was allowed.

Resources and scope

Let's assume everyone consented to your project and you have adequately addressed any ethical issues with gatekeepers, stakeholders, and your target population. That means everything is ready to go, right? Not quite yet. As a researcher, you will need to carry out the study you propose to do. Depending on how big or how small your proposed project is, you’ll need a little or a lot of resources.

One thing that all projects need is raw data . Raw data can come in may forms. Very often in social science research, raw data includes the responses to a survey or transcripts of interviews and focus groups, but raw data can also include experimental results, diary entries, art, or other data points that social scientists use in analyzing the world. Primary data is data you have collected yourself. Sometimes, social work researchers do not collect raw data of their own, but instead use secondary data analysis to analyze raw data that has been shared by other researchers. Secondary data is data someone else has collected that you have permission to use in your research. For example, you could use data from a local probation program to determine if a shoplifting prevention group was reducing the rate at which people were re-offending. You would need data on who participated in the program and their criminal history six months after the end of their probation period. This is secondary data you could use to determine whether the shoplifting prevention group had any effect on an individual's likelihood of re-offending. Whether a researcher should use secondary data or collect their own raw data is an important choice which we will discuss in greater detail in section 2.2. Collecting raw data or obtaining secondary data can be time consuming or expensive, but without raw data there can be no research project.

data analysis in research plan example

Time is an important resource to consider when designing research projects. Make sure that your proposal won't require you to spend more time than you have to collect and analyze data. Think realistically about the timeline for your research project. If you propose to interview fifty mental health professionals in their offices in your community about your topic, make sure you can dedicate fifty hours to conduct those interviews, account for travel time, and think about how long it will take to transcribe and analyze those interviews.

  • What is reasonable for you to do in your timeframe?
  • How many hours each week can the research team dedicate to this project?

One thing that can delay a research project is receiving approval from the institutional review board (IRB), the research ethics committee at your university. If your study involves human subjects , you may have to formally propose your study to the IRB and get their approval before gathering your data. A well-prepared study is likely to gain IRB approval with minimal revisions needed, but the process can take weeks to complete and must be done before data collection can begin. We will address the ethical obligations of researchers in greater detail in Chapter 5 .

Most research projects cost some amount of money. Potential expenses include wages for members of the research team, incentives for research participants, travel expenses, and licensing costs for standardized instruments. Most researchers seek grant funding to support the research. Grant applications can be time consuming to write and grant funding can be competitive to receive.

Knowledge, competence, and skills

For social work researchers, the social work value of competence is key in their research ethics.

Clearly, researchers need to be skilled in working with their target population in order to conduct ethical research.  Some research addresses this challenge by collecting data from competent practitioners or administrators who have second-hand knowledge of target populations based on professional relationships. Members of the research team delivering an intervention also need to have training and skills in the intervention. For example, if a research study examines the effectiveness of dialectical behavioral therapy (DBT) in a particular context, the person delivering the DBT must be certified in DBT.  Another idea to keep in mind is the level of data collection and analysis skills needed to complete the project.  Some assessments require training to administer. Analyses may be complex or require statistical consultation or advanced training.

In summary, here are a few questions you should ask yourself about your project to make sure it's feasible. While we present them early on in the research process (we're only in Chapter 2), these are certainly questions you should ask yourself throughout the proposal writing process. We will revisit feasibility again in Chapter 9 when we work on finalizing your research question .

  • Do you have access to the data you need or can you collect the data you need?
  • Will you be able to get consent from stakeholders, gatekeepers, and your target population?
  • Does your project pose risk to individuals through direct harm, dual relationships, or breaches in confidentiality?
  • Are you competent enough to complete the study?
  • Do you have the resources and time needed to carry out the project?
  • People will have to say “yes” to your research project. Evaluate whether your project might have gatekeepers or potential stakeholders. They may control access to data or potential participants.
  • Researchers need raw data such as survey responses, interview transcripts, or client charts. Your research project must involve more than looking at the analyses conducted by other researchers, as the literature review is only the first step of a research project.
  • Make sure you have enough resources (time, money, and knowledge) to complete your research project.

Post-awareness check (Emotion)

What factors have created your passion toward assisting your target population? How can this connection enhance your ability to receive a “yes” from potential participants? What are the anticipated challenges to receiving a “yes” from potential participants?

Think about how you might answer your question by collecting your own data.

  • Identify any gatekeepers and stakeholders you might need to contact.
  • How can you increase the likelihood you will get access to the people or records you need for your study?

Describe the resources you will need for your project.

  • Do you have concerns about feasibility?

TRACK 2 (IF YOU  AREN'T CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

You are researching the impact of your city's recent harm reduction interventions for intravenous drug users (e.g., sterile injection kits, monitored use, overdose prevention, naloxone provision, etc.).

  • Thinking about the services related to this issue in your own city, identify any gatekeepers and stakeholders you might need to contact.
  • How might you approach these gatekeepers and stakeholders? How would you explain your study?

2.2 Raw data

  • Identify potential sources of available data
  • Weigh the challenges and benefits of collecting your own data

In our previous section, we addressed some of the challenges researchers face in collecting and analyzing raw data. Just as a reminder, raw data are unprocessed, unanalyzed data that researchers analyze using social science research methods. It is not just the statistics or qualitative themes in journal articles. It is the actual data from which those statistical outputs or themes are derived (e.g., interview transcripts or survey responses).

There are two approaches to getting raw data. First, students can analyze data that are publicly available or from agency records. Using secondary data like this can make projects more feasible, but you may not find existing data that are useful for answering your working question. For that reason, many students gather their own raw data. As we discussed in the previous section, potential harms that come from addressing sensitive topics mean that surveys and interviews of practitioners or other less-vulnerable populations may be the most feasible and ethical way to approach data collection.

Using secondary data

Within the agency setting, there are two main sources of raw data. One option is to examine client charts. For example, if you wanted to know if substance use was related to parental reunification for youth in foster care, you could look at client files and compare how long it took for families with differing levels of substance use to be reunified. You will have to negotiate with the agency the degree to which your analysis can be public. Agencies may be okay with you using client files for a class project but less comfortable with you presenting your findings at a city council meeting. When analyzing data from your agency, you will have to manage a stakeholder relationship.

Another great example of agency-based raw data comes from program evaluations. If you are working with a grant funded agency, administrators and clinicians are likely producing data for grant reporting. The agency may consent to have you look at the raw data and run your own analysis. Larger agencies may also conduct internal research—for example, surveying employees or clients about new initiatives. These, too, can be good sources of available data. Generally, if the agency has already collected the data, you can ask to use them. Again, it is important to be clear on the boundaries and expectations of the agency. And don't be angry if they say no!

Some agencies, usually government agencies, publish their data in formal reports. You could take a look at some of the websites for county or state agencies to see if there are any publicly available data relevant to your research topic. As an example, perhaps there are annual reports from the state department of education that show how seclusion and restraint is disproportionately applied to Black children with disabilities , as students found in Virginia. In another example, one student matched public data from their city's map of criminal incidents with historically redlined neighborhoods. For this project, she is using publicly available data from Mapping Inequality , which digitized historical records of redlined housing communities and the Roanoke, VA crime mapping webpage . By matching historical data on housing redlining with current crime records, she is testing whether redlining still impacts crime to this day.

Not all public data are easily accessible, though. The student in the previous example was lucky that scholars had digitized the records of how Virginia cities were redlined by race. Sources of historical data are often located in physical archives, rather than digital archives. If your project uses historical data in an archive, it would require you to physically go to the archive in order to review the data. Unless you have a travel budget, you may be limited to the archival data in your local libraries and government offices. Similarly, government data may have to be requested from an agency, which can take time. If the data are particularly sensitive or if the department would have to dedicate a lot of time to your request, you may have to file a Freedom of Information Act request. This process can be time-consuming, and in some cases, it will add financial cost to your study.

Another source of secondary data is shared by researchers as part of the publication and review process. There is a growing trend in research to publicly share data so others can verify your results and attempt to replicate your study. In more recent articles, you may notice links to data provided by the researcher. Often, these have been de-identified by eliminating some information that could lead to violations of confidentiality. You can browse through the data repositories in Table 2.1 to find raw data to analyze. Make sure that you pick a data set with thorough and easy to understand documentation. You may also want to use Google's dataset search which indexes some of the websites below as well as others in a very intuitive and easy to use way.

Table 2.2 Sources of publicly available data
National Opinion Research Center General Social Survey; demographic, behavioral, attitudinal, and special interest questions; national sample Quantitative
Carolina Population Center Add Health; longitudinal social, economic, psychological, and physical well-being of cohort in grades 7–12 in 1994 Quantitative
Center for Demography of Health and Aging Wisconsin Longitudinal Study; life course study of cohorts who graduated from high school in 1957 Quantitative
Institute for Social & Economic Research British Household Panel Survey; longitudinal study of British lives and well- being Quantitative
International Social Survey Programme International data similar to GSS Quantitative
The Institute for Quantitative Social Science at Harvard University Large archive of written data, audio, and video focused on many topics Quantitative and qualitative
Institute for Research on Women and Gender Global Feminisms Project; interview transcripts and oral histories on feminism and women’s activism Qualitative
Oral History Office Descriptions and links to numerous oral history archives Qualitative
UNC Wilson Library Digitized manuscript collection from the Southern Historical Collection Qualitative
Qualitative Data Repository A repository of qualitative data that can be downloaded and annotated collaboratively with other researchers Qualitative

Ultimately, you will have to weigh the strengths and limitations of using secondary data on your own. Engel and Schutt (2016, p. 327) [1] propose six questions to ask before using secondary data:

  • What were the agency’s or researcher’s goals in collecting the data?
  • What data were collected, and what were they intended to measure?
  • When was the information collected?
  • What methods were used for data collection? Who was responsible for data collection, and what were their qualifications? Are they available to answer questions about the data?
  • How is the information organized (by date, individual, family, event, etc.)? Are identifiers used to indicate different types of data available?
  • What is known about the success of the data collection effort? How are missing data indicated and treated? What kind of documentation is available? How consistent are the data with data available from other sources?

In this section, we've talked about data as though it is always collected by scientists and professionals. But that's definitely not the case! Think more broadly about sources of data that are already out there in the world. Perhaps you want to examine the different topics mentioned in the past 10 State of the Union addresses by the President. Or maybe you want to examine whether the websites and public information about local health and mental health agencies use gender-inclusive language. People share their experiences through blogs, social media posts, videos, performances, among countless other sources of data. When you think broadly about data, you'll be surprised how much you can answer with available data.

Collecting your own raw data

The primary benefit of collecting your own data is that it allows you to collect and analyze the specific data you are looking for, rather than relying on what other people have shared. You can make sure the right questions are asked to the right people. Your early research projects may be smaller in scope. This isn't necessarily a limitation. Early projects are often the first step in a long research trajectory in which the same topic is studied in increasing detail and sophistication over time.

Student researchers often propose to survey or interview practitioners. The focus of these projects should be about the practice of social work and the study will uncover how practitioners understand what they do. Surveys of practitioners often test whether responses to questions are related to each other. For example, you could propose to examine whether someone's length of time in practice was related to the type of therapy they use or their level of burnout. Interviews or focus groups can also illuminate areas of practice. One student proposed to conduct focus groups of individuals in different helping professions in order to understand how they viewed the process of leaving an abusive partner. She suspected that people from different disciplines would make unique assumptions about the survivor's choices.

It's worth remembering here that you need to have access to practitioners, as we discussed in the previous section. Resourceful researchers will look at publicly available databases of practitioners, draw from agency and personal contacts, or post in public forums like Facebook groups. Consent from gatekeepers is important, and as we described earlier, you and your agency may be interested in collaborating on a project. Bringing your agency on board as a stakeholder in your project may allow you access to company email lists or time at staff meetings as well as access to practitioners. One student partnered with her internship placement at a local hospital to measure the burnout that nurses experienced in their department. Her project helped the agency identify which departments may need additional support.

Another possible way you could collect data is by partnering with your agency on evaluating an existing program. Perhaps they want you to evaluate the early stage of a program to see if it's going as planned and if any changes need to be made. Maybe there is an aspect of the program they haven't measured but would like to, and you can fill that gap for them. Collaborating with agency partners in this way can be a challenge, as you must negotiate roles, get stakeholder buy-in, and manage the conflicting time schedules of field work and research work. At the same time, it allows you to make your work immediately relevant to your specific practice and client population.

In summary, many early projects fall into one of the following categories. These aren't your only options! But they may be helpful in thinking about what research projects can look like.

  • Analyzing charts or program evaluations at an agency
  • Analyzing existing data from an agency, government body, or other public source
  • Analyzing popular media or cultural artifacts
  • Surveying or interviewing practitioners, administrators, or other less-vulnerable groups
  • Conducting a program evaluation in collaboration with an agency
  • All research projects require analyzing raw data.
  • Research projects often analyze available data from agencies, government, or public sources. Doing so allows researchers to avoid the process of recruiting people to participate in their study. This makes projects more feasible but limits what you can study to the data that are already available to you.
  • Think through the potential harm of discussing sensitive topics when surveying or interviewing clients and other vulnerable populations. Since many social work topics are sensitive, researchers often collect data from less-vulnerable populations such as practitioners and administrators.

Post-awareness check (Environment)

In what environment are you most comfortable in data collection (phone calls, face to face recruitment, etc)? Consider your preferred method of data collection that may align with both your personality and your target population.

  • Describe the difference between raw data and the results of research articles.
  • Consider browsing around the data repositories in Table 2.1.
  • Identify a common type of project (e.g., surveys of practitioners) and how conducting a similar project might help you answer your working question.
  • What kind of raw data might you collect yourself for your study?

2.3 Creating a data analysis plan

  • Define and construct a data analysis plan.
  • Define key quantitative data management terms—variable name, data dictionary, primary and secondary data, observations/cases.
  • Differentiate between univariate and bivariate quantitative analysis.
  • Explain when we might use quantitative bivariate analysis in social work research.
  • Identify how your qualitative research question, research aim, and type of data may influence your choice of analytic methods.
  • Outline the steps you will take in preparation for conducting qualitative data analysis.

After you have your raw data , whether this is secondary data or data you collected yourself, you will need to analyze it. While the specific steps to follow in quantitative or qualitative data analysis are beyond the scope of this chapter, we are going to address some basic concepts in this section to help you create a data analysis plan. A data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. If you look back at Table 2.1, you will see that creating a data analysis plan is a part of the study design process. The data analysis plan flows from the research question, is integral to the study desig n, and should be well conceptualized prior to beginning data collection. In this section, we will walk through the basics of quantitative and qualitative data analysis to help you understand the fundamentals of creating a data analysis plan.

When considering what data you might want to collect as part of your project, there are two important considerations that can create dilemmas for researchers. You might only get one chance to interact with your participants, so you must think comprehensively in your planning phase about what information you need and collect as much relevant data as possible. At the same time, though, especially when collecting sensitive information, you need to consider how onerous the data collection is for participants and whether you really need them to share that information. Just because something is interesting to us doesn't mean it's related enough to our research question to chase it down. Work with your research team and/or faculty early in your project to talk through these issues before you get to this point. And if you're using secondary data, make sure you have access to all the information you need in that data before you use it.

Once you've collected your quantitative data, you need to make sure it is well- organized in a database in a way that's actually usable. "Database" can be kind of a scary word, but really, it can be as simple as an Excel spreadsheet or a data file in whatever program you're using to analyze your data.  You may want to avoid Excel and use a formal database such as Microsoft Access or MySQL if you've got a large or complicated data set. But if your data set is smaller and you plan to keep your analyses simple, you can definitely get away with Excel. A typical data set is organized with variables as columns and observations/cases as rows. For example, let's say we did a survey on ice cream preferences and collected the following information in Table 2.3:

Table 2.3 Results of our ice cream survey
Tom 54 0 1 Rocky Road
Jorge 18 2 0 French Vanilla
Melissa 22 1 0 Espresso
Amy 27 1 0 Black Cherry
  • Variable name : Just what it sounds like—the name of your variable. Make sure this is something useful, short and, if you're using something other than Excel, all one word. Most statistical programs will automatically rename variables for you if they aren't one word, but the names can be a little ridiculous and long.
  • Observations/cases : The rows in your data set. In social work, these are often your study participants (people), but can be anything from census tracts to black bears to trains. When we talk about sample size, we're talking about the number of observations/cases. In our mini data set, each person is an observation/case.
  • Data dictionary (sometimes called a code book or metadata) : This is the document where you list your variable names, what the variables actually measure or represent, what each of the values of the variable mean if the meaning isn't obvious (i.e., if there are numbers assigned to gender), the level of measurement and anything special to know about the variables (for instance, the source if you mashed two data sets together). If you're using secondary data, the researchers sharing the data should make the data dictionary available .

Let's take that mini data set we've got up above and we'll show you what your data dictionary might look like in Table 2.4.

Table 2.4 Sample data dictionary/code book
Name Participant's first name open-ended response Nominal First name only. If name appears more than once, a random number has been attached to the end of the name to distinguish.
Age Participant's age at time of survey integer, in years Ratio Self-reported
Gender Participant's self-identified gender 0=cisgender female;       1=cisgender male;            2=non-binary; 3=transgender female; 4=transgender male; 5=another gender Nominal Self-reported
Hometown Participant's hometown 0=This town

1=Another town

Nominal Self-reported
Fav_Flav Participant's favorite ice cream open-ended response Nominal Self-reported

As part of planning for your research, you should come up with a data analysis plan. Remember, a data analysis plan is an ordered outline that includes your research question, a description of the data you are going to use to answer it, and the exact step-by-step analyses that you plan to run to answer your research question. A basic data analysis plan might look something like what you see in Table 2.5. Don't panic if you don't yet understand some of the statistical terms in the plan; we're going to delve into some of them in this section, and others will be covered in more depth in your statistics courses. Note here also that this is what operationalizing your variables and moving through your research with them looks like on a basic level. We will cover operationalization in more depth in Chapter 11.

Table 2.5 A basic data analysis plan
: What is the relationship between a person's race and their likelihood to graduate from high school?
: Individual-level U.S. American Community Survey data for 2017 from , which includes race/ethnicity and other demographic data (i.e., educational attainment, family income, employment status, citizenship, presence of both parents, etc.). Only including individuals for which race and educational attainment data is available.

, including mean, median, mode, range, distribution of interval/ratio variables, and missing values between the independent, control, and dependent variables. For instance, Chi-square test between race and high school graduation (both nominal variables), ANOVA on income and race. Correlations between interval/ratio variables. , like logistic regression, with high school graduation (yes/no) as my dependent variable, race as the independent variable, and multiple control variables I think are relevant based on my conceptual framework. of logistic regression results and of results.

An important point to remember is that you should never get stuck on using a particular statistical method because you or one of your co-researchers thinks it's cool or it's the hot thing in your field right now. You should certainly go into your data analysis plan with ideas, but in the end, you need to let your research question guide what statistical tests you plan to use. Be prepared to be flexible if your plan doesn't pan out because the data is behaving in unexpected ways.

You'll notice that the first step in the quantitative data analysis plan is univariate and descriptive statistics.   Univariate data analysis is a quantitative method in which a variable is examined individually to determine its distribution , or the way the scores are distributed across the levels, or values, of that variable. When we talk about levels ,  what we are talking about are the possible values of the variable—like a participant's age, income or gender. (Note that this is different from levels of measurement , which will be discussed in Chapter 11, but the level of measurement of your variables absolutely affects what kinds of analyses you can do with it.) Univariate analysis is n on-relational , which just means that we're not looking into how our variables relate to each other. Instead, we're looking at variables in isolation to try to understand them better. For this reason, univariate analysis is used for descriptive research questions.

So when do you use univariate data analysis? Always! It should be the first thing you do with your quantitative data, whether you are planning to move on to more sophisticated statistical analyses or are conducting a study to describe a new phenomenon. You need to understand what the values of each variable look like—what if one of your variables has a lot of missing data because participants didn't answer that question on your survey? What if there isn't much variation in the gender of your sample? These are things you'll learn through univariate analysis.

Did you know that ice cream causes shark attacks? It's true! When ice cream sales go up in the summer, so does the rate of shark attacks. So you'd better put down that ice cream cone, unless you want to make yourself look more delicious to a shark.

Photo of shark with open mouth emerging from water

Ok, so it's quite obviously not true that ice cream causes shark attacks. But if you looked at these two variables and how they're related, you'd notice that during times of the year with high ice cream sales, there are also the most shark attacks. Despite the fact that the conclusion we drew about the relationship was wrong, it's nonetheless true that these two variables appear related, and researchers figured that out through the use of bivariate analysis. (You will learn about correlation versus causation in  Chapter 8 .)

Bivariate analysis consists of a group of statistical techniques that examine the association between two variables. We could look at how anti-depressant medications and appetite are related, whether there is a relation between having a pet and emotional well-being, or if a policy-maker's level of education is related to how they vote on bills related to environmental issues.

Bivariate analysis forms the foundation of multivariate analysis, which we don't get to in this book. All you really need to know here is that there are steps beyond bivariate analysis, which you've undoubtedly seen in scholarly literature already! But before we can move forward with multivariate analysis, we need to understand the associations between the variables in our study .

[MADE THIS UP] Throughout your PhD program, you will learn more about quantitative data analysis techniques. Hopefully this section has provided you with some initial insights into how data is analyzed, and the importance of creating a data analysis plan prior to collecting data. Next, we will discuss some basic strategies for creating a qualitative data analysis plan.

If you don't see the general aim of your research question reflected in one of these areas, don't fret! This is only a small sampling of what you might be trying to accomplish with your qualitative study. Whatever your aim, you need to have a plan for what you will do once you have collected your data.

Iterative or linear

Some qualitative research is linear , meaning it follows more of a tra ditionally quantitative process: create a plan, gather data, and analyze data; each step is completed before we proceed to the next. You can think of this like how information is presented in this book. We discuss each topic, one after another. 

However, many times qualitative research is iterative , or evolving in cycles. An iterative approach means that once we begin collecting data, we also begin analyzing data as it is coming in. This early and ongoing analysis of our (incomplete) data then impacts our continued planning, data gathering and future analysis. Again, coming back to this book, while it may be written linear, we hope that you engage with it iteratively as you design and conduct your own research. By this we mean that you will revisit previous sections so you can understand how they fit together and you are in continuous process of building and revising how you think about the concepts you are learning about. 

As you may have guessed, there are benefits and challenges to both linear and iterative approaches. A linear approach is much more straightforward, each step being fairly defined. However, linear research being more defined and rigid also presents certain challenges. A linear approach assumes that we know what we need to ask or look for at the very beginning of data collection, which often is not the case.

With iterative research, we have more flexibility to adapt our approach as we learn new things. We still need to keep our approach systematic and organized, however, so that our work doesn't become a free-for-all. As we adapt, we do not want to stray too far from the original premise of our study. It's also important to remember with an iterative approach that we may risk ethical concerns if our work extends beyond the original boundaries of our informed consent and institutional review board agreement (IRB; see Chapter 6 for more on IRBs). If you feel that you do need to modify your original research plan in a significant way as you learn more about the topic, you can submit an addendum to modify your original application that was submitted. Make sure to keep detailed notes of the decisions that you are making and what is informing these choices. This helps to support transparency and your credibility throughout the research process.

As y ou begin your analysis, y ou need to get to know your data. This often  means reading through your data prior to any attempt at breaking it apart and labeling it. You mig ht read through a couple of times, in fact. This helps give you a more comprehensive feel for each piece of data and the data as a whole, again, before you start to break it down into smaller units or deconstruct it. This is especially important if others assisted us in the data collection process. We often gather data as part of team and everyone involved in the analysis needs to be very familiar with all of the data. 

During your reviewing you will start to develop and evolve your understanding of what the data means. Coding is a part of the qualitative data analysis process where we begin to interpret and assign meaning to the data. It represents one of the first steps as we begin to filter the data through our own subjective lens as the researcher. This understanding of the data should be dynamic and flexible, but you want to have a way to capture this understanding as it evolves. You may include this as part of your qualitative codebook where you are tracking the main ideas that are emerging and what they mean. Figure 2.2 is an example of how your thinking might change about a code and how you can go about capturing it. 

Figure 2.2 Example of coding in a codebook

There are a variety of different approaches to qualitative analysis, including thematic analysis, content analysis, grounded theory, phenomenology, photovoice, and more. The specific steps you will take to code your qualitative data, and to generate themes from these codes, will vary based on the analytic strategy you are employing. In designing your qualitative study, you would identify an analytical approach as you plan out your project. The one you select would depend on the type of data you have and what you want to accomplish with it.

  • Getting organized at the beginning of your project with a data analysis plan will help keep you on track. Data analysis plans should include your research question, a description of your data, and a step-by-step outline of what you're going to do with it. [chapter 14.1]

Exercises [from chapter 14.1]

  • Make a data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you're going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. You can do this exercise whether you're using quantitative or qualitative data! The same principles apply.
  • Make a draft quantitative data analysis plan for your project. Remember this should include your research question, a description of the data you will use, and a step-by-step outline of what you're going to do with your data once you have it, including statistical tests (non-relational and relational) that you plan to use. It's okay if you don't yet have a complete idea of the types of statistical analyses you might use.

2.4 Critical considerations

  • Critique the traditional role of researchers and identify how action research addresses these issues

So far in this chapter, we have presented the steps of research projects as follows:

  • Find a topic that is important to you and read about it.
  • Pose a question that is important to the literature and to your community.
  • Propose to use specific research methods and data analysis techniques to answer your question.
  • Carry out your project and report the results.

These were depicted in more detail in Table 2.1 earlier in this chapter. There are important limitations to this approach. This section examines those problems and how to address them.

Whose knowledge is privileged?

First, let's critically examine your role as the researcher. Following along with the steps in a research project, you start studying the literature on your topic, find a place where you can add to scientific knowledge, and conduct your study. But why are you the person who gets to decide what is important? Just as clients are the experts on their lives, members of your target population are the experts on their lives. What does it mean for a group of people to be researched on, rather than researched with? How can we better respect the knowledge and self-determination of community members?

data analysis in research plan example

A different way of approaching your research project is to start by talking with members of the target population and those who are knowledgeable about that community. Perhaps there is a community-led organization you can partner with on a research project. The researcher's role in this case would be more similar to a consultant, someone with specialized knowledge about research who can help communities study problems they consider to be important. The social worker is a co-investigator, and community members are equal partners in the research project. Each has a type of knowledge—scientific expertise vs. lived experience—that should inform the research process.

The community focus highlights something important: they are localized. These projects can dedicate themselves to issues at a single agency or within a service area. With a local scope, researchers can bring about change in their community. This is the purpose behind action research.

Action research

Action research   is research that is conducted for the purpose of creating social change. When engaging in action research, scholars collaborate with community stakeholders to conduct research that will be relevant to the community. Social workers who engage in action research don't just go it alone; instead, they collaborate with the people who are affected by the research at each stage in the process. Stakeholders, particularly those with the least power, should be consulted on the purpose of the research project, research questions, design, and reporting of results.

Action research also distinguishes itself from other research in that its purpose is to create change on an individual and community level. Kristin Esterberg puts it quite eloquently when she says, “At heart, all action researchers are concerned that research not simply contribute to knowledge but also lead to positive changes in people’s lives” (2002, p. 137). [2] Action research has multiple origins across the globe, including Kurt Lewin’s psychological experiments in the US and Paulo Friere’s literacy and education programs (Adelman, 1993; Reason, 1994). [3] Over the years, action research has become increasingly popular among scholars who wish for their work to have tangible outcomes that benefit the groups they study.

A traditional scientist might look at the literature or use their practice wisdom to formulate a question for quantitative or qualitative research, as we suggested earlier in this chapter. An action researcher, on the other hand, would consult with people in the target population and community to see what they believe the most pressing issues are and what their proposed solutions may be. In this way, action research flips traditional research on its head. Scientists are not the experts on the research topic. Instead, they are more like consultants who provide the tools and resources necessary for a target population to achieve their goals and to address social problems using social science research.

According to Healy (2001), [4] the assumptions of participatory-action research are that (a) oppression is caused by macro-level structures such as patriarchy and capitalism; (b) research should expose and confront the powerful; (c) researcher and participant relationships should be equal, with equitable distribution of research tasks and roles; and (d) research should result in consciousness-raising and collective action. Consistent with social work values, action research supports the self-determination of oppressed groups and privileges their voice and understanding through the conceptualization, design, data collection, data analysis, and dissemination processes of research. We will return to similar ideas in Part 4 of the textbook when we discuss qualitative research methods, though action research can certainly be used with quantitative research methods, as well.

  • Traditionally, researchers did not consult target populations and communities prior to formulating a research question. Action research proposes a more community-engaged model in which researchers are consultants that help communities research topics of import to them.

Post- awareness check (Knowledge)

Based on what you know of your target population, what are a few ways to receive their “buy-in” to participate in your proposed research study?

  • Apply the key concepts of action research to your project. How might you incorporate the perspectives and expertise of community members in your project?

The level that describes how data for variables are recorded. The level of measurement defines the type of operations can be conducted with your data. There are four levels: nominal, ordinal, interval, and ratio.

Referring to data analysis that doesn't examine how variables relate to each other.

a group of statistical techniques that examines the relationship between two variables

A research process where you create a plan, you gather your data, you analyze your data and each step is completed before you proceed to the next.

An iterative approach means that after planning and once we begin collecting data, we begin analyzing as data as it is coming in.  This early analysis of our (incomplete) data, then impacts our planning, ongoing data gathering and future analysis as it progresses.

Part of the qualitative data analysis process where we begin to interpret and assign meaning to the data.

A document that we use to keep track of and define the codes that we have identified (or are using) in our qualitative data analysis.

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book

Examples

Data Analysis Plan

data analysis in research plan example

With the use of a data analysis plan, you will probably know what to do when you are opting to analyze the data you have gathered. It is one of the most essential things to have that guides you on how you are going to do data collection appropriately. For some reasons, you might want to make sure that you are creating an effective plan . Through that, you should have gathered information that answers some questions in which you probably would want to know about. Having a good plan saves time. It is actually a very good idea to put some data that makes sense to your data analysis plan. Otherwise, you will feel disappointed and may think that what you are doing is worthless. 

10+ Data Analysis Plan Examples

1. data analysis plan template.

Data Analysis Plan Template

  • Google Docs

2. Survey Data Analysis Plan Template

Survey Data Analysis Plan Template

3. Qualitative Data Analysis Plan Template

Qualitative Data Analysis Plan Template

4. Scientific Data Analysis Plan

Scientific Data Analysis Plan

Size: 941 KB

5. Standard Data Analysis Plan

Standard Data Analysis Plan

Size: 247 KB

6. Formative Data Analysis Plan

Formative Data Analysis Plan

Size: 15 KB

7. Observational Study Data Analysis Plan

Observational Study Data Analysis Plan

Size: 34 KB

8. Data Analysis Plan and Products

Data Analysis Plan and Products

Size: 323 KB

9. Summary of Data Analysis Plan

Summary of Data Analysis Plan

Size: 667 KB

10. Professional Data Analysis Plan

Professional Data Analysis Plan

Size: 709 KB

11. National Data Analysis Plan

National Data Analysis Plan

Data Analysis Plan Definition

A data analysis plan is a roadmap that tells you the process on how to properly analyze and organize a particular data. It starts with the three main objectives. First, you have to answer your researched questions. Second, you should use questions that are more specific so that it can easily be understood. Third, you should segment respondents to compare their opinions with other groups.

Data Analysis Methods

Some data analysts would use a specific method. They usually work on both the qualitative data and quantitative data . Below are some of the methods used.

1. Regression Analysis

This is commonly used when you are going to determine the relationship of the variables. You are looking into the correlation of the dependent variable and independent variable. This aims to give an estimation of how an independent variable may impact the dependent variable. This is essential when you are going to make predictions and forecasts .

2. Monte Carlo Simulation

Expect different outcomes when you are making a decision. As individuals, we tend to weigh what’s better. Is it the pros or the cons? However, we cannot easily take which journey should we be going. We have to calculate all the potential risks. In the Monte Carlo Simulation, you are going to generate potential outcomes. This is usually used when you have to conduct a risk analysis that allows you to have a better forecast of what might happen in the future.

3. Factor Analysis

This is a kind of technique used to reduce large numbers to smaller ones. It works whenever multiple observable variables tend to correlate with each other. This is proven useful for it tends to uncover some hidden patterns. This would allow you to explore more concepts that are not easy to measure.

4. Cohort Analysis

A cohort analysis allows you to divide your users into small groups. You are going to monitor these groups from time to time. It is like examining their behavior which can lead you to identify patterns of behavior in a customer’s lifecycle. This is useful especially in business companies because it will serve as their avenue to tailor their service to their specific cohorts.

5. Cluster Analysis

This type of method identifies structures within the set of data. Its aim is to sort data into groups within clusters that are similar to each other and dissimilar to another cluster. This will help you gain insight as to how your data should be distributed.

6. Time Series Analysis

This is a statistical method that is used to determine trends. They measure the same variable to forecast how these variables would fluctuate in the future. There are three main patterns when conducting time series analysis. They are the trends, seasonality, and cyclic patterns.

7. Sentiment Analysis

There are insights that you can learn from what other people write about you. Using a sentiment analysis, you will be able to sort and understand data. Its goal is to interpret emotions that are being conveyed in the data. This may let you know about how other people feel about your brand or service.

What do you mean by aspect-based sentiment analysis?

An aspect-based sentiment analysis allows you to determine the type of emotion a customer writes that pertains to a featured product or campaign.

What is NLP?

NLP stands for Natural Language Processing. This is helpful in sentiment analysis because they use systems which are trained to associate inputs with outputs.

Why does identifying demographic groupings important?

This helps you understand the significance of your data and figure out what steps you need to perform to improve.

There have been a lot of methods to be used in data analysis plan, but it is also a good start to familiarize with the kind of data you have. It goes the same thing with the insights that are considered useful in the analysis . Having a good data plan can actually save your entire research. You just have to think logically to avoid errors before they can actually happen. One more thing to tell is that there have been a lot of data collection records among students. The moment you forgot about your variables and your data, your plan will become absolutely useless.

Twitter

Text prompt

  • Instructive
  • Professional

Create a study plan for final exams in high school

Develop a project timeline for a middle school science fair.

  • Open access
  • Published: 09 September 2024

Research status and frontiers of renal denervation for hypertension: a bibliometric analysis from 2004 to 2023

  • Jiaran Li 1   na1 ,
  • Xiaohan Zhang 1   na1 ,
  • Yuchen Jiang 1 ,
  • Huan Wang 1 ,
  • Xiongyi Gao 1 ,
  • Yuanhui Hu 1 &

Journal of Health, Population and Nutrition volume  43 , Article number:  142 ( 2024 ) Cite this article

Metrics details

Renal Denervation (RDN) is a novel non-pharmacological technique to treat hypertension. This technique lowers blood pressure by blocking the sympathetic nerve fibers around the renal artery, then causing a decrease in system sympathetic nerve excitability. This study aimed to visualize and analyze research hotspots and development trends in the field of RDN for hypertension through bibliometric analysis.

In total, 1479 studies were retrieved on the Web of Science Core Collection (WoSCC) database from 2004 to 2023. Using CiteSpace (6.2.R4) and VOSviewer (1.6.18), visualization maps were generated by relevant literature in the field of RDN for hypertension to demonstrate the research status and frontiers.

The number of publications was found to be generally increasing. Europe and the United States were the first countries to carry out research on different techniques and related RDN clinical trials. The efficacy and safety of RDN have been repeatedly verified and gained increasing attention. The study involves multiple disciplines, including the cardiovascular system, peripheral vascular disease, and physiological pathology, among others. Research hotspots focus on elucidating the mechanism of RDN in the treatment of hypertension and the advantages of RDN in appliance therapy. Additionally, the research frontiers include improvement of RDN instruments and techniques, as well as exploration of the therapeutic effects of RDN in diseases with increased sympathetic nerve activity.

The research hotspots and frontiers reflect the status and development trend of RDN in hypertension. In the future, it is necessary to strengthen international collaboration and cooperation, conduct long-term clinical studies with a large sample size, and continuously improve RDN technology and devices. These measures will provide new options for more patients with hypertension, thereby improving their quality of life.

Introduction

Hypertension is an extremely common chronic condition and public health event, affecting roughly 25% of the population [ 1 ].Poor blood pressure control may occur due to irregular adherence to pharmacological treatments and the subsequent lifestyle modifications required. This leads to increased risk for adverse cardiovascular events, longer hospital stays, and increased cost of treatment [ 2 , 3 ].The development of catheter-based renal denervation (RDN) is expected to address this limitation [ 4 , 5 ].

A strong correlation between blood pressure levels and the excessive activity of the sympathetic nervous system has been observed [ 6 , 7 ]. Activation of efferent renal sympathetic nerves results in renal arteriolar vasoconstriction, reducing renal blood flow and the release of renin [ 8 ]. Additionally, this process results in water-sodium retention due to activation of the renin-angiotensin-aldosterone system (RAAS), thereby increasing plasma volume and blood pressure [ 9 ]. RDN employs various techniques, including the use of radiofrequency energy, freezing energy, chemical denervation, or ultrasound-guided approaches. This technique aims to disrupt and interrupt the activity of the renal sympathetic nervous system and ultimately inhibit the sympathetic system throughout the body, leading to a reduction in arterial blood pressure [ 10 ]. According to the European Society of Hypertension (ESH) guidelines [ 11 ], RDN is a class II (level of evidence B) recommendation, and has become a therapeutic option for patients with uncontrolled or resistant hypertension.

The bibliometric analysis first emerged in the early 20th century and has become widely used as an analytical technique [ 12 , 13 ]. This method can evaluate the productivity of countries, institutions, and authors to track the development trends and hotspots in a specific research field [ 14 ]. The visualization software CiteSpace (6.2.R4) and VOSviewer (1.6.18) are widely used in the field of bibliometrics. In this paper, CiteSpace was mainly used to create visual graphs of literature keywords, and VOSviewer was used to visually analyze basic information such as countries, authors, etc. Hence, this study aimed to conduct a comprehensive summary and analysis of the field of treatment of hypertension in RDN.

Materials and methods

Data retrieval and collection.

Research data were obtained from the Web of Science Core Collection (WoSCC) database, which covers comprehensive research fields [ 15 ]. The retrieval strategy was (TS = (Hypertension OR “high blood pressure” OR hypertonic OR HPN OR HBP OR hyperpiesis OR hyperpiesia) AND TS= (“Renal Denervation”)) AND LA=(English). The retrieval time was set from January 1, 2004, to August 1, 2023, and 2288 works of literature were initially retrieved. Of these, 383 meeting abstracts, 244 editorial materials, 128 letters, 19 proceeding papers, 11 corrections, 10 early accesses, 9 book chapters, and 5 news items were excluded. Thus, a total of 1479 papers (433 review papers and 1046 research papers) were exported in the form of plain text files and saved in “download_***.txt” format within 1 day (August 9, 2023).

Data analysis

We conducted a visualization analysis of countries, authors, institutions, and journals through the VOSviewer software. A lower distance between the nodes in the visualization map corresponded to a closer similarity between the two themes [ 16 ]. In addition, to observe cooperation between countries more clearly, country co-occurrence maps formed by VOSviewer were first exported in GML format. Following this, Scimago Graphica software was used for the subsequent operations, resulting in the generation of a geographical distribution map of country cooperation.

CiteSpace software performs visualization analysis of keywords, to demonstrate the development process and trends in the research field. In the maps, the size of the node is related to the frequency of the keyword, the purple and the red rings in the outer area of the node represent the centrality and burstiness of the keyword, respectively [ 17 ]. The centrality of the keyword reflects the important role of keywords in this field, while its burstiness indicates the research hotspots.

The 1479 research papers used in this study were published across 351 journals in the period between 2004 and 2023, written by 5359 authors from 1890 organizations in 69 countries, and cited by 29,398 literature documents across 3721 journals.

Temporal distribution map of publications

Figure  1 shows the three phases in the development of this research field. The first phase, from 2004 to 2012, was characterized by an annual output of no more than 100 articles; this indicates that the field of treatment of hypertension in RDN was in its infancy and did not receive high attention. The number of papers being published increased from 2012 onwards and reached a peak in 2016; this reflected that RDN entered a rapid development stage between 2012 and 2017.

This was found to be related to the neutral results of the SYMPLICITY-3 HTN trial in 2014 on one hand, and the increased number of patients with hypertension on the other [ 18 ]. Meanwhile, research on hypertension in RDN reached the maturation stage in 2017. According to a citation analysis of 1479 articles, 29,398 citations were involved between 2004 and 2023, with an average of 29.55 citations per paper (h-index = 92).

figure 1

The trend of publications from 2004 to 2023

Distribution of co-authorship-countries/regions, institutions, and scholar authors

Two countries (USA and Germany) published more than half of all literature, demonstrating that the study of RDN was investigated more thoroughly in these countries (Table  1 ; Fig.  2 A and C). Figure  2 B shows the locations and collaboration of each country in this field have been shown in Fig.  2 B. America ranked first in total citations and h index, with 559 and 75 respectively. Figure  2 D indicates that the USA and Germany were early starters in the research of this field; although research in Asian countries began relatively later, they have made important contributions to the research field. In 2020, the Asia Renal Denervation Consortium fully affirmed the role of RDN and actively promoted this treatment as the initial choice for hypertension [ 19 ].

figure 2

Distribution map of countries/regions. A Density visualization of countries. B Collaboration of all countries by Scimago. C Occurrence of contributing countries. D Time overlay of main countries

As shown in Table  2 , five top scholar authors and institutions with the largest number of published research papers were listed. Figure  3 A describes the follow-up survey of patients with or without complications after RDN surgery conducted by Monash University, confirming the safety and efficacy of RDN in clinical applications [ 20 ]. In addition, this university has been involved in many experimental studies on RDN [ 21 ]. Figure  3 B shows inter-agency partnership, with the size of the nodes representing the number of posts and the lines representing the linkages, by dividing the 25 institutions that met the threshold into 2 clusters. As the most representative author, Mahfoud F was involved in the SYMPLICITY HTN-2 randomized clinical trial, and worked on the study of RDN for other cardiovascular diseases [ 22 , 23 ].

figure 3

Distribution map of institutions and scholar authors. A The occurrence of contributing institutions. B The occurrence of contributing scholar authors

Distribution of disciplines and journals

Figure  4 A shows the top 15 subject categories in the field of RDN for hypertension. As shown in Table  3 ; Fig.  4 B, the journal with the largest number of publications was Hypertension (105 papers), which publishes research on the regulation, clinical treatment, and prevention of hypertension. Hypertension also had the largest number of citations (5936 papers), followed by the European Heart Journal (2606 papers) and Journal of the American College of Cardiology (2498 papers). As the journal with the maximum number of publications and citations, Hypertension is authoritative and scientific in the field of treatment of hypertension in RDN.

figure 4

Distribution map of subject categories and journals. A Top 15 subject categories in the field of RDN for hypertension. B Occurrence of contributing journals

Distribution of highly cited literature and co-cited reference

The analysis of highly cited literature.

A greater number of citations was understood to indicate greater academic value of the paper. As of August 2023, 24 highly cited studies were finally retrieved, with a total of 7625 citations. Table  4 listed the top 10 most highly cited studies. The most cited study is “A Controlled Trial of Renal Denervation for Resistant Hypertension” published in the New England Journal of Medicine in 2014, conducted by Bakris et al. This study was a prospective, randomized, single-blind, randomized, sham-operated, controlled trial (SYMPLICITY HTN-3 clinical trial), conducted to show that RDN surgery for the treatment of resistant hypertension failed to achieve the expected results. This publication caused a great deal of controversy at the time, but also spurred the development of more randomized controlled trials [ 18 ].

Only one review, titled “The Autonomic Nervous System and Hypertension”, describes in detail the effect of the mechanism of adrenergic and vagal abnormalities on the characteristic structure of hypertension, leading to essential hypertension and organic damage [ 24 ]. In this study, Mancia and Grassi noted that the activation of adrenergic nerves is an unstable process, and can be overdriven by the progression of hypertension [ 25 , 26 ]. Moreover, previous studies have confirmed that in the adrenergic system, sympathetic nervous system hyperactivity may be the determinant of blood pressure variability [ 27 , 28 ], This is generally considered an independent risk factor for cardiovascular diseases such as heart failure and severe arrhythmias. In addition to traditional medical therapy, invasive approaches, such as continuous carotid baroreceptor stimulation and renal denervation, have been proven to effectively reduce blood pressure levels in patients with resistant hypertension [ 29 , 30 ], supplementing the mechanism of RDN for hypertension.

The analysis of co-cited references

According to the citation analysis of the publications listed in Fig.  5 A, the top three co-cited references revealed the effectiveness and safety of RDN for treating resistant hypertension through randomized controlled trials. After 6 months of assessing the patient’s blood pressure levels, the efficacy of RDN in the treatment of resistant hypertension was confirmed [ 31 , 32 ]. This study can provide an important clinical rationale for this research field. Using VOSviewer software to visualize and analyze the co-cited references, 25 references reaching the threshold were finally included and co-citation mapping was generated. Figure  5 B shows the division of the 25 kinds of literature into two clusters based on the intensity of collaboration. Most of these were clinical research studies in the field of cardiovascular systems, with a few in the field of physiology.

figure 5

Distribution map of co-cited reference. A Top 10 local cited references in the field of RDN for hypertension. B Visualization of co-cited references

Distribution of keywords

Analysis of keyword co-occurrence.

Keywords represent the core of the article, and their visual analysis plays an important role in exploring the frontiers and the development directions of this field. Figur6A shows the formation of eight clusters by running with a k-means clustering algorithm. Figure  6 B shows the co-occurrence map of 150 keywords with times of occurrence greater than 15, created using VOSviewer software. Additionally, the top 20 keywords in terms of number of occurrences were also listed in Table  5 . Apart from renal denervation and hypertension, keywords with high frequency included resistant hypertension, blood pressure, sympathetic nervous system, and trial, which are terms frequently used in the cardiovascular field (Fig.  6 C and D).

According to Fig.  6 B, keywords were mainly concentrated in the fields of cardiovascular system and physiology, and three clusters were formed. The red cluster consisted of 64 nodes, including hypertension, renin-angiotensin system, oxidative stress, resistant hypertension, and nervous system, and was focused on the mechanism of hypertension. For example, development of neurogenic hypertension is closely related to elevated angiotensin-II, inflammation, and vascular dysfunction [ 33 ].

The green cluster mainly consisted of renal denervation, controlled trial, prevalence, management, and blood-pressure reduction, and concentrates on the analysis of clinical trials of hypertension and the therapeutic effects of RDN on hypertension. An animal study illustrated the therapeutic effect of RDN by reducing factors associated with hypertension, such as ang-II or actin-binding protein [ 34 ].

The blue cluster consisting of atrial fibrillation and pulmonary vein isolation, reflected that RDN can treat other diseases as well as hypertension [ 35 ]. A meta-analysis conducted by Ukena indicated the effectiveness of RDN as an adjunctive treatment for patients with atrial fibrillation [ 36 ].

figure 6

Visualization of keywords. A Cluster map of keywords. B Occurrence map of keywords. C Word cloud map of the top 50 keywords. D Density visualization of keywords

Analysis of keywords timeline map

The timeline of keyword analysis explored the development and evolution of this research field. The time slice was set to 3 years, the g-index was set to 5, and the threshold of occurrences of the keyword was set to 14; this formed the timeline map of keywords in this research field by using CiteSpace software (Fig.  7 A). The size of the node represented the frequency of keyword occurrence, with the position related to the year. The purple and red of the outer ring of the node indicated the centrality and the strength of the keyword respectively, and the connecting line displayed a close association with the co-occurrence relationship between keywords.

Figure  7 A displays that research themes in the field of RDN for hypertension can be divided into 3 phases from 2004 to 2023. Firstly, from 2004 to 2010, research themes mainly involved theoretical exploration of RDN and research on the pathogenesis of hypertension [ 37 ], and included stimulation [ 38 , 39 ], fetal origins [ 40 ] and receptors [ 41 ].

From 2010 to 2016, keywords were heavily concentrated. During this period, the range of research was further expanded, and the number of themes increased rapidly. Therefore, it can be concluded that this phase focused on clinical trial research on RDN for hypertension. High-frequency keywords reflecting this stage included controlled trial [ 42 , 43 , 44 ], double-blind [ 45 , 46 ], meta-analyze [ 47 ], and simplicity htn 3 [ 48 ].

After 2016, the development of RDN has entered a phase of relative maturity, with further evidence of its effectiveness in treating hypertension. Therefore, this stage mainly involves exploring the extensive application of RDN in the cardiovascular field, while continuing to improve treatments for hypertension. This phase was analyzed around pulmonary vein isolation [ 36 ], baroreflex activation therapy [ 33 ], and rostral ventrolateral medulla [ 49 ]. Research around these themes is predicted to continue in the near future.

Analysis of keywords with high citation bursts

Figure  7 B represents commencement and termination times as well as the intensity of keyword bursts, where the blue color denotes temporal placement of the retrieved keywords, and the red line signifies peak intensity of keyword bursts. The examination of keyword mutations aids in investigating the frontiers of the research field.

In Fig.  7 B, uncontrolled hypertension had the highest burst intensity of 22.3, lasting from 2019 to the present. The subsequent keywords are neural control and the United States, respectively, which have already passed the period of highest outburst intensity. Early appearance of the sympathetic nervous system and arterial pressure confirmed that the mechanism of hypertension was a research hotspot in the early days of this phase. In addition to uncontrolled hypertension, keywords currently in the burst period include safety, hypertension, and cardiovascular disease, which represent the current frontiers in this field of research.

figure 7

A Timeline map of keywords from 2004 to 2023. B Top 15 keywords with the strongest citation bursts

General information

In this study, bibliometric methods were applied to examine the development and application of RDN in hypertension. Since the initial application of this technique in 2009 [ 50 ], RDN has undergone numerous clinical trials and has been validated as an effective modality to treat hypertension, in addition to pharmacologic therapy and lifestyle interventions. The technique encompasses three primary categories: radiofrequency ablation, ultrasound ablation, and chemical ablation. Meanwhile, radiofrequency ablation, ultrasound ablation. Meanwhile, radiofrequency ablation and ultrasound ablation were approved for hypertension treatment by the U.S. Food and Drug Administration (FDA) in November 2023. This represents a significant development in the field. Based on the bibliometric analysis, the following representative clinical studies are as follows:

Currently, radiofrequency ablation is the most advanced technique used in RDN. Large-scale prospective studies and clinical trials of radiofrequency based RDN (rRDN) have been conducted in the United States, Europe, and Asia; among these, the most prominent are the SYMPLICITY trials. Initial clinical studies provided evidence in support of the efficacy of rRDN in lowering blood pressure. However, the SYMPLICITY HTN-3 clinical trial did not confirm the efficacy of this technique in the treatment of resistant hypertension, as investigated by a 6-month follow-up of 535 patients from the United States [ 18 ]. However, a 3-year follow-up revealed a 24-hour ambulatory systolic blood pressure (SBP) change of − 15.6 mmHg in the rRDN group, which is a significantly superior level of blood pressure compared to the sham control group (–0.3 mmHg) [ 51 ]. This trial corroborates the long-term efficacy of rRDN in patients with resistant hypertension, supported by the supplementary evidence provided for SYMPLICITY HTN-3. A total of 1,742 patients with uncontrolled hypertension were enrolled in the Global SYMPLICITY Registry at 196 active centers in 45 countries following the rRDN procedure [ 52 ]. The data from the three-year follow-up period indicated sustained the reductions in office blood pressure (–16.5 mmHg) and 24-hour ambulatory SBP (–8.0 mmHg) were sustained. Furthermore, the occurrence of cardiovascular mortality and major adverse events associated with rRDN was markedly reduced at the three-year follow-up. This study has demonstrated the durability and safety of this procedure.

The sham-controlled RADIANCE-HTN trials, conducted to investigate the efficacy of endovascular ultrasound RDN (uRDN) in lowering blood pressure, included two cohorts: the SOLO cohort [ 53 ] and the TRIO cohort [ 54 ]. In the RADIANCE-HTN SOLO study, 146 patients with mild hypertension who had discontinued their antihypertensive medications were randomly assigned to the uRDN group and the sham-controlled group. In the RADIANCE-HTN TRIO trial, 136 patients with severe resistant hypertension resistant to three antihypertensive agents were randomized into groups. The results of the two-month follow-up indicated that the uRDN group had more pronounced reductions in 24-hour ambulatory SBP (SOLO cohort: − 8.5 mmHg, TRIO cohort: − 8.0 mmHg) compared to the sham procedure groups (SOLO cohort: − 2.2 mmHg, TRIO cohort: − 3.0 mmHg). These findings substantiated the efficacy and safety of uRDN in the treatment of hypertension.

Previous clinical studies [ 55 ] have demonstrated the feasibility of alcohol mediated RDN. However, these studies were limited by a small sample size and a lack of sufficient clinical evidence. In the TARGET-BP off-MED study [ 56 ], 106 patients with uncontrolled hypertension were recruited. No significant difference was observed in 24-hour ambulatory SBP reductions (–1.5 mmHg versus − 4.6 mmHg). This trial confirmed the safety of alcohol mediated RDN; however, its efficacy has yet to be demonstrated. More recently, the TARGET BP I Randomized Clinical Trial was recently conducted [ 57 ], enrolling 301 patients with uncontrolled hypertension. These patients were prescribed 2–5 antihypertensive drugs, randomly assigned to RDN or sham control. The results of the trial demonstrated that after a three-month follow-up duration, a statistically significant difference in mean 24-hour ambulatory SBP between the RDN group and the sham-controlled group was observed (–10 ± 14.2 mmHg versus − 6.8 ± 12.1 mmHg, P  = 0.0487). Additionally, the occurrence of adverse events during the six-month follow-up period was minimal; this substantiates the intermediate-term safety and efficacy of the alcohol-mediated RDN in patients with uncontrolled hypertension.

Hotspots and frontiers

The analysis of the co-citation relationship between the literature and cited reference can assist in the development of a knowledge framework and identification of research hotspots. Keywords, which represent the essence of an article, can be used to ascertain the hotspot direction of a research field. In conclusion, research hotspots identified through analysis of co-citation relationships and clustering and timeline analysis of keywords can be summarized as follows: mechanism of RDN for the treatment of hypertension and advantages of RDN in appliance therapy.

Mechanism of RDN for hypertension

Research on the mechanism of hypertension is mostly concentrated in the over-activation of the RAAS system, sympathetic dysfunction, and release of inflammatory factors [ 58 , 59 , 60 ]. As a pivotal organ and nervous system for regulating blood pressure, enhanced renal efferent sympathetic nerve activity promotes activation of beta1-adrenergic receptors of glomerular parietal cells. This influences turn on renin secretion, glomerular filtration rate, and renal tubular sodium reabsorption [ 61 ].The renal afferent nerve provides continuous feedback to the central autonomic nuclei and regulates the central sympathetic nervous system [ 62 ]. Therefore, the interaction between renal efferent and afferent sympathetic nerves constitutes renal sympathetic nerve activity (RSNA), which plays a pivotal role in the physiopathology of hypertension [ 63 ].

Distribution of renal efferent nerve fibers is particularly dense in the vicinity of renal arteries and veins, and some branches of these blood vessels are distributed around the arterial vascular segments outside the renal cortex and medulla [ 62 ]. The RDN technique reduces blood pressure by removing efferent and afferent fibers from the renal sympathetic nervous system, increasing water and sodium excretion, and decreasing the RSNA system and systemic sympathetic nervous system activity. In other words, the antihypertensive mechanism of RDN is equivalent to the combinations of antihypertensive agents, including beta-blockers, calcium channel blockers (CCBs), angiotensin receptor blockers (ARBs), angiotensin converting enzyme inhibitors (ACEI), and thiazide-like diuretics. Hence, RDN is a more suitable treatment option for patients with resistant hypertension.

Advantages of RDN in appliance therapy

At present, in addition to the RDN technology, several devices have been proposed to treat resistant hypertension. These include a central iliac arteriovenous coupler, electrical baroreflex activation therapy, and others.

The prompt decline in blood pressure following treatment of the anastomosis may be attributed to the formation of a low-resistance, high-compliance venous segment into the central arterial tree [ 7 , 64 , 65 ]. In a randomized controlled trial (the ROX CONTROL HTN study) [ 66 ], 44 patients were randomly assigned to the arteriovenous coupler group and 39 to the medical therapy group. The six-month postoperative follow-up indicated that levels of blood pressure in the anastomosis group (office blood pressure: − 23.2 mmHg systolic and − 17.7 mmHg diastolic, 24-hour ambulatory blood pressure: − 13.0 mmHg systolic and − 13.0 mmHg diastolic) exhibited a notable decrease when compared with the control group (–1.5[16.7] mmHg systolic and − 1.1[10.5] mmHg diastolic). While this study demonstrates the efficacy of the central iliac arteriovenous coupler for treating resistant hypertension, it is important to note that there 25 adverse events occurred among the patients who underwent this surgical procedure; these included urinary retention, transient pain, anemia, and others. Twelve patients developed significant unilateral lower extremity edema and were subsequently diagnosed with iliac vein stenosis proximal to the anastomosis. To date, no reports or clinical trials have been published regarding the long-term safety of central iliac arteriovenous coupler treatment; hence, this needs to be investigated in the future.

Electrical baroreflex activation therapy (BAT) stimulates the baroreceptors in the carotid sinus, inhibiting the activity of sympathetic nerves and ultimately reducing blood pressure [ 67 ]. In a randomized, double-blind, placebo-controlled study [ 68 ], 256 patients with resistant hypertension were randomly assigned to one of two groups. Group A was administered immediate BAT, while Group B underwent BAT deferred for six months. While the second-generation BAT technique is a significant advance over existing approaches, further randomized treatment trials must be conducted to prove the durability and safety of this technique for patients with resistant hypertension. Other technologies are not yet sufficiently mature for widespread adoption, and further studies and clinical trials are needed to improve their theranostic applications. In contrast, RDN has been verified and analyzed in numerous randomized clinical trials; hence, it offers a more objective theoretical basis, and plays an invaluable role in the treatment of resistant hypertension.

By analyzing the keywords with high citation bursts in this paper, we can summarize the research frontiers of RDN can be summarized as follows:

Improvement of RDN instruments and techniques

Although the RDN technique has become relatively established and confirmed by multiple studies, some uncertainty about this surgery persists. A study conducted by De Jong et al. [ 69 ] showed that 30% of patients did not respond when undergoing the RDN procedure with the monopoly ablation catheter, this may be due to the different proportions of sympathetic and parasympathetic tissue around the renal arteries [ 70 ]. Therefore, improvements to RDN instrumentation and techniques remain an important direction for research.

Taking rRDN as an example, in the SPYRAL HTN-OFF MED trial [ 48 ], the Symplicity Spyral multielectrode ablation catheter and the Symplicity G3 ablation radiofrequency generator were applied to patients with uncontrolled hypertension. The Symplicity Spyral catheter can ablate renal arteries with vessel diameters ranging from 3 to 8 mm. The catheter is equipped with four helical electrodes, enabling ablation of four quadrants of the renal artery trunk and at least two quadrants of the renal artery branches. The results of this trial demonstrated a significant benefit of RDN in terms of 24-hour ambulatory blood pressure reduction in the absence of antihypertensive agents, with stable levels of blood pressure throughout the day and no adverse events during the six-month follow-up period. However, this catheter has certain limitations. These include the following: the relatively fixed spiral structure of the catheter makes conformation to the morphology of the blood vessels challenging; the continuity of the ablation energy field is less effective, resulting in a longer ablation time.

A recent Netrod-HTN clinical study conducted in China investigated the efficacy and safety of the Netrod Six-Electrode rRDN for the treatment of uncontrolled hypertension. The Netrod Six-Electrode catheter increased the diameter of the renal artery vessels to 2–12 mm and adapted intelligently to morphological changes in the blood vessels, ensuring continuity of the ablation energy field. The procedure time was shortened, and the surgical efficiency increased significantly. Further prospective studies are required to verify long-term efficacy and safety.

At present, the RDN technique still presents some limits such as vascular endothelial damage, severe pain, etc. Further improvement of the RDN instrument and conduction of more research studies with multi-center, large sample size, real world studies represent potential future research directions.

Exploration of the therapeutic effects of RDN in diseases with increased sympathetic nerve activity

The increased sympathetic nerve activity is closely linked to cardiovascular diseases and metabolic abnormalities. Christian Ukena et al. [ 36 ] randomized patients with hypertension combined with atrial fibrillation(AF) into the RDN combined pulmonary vein isolation(PVI) group as well as the PVI alone group, and after 12 months, the odds ratio for AF recurrence was 0.43 for the combined surgery group versus the PVI alone group, suggesting that RDN can significantly reduce the recurrence of AF.

The subcohort of the SYMPLICITY HTN-2 clinical trial showed reduced blood pressure in RDN treatment with a concomitant reduction in fasting glucose (from 118 ± 3.4 to 108 ± 3.8 mg/dL), insulin levels (from 20.8 ± 3.0 to 9.3 ± 2.5 µIU/mL), and C-peptide levels (from 5.3 ± 0.6 to 3.0 ± 0.9 ng/mL) at 3-month follow-ups [ 71 ]. As an important regulator of insulin resistance, excitation of the sympathetic nervous system is associated with increased risks of central obesity and diabetes. RDN can improve glucose metabolism and insulin sensitivity, which can be explained by reduction of noradrenaline and inhibition of renal sympathetic activity.

Furthermore, some pathophysiology studies and clinical trials have demonstrated the treatment efficacy of RDN on chronic heart failure, left ventricular hypertrophy, and autonomic nerve dysfunction [ 61 , 62 , 63 ]. Thus, future research should explore the potential therapeutic effects of RDN in diseases with increased sympathetic nerve activity.

Limitations

Research data were obtained from the WoSCC. Only 433 review papers and 1,046 research papers were included in the analysis, as other types of literature and non-English sources were excluded; hence, there was a certain degree of source bias.

Conclusions

As the first bibliometric study of RDN for hypertension, this paper provided a comprehensive and objective summary of the status and clinical trials of the RDN technique, manifesting the efficacy, durability, and safety. The improvement of RDN ablation catheters and exploration of the therapeutic effects of RDN in diseases with increased sympathetic nerve activity will be the priority of future research. This will confirm the importance of RDN technology hypertension therapy and provide more treatment options to patients with hypertension or other diseases. Our study provides references and implications for future research into RDN for the treatment of hypertension.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Atrial fibrillation

Angiotensin converting enzyme inhibitors

Angiotensin receptor blockers

Baroreflex activation therapy

European Society of Hypertension

Calcium channel blockers

Food and Drug Administration

Renin-angiotensin-aldosterone system

Renal Denervation

Radiofrequency-based Renal Denervation

Renal sympathetic nerve activity

Systolic blood pressure

Pulmonary vein isolation

Ultrasound Renal Denervation

Web of Science Core Collection

Muntner P, Hardy ST, Fine LJ, Jaeger BC, Wozniak G, Levitan EB, et al. Trends in blood pressure control among US adults with hypertension, 1999–2000 to 2017–2018. JAMA. 2020;324:1190.

Article   PubMed   Google Scholar  

Azadi NA. The effect of education based on health belief model on promoting preventive behaviors of hypertensive disease in staff of the. Iran University of Medical Sciences; 2021.

Majd Z, Mohan A, Johnson ML, Essien EJ, Barner JC, Serna O, et al. Patient-reported barriers to adherence among ACEI/ARB users from a motivational interviewing telephonic intervention. Patient Prefer Adherence. 2022;16:2739–48.

Article   PubMed   PubMed Central   Google Scholar  

Lederman D, Zheng B, Wang X, Sumkin JH, Gur D. A GMM-based breast cancer risk stratification using a resonance-frequency electrical impedance spectroscopy: a breast cancer risk stratification using an REIS. Med Phys. 2011;38:1649–59.

Kandzari DE, Mirror. Mirror on the Wall. JACC: Cardiovasc Interventions. 2021;14:2625–8.

Google Scholar  

DiBona GF. Sympathetic nervous system and hypertension. Hypertension. 2013;61:556–60.

Article   CAS   PubMed   Google Scholar  

Lauder L, Azizi M, Kirtane AJ, Böhm M, Mahfoud F. Device-based therapies for arterial hypertension. Nat Rev Cardiol. 2020;17:614–28.

Zheng H, Liu X, Katsurada K, Patel KP. Renal denervation improves sodium excretion in rats with chronic heart failure: effects on expression of renal ENaC and AQP2. Am J Physiol Heart Circ Physiol. 2019;317:H958–68.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Kawasaki T, Oyoshi T, Hirata N. Anesthesia Management of a Liver Transplant Recipient with Remimazolam. Tulgar S, editor. Case Reports in Anesthesiology. 2023;2023:1–4.

Kandzari DE, Townsend RR, Bakris G, Basile J, Bloch MJ, Cohen DL et al. Renal denervation in hypertension patients: Proceedings from an expert consensus roundtable cosponsored by SCAI and NKF. Cathet Cardio Intervent. 2021;98:416–26.

Mancia G, Kreutz R, Brunström M, Burnier M, Grassi G, Januszewicz A et al. 2023 ESH guidelines for the management of arterial hypertension the Task Force for the management of arterial hypertension of the European Society of Hypertension Endorsed by the International Society of Hypertension (ISH) and the European Renal Association (ERA). J Hypertens. 2023.

Xie Y, Shi K, Yuan Y, Gu M, Zhang S, Wang K, et al. Bibliometric analysis reveals the progress of PM2.5 in Health Research, especially in Cancer Research. Int J Environ Res Public Health. 2023;20:1271.

Gao Y, Wang F, Song Y, Liu H. The status of and trends in the pharmacology of berberine: a bibliometric review [1985–2018]. Chin Med. 2020;15:7.

Shi J, Shi S, Yuan G, Jia Q, Shi S, Zhu X, et al. Bibliometric analysis of chloride channel research (2004–2019). Channels. 2020;14:393–402.

Merigó JM, Yang J-B. A bibliometric analysis of operations research and management science. Omega. 2017;73:37–48.

Article   Google Scholar  

van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84:523–38.

Chen C, Hu Z, Liu S, Tseng H. Emerging trends in regenerative medicine: a scientometric analysis in CiteSpace . Expert Opin Biol Ther. 2012;12:593–608.

Bhatt DL, Kandzari DE, O’Neill WW, D’Agostino R, Flack JM, Katzen BT, et al. A controlled trial of renal denervation for resistant hypertension. N Engl J Med. 2014;370:1393–401.

Kario K, Kim B-K, Aoki J, Wong AY-T, Lee Y-H, Wongpraparut N, et al. Renal denervation in Asia Consensus Statement of the Asia Renal Denervation Consortium. Hypertension. 2020;75:590–602.

Lambert GW, Hering D, Esler MD, Marusic P, Lambert EA, Tanamas SK, et al. Health-Related Quality of Life after renal denervation in patients with treatment-resistant hypertension. Hypertension. 2012;60:1479–84.

Esler MD, Boehm M, Sievert H, Rump CL, Schmieder RE, Krum H, et al. Catheter-based renal denervation for treatment of patients with treatment-resistant hypertension: 36 month results from the SYMPLICITY HTN-2 randomized clinical trial. Eur Heart J. 2014;35:1752–9.

Boehm M, Ewen S, Kindermann I, Linz D, Ukena C, Mahfoud F. Renal denervation and heart failure. Eur J Heart Fail. 2014;16:608–13.

Goetzinger F, Kunz M, Lauder L, Boehm M, Mahfoud F. Arterial Hypertension-clinical trials update 2023. Hypertens Res [Internet]. 2023 [cited 2023 Sep 23]; https://www.nature.com/articles/s41440-023-01359-y

Mancia G, Grassi G. The autonomic nervous system and hypertension. Circ Res. 2014;114:1804–14.

Smith P. Relationship between central sympathetic activity and stages of human hypertension. Am J Hypertens. 2004;17:217–22.

Grassi G, Cattaneo BM, Seravalle G, Lanfranchi A, Mancia G. Baroreflex control of sympathetic nerve activity in essential and secondary hypertension. Hypertension. 1998;31:68–72.

Grassi G. Assessment of Sympathetic Cardiovascular Drive in Human Hypertension: achievements and perspectives. Hypertension. 2009;54:690–7.

Mancia G, Grassi G, Giannattasio C, Seravalle G. Sympathetic activation in the pathogenesis of hypertension and progression of Organ damage. Hypertension. 1999;34:724–8.

Xu J, Hering D, Sata Y, Walton A, Krum H, Esler MA, et al. Renal denervation: current implications and future perspectives. Clin Sci. 2014;126:41–53.

Article   CAS   Google Scholar  

Menne J, Jordan J, Linnenweber-Held S, Haller H. Resistant hypertension: baroreflex stimulation as a new tool. Nephrol Dial Transpl. 2013;28:288–95.

Renal sympathetic denervation. In patients with treatment-resistant hypertension (the Symplicity HTN-2 trial): a randomised controlled trial. Lancet. 2010;376:1903–9.

Krum H, Schlaich M, Whitbourn R, Sobotka PA, Sadowski J, Bartus K, et al. Catheter-based renal sympathetic denervation for resistant hypertension: a multicentre safety and proof-of-principle cohort study. Lancet. 2009;373:1275–81.

Fisher JP, Paton JFR. The sympathetic nervous system and blood pressure in humans: implications for hypertension. J Hum Hypertens. 2012;26:463–75.

Ong J, Kinsman BJ, Sved AF, Rush BM, Tan RJ, Carattino MD, et al. Renal sensory nerves increase sympathetic nerve activity and blood pressure in 2-kidney 1-clip hypertensive mice. J Neurophysiol. 2019;122:358–67.

Berukstis A, Navickas R, Neverauskaite-Piliponiene G, Ryliskyte L, Misiura J, Vajauskas D, et al. Arterial destiffening starts early after renal artery denervation. Int J Hypertens. 2019;2019:3845690.

Ukena C, Becker N, Pavlicek V, Millenaar D, Ewen S, Linz D, et al. Catheter-based renal denervation as adjunct to pulmonary vein isolation for treatment of atrial fibrillation: a systematic review and meta-analysis. J Hypertens. 2020;38:783–90.

Jacob F, Clark LA, Guzman PA, Osborn JW. Role of renal nerves in development of hypertension in DOCA-salt model in rats: a telemetric approach. Am J Physiol-Heart Circul Physiol. 2005;289:H1519–29.

Girchev RA, Baecker A, Markova PP, Kramer HJ. Interaction of endothelin with renal nerves modulates kidney function in spontaneously hypertensive rats. Kidney Blood Press Res. 2006;29:126–34.

Hamza SM, Kaufman S. Splenorenal reflex modulates renal blood flow in the rat. J Physiol-London. 2004;558:277–82.

Alexander BT, Hendon AE, Ferril G, Dwyer TM. Renal denervation abolishes hypertension in low-birth-weight offspring from pregnant rats with reduced uterine perfusion. Hypertension. 2005;45:754–8.

Hendel MD, Collister JP. Renal denervation attenuates long-term hypertensive effects of angiotensin II in the rat. Clin Exp Pharmacol Physiol. 2006;33:1225–30.

Mahfoud F, Tunev S, Ewen S, Cremers B, Ruwart J, Schulz-Jander D, et al. Impact of Lesion Placement on Efficacy and Safety of Catheter-based Radiofrequency Renal Denervation. J Am Coll Cardiol. 2015;66:1766–75.

Schirmer SH, Sayed MMYA, Reil J-C, Ukena C, Linz D, Kindermann M, et al. Improvements in left ventricular hypertrophy and diastolic function following renal denervation effects beyond blood pressure and heart rate reduction. J Am Coll Cardiol. 2014;63:1916–23.

Rosa J, Widimsky P, Waldauf P, Lambert L, Zelinka T, Taborsky M, et al. Role of adding spironolactone and renal denervation in true resistant hypertension one-year outcomes of Randomized PRAGUE-15 study. Hypertension. 2016;67:397–403.

Persu A, Renkin J, Thijs L, Staessen JA. Renal denervation Ultima ratio or standard in treatment-resistant hypertension. Hypertension. 2012;60:596–.

Oliveras A, Armario P, Clara A, Sans-Atxer L, Vazquez S, Pascual J, et al. Spironolactone versus sympathetic renal denervation to treat true resistant hypertension: results from the DENERVHTA study a randomized controlled trial. J Hypertens. 2016;34:1863–71.

Ewen S, Ukena C, Linz D, Kindermann I, Cremers B, Laufs U, et al. Reduced effect of percutaneous renal denervation on blood pressure in patients with isolated systolic hypertension. Hypertension. 2015;65:193–9.

Townsend RR, Mahfoud F, Kandzari DE, Kario K, Pocock S, Weber MA, et al. Catheter-based renal denervation in patients with uncontrolled hypertension in the absence of antihypertensive medications (SPYRAL HTN-OFF MED): a randomised, sham-controlled, proof-of-concept trial. Lancet. 2017;390:2160–70.

DeLalio LJ, Sved AF, Stocker SD. Sympathetic nervous system contributions to hypertension: updates and therapeutic relevance. Can J Cardiol. 2020;36:712–20.

Schlaich MP, Sobotka PA, Krum H, Lambert E, Esler MD. Renal sympathetic-nerve ablation for uncontrolled hypertension. N Engl J Med. 2009;361:932–4.

Bhatt DL, Vaduganathan M, Kandzari DE, Leon MB, Rocha-Singh K, Townsend RR, et al. Long-term outcomes after catheter-based renal artery denervation for resistant hypertension: final follow-up of the randomised SYMPLICITY HTN-3 trial. Lancet. 2022;400:1405–16.

Mahfoud F, Böhm M, Schmieder R, Narkiewicz K, Ewen S, Ruilope L, et al. Effects of renal denervation on kidney function and long-term outcomes: 3-year follow-up from the Global SYMPLICITY Registry. Eur Heart J. 2019;40:3474–82.

Azizi M, Schmieder RE, Mahfoud F, Weber MA, Daemen J, Davies J, et al. Endovascular ultrasound renal denervation to treat hypertension (RADIANCE-HTN SOLO): a multicentre, international, single-blind, randomised, sham-controlled trial. Lancet. 2018;391:2335–45.

Azizi M, Sanghvi K, Saxena M, Gosse P, Reilly JP, Levy T, et al. Ultrasound renal denervation for hypertension resistant to a triple medication pill (RADIANCE-HTN TRIO): a randomised, multicentre, single-blind, sham-controlled trial. Lancet. 2021;397:2476–86.

Mahfoud F, Renkin J, Sievert H, Bertog S, Ewen S, Böhm M, et al. Alcohol-mediated renal denervation using the Peregrine System Infusion Catheter for Treatment of Hypertension. JACC Cardiovasc Interv. 2020;13:471–84.

Pathak A, Rudolph UM, Saxena M, Zeller T, Müller-Ehmsen J, Lipsic E, et al. Alcohol-mediated renal denervation in patients with hypertension in the absence of antihypertensive medications. EuroIntervention. 2023;19:602–11.

Kandzari DE, Weber MA, Pathak A, Zidar JP, Saxena M, David SW, et al. Effect of alcohol-mediated renal denervation on blood pressure in the Presence of Antihypertensive medications: primary results from the TARGET BP I randomized clinical trial. Circulation. 2024;149:1875–84.

Czesnikiewicz-Guzik M, Osmenda G, Siedlinski M, Nosalski R, Pelka P, Nowakowski D, et al. Causal association between periodontitis and hypertension: evidence from mendelian randomization and a randomized controlled trial of non-surgical periodontal therapy. Eur Heart J. 2019;40:3459–70.

Zhuo Z, Lin H, Liang J, Ma P, Li J, Huang L, et al. Mitophagy-related gene signature for Prediction Prognosis, Immune Scenery, Mutation, and Chemotherapy Response in Pancreatic Cancer. Front Cell Dev Biol. 2021;9:802528.

Dzau VJ, Balatbat CA. Future of hypertension. Hypertension. 2019;74:450–7.

Parati G, Esler M. The human sympathetic nervous system: its relevance in hypertension and heart failure. Eur Heart J. 2012;33:1058–66.

Sata Y, Head GA, Denton K, May CN, Schlaich MP. Role of the sympathetic nervous system and its modulation in renal hypertension. Front Med. 2018;5:82.

Ram CVS. Status of renal denervation therapy for hypertension. Circulation. 2019;139:601–3.

Kapil V, Sobotka PA, Lobo MD, Schmieder RE. Central arteriovenous anastomosis to treat resistant hypertension: current opinion in Nephrology and Hypertension. 2018;27:8–15.

Kunz M, Lauder L, Ewen S, Boehm M, Mahfoud F. The current status of devices for the treatment of resistant hypertension. Am J Hypertens. 2020;33:10–8.

Lobo MD, Sobotka PA, Stanton A, Cockcroft JR, Sulke N, Dolan E, et al. Central arteriovenous anastomosis for the treatment of patients with uncontrolled hypertension (the ROX CONTROL HTN study): a randomised controlled trial. Lancet. 2015;385:1634–41.

van Kleef MEAM, Bates MC, Spiering W. Endovascular Baroreflex amplification for resistant hypertension. Curr Hypertens Rep. 2018;20:46.

Bisognano JD, Bakris G, Nadim MK, Sanchez L, Kroon AA, Schafer J, et al. Baroreflex Activation Therapy lowers blood pressure in patients with resistant hypertension results from the Double-Blind, randomized, placebo-controlled Rheos Pivotal Trial. J Am Coll Cardiol. 2011;58:765–73.

de Jong MR, Hoogerwaard AF, Adiyaman A, Smit JJJ, Heeg J-E, van Hasselt BAAM, et al. Renal nerve stimulation identifies aorticorenal innervation and prevents inadvertent ablation of vagal nerves during renal denervation. Blood Press. 2018;27:271–9.

Huang H-C, Cheng H, Chia Y-C, Li Y, Van Minh H, Siddique S, et al. The role of renal nerve stimulation in percutaneous renal denervation for hypertension: a mini-review. J Clin Hypertens. 2022;24:1187–93.

Mahfoud F, Schlaich M, Kindermann I, Ukena C, Cremers B, Brandt MC, et al. Effect of renal sympathetic denervation on glucose metabolism in patients with resistant hypertension a pilot study. Circulation. 2011;123:1940–6.

Download references

Acknowledgements

We would like to thank the kind support of Guang’anmen Hospital, China Academy of Chinese Medical Sciences.

This work was supported by the special project for scientific research and construction of the national Chinese medicine clinical research base, funded by the State Administration of Traditional Chinese Medicine, China (JDZX2015142).

Author information

Jiaran Li and Xiaohan Zhang contributed equally to this work and share the first authorship.

Authors and Affiliations

Department of Cardiovascular Diseases, Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China

Jiaran Li, Xiaohan Zhang, Yuchen Jiang, Huan Wang, Xiongyi Gao, Yuanhui Hu & Bai Du

You can also search for this author in PubMed   Google Scholar

Contributions

JR and XH analyzed the data and prepared the first draft of the manuscript. YC and H collected and collated data. XY helped in visualization. B and YH conceptualized designed the study and involved in manuscript reviewed. All authors contributed substantively to the manuscript.

Corresponding authors

Correspondence to Yuanhui Hu or Bai Du .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, J., Zhang, X., Jiang, Y. et al. Research status and frontiers of renal denervation for hypertension: a bibliometric analysis from 2004 to 2023. J Health Popul Nutr 43 , 142 (2024). https://doi.org/10.1186/s41043-024-00626-z

Download citation

Received : 24 June 2024

Accepted : 16 August 2024

Published : 09 September 2024

DOI : https://doi.org/10.1186/s41043-024-00626-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Renal denervation
  • Hypertension
  • Knowledge mapping

Journal of Health, Population and Nutrition

ISSN: 2072-1315

data analysis in research plan example

  • Open access
  • Published: 09 September 2024

The causal relationship between CSF metabolites and GBM: a two-sample mendelian randomization analysis

  • Haijun Bao 1 , 2 ,
  • Yiyang Chen 1 ,
  • Zijun Meng 1 &
  • Zheng Chu 1 , 2  

BMC Cancer volume  24 , Article number:  1119 ( 2024 ) Cite this article

Metrics details

Glioblastoma multiforme (GBM) is a highly aggressive primary malignant brain tumor characterized by rapid progression, poor prognosis, and high mortality rates. Understanding the relationship between cerebrospinal fluid (CSF) metabolites and GBM is crucial for identifying potential biomarkers and pathways involved in the pathogenesis of this devastating disease.

In this study, Mendelian randomization (MR) analysis was employed to investigate the causal relationship between 338 CSF metabolites and GBM. The data for metabolites were obtained from a genome-wide association study summary dataset based on 291 individuals, and the GBM data was derived from FinnGen included 91 cases and 174,006 controls of European descent. The Inverse Variance Weighted method was utilized to estimate the causal effects. Supplementary comprehensive assessments of causal effects between CSF metabolites and GBM were conducted using MR-Egger regression, Weighted Median, Simple Mode, and Weighted Mode methods. Additionally, tests for heterogeneity and pleiotropy were performed.

Through MR analysis, a total of 12 identified metabolites and 2 with unknown chemical properties were found to have a causal relationship with GBM. 1-palmitoyl-2-stearoyl-gpc (16:0/18:0), 7-alpha-hydroxy-3-oxo-4-cholestenoate, Alpha-tocopherol, Behenoyl sphingomyelin (d18:1/22:0), Cysteinylglycine, Maleate, Uracil, Valine, X-12,101, X-12,104 and Butyrate (4:0) are associated with an increased risk of GBM. N1-methylinosine, Stachydrine and Succinylcarnitine (c4-dc) are associated with decreased GBM risk.

In conclusion, this study sheds light on the intricate interplay between CSF metabolites and GBM, offering novel perspectives on disease mechanisms and potential treatment avenues. By elucidating the role of CSF metabolites in GBM pathogenesis, this research contributes to the advancement of diagnostic capabilities and targeted therapeutic interventions for this aggressive brain tumor. Further exploration of these findings may lead to improved management strategies and better outcomes for patients with GBM.

Peer Review reports

Introduction

Glioblastoma multiforme (GBM) is one of the most aggressive primary malignant brain tumors, characterized by rapid progression, poor prognosis, and high mortality rates [ 1 ]. According to the 2016 WHO classification, gliomas are divided into four grades: low-grade gliomas (LGG grades I and II) and high-grade gliomas (HGG grades III and IV) [ 2 ]. GBM, the most malignant subtype, accounts for over 50% of all gliomas and more than 15% of primary brain tumors [ 3 ]. Due to its immunosuppressive microenvironment and tendency for recurrence, GBM stands out as one of the most challenging tumors, imposing a significant societal burden.

Cerebrospinal fluid (CSF) envelops the cerebral and spinal regions within the meningeal cavities, playing a crucial role in maintaining brain homeostasis, providing nutrients, and clearing waste. Thanks to the presence of the blood-brain barrier, CSF maintains its distinctive environment, safeguarding the normal functioning of neurons. Due to their direct exposure to GBM, CSF metabolites hold the potential to mirror metabolic alterations linked to tumor initiation, presence, and progression [ 4 ]. GBM, as a highly invasive brain tumor, can disrupt the normal flow and composition of CSF, leading to increased intracranial pressure and hydrocephalus. Additionally, GBM cells can infiltrate the soft meninges and spread along the CSF pathways [ 5 ]. Therefore, CSF analysis, including cytology and molecular spectrum analysis, is pivotal for diagnosing and monitoring GBM, as it may contain tumor cells and biomarkers reflecting the disease state. Understanding the dynamic interplay between GBM and CSF is crucial for enhancing diagnostic and therapeutic strategies against this devastating disease.

Studying CSF presents both challenges and valuable opportunities in understanding various neurological disorders. One of the primary hurdles lies in obtaining CSF samples, typically acquired through invasive procedures like lumbar punctures, posing risks and discomfort to patients. Consequently, most human metabolomics research has focused on more accessible sample types such as blood or urine. Furthermore, CSF composition is dynamic, influenced by factors like age, gender, and disease status, making standardization and interpretation complex [ 6 ]. Despite these challenges, researching CSF holds immense value. CSF serves as a direct window into the central nervous system, reflecting biochemical, cellular, and molecular changes associated with neurological disorders like Alzheimer’s disease, multiple sclerosis, and brain tumors [ 7 , 8 , 9 ]. Analysis of CSF biomarkers provides diagnostic, prognostic, and therapeutic insights, aiding in disease detection, monitoring, and treatment response assessment. Rogachev et al. found that there were partial differences in the correlation of plasma and CSF metabolites between HGG and healthy control groups [ 4 ]. Ji et al. demonstrated that CSF metabolites could be used to predict glioma grade and leptomeningeal metastasis [ 10 ]. So, by overcoming technical barriers and leveraging the rich information provided by CSF, researchers can significantly enhance our understanding and management of neurological diseases, ultimately improving patient outcomes.

In this study, we employed Mendelian randomization (MR) analysis to investigate the causal relationship between CSF metabolites and GBM. MR is a robust epidemiological method that utilizes single nucleotide polymorphisms (SNPs) as instrumental variables (IVs), less susceptible to unmeasured confounders and reverse causality [ 11 ]. This study aims to elucidate potential disease mechanisms, identify novel therapeutic avenues, and provide insights for the diagnosis, treatment, and prevention of GBM.

Materials and methods

Study design.

Our study provides an overview of two sample MR surveys used to explore the causal relationship between CSF metabolites and GBM. In MR studies, adherence to three fundamental instrumental variable (IV) assumptions is crucial: (1) genetic variants must correlate with the exposure; (2) these variants should be independent of confounding factors; and (3) they should exclusively impact the outcome through the exposure [ 11 ]. Strict quality control measures, including tests for multicollinearity and heterogeneity, were conducted to enhance the reliability of causal results.

CSF metabolites data synopsis

The exposure factor we studied is 338 CSF metabolites, based on a metabolome-wide association study by Panyard et al. [ 12 ]. In this study, a total of 291 independent samples of individuals of European descent were retained after quality control and further data cleaning steps. All participants were in a state of cognitive health when CSF was extracted. Among the 338 metabolites, 38 are still chemically undefined, whereas 296 have undergone chemical validation and are categorized into eight primary metabolic groups, encompassing amino acid, carbohydrate, cofactor and vitamin, energy, lipid, nucleotide, peptide, and xenobiotic metabolism.

GBM data synopsis

The outcome variable under scrutiny is GBM, for which the genome-wide association study (GWAS) summary dataset was sourced from FinnGen included 91 cases and 174,006 controls of European descent and comprised 16,380,303 SNPs.

Selection of IVs

SNPs are commonly used genetic variations in MR Analysis, reflecting DNA sequence diversity resulting from a single nucleotide change at the genomic level. In our study, we meticulously constructed a unique set of IVs for each of the 338 CSF metabolites. This approach was taken to ensure that the genetic associations specific to each metabolite were accurately represented. Firstly, considering the limited number of SNPs available to achieve genome-wide significance for metabolites, we relaxed the threshold by testing with a p-value set to less than 1 × 10^ −5 [ 13 ]. Then, linkage disequilibrium criteria were set at r 2  < 0.001 and a genetic distance of 10,000 kb, with highly correlated SNPs excluded to ensure the independence of included SNPs. Furthermore, to mitigate bias resulting from weak instrumentality, the F-value of each SNP was computed, and SNPs with an F-value < 10 were identified as weak instruments [ 14 ]. F = R 2 ( N  − 2)/(1 − R2), in which R 2 and N denotes the proportion of variance explained by the chosen SNPs and the sample size of the GWAS for the SNP respectively. What’s more, we utilized the IEU OpenGWAS database to meticulously identify and exclude any SNPs that showed significant associations with potential confounders. Finally, project the SNPs related to CSF metabolites onto the GWAS merged data of GBM and extract the corresponding statistical parameters.

Statistical analysis

In this study, we employed the Inverse Variance Weighted (IVW), MR-Egger, and Weighted Median, Weighted mode, and Simple mode methods to estimate dependent effects. Given the IVW method’s superior test efficacy compared to other MR methods, we selected it as the primary method. The IVW method assumes the validity of all genetic variants as instrumental variables. It calculates causal effect values for individual instrumental variables using the ratio method, then combines each estimate through weighted linear regression to derive the total effect value. Nonetheless, there might be unidentified confounding variables leading to genetic pleiotropy and bias in effect size estimation. Therefore, MR-Egger regression and weighted median methods were employed as supplementary approaches to corroborate the causal impact of exposure on the outcome. The “leave-one-out” method was employed to identify any instrumental variables that could influence the estimation of causal effects. Horizontal pleiotropy was assessed using the Egger intercept, and the Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) test was performed to identify and exclude SNPs that may be influenced by pleiotropy. Heterogeneity tests were conducted using the Cochran Q test to evaluate the diversity of the SNPs. All analyses were performed using the TwoSampleMR R packages in R software 4.4.1.

After investigating the quality control of IVs, we identified 14 CSF metabolites associated with GBM, including 21–196 SNPs (with Stachydrine genetically represented by 21 SNPs and Butyrate (4:0) having the highest representation with 196 SNPs). The F-statistic data range for CSF metabolites is from 19.37 to 78.49, all exceeding a threshold greater than 10, indicating they are less likely to be influenced by instrument biases (Supplementary Table S1 ). Among the 14 CSF metabolites associated with GBM, there are 12 identified metabolites and 2 with unknown chemical properties. The identified metabolites chemically classify into lipids, vitamins, amino acids, nucleotides, energy and xenobiotics (Supplementary Table S2 ).

The causal relationship between the risk of GBM and 14 CSF metabolites is as follows (Fig.  1 ): 1-palmitoyl-2-stearoyl-gpc (16:0/18:0) (OR 1.661, 95% CI: 1.039–2.654, p  = 0.034), 7-alpha-hydroxy-3-oxo-4-cholestenoate (7-HOCA) (OR 4.272, 95% CI: 1.335–13.673, p  = 0.014), Alpha-tocopherol (12:0) (OR 1.693, 95% CI: 1.037–2.764, p  = 0.035), Behenoyl sphingomyelin (d18:1/22:0) (OR 1.515, 95% CI: 1.120–2.051, p  = 0.007), Cysteinylglycine (OR 1.361, 95% CI: 1.072–1.726, p  = 0.011), Maleate (OR 1.517, 95% CI: 1.077–2.137, p  = 0.017), N1-methylinosine (OR 0.467, 95% CI: 0.232–0.941, p  = 0.033), Stachydrine (OR 0.738, 95% CI: 0.578–0.942, p  = 0.015), Succinylcarnitine (c4-dc) (OR 0.328, 95% CI: 0.110–0.978, p  = 0.046), Uracil (OR 5.053, 95% CI: 1.044–24.449, p  = 0.044), Valine (OR 5.284, 95% CI: 1.170–23.868, p  = 0.030), X-12,101 (OR 2.219, 95% CI: 1.148–4.290, p  = 0.018), X-12,104 (OR 2.943, 95% CI: 1.023–8.463, p  = 0.045), Butyrate (4:0) (OR 1.422, 95% CI: 1.063–1.904, p  = 0.018).

figure 1

Forest plot of mendelian randomization estimates between CSF metabolites and GBM

In brief, the IVW-derived estimates demonstrated significance ( p  < 0.05), with a consistent direction and magnitude observed across IVW, MR-Egger, Weighted mode, Weighted median and Simple mode estimates (Supplementary Table S3 ). The conclusions of the other four methods remain largely consistent with those of the IWV method. Both the MR-Egger intercept test and the Cochran Q test strongly support the absence of pleiotropy and heterogeneity, except the Egger intercept of Behenoyl sphingomyelin (d18:1/22:0) showing pleiotropy (Table  1 ). Scatter plots for 14 identified CSF metabolites across various tests are displayed in Fig.  2 . The leave-one-out analysis confirmed that excluding any single SNP did not introduce bias into the MR estimation (Supplementary Figure S1 ). The funnel plots are presented in Supplementary Figure S2 .

figure 2

Scatter plots of 14 CSF metabolites related to the risk of GBM

Our study delved into the intricate relationship between CSF metabolites and GBM, shedding light on potential biomarkers and pathways implicated in the pathogenesis of this aggressive brain tumor. By investigating the relationship between CSF metabolites and GBM risk through MR analysis, we identified 14 CSF metabolites significantly associated with the risk of GBM. These metabolites encompass a variety of biochemical classes, including lipids, vitamins, amino acids, nucleotides, energy and xenobiotics, underscoring the multifactorial nature of GBM development and progression. Understanding the dynamic interplay between CSF metabolites and GBM not only enhances our diagnostic capabilities but also holds promise for the development of targeted therapeutic interventions aimed at disrupting key metabolic pathways implicated in GBM tumorigenesis and progression.

Lipids, including sterols, phospholipids, and diacyl/triacylglycerols, are components of biofilms, which emerge as pivotal players in the landscape of GBM research. Lipids serve not only as structural components of cell membranes but also as signaling molecules involved in various cellular processes, including proliferation, migration, and apoptosis. The dysregulation of lipid metabolism has been implicated in tumor initiation, progression, and therapy resistance, making lipid metabolites attractive candidates for biomarker discovery and therapeutic targeting in GBM [ 15 ]. So far, there have been no literature reports on the roles of 1-palmitoyl-2-steroyl-gpc (16:0/18:0) and Behenoyl sphingomyelin (d18:1/22:0) in cancer. We speculate that these two phospholipid metabolites may be associated with enhanced lipid synthesis and metabolism typically exhibited by cancer cells. Not only do they provide the structural support required for cancer cell proliferation, but they also participate in regulating cell signaling pathways and epigenetic events, promoting the survival and metastasis ability of cancer cells. Halama et al. have discovered that in colon and ovarian cancer, endothelial cells induce metabolic reprogramming by promoting the overexpression of glycerophospholipids and polyunsaturated fatty acids in cancer cells [ 16 ].

In our study, we are pioneering the association between 7-HOCA and GBM for the first time. 7-HOCA is the primary metabolite of oxysterol 27-hydroxycholesterol in the brain, and the increase of 7-HOCA in CSF has been shown to reflect blood-brain barrier damage [ 17 , 18 ]. Feng et al. found that 7-HOCA is closely related to the occurrence of lung cancer, showing a significantly increased risk [ 19 ]. Combining our research findings, 7-HOCA may play a significant role in the development of cancer. Butyrate (4:0) is the salt form of butyric acid, which stands as a significant byproduct in fatty acid metabolism. In vivo, butyric acid can be generated through various pathways, including its production from the metabolism of fatty acids or its fermentation from dietary fiber by intestinal microbiota. Once produced, butyric acid actively participates in the regulation and modulation of lipid metabolism, influencing the synthesis, breakdown, and utilization of fats in the body. Tumor cells can obtain fatty acids through lipolysis to support their growth. During periods of adequate nutritional supply, fatty acids are stored in adipose tissue as triglycerides (TG), which are broken down to release fatty acids when energy levels are low. Recent studies have revealed the significant role of butyrate salts in the occurrence and progression of colorectal and lung cancers [ 20 ]. However, this is the first report on the relationship between butyrate salts and GBM. We are also presenting, for the first time, the causal relationship between maleate in CSF and GBM. Maleate is a salt form of malic acid, present in the human body as a metabolite or dietary source. Further research is needed to understand its involvement in the onset and progression of GBM.

The importance of amino acid metabolism in GBM lies in its role as the building blocks of proteins. Amino acids not only provide essential materials for cell growth and proliferation but also participate in regulating cellular signaling, gene expression, and metabolic pathways. GBM cells typically exhibit high metabolic activity, leading to an increased demand for amino acids. Therefore, abnormalities in amino acid supply and metabolism may influence the development and progression of glioblastoma. In our study, valine and cysteinylglycine was found to have a positive relationship with GBM. Similar to our results, Yao et al. found that the valine, leucine, and isoleucine biosynthesis metabolic pathway can be used for screening and diagnosis of ovarian cancer [ 21 ]. Valine, leucine and isoleucine constitute the group of branched-chain amino acids (BCAAs), which can be utilized and processed by astrocytes to perform various functions, including serving as fuel materials for brain energy metabolism [ 22 ]. The diagnostic role of valine metabolites in cancer has also been found in saliva. In another study, valine was found to play a dominant role in diagnosing multiple cancers through saliva [ 23 ]. Cysteinylglycine, produced during the breakdown of glutathione, has been proposed as a prooxidant, contributing to oxidative stress and lipid peroxidation, which are implicated in the development of human cancers. Consistent with our results, Lin et al. found that the increase of cysteinylglycine level was associated with the increased risk of breast cancer in the high oxidative stress group of middle-aged and elderly women [ 24 ]. However, there are also studies suggesting the opposite perspective. For instance, Miranti et al. found a negative correlation between serum cysteinylglycine and esophageal adenocarcinoma [ 25 ]. For understanding and intervening in the metabolic characteristics and therapeutic mechanisms of GBM, in-depth investigation into the role of amino acid metabolism is crucial.

As the fundamental substances that ensure the structure of DNA, the metabolism of nucleotides is closely intertwined with the initiation and progression of tumors. As shown in this study, uracil may act as a metabolite and participant in the onset and progression of tumors. UPP1, as a critical enzyme in uracil metabolism, catalyzes the dephosphorylation of uridine to uracil and ribose-1-phosphate. Its activation of the AKT signaling pathway can enhance tumorigenesis and resistance to anticancer drugs [ 26 ]. N1-methylinosine is a known modification of ribonucleosides, formed by adding a methyl group to the N1 position of the purine ring of a nucleotide. This methylation modification typically occurs within RNA molecules. In biological organisms, methylation modifications of ribonucleosides are common epigenetic regulatory mechanisms that can influence RNA stability, translation, and interactions. Studies by Li et al. detected N1-methylinosine in urine samples from cancer patients, suggesting its potential as a biomarker for cancer [ 27 ]. The presence of N1-methylinosine may indicate abnormalities in ribonucleoside metabolism within cells, which could be associated with the development and progression of cancer.

In our study, except N1-methylinosine, another two metabolites that showed a negative causal relationship with GBM were Stachydrine and Succinylcarnitine (c4-dc). Stachydrine is renowned for its antioxidant and anti-inflammatory properties [ 28 ]. Consistent with our results, some studies have demonstrated that Stachydrine can inhibit the proliferation of breast cancer and liver cancer cells, while inducing apoptosis, autophagy, and cellular senescence. Its mechanisms of action may include inhibiting survival signaling pathways such as Akt and ERK, as well as regulating the expression of cell cycle proteins [ 29 , 30 ]. Liu et al. found that Stachydrine may prevent liver damage by attenuating inflammatory responses, inhibiting the ERK and AKT pathways, and suppressing the expression of macrophage stimulating protein [ 31 ]. Succinylcarnitine is an intermediate in the tricarboxylic acid (TCA) cycle and amino acid-based energy metabolism, responsible for fatty acid β-oxidation and mitochondrial function, serving as an energy-related metabolic product [ 32 ]. An MR study identified maternal succinylcarnitine during pregnancy as a significant factor contributing to the risk of congenital heart disease in offspring [ 33 ]. Kim et al. found that the decrease in plasma succinylcarnitine levels was associated with the inhibition of prostate tumor growth by a whole walnut diet, possibly achieved through influencing cellular energy metabolism status and regulating relevant gene expression [ 34 ]. However, further research is needed to elucidate the specific mechanisms of these two CSF metabolites.

Several limitations exist within our study. the Pleiotropy test result of Behenoyl sphingomyelin (d18:1/22:0) was less than 0.05, indicating the presence of pleiotropy. However, further SNP screening was performed using MR-PRESSO, but the results showed no SNPs that needed to be removed. There are many possible reasons for this result, such as sample size, insufficient statistical power, complexity of pleiotropy, and methodological limitations. Therefore, it is emphasized that readers need to be cautious when reviewing the results, which does not mean that there is no pleiotropy, but may be due to the influence of the above factors. Additionally, our study may have another important limitation, which is the lack of research on the potential correlation of metabolites at SNP and phenotype levels. This oversight may have implications for the interpretation of our findings, as unaccounted correlations could influence the results of the Mendelian Randomization analysis.

These findings from our study open several avenues for future research. It is essential to delve into the mechanistic roles of the identified CSF metabolites in GBM biology, exploring their interactions with cellular pathways and the tumor microenvironment. Additionally, longitudinal studies are needed to understand the dynamics of these metabolites over the course of the disease and in response to treatments. Investigating the generalizability of our findings across diverse populations and the integration of metabolomics data with other omics platforms will be critical for identifying robust biomarker panels. Furthermore, the potential of these metabolites as therapeutic targets should be rigorously tested in preclinical models to advance towards novel treatment strategies for GBM.

In conclusion, our study provides valuable insights into the intricate relationship between CSF metabolites and GBM, shedding light on potential biomarkers and pathways involved in the pathogenesis of this aggressive brain tumor. Through MR analysis, we identified 14 CSF metabolites significantly associated with GBM risk, spanning various biochemical classes such as lipids, vitamins, amino acids, nucleotides, energy, and xenobiotics. This underscores the multifactorial nature of GBM development and progression. Our findings highlight the importance of understanding the dynamic interplay between CSF metabolites and GBM, not only for enhancing diagnostic capabilities but also for potentially developing targeted therapeutic interventions aimed at disrupting key metabolic pathways implicated in GBM tumorigenesis and progression.

Data availability

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Abbreviations

7-alpha-hydroxy-3-oxo-4-cholestenoate

  • Cerebrospinal fluid

Glioblastoma multiforme

Genome-wide association study

Instrumental variables

  • Mendelian randomization

Single nucleotide polymorphisms

Ilkhanizadeh S, et al. Glial progenitors as targets for transformation in glioma. Adv Cancer Res. 2014;121:1–65.

Article   PubMed   Google Scholar  

Louis DN, et al. The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta Neuropathol. 2016;131(6):803–20.

Topalian SL, et al. Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N Engl J Med. 2012;366(26):2443–54.

Article   PubMed   PubMed Central   Google Scholar  

Rogachev AD et al. Correlation of metabolic profiles of plasma and Cerebrospinal Fluid of High-Grade Glioma patients. Metabolites, 2021. 11(3).

Lah TT, Novak M, Breznik B. Brain malignancies: Glioblastoma and brain metastases. Semin Cancer Biol. 2020;60:262–73.

Tashjian RS, Vinters HV, Yong WH. Biobanking of Cerebrospinal Fluid. Methods Mol Biol. 2019;1897:107–14.

Simrén J, et al. Fluid biomarkers in Alzheimer’s disease. Adv Clin Chem. 2023;112:249–81.

Deisenhammer F, et al. The cerebrospinal fluid in multiple sclerosis. Front Immunol. 2019;10:726.

Kopková A, et al. MicroRNAs in cerebrospinal fluid as biomarkers in Brain Tumor patients. Klin Onkol. 2019;32(3):181–6.

Im JH, et al. Comparative cerebrospinal fluid metabolites profiling in glioma patients to predict malignant transformation and leptomeningeal metastasis with a potential for preventive personalized medicine. Epma j. 2020;11(3):469–84.

Sekula P, et al. Mendelian randomization as an Approach to assess causality using Observational Data. J Am Soc Nephrol. 2016;27(11):3253–65.

Panyard DJ, et al. Cerebrospinal fluid metabolomics identifies 19 brain-related phenotype associations. Commun Biol. 2021;4(1):63.

Yang J, et al. Assessing the Causal effects of human serum metabolites on 5 Major Psychiatric disorders. Schizophr Bull. 2020;46(4):804–13.

Flatby HM, et al. Circulating levels of micronutrients and risk of infections: a mendelian randomization study. BMC Med. 2023;21(1):84.

Maan M, et al. Lipid metabolism and lipophagy in cancer. Biochem Biophys Res Commun. 2018;504(3):582–9.

Halama A, et al. Nesting of colon and ovarian cancer cells in the endothelial niche is associated with alterations in glycan and lipid metabolism. Sci Rep. 2017;7:39999.

Heverin M, et al. Crossing the barrier: net flux of 27-hydroxycholesterol into the human brain. J Lipid Res. 2005;46(5):1047–52.

Saeed A, et al. 7α-hydroxy-3-oxo-4-cholestenoic acid in cerebrospinal fluid reflects the integrity of the blood-brain barrier. J Lipid Res. 2014;55(2):313–8.

Feng Y, et al. Causal effects of genetically determined metabolites on cancers included lung, breast, ovarian cancer, and glioma: a mendelian randomization study. Transl Lung Cancer Res. 2022;11(7):1302–14.

Karim MR, et al. Butyrate’s (a short-chain fatty acid) microbial synthesis, absorption, and preventive roles against colorectal and lung cancer. Arch Microbiol. 2024;206(4):137.

Yao JZ et al. Diagnostics Ovarian cancer via Metabolite Anal Mach Learn Integr Biol (Camb), 2023. 15.

Murín R, et al. Glial metabolism of valine. Neurochem Res. 2009;34(7):1195–203.

Bel’skaya LV, Sarf EA, Loginova AI. Diagn Value Salivary Amino Acid Levels Cancer Metabolites, 2023. 13(8).

Lin J, et al. Plasma cysteinylglycine levels and breast cancer risk in women. Cancer Res. 2007;67(23):11123–7.

Miranti EH, et al. Prospective study of serum cysteine and cysteinylglycine and cancer of the head and neck, esophagus, and stomach in a cohort of male smokers. Am J Clin Nutr. 2016;104(3):686–93.

Du W, et al. UPP1 enhances bladder cancer progression and gemcitabine resistance through AKT. Int J Biol Sci. 2024;20(4):1389–409.

Li HY, et al. Separation and identification of purine nucleosides in the urine of patients with malignant cancer by reverse phase liquid chromatography/electrospray tandem mass spectrometry. J Mass Spectrom. 2009;44(5):641–51.

Zhao L, et al. Stachydrine ameliorates isoproterenol-induced cardiac hypertrophy and fibrosis by suppressing inflammation and oxidative stress through inhibiting NF-κB and JAK/STAT signaling pathways in rats. Int Immunopharmacol. 2017;48:102–9.

Bao X, et al. Stachydrine hydrochloride inhibits hepatocellular carcinoma progression via LIF/AMPK axis. Phytomedicine. 2022;100:154066.

Wang M, et al. Stachydrine hydrochloride inhibits proliferation and induces apoptosis of breast cancer cells via inhibition of akt and ERK pathways. Am J Transl Res. 2017;9(4):1834–44.

PubMed   PubMed Central   Google Scholar  

Liu FC et al. The modulation of Phospho-Extracellular Signal-regulated kinase and phospho-protein kinase B Signaling pathways plus Activity of Macrophage-stimulating protein contribute to the Protective Effect of Stachydrine on Acetaminophen-Induced Liver Injury. Int J Mol Sci, 2024. 25(3).

Mai M, et al. Serum levels of acylcarnitines are altered in prediabetic conditions. PLoS ONE. 2013;8(12):e82459.

Taylor K et al. The relationship of maternal gestational Mass spectrometry-derived metabolites with offspring congenital heart disease: results from multivariable and mendelian randomization analyses. J Cardiovasc Dev Dis, 2022. 9(8).

Kim H, Yokoyama W, Davis PA. TRAMP prostate tumor growth is slowed by walnut diets through altered IGF-1 levels, energy pathways, and cholesterol metabolism. J Med Food. 2014;17(12):1281–6.

Download references

Acknowledgements

The authors thank all the participants and researchers who contributed and collected data.

This research was supported by Research Foundation for Talented Scholars of Xuzhou Medical University (D2021011).

Author information

Authors and affiliations.

Department of Forensic Medicine, First College for Clinical Medicine, Xuzhou Medical University, 84 West Huaihai Rd, Xuzhou, Jiangsu, 221000, China

Haijun Bao, Yiyang Chen, Zijun Meng & Zheng Chu

Jiangsu Medical Engineering Research Center of Gene Detection, Xuzhou, Jiangsu, China

Haijun Bao & Zheng Chu

You can also search for this author in PubMed   Google Scholar

Contributions

HjB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Projection administration; Software, Validation, Writing – original draft. YyC: Formal analysis, Investigation, Writing – original draft. ZjM: Conceptualization, Data curation, Investigation, Writing – review & editing. ZC: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Projection administration; Software, Validation, Writing – review & editing.

Corresponding author

Correspondence to Zheng Chu .

Ethics declarations

Ethical approval.

All data analyzed in this study were sourced from publicly available databases. Ethical approval was secured for each cohort, and informed consent was obtained from all participants prior to their involvement. Written informed consent was provided by the patients/participants for their participation in this study.

Consent for publication

Not Applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, supplementary material 5, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bao, H., Chen, Y., Meng, Z. et al. The causal relationship between CSF metabolites and GBM: a two-sample mendelian randomization analysis. BMC Cancer 24 , 1119 (2024). https://doi.org/10.1186/s12885-024-12901-7

Download citation

Received : 25 March 2024

Accepted : 04 September 2024

Published : 09 September 2024

DOI : https://doi.org/10.1186/s12885-024-12901-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

ISSN: 1471-2407

data analysis in research plan example

IMAGES

  1. Data Analysis

    data analysis in research plan example

  2. CHOOSING A QUALITATIVE DATA ANALYSIS (QDA) PLAN

    data analysis in research plan example

  3. Data Analysis Plan for Quantitative Research Analysis

    data analysis in research plan example

  4. FREE 10+ Sample Data Analysis Templates in PDF

    data analysis in research plan example

  5. FREE 10+ Sample Data Analysis Templates in PDF

    data analysis in research plan example

  6. FREE 7+ Data Analysis Samples in Excel

    data analysis in research plan example

VIDEO

  1. Data Analysis: Research Writing And Data Analysis In The AI Era. Day 3. Part 2b

  2. Data Analysis in Research

  3. 6. Data Analysis

  4. DATA ANALYSIS

  5. What is Qualitative Research

  6. Choosing a Data Analysis Research Topic

COMMENTS

  1. How to Create a Data Analysis Plan: A Detailed Guide

    How to Create a Data Analysis Plan: A Detailed Guide

  2. Data Analysis Plan: Examples & Templates

    A data analysis plan is a roadmap for how you're going to organize and analyze your survey data—and it should help you achieve three objectives that relate to the goal you set before you started your survey: Answer your top research questions. Use more specific survey questions to understand those answers. Segment survey respondents to ...

  3. PDF Developing a Quantitative Data Analysis Plan

    Developing a Quantitative Data Analysis Plan

  4. Data Analysis in Research: Types & Methods

    Data Analysis in Research: Types & Methods

  5. Creating a Data Analysis Plan: What to Consider When Choosing

    The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data. ... As the title implies, this book covers a wide range of statistics used in medical research and provides numerous examples ...

  6. A practical guide to data analysis in general literature reviews

    A practical guide to data analysis in general literature reviews

  7. Data Analysis Plan: Examples & Templates

    A data analysis plan is a roadmap for how you can organise and analyse your survey data. Learn how to write an effective survey data analysis plan today. ... Doing this will help you know which survey questions to refer to for specific research topics. For example, to find out which parts of the conference attendees liked the best, look at the ...

  8. Data Analysis Plan: Ultimate Guide and Examples

    Data Analysis Plan: Ultimate Guide and Examples

  9. Statistical Analysis Plan: What is it & How to Write One

    A statistical analysis plan (SAP) is a document that specifies the statistical analysis that will be performed on a given dataset. It serves as a comprehensive guide for the analysis, presenting a clear and organized approach to data analysis that ensures the reliability and validity of the results. SAPs are most widely used in research, data ...

  10. PDF Chapter 22 Writing the Data Analysis Plan

    model), or sometimes your plan will combine the new with the old. When designing your plan, you may not perceive that your major strengths are in the area of data analysis. For example, you may not have had a great deal of coursework or experience (if any) with some of the methods that you now propose to use for your grant application.

  11. PDF Creating an Analysis Plan

    Creating an Analysis Plan

  12. Qualitative Data Analysis Methods: Top 6 + Examples

    Qualitative Data Analysis Methods: Top 6 + Examples

  13. What Is Data Analysis? (With Examples)

    What Is Data Analysis? (With Examples)

  14. The Beginner's Guide to Statistical Analysis

    The Beginner's Guide to Statistical Analysis | 5 Steps & ...

  15. Analysis plan

    Concrete research questions are essential for determining the analyses required. The analysis plan should then describe the primary and secondary outcomes, the determinants and data needed, and which statistical techniques are to be used to analyse the data. The following issues need to be considered in this process and described where ...

  16. PDF Creating a Data Analysis Plan: What to Consider When Choosing

    describing the data through to testing our hypotheses. The purpose of this article is to help you create a data analysis plan for a quantitative study. For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies.2,3 Information in ...

  17. (PDF) Guide to the Statistical Analysis Plan

    An analysis plan is a description of the steps of the analyses that will be used to understand study objectives (Yuan et al., 2019). The analysis plan is a part of the collaborative process ...

  18. A Really Simple Guide to Quantitative Data Analysis

    A Really Simple Guide to Quantitative Data Analysis

  19. Data Analysis Plan Template

    Process Street. Define Research Objectives Clearly outline the specific goals and objectives of the data analysis plan. Identify what you want to achieve and how the results will contribute to the overall research. Consider the impact of these objectives on decision-making processes and future actions. Specify the metrics or indicators you will ...

  20. What is data analysis? Examples and how to start

    What is data analysis? Examples and how to get started

  21. Data Analysis Plan Templates

    Statistics Solutions provides a data analysis plan template based on your selected analysis. These templates are available from within Intellectus Statistics (see video). You can use these templates to develop the data analysis section of your dissertation or research proposal. If you do not know your analysis, you can figure it out using the ...

  22. 2.3 Data management and analysis

    The data analysis plan flows from the research question, is integral to the study design, and should be well conceptualized prior to beginning data collection. In this section, we will walk through the basics of quantitative and qualitative data analysis to help you understand the fundamentals of creating a data analysis plan.

  23. Data Analysis Plan

    Data Analysis Plan Definition. A data analysis plan is a roadmap that tells you the process on how to properly analyze and organize a particular data. It starts with the three main objectives. First, you have to answer your researched questions. Second, you should use questions that are more specific so that it can easily be understood.

  24. Research status and frontiers of renal denervation for hypertension: a

    Renal Denervation (RDN) is a novel non-pharmacological technique to treat hypertension. This technique lowers blood pressure by blocking the sympathetic nerve fibers around the renal artery, then causing a decrease in system sympathetic nerve excitability. This study aimed to visualize and analyze research hotspots and development trends in the field of RDN for hypertension through ...

  25. The causal relationship between CSF metabolites and GBM: a two-sample

    Study design. Our study provides an overview of two sample MR surveys used to explore the causal relationship between CSF metabolites and GBM. In MR studies, adherence to three fundamental instrumental variable (IV) assumptions is crucial: (1) genetic variants must correlate with the exposure; (2) these variants should be independent of confounding factors; and (3) they should exclusively ...