# Tutorial 4: Creating a radar plot for individual characters using custom word lists in a Dutch fairytale corpus ![StoryNavigator Logo](../../doc/widgets/images/storynavigator_logo_small.png) --- This tutorial is part of a series demonstrating the use of StoryNavigator widgets. These tutorials show how to use StoryNavigator widgets with other pre-existing widgets available within the Orange platform, and how to generate output via tables or figures. Each tutorial addresses a research question related to the narrative structure and contents of the corpus of stories. --- ### Step 0: Research question This tutorial will guide you through the process of using Orange and StoryNavigator widgets to create a radar plot for individual characters in a story, based on predefined word categories (e.g., Halliday's dimensions). The approach is adapted from the workflow presented in Andrade & Andersen (2020). we focus on the question: - How do different categories of verbs, as categorized by Halliday's functional dimensions, distribute for a particular narrator or person in a story? We use the following workflow: ![Workflow](../../doc/widgets/images/radarplot_individual_figure.png) This workflow can be downloaded [here](https://github.com/navigating-stories/orange-story-navigator/tree/master/doc/widgets/workflows), and it uses a dataset of Dutch fairytales which can be found [here](https://github.com/navigating-stories/orange-story-navigator/tree/master/doc/widgets/fairytales/). ### Step 1: Load the Corpus - Task: Import the corpus of Dutch fairytales in tabular format using the Import Documents widget. - Outcome: You will be able to inspect the stories for visual validation. - Widget: Import Documents - Hint: Ensure the stories are properly loaded, with each document represented in the corpus. ### Step 2: Load the Custom Word List - Task: Load the predefined custom word list inspired by Halliday's functional categories using the File widget. The file `dutch_halliday_action_list.csv` can be found [here](../../orangecontrib/storynavigation/resources). - Outcome: The word list will be loaded and prepared for merging with the fairytales dataset. - Widget: File → Data Table (1) ### Step 3: Extract Subject-Verb Combinations Task: Use the **Narrative Network widget** to extract *subject-verb combinations* from the Dutch fairytales. This will enable us to focus on the verbs used by specific narrators (e.g., “ik”). - Outcome: Subject-verb combinations will be prepared for further analysis. - Widget: Narrative Network → Python Script (1) - Hint: Clean the subject-verb combinations using a Python script, removing any unnecessary extensions related to _subj or _obj using the **python script widget**. Insert the following script in the corresponding editor: ```python from Orange.data import Table # Assuming 'in_data' is your Orange Table object # Extract the column indices for 'subject' and 'object' subject_idx = in_data.domain.index('subject') object_idx = in_data.domain.index('object') # Loop over the rows and replace the substrings for 'subject' and 'object' columns for row in in_data: if row[subject_idx] is not None: row[subject_idx] = row[subject_idx].value.replace('_subj', '') # Remove '_subj' from 'subject' if row[object_idx] is not None: row[object_idx] = row[object_idx].value.replace('_obj', '') # Remove '_obj' from 'object' # Output modified data out_data = in_data ``` The subject-verb-object combinations for each *story_id* look like: ![Radar Plot](../../doc/widgets/images/SVO.png) ### Step 4: Select Rows with a specific character - Task: Use the Select Rows widget to filter for rows where the subject is the narrator (e.g.,"ik" or "hij"). - Outcome: After filtering, only the character’s actions (verbs) will remain which will be used to classify according to the Halliday's dimensions. - Widget: Select Rows → Data Table (3) ![Radar Plot](../../doc/widgets/images/rows_character.png) ### Step 5: Merge the Custom Word List with Story Elements - Task: Merge the Halliday categorization with the subject-verb list extracted from the fairytales using the **Merge Data widget**. - Outcome: You will have a merged dataset containing the subject, verb, and their corresponding Halliday categories. - Take care to ensure the merge is done correctly. In this case, the verbs related to the selected narrator will determine the merge, that is: the data from the Halliday categories are *added* to the subject-verb connections. The merged dataset is equal in size to the subject-verb connections dataset. ### Step 6: Remove Irrelevant Columns and Inspect Data - Task: Use the **Select Columns widget** to remove any unnecessary columns for clarity and further analysis. - Outcome: A clean dataset with only relevant columns (e.g., subject, verb, Halliday category). - Widget: Select Columns → Data Table (7) ### Step 7: Calculate Frequency of Categories - Task: Group the data by the Halliday categories using the **Group By widget** to calculate the frequency of each category (i.e., by counting how often a category appears in the data). - Outcome: A summary of category frequencies based on the actions of the selected narrator. - Widget: Group By → Data Table (6) - Hint: Ensure the grouping is by a Halliday category and that the frequency variable is aggregated using *sum* or *count*. ### Step 8: Prepare Data for Plotting - Task: Use the **Edit Domain widget* to adjust the variable type of the Halliday category for better plotting. - Outcome: The categories are formatted for optimal display in the radar plot. ### Step 9: Create a Radar Plot - Task: Use the **Python Script widget** to generate a radar plot visualizing the distribution of verb categories for the selected narrator. - Outcome: A radar plot showing the distribution of verb categories used by the narrator will be generated. - Use the script below to generate the radar plot in the **Python Script widget**: ```python import matplotlib.pyplot as plt import numpy as np import pandas as pd out_data = in_data out_learner = None out_classifier = None out_object = None if in_data is not None: try: # Extract 'category' and 'freq-sum' columns from the input data categories = in_data.get_column(in_data.domain["process"]) freq_sum = in_data.get_column(in_data.domain["process - Count"]) # Combine categories with same label df = pd.DataFrame({'categories': categories, 'freq_sum': freq_sum}) df_grouped = df.groupby(['categories']).sum() categories = list(df_grouped.index) freq_sum = df_grouped['freq_sum'] # Convert categories to string categories = [str(cat) for cat in categories] # Number of categories N = len(categories) # Compute the angles for the polar bars angles = np.linspace(0, 2 * np.pi, N, endpoint=False) # Initialize polar bar plot fig = plt.figure(figsize=(8,8)) ax = fig.add_subplot(111, polar=True) # Plot bars on the polar plot bars = ax.bar(angles, freq_sum, width=0.4, color='skyblue', alpha=0.7) # Set the category names as the x-ticks ax.set_xticks(angles) ax.set_xticklabels(categories) # Optional: Customize y-axis labels or grid ax.yaxis.grid(True) # Display the polar bar plot plt.show() except KeyError as e: print(f"Error: Column '{e}' not found in the data. Check the column names.") ``` This produces something like: ![Radar Plot](../../doc/widgets/images/radarplot_character.png) ### Conclusion By following these steps, you will be able to create a radar plot based on Halliday's word categorization, showing how a specific person in a story uses different verb types across a story. This analysis enables a deeper understanding of the person's linguistic choices, visualized clearly in the radar plot. Finally, this allows to compare different characters in the story based on their verb usage (cf. Andrade & Andersen, 2020). ### References - Andrade, S.B. and Andersen, D. (2020). Digital story grammar: a quantitative methodology for narrative analysis. International Journal of Social Research Methodology, 23(4), 405-421. - Halliday, M. A. K. (2004). An Introduction to Functional Grammar. Routledge.