SEASR in Action: Data Analytics for Humanities Scholars
By Loretta Auvil and Boris Capitanu from NCSA
Contact Us: lauvil@illinois.edu or capitanu@ncsa.uiuc.edu
This course focuses on introducing participants to The Software Environment for the Advancement of Scholarly Research, SEASR, providing humanities, arts, and social science communities a transformational cyberinfrastructure technology.
| Mon June 08 | Tues June 09 | Wed June 10 | Thu June 11 | Fri June 12 |
|---|---|---|---|---|
SEASR Overview
|
Meandre Workbench
|
SEASR Analytics for Zotero
|
Installation and Development Tools
|
Future
|
Text Analytics
|
SEASR Applications
|
Creating Zotero Flows
|
Deployment of Flows
|
Session Title: SEASR Overview
Importance of the Topic
Many projects are making available unprecedented volumes of information; they are also creating a heterogeneous spectrum of disruptive tools providing researchers with new and unforeseen ways of interacting with that information. This effort is to support the development of a state-of-the-art software environment for data management and analysis of digital libraries, repositories and archives, as well as educational platforms that are expected to contribute to many of the humanities. We will provide an overview and motivation for the SEASR project with example applications. This will include a brief overview of the technologies developed.
Focus of the Topic
Upon completion of this session, participants will understand:
- What SEASR is
- What SEASR can do
- Example applications that leverage SEASR
- How to execute SEASR flows from the Community Hub
Format of the Session
- Introduction of Attendees
- Attendee Expectation
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Attendee Expectation
- Explore tool usage during learning exercises
- Participate in discussion
- Develop a plan for how SEASR could benefit your work
- Present and discuss plan on Friday
Presentation
- Link to these Course Notes online can be found at http://dev-tools.seasr.org/confluence/display/Outreach/DHSI-SEASR

- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

Demonstration
- Community Hub
- Keyword Cloud Functionality
- Tag Cloud Viewer
- Keyword Cloud Functionality
Learning Exercises
- Explore Community Hub's Keyword Cloud Functionality
- Open browser and go to http://seasr.org

- Click on "View Projects"
- Click on "Keyword Cloud"
- Click on "visualization" to see all the existing applications that have a tag of "visualization"
- Click on "cluster" to see all the existing applications that have a tag of "visualization" and "cluster"
- Click on the delete button to remove "cluster" from the selection
- Click on the "Tag Cloud Viewer" for more detail information about this application
- Open browser and go to http://seasr.org
- Perform analysis using "Tag Cloud Viewer" on a hard coded web page
- Open browser and go to http://seasr.org/documentation/example-flows/tag-cloud-viewer/

- Click on the "Execute" button to launch the creation of a tag cloud view for "Emma" by Jane Austen retrieved from Project Gutenberg
- Open browser and go to http://seasr.org/documentation/example-flows/tag-cloud-viewer/
- Perform analysis using Tag Cloud Viewer" on a webpage of your choice
- Open browser and go to http://seasr.org/documentation/example-flows/tag-cloud-viewer/

- Find a web url that you are interested in analyzing
- Click on the "Custom Execute" button to launch the application where you can copy and paste a web url that you are interested in analyzing
- Open browser and go to http://seasr.org/documentation/example-flows/tag-cloud-viewer/
Discussion Questions
- What are data repositories that you utilize in your scholarly research?
- What tools or applications are being utilized against these repositories?
Summary and Review
Session Title: Text Analytics Overview
Importance of the Topic
Text Analytics are important to any humanities scholars who are interested in increasing the efficiency of their efforts or exploring new research questions that are difficult to do without technology. This session will provide an overview of text analytics including part of speech tagging. We will look at example applications of clustering, frequent pattern analysis and entity extraction. We will also look at the Meandre Server Interface.
Focus of the Topic
Upon completion of this session, participants will understand:
- What Text Analytics can do
- Example text analytics applications that leverage SEASR
- What the Meandre Server Interface is
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

Demonstration
- Meandre Server Interface
- Tag Cloud Viewer, Text Clustering, Entity Extraction
Learning Exercises
- Explore Meandre Server Interface
- Open browser and point to "http://SERVER:1714/public/services/ping.html"
- Explore flows
- Execute flows
- Tune and execute flows
- Other functionality
- Execute the "Text Clustering" flow on a hard coded web page
- Click "flows"
- Under "Action", click "run"
- Execute the "Text Clustering" flow on a webpage of your choice
- Click "flows"
- Under "Action", click "tune&run"
- On the webpage, replace the "http://www.gutenberg.org/files/22925/22925.txt" with a web url of interest to you.
Discussion Questions
- Identify and discuss three other text tools that could be useful in the Humanities?
- What are the obstacles to using this technology for text analysis - what will your colleagues say?
Summary and Review
Session Title: Meandre Workbench
Importance of the Topic
In the context of the Meandre environment flows play a central role, orchestrating the work that needs to be performed by the various components to carry out a particular task. Typically, researchers use (and may construct) flows that allow them to extract/compute/visualize interesting information from their data sets. The Meandre Workbench provides a drag-and-drop visual environment for creating and executing flows.
Focus of the Topic
Upon completion of this session, participants will:
- Understand the notion of "Location" as a mechanism for importing external components and flows
- Be able to add, remove, rename, connect, disconnect components in a flow, and change their properties
- Be able to construct and execute flows
- Be able to navigate and customize the Workbench interface
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

- User Manual for Meandre Workbench can be found at http://seasr.org/meandre/documentation/tools-manual/meandre-workbench/

Demonstration
- We will be demonstrating the use of the Workbench for creating flows
- Use TagCloudViewer as an example and explain how it was created
Learning Exercises
- Explore the functionality of the Meandre Workbench
- Usage of existing components to create a data-driven flow for creating a basic Tag Cloud Viewer flow so they can become familiar with the mechanics of drag-drop, creating connections, setting properties, saving, executing
- Retrieve text from a url
- Count the words
- Visualize with the Tag Cloud Viewer
- Improve the Tag Cloud Flow that you created to "clean" it up a bit
- Filter HTML tags from the text
- Convert all words to lower case
- Remove stop words
- Filter to specific number of words
Discussion Questions
- What are three advantages of using a component driven environment for text analytics?
- What are the possible obstacles for humanities scholars in using an environment like the Meandre Workbench to assemble and create flows for accomplishing their research needs?
- Are there parts of the workbench that are unclear or that need extra explanation?
- Do you have any feature requests?
- Are there any tools that you would like to see componentized such that you can work with these tools in the Meandre Workbench?
Summary and Review
Session Title: SEASR Applications
Importance of the Topic
There are many tools that can be developed for analyzing text, audio, images, video and more. It is important to have a general understanding of capabilities so that research questions can be posed for reasonable analysis. This session will introduce you to several applications that can be used to spark interest and to think of other potential applications.
Focus of the Topic
Upon completion of this session, participants will:
- Experience an example of audio analysis that uses SEASR
- Observe several more examples of text analysis that use SEASR
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

Demonstration
- Son of Blinkie from the NEMA Project
- MONK
- Emotion Tracking
Learning Exercises
Discussion Questions
- What part of these applications can be useful to your research?
Summary and Review
Session Title: SEASR Analytics for Zotero
Importance of the Topic
Leveraging the extensive capabilities of the Meandre engine is not limited to the Workbench and the Meandre Server Interface. Zotero is a very popular Firefox extension among humanities researchers, allowing them to collect, manage, and cite their research sources. This session will show how to leverage the capabilities of the Meandre engine from Zotero, allowing researchers to bring the power of analytics to their Zotero collections.
Focus of the Topic
Upon completion of this session, participants will:
- Be able to install and use the SEASR Analytics extension for Zotero
- Understand what happens behind the scene - what information is pulled from Zotero and transmitted to a Meandre flow for analysis
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

- Documentation page can be found at http://seasr.org/documentation/zotero/

Demonstration
- We will be demonstrating how to install and use the SEASR Analytics extension for Zotero
Learning Exercises
- Have participants run some of the Zotero-enabled flows on their own collections
Discussion Questions
- What kinds of data assets would you be creating in Zotero?
- What other analysis would you like to use against this data?
Summary and Review
Session Title: Creating Flows for Zotero
Importance of the Topic
This session will provide information on expanding the analytical capabilities provided by the Zotero extension. Being able to create and run custom analytical algorithms on their Zotero collections is important for researchers to be able to address meaningful research questions regarding their data.
Focus of the Topic
Upon completion of this session, participants will:
- Be able to adapt an existing flow (or create a new flow) that will perform some analysis on the items in a collection
- Understand how the configuration mechanism for enabling flows in the SEASR Analytics extension works
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

Demonstration
- We will go through an example of what a Zotero-enabled flow looks like and what's special about it
- We will show how to modify an existing Zotero-enabled flow and how to "deploy" it so that it can be leveraged within Zotero
Learning Exercises
- Create a new flow (or adapt an existing flow) using the Meandre Workbench that performs some simple analysis and "deploy" it for access by Zotero
- We can use the flow we constructed in an earlier session as a base
- Execute this flow
- Change the configuration of SEASR plugin so that it knows how to access this flow
Discussion Questions
- What improvements would you like to see in the SEASR-Zotero integration, both in terms of functionality and user interaction experience?
Summary and Review
Session Title: Installation and Development Tools
Importance of the Topic
This session will focus on how participants can quickly get SEASR up and running on their own computers. The presentation will touch on how to obtain and install the SEASR applications, access the tools that are available, and what the best way is to keep a communication channel open with the SEASR team for providing feedback and obtaining support.
Focus of the Topic
Upon completion of this session, participants will understand:
- Where to obtain the latest version of the SEASR software packages
- How to install/uninstall SEASR on various operating systems
- How to start/stop SEASR and what to do if something goes wrong
- How to communicate feedback (feature requests, bug reports, etc.) to the SEASR team
- What tools are available to help facilitate component development efforts
- Where to find documentation about anything SEASR related
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

Demonstration
- Installation of Meandre
- Meandre Eclipse Plugin
- JIRA, Confluence, Bamboo - what they are and what we use them for
Learning Exercises
- Have participants download and install SEASR on their personal computers
- Have participants sign up for accounts to access the SEASR suite of Atlassian tools
- Use JIRA to log a support request
Discussion Questions
- What challenges (if any) would scholars have installing the SEASR software?
- Do you see your institution's IT department running the SEASR environment or would it be your research group?
Summary and Review
Session Title: Deployment of Flows
Importance of the Topic
Meandre's success relies on its ability to easily assemble data-intensive flows. Flows may be easily assembled by using the Meandre Workbench, an icon based programming environment for assembling data-intensive flows. Although we are using RDF for describing flows and components, it is not very human readable. So the ability to script the creation of the flow is important. We will describe ZigZag, the scripting language for Meandre and demonstrate how it easy it makes the deployment of flows. ZigZag targets rapid application development by speeding up the flow construction cycle and addressing parallelization.
Focus of the Topic
Upon completion of this session, participants will:
- Understand how ZigZag can accelerate the data-intensive flow development cycle and how it can be used to create a self-contained flow task for later execution
- Understand the usage of the ZigZag scripting language for Meandre
- Know how to generate a ZigZag script from an existing flow
- Know how to indicate the level of parallelism for improving performance
- See how the ZigZag scripts are deployed for Zotero and Fedora
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

- Tutorial can be found at http://dev-tools.seasr.org/confluence/display/COOK/ZigZag+Tutorial

Demonstration
- Usage of ZigZag
- Compiling and executing flows using ZigZag
- Usage of ZigZag for Zotero-enabled flows
- Usage of ZigZag for Fedora flows
Learning Exercises
- Open an existing ZigZag flow
- Convert your flow from yesterday to ZigZag
- Compile the script
- Execute the script
Discussion Questions
- Which environment would you most likely use, the Meandre Workbench or the ZigZag scripting language?
Summary and Review
Session Title: Future
Importance of the Topic
We want to describe where the SEASR Team is headed in future development. We will describe some future work to enable this discovery process for flows and components. The ability for researchers to find components and flows is important. We will also highlight some additional data and computation proposals that are likely to be funded. We want to hear how SEASR could benefit your work.
Focus of the Topic
Upon completion of this session, participants will understand:
- Planned future enhancements and developments for the SEASR project including features for Meandre Server and Meandre Workbench
- How SEASR Central can enhance the discovery process
- How other scholars plan to use SEASR to aid their research
Format of the Session
- Presentation
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at http://dev-tools.seasr.org/confluence/display/Outreach/Presentations

- Participant plan presentations
Discussion Questions
- How can SEASR benefit my research?
- What does SEASR need to look like for the future of humanities research?
- What scholarly questions do I have from my research for what to do with a million books?