Session Title: Meandre Workbench
Importance of the Topic
In the context of the Meandre environment, flows play a central role in orchestrating the work that needs to be performed by the various components to carry out a particular task. Typically, researchers use (and may construct) flows that allow them to extract, compute and visualize interesting information from their data sets. The Meandre Workbench provides a drag-and-drop visual environment for creating and executing flows. In this session we will discuss the workbench and provide hands-on training using the workbench to construct and modify flows.
Focus of the Topic
Upon completion of this session, participants will:
- Understand the notion of "Location" as a mechanism for importing external components and flows
- Be able to add, remove, rename, connect, disconnect components in a flow, and change their properties
- Be able to construct and execute flows
- Be able to navigate and customize the Workbench interface
Format of the Session
- Presentation
- Demonstration
- Learning Exercise
- Discussion Questions
- Summary and Review
Presentation
- Slides can be found at 3-Meandre-Workbench.pptx
- User Manual for Meandre Workbench can be found at http://seasr.org/meandre/documentation/tools-manual/meandre-workbench/
Demonstration
- We will be demonstrating the use of the Workbench for creating flows
- Explore the NLP flows that we ran yesterday from the SEASR Community Hub
- Use TagCloudViewer as an example and explain how it was created
Learning Exercises
- Explore the functionality of the Meandre Workbench
- Open Meandre Workbench (WB) by navigating to http://localhost:1712 (if necessary, replace "localhost" with the name of the server where the Workbench is running)
- Usage of existing components to create a data-driven flow for creating a basic Tag Cloud Viewer flow so they can become familiar with the mechanics of drag-drop, creating connections, setting properties, saving, and executing
- Create a new tab in the WB by clicking on the first tab (with the yellow star)
- Retrieve text from a url
- Expand the Components section of the WB (click on the + sign)
- Find the component named "Push Text" (scroll down or use the search box) and drag it onto the workspace
- Find the component named "Universal Text Extractor" and add it to the flow, as before
- Connect the output port "text" of "Push Text" to the input port "location" of "Universal Text Extractor" (click on each port to make a connection)
- Count the words
- Find the components "OpenNLP Tokenizer" and "Token Counter" and add them to the flow, as before
- Connect the output port "text" of "Universal Text Extractor" to the input port "text" of "OpenNLP Tokenizer"
- Connect the output port "tokens" of "OpenNLP Tokenizer" to the input port "tokens" of "Token Counter"
- Visualize with the Tag Cloud Viewer
- Find the components "Tag Cloud Image Maker", "HTML Fragment Maker" and "HTML Viewer" and add them to the flow
- Change the property named "encoding" of "HTML Fragment Maker" to read "image" (no quotes)
- Select the "HTML Fragment Maker" component by clicking on it
- Double click on "encoding" in the Details -> Properties panel on the right side of the WB and change the value by typing "image" (no quotes)
- After changing the text, press ENTER to accept the new value
- Connect the output port "token_counts" of "Token Counter" to the input port of "Tag Cloud Image Maker"
- Connect the output port "raw_data" of "Tag Cloud Image Maker" to the input port of "HTML Fragment Maker"
- Connect the output port "html" of "HTML Fragment Maker" to the input port of "HTML Viewer"
- Improve the Tag Cloud Flow that you created to "clean" it up a bit
- Convert all words to lower case
- Find the component "To Lowercase" and add it to the flow, connecting it between "Universal Text Extractor" and "OpenNLP Tokenizer"
- Click the output port "text" of "Universal Text Extractor" and then click the input port "text" of "To Lowercase" (this will remove the existing connection between "Universal Text Extractor" and "OpenNLP Tokenzier")
- Connect the output port of "To Lowercase" to the appropriate port of "OpenNLP Tokenizer"
- Find the component "To Lowercase" and add it to the flow, connecting it between "Universal Text Extractor" and "OpenNLP Tokenizer"
- Remove stop words
- Add another "Push Text", "Universal Text Extractor" and "OpenNLP Tokenizer" to the flow, and connect them as before
- Set the "message" property of this second "Push Text" to read "http://repository.seasr.org/Datasets/Text/common_words.txt" (no quotes)
- Find and add the component "Token Filter" between "Token Counter" and "Tag Cloud Image Maker"
- Connect the output port "token_counts" of "Token Counter" to the input port "token_counts" of "Token Filter"
- Connect the output port "token_counts" of "Token Filter" to the input port of "Tag Cloud Image Maker"
- Connect the output port "tokens" of the second "OpenNLP Tokenizer" to the input port "tokens_blacklist" of "Token Filter"
- Filter to specific number of words
- Find and add the component "Top N Filter" between "Token Filter" and "Tag Cloud Image Maker"
- Connect the output port "token_counts" to the input port of "Top N Filter"
- Connect the output port of "Top N Filter" to the input port of "Tag Cloud Image Maker"
- Set the property "n_top_tokens" of "Top N Filter" to a number representing the number of top tokens to be displayed (ranked by token count)
- Convert all words to lower case
Attendee Project Plan
- Review project plan
- Modify and develop the project plan over the week
Discussion Questions
- What are three advantages of using a component driven environment for text analytics?
- What are the possible obstacles for humanities scholars in using an environment like the Meandre Workbench to assemble and create flows for accomplishing their research needs?
- Are there parts of the workbench that are unclear or that need extra explanation?
- Do you have any feature requests?
- Are there any tools that you would like to see componentized such that you can work with these tools in the Meandre Workbench?
Add Comment