Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

A Filtering Engine For Large Conceptual Schemas

A PhD Thesis by Antonio Villegas
by

Antonio Villegas

on 11 March 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of A Filtering Engine For Large Conceptual Schemas

A Filtering Engine
for Large Conceptual Schemas Antonio Villegas Antoni Olivé
Maria-Ribera Sancho Outline A Filtering Engine For Large Conceptual Schemas Conceptual Modeling
in the large Research Problem Introduction Motivation Filtering Engine for
Large Conceptual Schemas Research Contribution Filtering Method for Large Conceptual Schemas Research Contribution Adaptation to HL7 V3 Research Contribution Conclusions & Further Work 1 2 3 4 5 6 Conclusions Publications Introduction Conceptual Modeling in the Large Filtering Method
for Large Conceptual Schemas Filtering Engine
for Large Conceptual Schemas Adaptation to HL7 V3 General Filtering Method
for Large Conceptual Schemas Catalog of Specific Filtering Requests
for Large Conceptual Schemas ER 2010 ER 2012 Conference Conference Filtering Engine for Large Conceptual Schemas ER 2011 Demo ER 2012 Demo Relevance Metrics for Large Conceptual Schemas ER 2008 Workshops Conceptual Modelling and Its Theoretical Foundations 2012 Book Chapter J.UCS 2010 Journal Adaptation of the Filtering Methodology to HL7 V3 schemas IEEE Services 2010 Conference Co-Direction of Degree Final Projects Visualization of UML
schemas with HTML5 2012 Jose Maria Gomez
Maria-Ribera Sancho
Antonio Villegas Transformation of standard
healthcare models from HL7 to UML 2011 David Ortiz
Antoni Olivé
Antonio Villegas Future Research Directions Extend the filtering catalog with additional filtering requests Extend the relevance metrics taking into account instances Validation with real users Combine the filtering methodology with existing
approaches from the literature Filtering Methodology for Large Conceptual Schemas Problem Extraction of Relevant Knowledge from Large Conceptual Schemas Conceptual Modeling in the Large Usability of Large Conceptual Schemas Information Retrieval Contribution Problem
Relevance Design as
an artifact Catalog of Filtering Requests Web-based Filtering Engine Relevance Metrics Adaptation to HL7 V3 Research Contributions Research Community Research
Rigor Communication
of Research State of the art Conceptual Modeling Clustering Summarization Relevance Visualization Case Studies Real World Large Conceptual Schemas Industrial Standard from Healthcare Design Evaluation Design as a Search Process "More and more, individuals and society
rely on advanced software systems" "We need to be able to produce reliable and trustworthy systems economically and quickly" Sommerville, I. Software Engineering.
9th Edition. Addison-Wesley (2010) Software Engineering An essential software engineering activity Conceptual Modeling The conceptual schema is the general knowledge that an information system needs to know
in order to perform its functions Conceptual Schema Organizations require the management and maintenance of large amounts of knowledge
from their domains of interest Large Conceptual Schemas The growth in the amount of knowledge to deal with
also has an impact in the size of conceptual schemas
of information systems, making them larger The sheer size of those schemas transforms them into very useful artifacts for the communities and organizations for which they are developed However, the size of the schemas and their overall structure and organization make it difficult
to manually extract knowledge from them,
to understand their characteristics, and to change them. Filtering The aim of information filtering is to expose users to only information that is relevant to them in accordance with their needs, and to help them using their limited time and information processing capacity most optimally Provide a filtering engine of very large conceptual schemas to help users to easily extract from them the most relevant knowledge for a particular purpose Thesis Goal The development and management of (very) large
conceptual schemas poses specific problems that are not
encountered when dealing with small conceptual schemas Olivé A., Cabot J. A Research Agenda for Conceptual Schema-Centric Development. Conceptual modelling in information systems engineering. Springer (2007) The conceptual schema of a large organization
may contain thousands of entity types,
relationship types, constraints, and so on. We need methods, techniques and tools to support conceptual modellers and users in the development, reuse, evolution and understanding of large schemas. The challenge Filtering Large Conceptual Schemas A filtering methodology for
large conceptual schemas A filtering engine for
large conceptual schemas Design Science Research contributes contributes State of the art Conceptual Modeling in the Large / Relevance Metrics Metrics over UML/OCL
conceptual schemas contributes Case Studies Real World Large
Conceptual Schemas Industrial Standard
from Healthcare Major Contributions Filtering Approach Comparison Filtering Large Conceptual Schemas Domain
Expert Member of Maintenance Team Software Tester Database Designer Conceptual Modeler Assistance for Exploring & Understanding
Large Conceptual Schemas in the existing literature Clustering Methods Hierarchical Clustering Static tree structure Bottom-up approach Manual construction Applied to ER diagrams Teorey et al. (1989) Feldman and Miller (1986) ··· Teorey et al. (1989) ··· Moody (1999, 2000) ··· Shoval et al. (2002, 2004) Plain Clustering Clustering based on metrics over the
structure of the schema Static structure Semi-automatic construction Applied to ER diagrams Tavana et al. (2007) Francalanci and Pernici (1994) ··· Akoka and Comyn-Wattiau (1996) ··· Campbell et al. (1996) ··· Tavana et al. (2007) Relevance Methods Apply a ranking function to the elements in order to obtain an
ordered list of them according to their general relevance score Link-Analysis Methods Use relationships as links The relevance of an element is the
addition of a proportional portion of
the relevance of the elements that
link to it. Relevance based on the links connecting elements Geerts et al. (2004) ··· Kleinberg (1999) ··· Brin and Page (1998) ··· Tzitzikas et al. (120, 121) ··· Hsi et al. (2003) Occurrence Counting Methods Relevance based on the number of properties of the element and the links in which it participates Castano et al. (1998) Castano et al. (1998) ··· Tzitzikas et al. (121) ··· Hsi et al. (2003) Classify elements in groups (clusters) according
to a similarity function Summarization Methods Coverage Methods Combination of relevance and coverage metrics using a
link-analysis approach Applied to database schemas Yu and Jagadish (2006) Yu and Jagadish (2006) ··· Yang et al. (2009) Rule-based Abstraction Apply abstraction rules to a set of model
elements in order to replace them by a less
complex, more-abstract element Egyed (2002) Egyed (2002) Compute a summary of the large schema with only the elements that are more relevant or general Expose to users only the elements that are relevant to them according to their needs The main idea is to extract a reduced and self-contained view from the large schema, that is, a filtered conceptual schema with the knowledge of interest to the user Clustering Relevance Summarization Filtering Input Output Retrieved
Knowledge User
Interaction Schema Schema Schema Schema and
user request Clustered Schema Ranking Schema Summary Filtered Schema General General General Specific (depends on user) Static Exploration Static Exploration Static Exploration Dynamic
Request / Response
Cycle A relevant challenge
open to new contributions Conceptual Modeling in the Large Lack of contributions to deal with UML/OCL conceptual schemas Only part of the knowledge of conceptual schemas is taken into account Most of the contributions are applied to ER diagrams or database schemas The number of methods to reduce or restructure conceptual schemas defined using UML/OCL is very small Concepts like constraints, derivation rules, and specification of events or operations are commonly avoided The more knowledge used, the more complete the results will be Filtering Engine Web Architecture Service-Oriented Architecture SOAP + WSDL Eclipse Web Tools Platform + Axis 2 + Tomcat Development Technologies Any user interested in making use of our tool only requires a web browser to access it There are several ways of designing and coding a filtering system
following our catalog of filtering requests Our proposal provides a minimum working application Our service-oriented approach provides ways to easily extend
the functionalities of the filtering engine Implement a new filtering request
Design a new client view Public API Catalog of Filtering Requests Application of the Filtering Methodology Filtering Method for Large Conceptual Schemas Metrics for Large Conceptual Schemas Filtering Large Conceptual Schemas Fundamentals HL7 Version 3 Standard Filtering Methodology for HL7 V3 Experimentation Different kinds of elements can be part of the focus set F1 Filtering Request for Entity and Relationship Types Catalog of Filtering Requests F2 Filtering Request for Schema Rules Catalog of Filtering Requests The user focuses on a small set of event types F3 Filtering Request for Event Types Catalog of Filtering Requests F4 Filtering Request for a Conceptual Schema Catalog of Filtering Requests F5 Filtering Request for Context Behavior of Entity Types Catalog of Filtering Requests F6 Filtering Request for Contextualized Types Catalog of Filtering Requests The user focuses on a small set of entity and relationship types The user focuses on a fragment of the large schema The user focuses on a small set of entity types The user focuses on a small set of entity and event types
and provides a contextualization function over them Focus on Filtering Large UML/OCL Conceptual Schemas UML/OCL are the de-facto standard modeling languages Take into account more knowledge over the schema than existing methods in the literature User-directed Filtering Approach Align filtering results to user's needs Provide interactive and iterative request/response filtering cycle Interest Importance Closeness linear combination of A User-centered View of Relevance General Importance Occurrence Counting Methods Number of attributes Number of relationships in which participates Number of generalizations in which participates Link-Analysis Methods Number of attributes Links through relationships in which participates Links through generalizations in which participates Number of OCL navigations in which participates Links through OCL navigations in which participates Cardinality Constraints Derivation Rules Invariants Pre and Postconditions Event effects Schema Rules OCL Expressions Commonly ignored
We use them! OCL Navigations context E1 inv:
self.e2.e3 -> ... E1 E2 E3 A lot of them in
large schemas We can extend the existing methods
without changing their approaches By using Importance all the users will obtain the same results
regardless of their specific knowledge requirements The Closeness of an element is inversely proportional to
the distance of that element to the elements in the focus set
through relationship types and generalizations combine importance with user focus E2 E3 E4 If all have the same importance,
the most interesting is the closest one E2 E3 E4 Alternatively, Interest as a combination of
Importance and Closeness E1 E1 distance
to A 1 2 3 Stage 1 Metrics Processing Filtering Method for Large Conceptual Schemas Stage 2 Entity and Event Types Processing Filtering Method for Large Conceptual Schemas Stage 3 Relationship Types Processing Filtering Method for Large Conceptual Schemas Stage 4 Generalizations Processing Filtering Method for Large Conceptual Schemas Stage 5 Schema Rules Processing Filtering Method for Large Conceptual Schemas Stage 6 Data Types Processing Filtering Method for Large Conceptual Schemas Stage 7 Presentation Filtering Method for Large Conceptual Schemas includes the schema elements the user wants to focus on includes the schema elements the user denotes as not interesting for her knowledge request denotes how many schema elements the user wants to as output indicates which importance method needs to be used The knowledge contained in the filtered schema
is a subset of the
knowledge from the large schema Magento 218 entity types
187 event types
983 attributes www.magentocommerce.com
165 generalizations
319 relationship types
893 general constraints
69 pre- and postconditions Magento Conceptual Schema { LogIn, LogOut, Customer } { } 6 Link-Analysis Filtered Entity Types = { Customer, StoreView, Website, Product }
Filtered Event Types = { LogIn, LogOut } Select Elements from the Interest Ranking until reaching the Size Threshold Size Threshold = 6 Referentially-Complete Relationship relationship type whose participants are all included in the filtered conceptual schema relationship type with some participants inside of the filtered schema
and others outside, but all of those who are outside have descendants
that are included in the filtered schema Referentially-Partial Relationship Projection of Referentially-Partial Relationship In our running example, the method obtained 16 referentially-complete relationships types
including 4 association classes In addition to it, our filtering method obtains a referentially-partial relationship type
that must be projected Process to filter generalization relationships Select the data types that are referenced or used by the elements previously included in the resulting filtered conceptual schema Referentially-Complete Schema Rules a schema rule is referentially-complete whether all its participants belong to the filtered schema a schema rule with participants out of the filtered schema Referentially-Incomplete Schema Rules Select the Schema Rules whose Context is an element
that belongs to the Filtered Schema Include the rule name as a reference in the filtered schema Include the rule in the filtered schema 5 Referentially-Complete Schema Rules 50 References to Referentially-Incomplete Schema Rules 5 enumeration types + 5 data types WeekDay, Status, BackOrderPolicy, ProductStatus PhoneNumber, DateTime, Date, Time, Address Experimental Evaluation Case Studies Magento 218 entity types
187 event types
983 attributes www.magentocommerce.com
165 generalizations
319 relationship types
893 general constraints
69 pre- and postconditions osCommerce 84 entity types
262 event types
209 attributes www.oscommerce.com
393 generalizations
183 relationship types
204 general constraints
220 pre- and postconditions UML Metaschema 293 entity types
93 attributes www.uml.org
355 generalizations
377 relationship types
107 general constraints
9 pre- and postconditions EU-Rent 65 entity types
120 event types
85 attributes Car Rental System
207 generalizations
152 relationship types
107 general constraints
166 pre- and postconditions Effectiveness Efficiency A good method does not only need to be useful,
but it also needs to obtain the results
in an acceptable time according to the user’s expectations Record the time lapse between
the selection of a focus set and
the presentation of the filtered schema Filtering Utility Factor response time (in milliseconds) obtained by an Intel Core 2 Duo 3GHz processor with 4GB of DDR2 RAM Results Mean value of the response time
less than 45 milliseconds The largest observations
are below 150 milliseconds The resulting times are
short enough for our purpose The bigger size reduction our method obtains, the better We compare the final size of the filtered schema
with the size of the contextual schema the portion of the large schema the user needs to manually explore
in order to cover the elements referenced by the filtered schema
starting from the elements within the focus set Contextual Schema The user explores
complete generalization paths
relationship types without projections We define the filtering utility factor between
the filtered conceptual schema FCS and
the contextual conceptual schema CCS
as follows: where the size (CS) of a conceptual schema CS counts:
the number of entity and event types in CS
the number of relationship types in CS
the number of attributes in CS
the number of generalization relationships in CS = 1 - (FCS) (CCS) FCS CCS Results Mean value of the filtering utility
factor greater than 0.6
in most cases Size reduction greater than 60%
using filtered schemas instead of
exploring the whole schema manually 660 entity types, 1031 relationship types 1453 schema rules 569 event types 569 event types 1453 schema rules 5166 event types 367 entity types The e-Commerce Case Study Perform a filtering-driven conceptual schema comparison between osCommerce and Magento in order to support the process of selecting one of them Show that our filtering methodology can assist users through the comparison between two large conceptual schemas There are 40 event types (15% of osCommerce's total and 21% of Magento's)
and 36 entity types (43% of osCommerce's total and 16% of Magento's)
that are specified in both conceptual schemas with different characteristics but sharing the same name Purpose Filtering Scenario osCommerce Magento Magento osCommerce osCommerce Magento F1 - Filtering Request for Entity and Relationship Types F3 - Filtering Request for Event Types Lessons Learned The Magento framework provides a more detailed approach to
the e-commerce business than the one proposed by osCommerce The configuration of the osCommerce environment is easier than Magento's From the Comparison Small-to-Medium Organizations Medium-to-Large Organizations osCommerce Magento From the Use of Filtering New Users Experienced Users F1 - Filtering Request for Entity and Relationship Types F5 - Filtering Request for Context Behavior
of Entity Types F3 - Filtering Request for Event Types F2 - Filtering Request for Schema Rules The UML Metaschema Case Study Compare the contents of the OMG's formal specification document of the UML with the results of applying a set of filtering requests to its conceptual schema Show that our filtering methodology can also be useful even when
there is a good-enough documentation with the formal specification of a given large conceptual schema Purpose Filtering Scenario UML Metaschema (www.uml.org) The document with the UML metaschema definition includes more than 700 pages with the technical aspects of this formal specification.
Although extensive cross-references are provided, users may lose their context through navigation. UML Metaschema Specification Document P.127 P.32 F2 - Filtering Request for Schema Rules Lessons Learned Having a good documentation for a large conceptual schema is useful to inexperienced users There are situations where a user is interested on several aspects from a large schema that are defined in different places of the documentation From the Comparison Filtering Methodology From the Use of Filtering Users Formal Specification Document F1 - Filtering Request for Entity and Relationship Types F2 - Filtering Request for Schema Rules more dynamic exploration approach saves time and reduces the searching effort a user must dedicate Time Analysis Record the time lapse between
the selection of a focus set and
the presentation of the filtered schema It is expected that as we increase the size of the focus set, the time will increase linearly Our method computes the distances from each element in the focus set to all the rest of elements (closeness measure). This computation requires the same time (in average) for each element in the focus set

Therefore, the more elements we have in a focus set, the more the time our method spends in computing distances Reason According to the expected use of our method,
having a focus set of 40 elements is not a common situation (although possible) Sizes of focus set up to 10 elements are more realistic,
in which case the average time does not exceed one second Precision Analysis The percentage of relevant
knowledge presented to the user We use the concept of precision applied to HL7 V3 domain models Precision = # relevant elements retrieved # elements retrieved Each domain model contains a main element which is the central point of knowledge to the users interested in such domain A common situation for a user is to focus on the main element of a domain
and to navigate through the domain model to understand its related knowledge The test reveals that to reach more than 80% of the relevant
elements of a domain model, only three iterations are required We simulate the generation of a domain model from its main element Initialization Focus Set = Main element of the original domain model
Size Threshold = number of elements of the original domain model This way, we will obtain a filtered schema with the same number of
elements as the original domain model Following Iterations Focus Set = Main element of the original domain model
Size Threshold = number of elements of the original domain model
Rejection Set = includes non-relevant elements retrieved in the previous iteration Transformation of HL7 V3 into UML Adaptation of the Filtering Methodology Support a large conceptual schema as
a large set of small-to-medium schemas Easier to automatically extract knowledge
from HL7 V3 and explore its schemas HL7 V3 schemas can be used with
a wide range of existing UML tools Conceptual modelers from the software engineering
area can easily contribute to the development of HL7 V3 Improve the Quality of the HL7 V3 Standard 1 Reference Information Model 22 Domain Models 870 Refined Models 573 Interactions The HL7 Version 3 messaging standard defines a series of electronic messages (called interactions) to support all healthcare workflows Health Level Seven International (HL7) is the global authority on standards for interoperability of health information technology with members in over 55 countries Issues of HL7 V3 HL7 V3 models contain special constructions
specified in a non UML-compliant modeling language The complexity of exploring, understanding and implementing
the HL7 V3 models within the standard is high Filtering Methodology ATL-based Model-to-Model Transformation Patient Billing Account Model (FIAB_RM010000UV02) HL7 V3 UML Patient Billing Account Model (FIAB_RM010000UV02) HL7 V3 models { Patient, Appointment } { } 12 Link-Analysis Pre-Filtering Filtering Method 17% of the total amount of HL7 V3 Tool Support Understandability Analyzability prezi.com/user/avillegasn Hard to navigate if a user is interested in
elements located within different clusters Relevance Methods
from the Literature First Stage Pre-Filtering Select default values in enumeration attributes
Redefine multiplicities of attributes and relationship ends Contextualization M. Gogolla et al.: USE: A UML-based specification environment for validating UML and OCL. Sci. Comput. Program. 69(1-3): 27-34 (2007) Select Conceptual Schemas that contain elements of the focus set Reduce the effort of computing distances for the Closeness metric A. Hevner et al.: Design Science in Information Systems Research.
MIS Quarterly 28(1): 75-105 (2004) The user focuses on a small set of schema rules
Full transcript