Enhancing the Information Systems
Undergraduate Curriculum with NPACI Technology
Alexis Koster, Professor
IDS Department
- Introduction: Enhancing Database
Systems
- Enhancing the undergraduate Data Management
Systems course: IDS 480
- Enhancing the undergraduate Database Systems
course: a proposal for an advanced Database Course
Introduction: Enhancing Database Systems in
the Undergraduate Information Systems Curriculum
Database Systems in the Undergraduate Information
Systems Curriculum
- One Course Only Exists
- Covers the Basic Concepts
- Little room for new material
- Computational Science used to improve
learning
- Need for an Advanced Course
- Physical Database Structures
- Data base Administration
- Data warehouses
- Computational Science Topics to be
Included
- Computational Science Tools to Improve
learning
THE CURRENT DATABASE COURSE, IDS 480--An
Overview
-
- Introduction to database management systems
- Database Design
- Introduction to ORACLE and SQL
- Transactions issues
- Physical Data structures used in database management
systems
- New environments for DBMS and advanced topics
THE CURRENT DATABASE COURSE, IDS 480
-
- Introduction to database management systems
-
- traditional file systems
- Elements of database systems and the database environment
- Concepts of Relational Database Systems
-
- Database Design
-
- Phases in Data Base Design
- Data modeling using the Entity-Relationship approach.
- Mapping an Entity-Relationship Design to a Relational
- Normal forms
-
- Introduction to ORACLE and SQL
-
- Acessing ORACLE
- Creating tables
- Retrieval of data from single files.
- Retrieval of data from multiple files: joins
- Commands to modify data: Update, Delete, Insert
- Views
- Grant Commands
-
- Transactions issues
-
- data integrity/security/privacy
- recovery from failure
- concurrency control.
-
- Physical Data structures used in database management
systems
-
- Characteristics of Storage Devices: access time and
capacity
- trees, networks, linked lists and inverted structures.
- organization of indexes: B-Trees.
-
- New environments for DBMS and advanced topics
-
- client/server
- distributed database systems
- Data Warehousing
- Internet
- database machines
THE CURRENT DATABASE COURSE: IDS
480
TOPICS COVERED in DEPTH
-
- Introduction to database management systems
-
- Database Design
-
- Introduction to ORACLE and SQL
TOPICS COVERED LIGHTLY
-
- Transactions issues
-
- Physical Data structures used in database management
systems
TOPICS RARELY COVERED
-
- New environments for DBMS and advanced topics
AN IMPROVED DATABASE COURSE: IDS
480
Improving the learning of SQL commands
- Most complex topics:
- joins
- count function
- views
- Proposed teaching improvement
- Use computational science techniques
- Data visualization
- Animation
- Demonstrations
- Join of two tables:
http://www.edcenter.sdsu.edu/zli/LeeLou/Viewlet/tables_join_viewlet.html
http://www.edcenter.sdsu.edu/zli/LeeLou/Table_Viewlet/table_join_viewlet.html
- Normalization versus "De-normalization"
Use same animation to show problem of performance with highly
normalized tables
-
- Introducing NPACI applications
- Storage for large databases: robots for tape
silos
A Proposal for an Advanced Database
Course
- Data Storage
- Physical Data Structures
- Recovery from Failure
ORACLE techniques
- Concurrency Control
- Data Warehouses and OLAP
- Internet and Java Database
Connectivity
- Object-Oriented Databases
The Relational-object model (ORACLE)
- Multimedia Databases and Metadata
COMPUTATIONAL SCIENCE AND NPACI IN THE
ADVANCED DATABASE COURSE
- Data Storage
- IBM HPSS
- Near-online storage:
- Digitized tapes
- Robotics silos of digitized tapes
http://www.npaci.edu/Resources/Systems/cgi-bin/compute_arc.cgi
- Physical Data structures
- Animation techniques used to show improvement
brought by physical data structures
- Data Warehouses
- Use of parallel techniques for performance
improvement
- Comparison of Database Machines (NCR) and
NPACI
- Multimedia Databases
- Storage (digitized tapes and robotic
silo)
- Timing questions: use of caches
- Physical data structures
- Metadata
- Queries
- Java Database Connectivity
Mass Data Storage
At the San Diego Supercomputer Center
http://www.npaci.edu/Resources/Systems/cgi-bin/compute_arc.cgi
- NPACI Archival System Resources at SDSC:
HPSS
- Available Tape Storage: 60 TB
- Available Disk Cache: 250 GB
-
- Why HPSS?
www.sdsc.edu/hpss/hpss1.html
- hierarchical storage system designed to deal with
- the massive amounts of data in modern computing
systems,
- imbalance of computational vs. I/O capabilities.
- Driving Trends
- Rapid improvements in applications and algorithms
- compute capabilities
- User's need for a high-level view of information, not a
low-level storage view
- Shift from centralized to distributed computing
- Need to integrate distributed information
repositories.
- Shift from proprietary to open systems
- Rapid improvements in local and wide area networking
technology
-
- compute capabilities
- memory sizes
- data collection
- devices
-
- multimedia capabilities
- integration of enterprise data
- producing multigigabyte-to-terabyte datasets
need for scalability to petabyte and exabyte
stores
- Significantly slower improvement in storage device
performance relative to other system components: 50% vs
20%
SURVEY OF MASS STORAGE HARDWARE
-
- Removable Storage Media
- Tape Drives
- Archival Systems
- High Performance Connections
- RAID systems
TAPE STORAGE
Examples of small and large devices (digitized square tapes)
FastStor
- DLT4000, DLT7000, DLT8000
- 7 Cartridges Positions
- Up to 560 GB Capacity
- Up to 43 GB/hour Transfer rate
FastStor 22
- DLT4000, DLT7000, DLT8000
- 22 Cartridges Positions
- Up to 1.76 TB Capacity
- Up to 43 GB/hour Transfer rate
-
Commonly used storage media
Name IBM 3490E IBM 3590 SD-3
Form factor 1/2" 1/2" 19 mm
cartridge
cartridge cartridge
Capacity 0.8 GB 10 GB 10/25/50 GB
Recording serpentine serpentine
helical
format
. Name DLT 7000 Exabyte Sony AIT
Mammoth
Form factor 1/2" cartridge 8 mm cartridge 8
mm cartridge
Capacity 35 GB 20 GB 25 GB
Recording serpentine helical helical
format
Tape Drives
-
- Storage Need: explosive growth
- computer networks
- databases
- scientifc
- multimedia
- video/audio
-
- TECHNOLOGY CHANGES
ARCHIVAL SYSTEMS
TAPE ROBOT/SILO SYSTEMS
StorageTek
One silo = up to 6000 1/2" tapes
Connect up to 256 silos
Maximum capacity: 1.2 to Perabytes to 123
Perabytes
ADIC
500 to 10000 tapes
up to 50 tape drives
http://www.adic.com/US/English/Products/Hardware/MixedMedia/AML2/index.html
http://www.storagetek.com/StorageTek/about/about_vid.html
RAID SYSTEMS
Redundant Arrays of Inexpensive Disks
Technologies
- Striping
- Mirroring
- Duplexing
6 RAID levels (0 to 5)
CONCLUSION
Computational Science In Database Courses
- Tools to improve learning
- Visualization
- Animation
- Content
Multimedia/video/audio
Data Warehouses
- Mass Storage
- Physical Data Structures
- Metadata
- Internet Access (Java Data Base
Connectivity