برای استفاده از امکانات سیستم، گزینه جاوا اسکریپت در مرورگر شما باید فعال باشد
صفحه
از
0
Solaris internals : core kernel components
Mauro, Jim.
اطلاعات کتابشناختی
Solaris internals : core kernel components
Author :
Mauro, Jim.
Publisher :
Sun Microsystems, Inc.,
Pub. Year :
2001
Subjects :
Operating systems (Computers)
Call Number :
QA 76 .76 .O63 .M37195 2001
جستجو در محتوا
ترتيب
شماره صفحه
امتياز صفحه
فهرست مطالب
Solaris Internals
(1)
Core Kernel Components
(1)
Solaris Internals
(3)
Core Kernel Components
(3)
Jim Mauro and Richard McDougall
(3)
Sun Microsystems Press
(4)
A Prentice Hall Title
(4)
Acknowledgements
(7)
Preface
(11)
About This Book
(11)
Intended Audience
(12)
How This Book Is Organized
(13)
Solaris Source Code
(14)
Updates and Related Material
(14)
Notational Conventions
(15)
Typeface or Symbol
(15)
Meaning
(15)
Example
(15)
Shell
(15)
Prompt
(15)
A Note from the Authors
(16)
Contents
(17)
Part One 1
(17)
Introduction to Solaris Internals
(17)
1. An Introduction to Solaris 3
(17)
2. Kernel Services 27
(18)
3. Kernel Synchronization Primitives 59
(19)
4. Kernel Bootstrap and Initialization 103
(19)
Part Two 123
(20)
The Solaris Memory System
(20)
5. Solaris Memory Architecture 125
(20)
6. Kernel Memory 205
(21)
7. Memory Monitoring 235
(22)
Part Three 259
(23)
Threads, Processes, and IPC
(23)
8. The Solaris Multithreaded Process Architecture 261
(23)
9. The Solaris Kernel Dispatcher 349
(24)
10. Interprocess Communication 429
(24)
Part Four 479
(25)
Files and File Systems
(25)
11. Solaris Files and File I/O 481
(25)
12. File System Overview 523
(26)
13. File System Framework 541
(27)
14. The Unix File System 577
(27)
15. Solaris File System Cache 601
(28)
Appendix A
(29)
Kernel Tunables, Switches, and Limits 621
(29)
Appendix B
(29)
Kernel Virtual Address Maps 633
(29)
Appendix C
(29)
A Sample Procfs utility 641
(29)
List of Figures
(31)
List of Tables
(37)
List of Header Files
(41)
Part One
(43)
Introduction to Solaris Internals
(43)
• An Introduction to Solaris
(43)
• Kernel Services
(43)
• Kernel Synchronization Primitives
(43)
• Kernel Bootstrap and Initialization
(43)
#
(43)
An Introduction to Solaris
(45)
1.1 A Brief History
(46)
1.2 Key Differentiators
(50)
1.3 Kernel Overview
(52)
1.3.1 Solaris Kernel Architecture
(53)
Figure 1.1 Solaris Kernel Components
(54)
1.3.2 Modular Implementation
(54)
Figure 1.2 Core Kernel and Loadable Modules
(55)
1.4 Processes, Threads, and Scheduling
(56)
Figure 1.3 Kernel Threads, Processes, and Lightweight Processes
(57)
1.4.1 Two-Level Thread Model
(57)
Figure 1.4 Two-Level Thread Model
(57)
1.4.2 Global Process Priorities and Scheduling
(58)
Figure 1.5 Global Thread Priorities
(58)
1.5 Interprocess Communication
(59)
1.5.1 Traditional UNIX IPC
(59)
1.5.2 System V IPC
(60)
1.5.3 POSIX IPC
(60)
1.5.4 Advanced Solaris IPC
(60)
1.6 Signals
(61)
1.7 Memory Management
(61)
Figure 1.6 Address Spaces, Segments, and Pages
(62)
1.7.1 Global Memory Allocation
(62)
1.7.2 Kernel Memory Management
(63)
1.8 Files and File Systems
(63)
Figure 1.7 Files Organized in a Hierarchy of Directories
(64)
1.8.1 File Descriptors and File System Calls
(64)
1.8.2 The Virtual File System Framework
(65)
Figure 1.8 VFS/Vnode Architecture
(66)
1.9 I/O Architecture
(67)
Figure 1.9 The Solaris Device Tree
(67)
Kernel Services
(69)
2.1 Access to Kernel Services
(69)
Figure 2.1 Switching into Kernel Mode via System Calls
(70)
2.2 Entering Kernel Mode
(70)
2.2.1 Context
(71)
2.2.1.1 Execution Context
(71)
2.2.1.2 Virtual Memory Context
(71)
2.2.2 Threads in Kernel and Interrupt Context
(72)
Figure 2.2 Process, Interrupt, and Kernel Threads
(73)
2.2.3 UltraSPARC I & II Traps
(73)
2.2.3.1 UltraSPARC I & II Trap Types
(74)
2.2.3.2 UltraSPARC I & II Trap Priority Levels
(75)
2.2.3.3 UltraSPARC I & II Trap Levels
(76)
2.2.3.4 UltraSPARC I & II Trap Table Layout
(76)
Figure 2.3 UltraSPARC I & II Trap Table Layout
(77)
2.2.3.5 Software Traps
(77)
2.2.3.6 A Utility for Trap Analysis
(78)
2.3 Interrupts
(80)
2.3.1 Interrupt Priorities
(80)
Figure 2.4 Solaris Interrupt Priority Levels
(80)
2.3.1.1 Interrupts as Threads
(81)
Figure 2.5 Handling Interrupts with Threads
(82)
2.3.1.2 Interrupt Thread Priorities
(83)
Figure 2.6 Interrupt Thread Global Priorities
(83)
2.3.1.3 High-Priority Interrupts
(83)
2.3.1.4 UltraSPARC Interrupts
(84)
Figure 2.7 Interrupt Table on sun4u Architectures
(84)
2.3.2 Interrupt Monitoring
(84)
2.3.3 Interprocessor Interrupts and Cross-Calls
(85)
2.4 System Calls
(86)
2.4.1 Regular System Calls
(86)
Figure 2.8 The Kernel System Call Entry (sysent) Table
(86)
Figure 2.9 System Call Execution
(87)
2.4.2 Fast Trap System Calls
(88)
2.5 The Kernel Callout Table
(89)
2.5.1 Solaris 2.6 and 7 Callout Tables
(89)
Figure 2.10 Solaris 2.6 and Solaris 7 Callout Tables
(90)
2.5.2 Solaris 2.5.1 Callout Tables
(93)
Figure 2.11 Solaris 2.5.1 Callout Tables
(94)
2.6 The System Clock
(96)
2.6.1 Process Execution Time Statistics
(97)
2.6.2 High-Resolution Clock Interrupts
(98)
2.6.3 High-Resolution Timer
(99)
2.6.4 Time-of-Day Clock
(99)
Figure 2.12 Time-of-Day Clock on SPARC Systems
(100)
Kernel Synchronization Primitives
(101)
3.1 Synchronization
(101)
3.2 Parallel Systems Architectures
(102)
Figure 3.1 Parallel Systems Architectures
(104)
3.3 Hardware Considerations for Locks and Synchronization
(105)
Figure 3.2 Atomic Instructions for Locks on SPARC
(107)
Figure 3.3 Hardware Data Hierarchy
(108)
3.4 Introduction to Synchronization Objects
(110)
3.4.1 Synchronization Process
(111)
Figure 3.4 Solaris Locks — The Big Picture
(112)
3.4.2 Synchronization Object Operations Vector
(112)
3.5 Mutex Locks
(113)
3.5.1 Overview
(114)
3.5.2 Solaris 7 Mutex Lock Implementation
(116)
Figure 3.5 Solaris 7 Adaptive and Spin Mutex
(116)
3.5.2.1 Solaris 2.6 Mutex Implementation Differences
(120)
Figure 3.6 Solaris 2.6 Mutex
(121)
3.5.2.2 Solaris 2.5.1 Mutex Implementation Differences
(121)
Figure 3.7 Solaris 2.5.1 Adaptive Mutex
(122)
Figure 3.8 Solaris 2.5.1 Mutex Operations Vectoring
(122)
3.5.2.3 Why the Mutex Changes in Solaris 7
(123)
3.6 Reader/Writer Locks
(124)
3.6.1 Solaris 7 Reader/Writer Locks
(125)
Figure 3.9 Solaris 7 Reader/Writer Lock
(125)
3.6.2 Solaris 2.6 RW Lock Differences
(128)
Figure 3.10 Solaris 2.6 Reader/Writer Lock
(128)
3.6.3 Solaris 2.5.1 RW Lock Differences
(128)
Figure 3.11 Solaris 2.5.1 RW Lock Structure
(128)
3.7 Turnstiles and Priority Inheritance
(131)
3.7.1 Solaris 7 Turnstiles
(132)
Figure 3.12 Solaris 7 Turnstiles
(133)
3.7.2 Solaris 2.5.1 and 2.6 Turnstiles
(135)
Figure 3.13 Solaris 2.5.1 and Solaris 2.6 Turnstiles
(135)
Figure 3.14 Solaris 2.5.1 and 2.6 Turnstiles
(136)
3.8 Dispatcher Locks
(139)
3.9 Kernel Semaphores
(141)
Figure 3.15 Kernel Semaphore
(141)
Figure 3.16 Sleep Queues in Solaris 2.5.1, 2.6, and 7
(143)
Kernel Bootstrap and Initialization
(145)
4.1 Kernel Directory Hierarchy
(145)
Figure 4.1 Core Kernel Directory Hierarchy
(147)
4.2 Kernel Bootstrap and Initialization
(149)
1. The boot(1M) command reads and loads the bootblock into memory.
(149)
2. The bootblock locates and loads the secondary boot program, ufsboot, into memory and passes co...
(149)
3. ufsboot locates and loads the core kernel images and the required kernel runtime linker into m...
(149)
4. The core kernel locates and loads mandatory kernel modules from the root disk directory tree a...
(149)
5. The kernel startup code executes, creating and initializing kernel structures, resources, and ...
(149)
6. The system executes shell scripts from system directories, bringing the system up to the init ...
(149)
4.2.1 Loading the Bootblock
(149)
Figure 4.2 Bootblock on a UFS-Based System Disk
(150)
4.2.2 Loading ufsboot
(150)
4.2.3 Locating Core Kernel Images and Linker
(151)
4.2.4 Loading Kernel Modules
(151)
Figure 4.3 Boot Process
(152)
4.2.5 Creating Kernel Structures, Resources, and Components
(152)
4.2.6 Completing the Boot Process
(156)
4.2.7 During the Boot Process: Creating System Kernel Threads
(157)
4.3 Kernel Module Loading and Linking
(158)
1. Load the module (a binary object file) into memory.
(158)
2. Establish kernel address space mappings for module segments.
(158)
3. Link the module’s segments into the kernel.
(158)
4. Perform the module-type-specific install function.
(158)
Figure 4.4 Loading a Kernel Module
(159)
Figure 4.5 Module Control Structures
(160)
1. Create and allocate a modctl structure. First, search the linked list of modctl structures, lo...
(160)
2. Enter the kernel runtime linker, krtld, to create address space segments and bindings, and loa...
(160)
a) Allocate module structure().
(161)
b) Allocate space for the module’s symbols in the kernel’s kobj_map resource map.
(161)
c) Loop through the segments of the module being loaded, and allocate and map space for text and ...
(161)
d) Load kernel object into memory, linking the object’s segments into the appropriate kernel addr...
(161)
3. Set the mod_loaded bit in the module’s modctl structure, and increment the mod_loadcnt.
(161)
4. Create a link to module’s mod_linkage structure.
(161)
5. Execute the module’s mod_install function indirectly by looking up the module _init() routine ...
(161)
Figure 4.6 Module Operations Function Vectoring
(163)
Part Two
(165)
The Solaris Memory System
(165)
• Solaris Memory Architecture
(165)
• Kernel Memory
(165)
• Memory Monitoring
(165)
#
(165)
Solaris Memory Architecture
(167)
5.1 Why Have a Virtual Memory System?
(167)
Figure 5.1 Solaris Virtual-to-Physical Memory Management
(169)
5.2 Modular Implementation
(170)
Figure 5.2 Solaris Virtual Memory Layers
(172)
5.3 Virtual Address Spaces
(172)
Figure 5.3 Process Virtual Address Space
(173)
5.3.1 Sharing of Executables and Libraries
(174)
5.3.2 SPARC Address Spaces
(174)
Figure 5.4 SPARC 32-Bit Shared Kernel/Process Address Space
(175)
Figure 5.5 SPARC sun4u 32- and 64-Bit Process Address Space
(176)
5.3.3 Intel Address Space Layout
(176)
Figure 5.6 Intel x86 Process Address Space
(177)
5.3.4 Process Memory Allocation
(176)
5.3.5 The Stack
(178)
5.3.6 Address Space Management
(179)
Figure 5.7 The Address Space
(179)
5.3.7 Virtual Memory Protection Modes
(182)
5.3.8 Page Faults in Address Spaces
(182)
Figure 5.8 Virtual Address Space Page Fault Example
(184)
1. A reference is made to a memory address that does not map to a physical page of memory. In thi...
(184)
2. When the process accesses the address with no physical memory behind it, the MMU detects the i...
(184)
3. The address space as_fault() routine compares the address of the fault with the addresses mapp...
(184)
4. The segment driver allocates and maps page of memory by calling into the HAT layer and then co...
(185)
5. The segment driver then reads the page in from the backing store by calling the getpage() func...
(185)
6. The backing store for this segment is the swap device, so the swap device getpage() function i...
(185)
5.4 Memory Segments
(185)
Figure 5.9 Segment Interface
(186)
5.4.1 The vnode Segment: seg_vn
(189)
5.4.1.1 Memory Mapped Files
(189)
Figure 5.10 The seg_vn Segment Driver Vnode Relationship
(191)
5.4.1.2 Shared Mapped Files
(192)
Figure 5.11 Shared Mapped Files
(193)
5.4.2 Copy-on-Write
(194)
5.4.3 Page Protection and Advice
(194)
5.5 Anonymous Memory
(195)
Figure 5.12 Anonymous Memory Data Structures
(196)
5.5.1 The Anonymous Memory Layer
(197)
5.5.2 The swapfs Layer
(198)
5.5.2.1 Swap Allocation
(199)
5.5.2.2 swapfs Implementation
(201)
Figure 5.13 Anon Slot Initialized to Virtual Swap Before Page-out
(202)
Figure 5.14 Physical Swap After a Page-out Occurs
(203)
5.5.3 Anonymous Memory Accounting
(203)
Figure 5.15 Swap Allocation States
(205)
5.6 Virtual Memory Watchpoints
(206)
Figure 5.16 Watchpoint Data Structures
(209)
5.7 Global Page Management
(209)
5.7.1 Pages—The Basic Unit of Solaris Memory
(209)
Figure 5.17 The Page Structure
(210)
5.7.2 The Page Hash List
(210)
Figure 5.18 Locating Pages by Their Vnode/Offset Identity
(211)
1. It calculates the slot in the page_hash array containing a list of potential pages by using th...
(211)
2. It uses the PAGE_HASH_SEARCH macro, shown below, to search the list referenced by the slot for...
(211)
5.7.3 MMU-Specific Page Structures
(211)
Figure 5.19 Machine-Specific Page Structures: sun4u Example
(212)
5.7.4 Physical Page Lists
(212)
Figure 5.20 Contiguous Physical Memory Segments
(213)
5.7.4.1 Free List and Cache List
(213)
5.7.5 The Page-Level Interfaces
(214)
5.7.6 The Page Throttle
(215)
5.7.7 Page Sizes
(215)
5.7.8 Page Coloring
(216)
Figure 5.21 Physical Page Mapping into a 64-Kbyte Physical Cache
(217)
5.8 The Page Scanner
(220)
5.8.1 Page Scanner Operation
(221)
Figure 5.22 Two-Handed Clock Algorithm
(222)
5.8.2 Page-out Algorithm and Parameters
(222)
5.8.2.1 Scan Rate Parameters (Assuming No Priority Paging)
(222)
Figure 5.23 Page Scanner Rate, Interpolated by Number of Free Pages
(223)
5.8.2.2 Not Recently Used Time
(224)
5.8.3 Shared Library Optimizations
(225)
5.8.4 The Priority Paging Algorithm
(225)
Figure 5.24 Scan Rate Interpolation with the Priority Paging Algorithm
(227)
5.8.4.1 Page Scanner CPU Utilization Clamp
(227)
5.8.4.2 Parameters That Limit Pages Paged Out
(228)
5.8.4.3 Summary of Page Scanner Parameters
(228)
5.8.5 Page Scanner Implementation
(229)
Figure 5.25 Page Scanner Architecture
(230)
5.8.6 The Memory Scheduler
(231)
5.8.6.1 Soft Swapping
(231)
5.8.6.2 Hard Swapping
(232)
5.9 The Hardware Address Translation Layer
(232)
Figure 5.26 Role of the HAT Layer in Virtual-to-Physical Translation
(232)
5.9.1 Virtual Memory Contexts and Address Spaces
(234)
5.9.1.1 Hardware Translation Acceleration
(235)
5.9.2 The UltraSPARC-I and -II HAT
(235)
Figure 5.27 UltraSPARC-I and -II MMUs
(236)
Figure 5.28 Virtual-to-Physical Translation
(237)
Figure 5.29 UltraSPARC-I and -II Translation Table Entry (TTE)
(238)
Figure 5.30 Relationship of TLBs, TSBs, and TTEs
(239)
1. The MMU first looks in the TLB for a valid TTE for the requested virtual address.
(239)
2. If a valid TTE is not found, then the MMU automatically generates a pointer for the location o...
(239)
3. The trap handler reads the hardware-constructed pointer, retrieves the entry from the TSB, and...
(239)
4. If the TTE is not found in the TSB, then the TLB miss handler jumps to a more complex, but slo...
(239)
5.9.3 Address Space Identifiers
(240)
5.9.3.1 UltraSPARC-I and II Watchpoint Implementation
(241)
5.9.3.2 UltraSPARC-I and -II Protection Modes
(241)
5.9.3.3 UltraSPARC-I and -II MMU-Generated Traps
(242)
5.9.4 Large Pages
(242)
5.9.4.1 TLB Performance and Large Pages
(243)
5.9.4.2 Solaris Support for Large Pages
(244)
Kernel Memory
(247)
6.1 Kernel Virtual Memory Layout
(247)
6.1.1 Kernel Address Space
(248)
Figure 6.1 Solaris 7 64-Bit Kernel Virtual Address Space
(249)
6.1.2 The Kernel Text and Data Segments
(250)
1. The time spent in TLB miss handlers for kernel code was reduced to almost zero.
(250)
2. The number of TLB entries used by the kernel was dramatically reduced, leaving more TLB entrie...
(250)
6.1.3 Virtual Memory Data Structures
(250)
6.1.4 The SPARC V8 and V9 Kernel Nucleus
(251)
6.1.5 Loadable Kernel Module Text and Data
(251)
6.1.6 The Kernel Address Space and Segments
(253)
Figure 6.2 Kernel Address Space
(253)
6.2 Kernel Memory Allocation
(254)
Figure 6.3 Different Levels of Memory Allocation
(255)
6.2.1 The Kernel Map
(255)
6.2.2 The Resource Map Allocator
(256)
6.2.3 The Kernel Memory Segment Driver
(256)
6.2.4 The Kernel Memory Slab Allocator
(259)
6.2.4.1 Slab Allocator Overview
(259)
Figure 6.4 Objects, Caches, Slabs, and Pages of Memory
(261)
6.2.4.2 Object Caching
(262)
6.2.4.3 General-Purpose Allocations
(265)
6.2.4.4 Slab Allocator Implementation
(265)
Figure 6.5 Slab Allocator Internal Implementation
(266)
6.2.4.5 The CPU Layer
(267)
6.2.4.6 The Depot Layer
(267)
6.2.4.7 The Global (Slab) Layer
(268)
6.2.4.8 Slab Cache Parameters
(269)
6.2.4.9 Slab Allocator Statistics
(271)
6.2.4.10 Slab Allocator Tracing
(273)
Memory Monitoring
(277)
7.1 A Quick Introduction to Memory Monitoring
(277)
7.1.1 Total Physical Memory
(278)
7.1.2 Kernel Memory
(278)
7.1.3 Free Memory
(278)
7.1.4 File System Caching Memory
(278)
7.1.5 Memory Shortage Detection
(279)
7.1.6 Swap Space
(280)
7.1.6.1 Virtual Swap Space
(280)
7.1.6.2 Physical Swap Space
(280)
7.2 Memory Monitoring Tools
(281)
7.3 The vmstat Command
(282)
7.3.1 Free Memory
(283)
7.3.2 Swap Space
(283)
7.3.3 Paging Counters
(284)
7.3.4 Process Memory Usage, ps, and the pmap Command
(284)
Figure 7.1 Process Private and Shared Mappings (/bin/sh Example)
(286)
7.4 MemTool: Unbundled Memory Tools
(287)
7.4.1 MemTool Utilities
(288)
7.4.2 Command-Line Tools
(288)
7.4.2.1 System Memory Summary: prtmem
(288)
7.4.2.2 File System Cache Memory: memps -m
(289)
7.4.2.3 The prtswap Utility
(290)
7.4.3 The MemTool GUI
(290)
7.4.3.1 File System Cache Memory
(291)
Figure 7.2 MemTool GUI: File System Cache Memory
(291)
7.4.3.2 Process Memory
(292)
Figure 7.3 MemTool GUI: Process Memory
(293)
7.4.3.3 Process Matrix
(294)
Figure 7.4 MemTool GUI: Process/File Matrix
(295)
7.5 Other Memory Tools
(295)
7.5.1 The Workspace Monitor Utility: WSM
(295)
7.5.2 An Extended vmstat Command: memstat
(296)
Part Three
(301)
Threads, Processes, and IPC
(301)
• The Solaris Multi-threaded Process Architecture
(301)
• The Solaris Kernel Dispatcher
(301)
• Interprocess Communication
(301)
#
(301)
The Solaris Multithreaded Process Architecture
(303)
8.1 Introduction to Solaris Processes
(303)
8.1.1 Architecture of a Process
(304)
Figure 8.1 Process Execution Environment
(305)
Figure 8.2 The Multithreaded Process Model
(308)
8.1.2 Process Image
(309)
Figure 8.3 ELF Object Views
(310)
Figure 8.4 Conceptual View of a Process
(311)
8.2 Process Structures
(311)
8.2.1 The Process Structure
(311)
Figure 8.5 The Process Structure and Associated Data Structures
(312)
Figure 8.6 Process Virtual Address Space
(313)
Figure 8.7 Process State Diagram
(317)
Figure 8.8 Process Lineage Pointers
(319)
Figure 8.9 PID Structure
(320)
8.2.2 The User Area
(323)
Figure 8.10 Process Open File Support Structures
(326)
8.2.3 The Lightweight Process (LWP)
(327)
8.2.4 The Kernel Thread (kthread)
(329)
Figure 8.11 The Process, LWP, and Kernel Thread Structure Linkage
(332)
8.3 The Kernel Process Table
(332)
8.3.1 Process Limits
(333)
8.3.2 LWP Limits
(335)
8.4 Process Creation
(335)
Figure 8.12 Process Creation
(336)
Figure 8.13 exec Flow
(341)
Figure 8.14 exec Flow to Object-Specific Routine
(342)
Figure 8.15 Initial Process Stack Frame
(343)
8.5 Process Termination
(344)
8.5.1 The LWP/kthread Model
(346)
1. The kernel loops through the list of LWP/kthreads in the process, setting the t_astflag in the...
(346)
2. Inside the trap handler, which is entered as a result of the cross-call, the kernel tests the ...
(346)
3. The trap handler tests the process HOLDFORK flag and if it is set in p_flags (which it will be...
(346)
4. During an exit, with EXITLWPS set in p_flags, the lwp_exit() function is called to terminate t...
(346)
8.5.2 Deathrow
(347)
8.6 Procfs — The Process File System
(348)
8.6.1 Procfs Implementation
(351)
Figure 8.16 procfs Kernel Process Directory Entries
(352)
Figure 8.17 procfs Directory Hierarchy
(353)
Figure 8.18 procfs Data Structures
(354)
Figure 8.19 procfs File Open
(355)
Figure 8.20 procfs Interface Layers
(357)
8.6.2 Process Resource Usage
(360)
8.6.3 Microstate Accounting
(362)
8.7 Signals
(366)
Figure 8.21 Signal Representation in k_sigset_t Data Type
(372)
8.7.1 Signal Implementation
(372)
Figure 8.22 Signal-Related Structures
(374)
Figure 8.23 High-Level Signal Flow
(381)
8.7.1.1 Synchronous Signals
(381)
8.7.1.2 Asynchronous Signals
(382)
8.7.2 SIGWAITING: A Special Signal
(384)
8.8 Sessions and Process Groups
(384)
Figure 8.24 Process Group Links
(386)
Figure 8.25 Process and Session Structure Links
(388)
The Solaris Kernel Dispatcher
(391)
9.1 Overview
(392)
Figure 9.1 Global Priority Scheme and Scheduling Classes
(393)
9.1.1 Scheduling Classes
(394)
Figure 9.2 Solaris Scheduling Classes and Priorities
(396)
Figure 9.3 Scheduling Class Data Structures
(404)
9.1.2 Dispatch Tables
(404)
9.2 The Kernel Dispatcher
(410)
Figure 9.4 tsproc Structure Lists
(413)
9.2.1 Dispatch Queues
(413)
Figure 9.5 Solaris Dispatch Queues
(416)
9.2.2 Thread Priorities
(417)
Figure 9.6 Setting RT Priorities
(419)
1. After the creation and initialization of the LWP/kthread (fork()), a fork return function is c...
(420)
2. It creates a new ts_cpupri value that is based on the ts_tqexp value in the indexed table loca...
(420)
3. The ts_timeleft value is set on the basis of the allotted time quantum for the priority level.
(420)
4. ts_dispwait is set to 0. A user mode priority is calculated, setting the ts_umdpri value, whic...
(421)
5. t_pri is determined by the ts_globpri (global priority) value in the indexed table location (5).
(421)
Figure 9.7 Setting a Thread’s Priority Following fork()
(421)
if (kthread->tsproc.ts_dispwait > ts_dptbl[ts_umdpri].ts_maxwait)
(425)
Figure 9.8 Priority Adjustment with ts_slpret
(426)
9.2.3 Dispatcher Functions
(430)
9.2.3.1 Dispatcher Queue Insertion
(430)
Figure 9.9 Kernel Thread Queue Insertion
(431)
9.2.3.2 Thread Preemption
(436)
Figure 9.10 Thread Preemption Flow
(442)
9.2.3.3 The Heart of the Dispatcher: swtch()
(442)
9.3 The Kernel Sleep/Wakeup Facility
(446)
9.3.1 Condition Variables
(447)
Figure 9.11 Condition Variable
(447)
Figure 9.12 Sleep/Wake Flow Diagram
(449)
9.3.2 Sleep Queues
(449)
Figure 9.13 Solaris 2.5.1 and Solaris 2.6 Sleep Queues
(450)
Figure 9.14 Solaris 7 Sleep Queues
(451)
9.3.3 The Sleep Process
(452)
Figure 9.15 Setting a Thread’s Priority in ts_sleep()
(454)
9.3.4 The Wakeup Mechanism
(455)
9.4 Scheduler Activations
(457)
Figure 9.16 Two-Level Threads Model
(457)
9.4.1 User Thread Activation
(458)
9.4.2 LWP Pool Activation
(459)
9.5 Kernel Processor Control and Processor Sets
(461)
Figure 9.17 CPU Structure and Major Links
(463)
9.5.1 Processor Control
(464)
9.5.2 Processor Sets
(467)
Figure 9.18 Processor Partition (Processor Set) Structures and Links
(469)
Interprocess Communication
(471)
10.1 Generic System V IPC Support
(472)
10.1.1 Module Creation
(472)
10.1.2 Resource Maps
(475)
10.2 System V Shared Memory
(475)
10.2.1 Shared Memory Kernel Implementation
(480)
10.2.2 Intimate Shared Memory (ISM)
(482)
Figure 10.1 Shared Memory: ISM versus Non-ISM
(483)
10.3 System V Semaphores
(486)
10.3.1 Semaphore Kernel Resources
(487)
10.3.2 Kernel Implementation of System V Semaphores
(490)
10.3.3 Semaphore Operations Inside Solaris
(492)
10.4 System V Message Queues
(493)
10.4.1 Kernel Resources for Message Queues
(494)
Figure 10.2 System V Message Queue Structures
(498)
10.4.2 Kernel Implementation of Message Queues
(499)
10.5 POSIX IPC
(501)
Figure 10.3 Process Address Space with mmap(2)
(503)
10.5.1 POSIX Shared Memory
(503)
10.5.2 POSIX Semaphores
(504)
Figure 10.4 POSIX Named Semaphores
(505)
10.5.3 POSIX Message Queues
(507)
Figure 10.5 POSIX Message Queue Structures
(508)
10.6 Solaris Doors
(511)
10.6.1 Doors Overview
(512)
Figure 10.6 Solaris Doors
(512)
10.6.2 Doors Implementation
(513)
Figure 10.7 Solaris Doors Structures
(513)
Figure 10.8 door_call() Flow with Shuttle Switching
(518)
Part Four
(521)
Files and File Systems
(521)
• Files and File I/O
(521)
• File System Overview
(521)
• File System Framework
(521)
• The UFS File System
(521)
• File System Caching
(521)
#
(521)
Solaris Files and File I/O
(523)
11.1 Files in Solaris
(523)
Figure 11.1 File-Related Structures
(526)
11.1.1 Kernel File Structures
(528)
11.2 File Application Programming Interfaces (APIs)
(530)
Figure 11.2 Kernel File I/O Interface Relationships
(531)
11.2.1 Standard I/O (stdio)
(531)
11.2.2 C Runtime File Handles
(534)
11.2.3 Standard I/O Buffer Sizes
(535)
11.3 System File I/O
(535)
11.3.1 File I/O System Calls
(535)
11.3.1.1 The open() and close() System Calls
(536)
11.3.1.2 The read() and write() System Calls
(536)
11.3.2 File Open Modes and File Descriptor Flags
(537)
11.3.2.1 Nonblocking I/O
(538)
11.3.2.2 Exclusive open
(538)
11.3.2.3 File Append Flag
(539)
11.3.2.4 Data Integrity and Synchronization Flags
(540)
11.3.2.5 Other File Flags
(541)
11.3.2.6 The dup System Call
(541)
11.3.2.7 The pread and pwrite System Calls
(543)
11.3.2.8 The readv and writev System Calls
(544)
11.4 Asynchronous I/O
(544)
11.4.1 File System Asynchronous I/O
(545)
11.4.2 Kernel Asynchronous I/O
(546)
11.5 Memory Mapped File I/O
(551)
Figure 11.3 File Read with read(2)
(551)
Figure 11.4 Memory Mapped File I/O
(552)
11.5.1 Mapping Options
(553)
11.5.1.1 Mapping Files into Two or More Processes
(554)
11.5.1.2 Permission Options
(554)
11.5.2 Providing Advice to the Memory System
(555)
11.5.2.1 The MADV_DONTNEED Flag
(555)
11.5.2.2 The MADV_WILLNEED Flag
(557)
11.5.2.3 The MADV_SEQUENTIAL Flag
(557)
11.5.2.4 The MADV_RANDOM Flag
(558)
11.6 64-bit Files in Solaris
(559)
11.6.1 64-bit Device Support in Solaris 2.0
(560)
11.6.2 64-bit File Application Programming Interfaces in Solaris 2.5.1
(560)
11.6.3 Solaris 2.6: The Large-File OS
(561)
11.6.3.1 The Large-File Summit
(562)
11.6.3.2 Large-File Compilation Environments
(562)
11.6.4 File System Support for Large Files
(564)
File System Overview
(565)
12.1 Why Have a File System?
(565)
12.2 Support for Multiple File System Types
(566)
12.3 Regular (On-Disk) File Systems
(567)
12.3.1 Allocation and Storage Strategy
(568)
12.3.1.1 Block-Based Allocation
(568)
12.3.1.2 Extent-Based Allocation
(569)
Figure 12.1 Block- and Extent-Based Allocation
(569)
12.3.1.3 Extentlike Performance from Block Clustering
(570)
12.3.2 File System Capacity
(571)
12.3.3 Variable Block Size Support
(572)
12.3.4 Access Control Lists
(573)
Figure 12.2 Traditional File Access Scheme
(573)
12.3.5 File Systems Logging (Journaling)
(574)
12.3.5.1 Metadata Logging
(576)
Figure 12.3 File System Metadata Logging
(577)
12.3.5.2 Data and Metadata Logging
(577)
12.3.5.3 Log-Structured File Systems
(578)
12.3.6 Expanding and Shrinking File Systems
(578)
12.3.7 Direct I/O
(579)
12.3.7.1 Sparse Files
(580)
12.3.7.2 Integrated Volume Management
(580)
12.3.7.3 Summary of File System Features
(580)
File System Framework
(583)
13.1 Solaris File System Framework
(583)
13.1.1 Unified File System Interface
(584)
Figure 13.1 Solaris File System Framework
(584)
13.1.2 File System Framework Facilities
(585)
13.2 The vnode
(585)
Figure 13.2 The Vnode Object
(586)
13.2.1 vnode Types
(587)
13.2.2 Vnode Methods
(588)
13.2.3 vnode Reference Count
(590)
13.2.4 Interfaces for Paging vnode Cache
(590)
13.2.5 Block I/O on vnode Pages
(592)
13.3 The vfs Object
(592)
Figure 13.3 The vfs Object
(593)
13.3.1 The File System Switch Table
(594)
13.3.2 The Mounted vfs List
(596)
Figure 13.4 The Mounted vfs List
(597)
13.4 File System I/O
(600)
Figure 13.5 The read()/write() vs. mmap() Methods for File I/O
(600)
13.4.1 Memory Mapped I/O
(601)
13.4.2 read() and write() System Calls
(602)
13.4.3 The seg_map Segment
(603)
13.5 Path-Name Management
(607)
13.5.1 The lookupname() and lookupppn() Methods
(608)
13.5.2 The vop_lookup() Method
(608)
13.5.3 The vop_readdir() Method
(608)
13.5.4 Path-Name Traversal Functions
(610)
13.5.5 The Directory Name Lookup Cache (DNLC)
(610)
13.5.5.1 DNLC Operation
(611)
Figure 13.6 Solaris 2.3 Name Cache
(612)
13.5.5.2 The New Solaris DLNC Algorithm
(613)
Figure 13.7 Solaris 2.4 DNLC
(614)
13.5.5.3 DNLC Support Functions
(614)
13.5.6 File System Modules
(615)
13.5.7 Mounting and Unmounting
(615)
13.6 The File System Flush Daemon
(618)
Figure 0.1 Default File Allocation in 16-Mbyte Groups
(628)
The Unix File System
(619)
14.1 UFS Development History
(619)
14.2 UFS On-Disk Format
(621)
14.2.1 UFS Inodes
(621)
14.2.2 UFS Directories
(621)
Figure 14.1 UFS Directory Entry Format
(622)
Figure 14.2 Unix Directory Hierarchy
(622)
14.2.3 UFS Hard Links
(623)
Figure 14.3 UFS Links
(623)
14.2.4 UFS Layout
(623)
Figure 14.4 UFS Layout
(624)
14.2.4.1 The Boot Block
(624)
14.2.4.2 The Superblock
(625)
14.2.5 Disk Block Location
(626)
Figure 14.5 The UFS inode Format
(626)
14.2.6 UFS Block Allocation
(627)
14.2.7 UFS Allocation and Parameters
(628)
14.3 UFS Implementation
(632)
Figure 14.6 The UFS File System
(633)
14.3.1 Mapping of Files to Disk Blocks
(634)
14.3.1.1 Reading and Writing UFS Blocks
(634)
14.3.1.2 Buffering Block Metadata
(635)
14.3.2 Methods to Read and Write UFS Files
(635)
14.3.2.1 ufs_read()
(635)
Figure 14.7 ufs_read()
(636)
14.3.2.2 ufs_write()
(637)
Figure 14.8 ufs_write()
(638)
14.3.3 In-Core UFS Inodes
(639)
Figure 14.9 The UFS inode
(639)
14.3.3.1 Freeing inodes—the Inode Idle List
(640)
14.3.3.2 Caching Inodes—the Inode Idle List
(640)
Figure 14.10 UFS Idle Queue
(641)
14.3.4 UFS Directories and Path Names
(642)
14.3.4.1 ufs_lookup()
(642)
14.3.4.2 ufs_readdir()
(642)
Solaris File System Cache
(643)
15.1 Introduction to File Caching
(643)
Figure 15.1 The Old-Style Buffer Cache
(644)
15.1.1 Solaris Page Cache
(644)
Figure 15.2 The Solaris Page Cache
(645)
15.1.2 Block Buffer Cache
(646)
15.2 Page Cache and Virtual Memory System
(647)
15.2.1 File System Paging Optimizations
(649)
15.3 Is All That Paging Bad for My System?
(650)
15.4 Paging Parameters That Affect File System Performance
(653)
Figure 15.3 VM Parameters That Affect File Systems
(655)
15.5 Bypassing the Page Cache with Direct I/O
(656)
15.5.1 UFS Direct I/O
(656)
15.5.2 Direct I/O with Veritas VxFS
(657)
15.6 Directory Name Cache
(657)
15.7 Inode Caches
(659)
15.7.1 UFS Inode Cache Size
(659)
Figure 15.4 In-Memory Inodes (Referred to as the “Inode Cache”)
(660)
15.7.2 VxFS Inode Cache
(662)
Kernel Tunables, Switches, and Limits
(663)
A.1 Setting Kernel Parameters
(663)
A.2 System V IPC - Shared Memory Parameters
(664)
A.3 Virtual Memory Parameters
(666)
A.4 File System Parameters
(668)
A.5 Miscelaneous Parameters
(670)
A.6 Process and Dispatcher (Scheduler) Parameters
(672)
A.7 STREAMS Parameters
(672)
Kernel Virtual Address Maps
(675)
FigureB.1 Kernel Address Space and Segments
(675)
FigureB.2 Solaris 7 sun4u 64-Bit Kernel Address Space
(678)
FigureB.3 Solaris 7 sun4u 32-Bit Kernel Address Space
(679)
FigureB.4 Solaris 7 sun4d 32-Bit Kernel Address Space
(680)
FigureB.5 Solaris 7 sun4m 32-Bit Kernel Address Space
(681)
FigureB.6 Solaris 7 x86 32-Bit Kernel Address Space
(682)
A Sample Procfs utility
(683)
Bibliography
(689)
1. Bach, M. J., The Design of the UNIX Operating System, Prentice Hall, 1986.
(689)
2. Bonwick, Jeff, The Slab Allocator: An Object-Caching Kernel Memory Allocator. Sun Microsystems...
(689)
3. Bourne, S. R., The UNIX System, Addison-Wesley, 1983.
(689)
4. Catanzaro, B., Multiprocessor System Architectures, Prentice Hall, 1994.
(689)
5. Cockcroft, A., Sun Performance and Tuning — Java and the Internet, 2nd Edition, Sun Microsyste...
(689)
6. Cockcroft, A., CPU Time Measurement Errors, Computer Measurement Group Paper 2038, 1998.
(689)
7. Cypress Semiconductor, The CY7C601 SPARC RISC Users Guide, Ross Technology, 1990.
(689)
8. Eykholt, J. R., et al., Beyond Multiprocessing — Multithreading the SunOS Kernel, Summer ‘92 U...
(689)
9. Gingell, R. A., Moran, J. P., Shannon, W. A., Virtual Memory Architecture in SunOS, Proceeding...
(689)
10. Goodheart, B., Cox, J., The Magic Garden Explained — The Internals of UNIX System V Release 4...
(689)
11. Hwang, K., Xu, Z., Scalable Parallel Computing, McGraw-Hill, 1998.
(689)
12. Intel Corp, The Intel Architecture Software Programmers Manual, Volume 1, 2 and 3, Intel Part...
(689)
13. Kleiman, S. R., Vnodes: An Architecture for Multiple File System Types in Sun UNIX, Proceedin...
(689)
14. Kleiman, S., Shah, D., Smaalders, B., Programming with Threads, Prentice Hall, SunSoft Press,...
(689)
15. Leffler, S. J., McKusick, M. K., Karels, M. J., Quarterman, J.S., The Design and Implementati...
(690)
16. Lewis, B., Berg, D. J., Threads Primer. A Guide to Multithreaded Programming, SunSoft Press/P...
(690)
17. Lewis, B., Berg, D. J., Multithreaded Programming with Pthreads. Sun Microsystems Press/Prent...
(690)
18. McKusick, M. K., Bostic, K., Karels, M. J., Quarterman, J. S., The Design and Implementation ...
(690)
19. McKusick, M. K., Joy, W., Leffler, S., Fabry, R., A Fast File System for UNIX, ACM Transactio...
(690)
20. Moran, J. P., SunOS Virtual Memory Implementation, Proceedings of 1988 EUUG Conference.
(690)
21. Pfister, G., In Search of Clusters, Prentice Hall, 1998.
(690)
22. Rosenthal, David S., Evolving the Vnode Interface, Proceedings of Summer 1990 USENIX Conference.
(690)
23. Schimmel, C., UNIX Systems for Modern Architectures, Addison-Wesley, 1994.
(690)
24. Seltzer, M., Bostic, K., McKusick, M., Staelin, C. An Implementation of a Log-Structured File...
(690)
25. Shah, D. K., Zolnowsky, J., Evolving the UNIX Signal Model for Lightweight Threads, Sun Propr...
(690)
26. Snyder, P., tmpfs: A Virtual Memory File System, Sun Microsystems White paper.
(690)
27. SPARC International, System V Application Binary Interface — SPARC Version 9 Processor Supple...
(690)
28. Sun Microsystems, Writing Device Drivers - Part Number 805-3024-10, Sun Microsystems, 1998
(690)
29. Sun Microsystems, STREAMS Programming Guide - Part Number 805-4038-10, Sun Microsystems, 1998
(690)
30. Sun Microsystems, UltraSPARC Microprocessor Users Manual - Part Number 802-7220, Sun Microsys...
(690)
31. Stevens, W. R., Advanced Programming in the UNIX Environment, Addison-Wesley, 1992.
(690)
32. Stevens, W. R., UNIX Network Programming, Volume 2. Interprocess Communication. 2nd Edition. ...
(690)
33. Tanenbaum, A. Operating Systems: Design and Implementation. Prentice Hall, 1987.
(690)
34. Tucker, Andy, Scheduler Activations, PSARC 1996/021, Sun Internal Proprietary Document. March...
(691)
35. Tucker, Andy, Scheduler Activations in Solaris, SunSoft TechConf ‘96. Sun Proprietary/Confide...
(691)
36. Talluri, M., Use of Superpages and subblocking in the address translation hierarchy, Thesis f...
(691)
37. Tucker, Andy, Private Communication.
(691)
38. UNIX Software Operation, System V Application Binary Interface – UNIX System V. Prentice Hall...
(691)
39. Vahalia, U., UNIX Internals — The New Frontiers, Prentice Hall, 1996.
(691)
40. Van der Linden, P., Expert C Programming — Deep C Secrets, SunSoft Press/Prentice Hall, 1994.
(691)
41. Weaver, D., Germond, T., (editors), The SPARC Architecture Manual, Version 9, Prentice Hall, ...
(691)
42. Wong, B., Configuration and Capacity Planning on Sun Solaris Servers, Sun Microsystems Press/...
(691)
43. Zaks, Rodney, Programming the Z80, Sybex Computer Books, 1982.
(691)
Index
(693)
Symbols
(693)
Numerics
(693)
A
(693)
B
(694)
C
(694)
D
(695)
E
(696)
F
(696)
G
(697)
H
(697)
I
(698)
J
(698)
K
(698)
L
(700)
M
(701)
N
(702)
O
(702)
P
(702)
Q
(705)
R
(705)
S
(705)
T
(709)
U
(710)
V
(711)
W
(712)
X
(712)
Z
(712)