mirror of
				https://github.com/notepad-plus-plus/notepad-plus-plus.git
				synced 2025-10-31 03:24:04 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			205 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			205 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <?xml version="1.0"?>
 | |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 | |
|         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml">
 | |
|         <head>
 | |
|                 <meta name="generator" content="HTML Tidy, see www.w3.org" />
 | |
|                 <meta name="generator" content="SciTE" />
 | |
|                 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
 | |
|                 <meta name="viewport" content="width=device-width, initial-scale=1" />
 | |
|                 <title>
 | |
|                         Scintilla Style Metadata
 | |
|                 </title>
 | |
|                 <style type="text/css">
 | |
| <!--
 | |
| /*<![CDATA[*/
 | |
| 	CODE { font-weight: bold; font-family: Menlo,Consolas,Bitstream Vera Sans Mono,Courier New,monospace; }
 | |
| /*]]>*/
 | |
| -->
 | |
|                 </style>
 | |
|         </head>
 | |
| <body bgcolor="#FFFFFF" text="#000000">
 | |
|         <table bgcolor="#000000" width="100%" cellspacing="0" cellpadding="0" border="0">
 | |
|                 <tr>
 | |
|                 <td>
 | |
|                 <img src="SciTEIco.png" border="3" height="64" width="64" alt="Scintilla icon" />
 | |
|                 </td>
 | |
|                 <td>
 | |
|                 <a href="index.html" style="color:white;text-decoration:none"><font size="5">Scintilla</font></a>
 | |
|                 </td>
 | |
|                 </tr>
 | |
|         </table>
 | |
|         <h2>
 | |
|                 Language Types
 | |
|         </h2>
 | |
|         <p>
 | |
|                 Scintilla contains lexers for various types of languages:
 | |
|                 <ul>
 | |
|                         <li>Programming languages like C++, Java, and Python.</li>
 | |
|                         <li>Assembler languages are low-level programming languages which may additionally include instructions and registers.</li>
 | |
|                         <li>Markup languages like HTML, TeX, and Markdown.</li>
 | |
|                         <li>Data languages like EDIFACT and YAML.</li>
 | |
|                 </ul>
 | |
|         </p>
 | |
|         <p>
 | |
|                 Some languages can be used in different ways. JavaScript is a programming language but also
 | |
|                 the basis of JSON data files. Similarly,
 | |
|                 <a href="https://en.wikipedia.org/wiki/S-expression">Lisp s expressions</a> can be used for both source code and data.
 | |
|         </p>
 | |
|         <p>
 | |
|                 Each language type has common elements such as identifiers in programming languages.
 | |
|                 These common elements should be identified so that languages can be displayed with common
 | |
|                 styles for these elements.
 | |
|                 Style tags are used for this purpose in Scintilla.
 | |
|         </p>
 | |
|         <h2>
 | |
|                 Style Tags
 | |
|         </h2>
 | |
|         <p>
 | |
|                 Every style has a list of tags where a tag is a lower-case word containing only the common ASCII letters 'a'-'z'
 | |
|                 such as "comment" or "operator".
 | |
|         </p>
 | |
|         <p>
 | |
|                 Tags are ordered from most important to least important.
 | |
|         </p>
 | |
|         <p>
 | |
|                 While applications may assign visual attributes for tag lists in many different ways, one reasonable technique is to
 | |
|                 apply tag-specific attributes in reverse order so that earlier and more important tags override less important tags.
 | |
|                 For example, the tag list <code>"error comment documentation keyword"</code> with
 | |
|                 a set of tag attributes <br />
 | |
|                 <code>{ comment=fore:green,back:very-light-green,font:Serif documentation=fore:light-green error=strikethrough keyword=bold }</code><br />
 | |
|                 could be rendered as <br />
 | |
|                 <code>bold,fore:light-green,back:very-light-green,font:Serif,strikethrough</code>.
 | |
|         </p>
 | |
|         <p>
 | |
|                 Alternative renderings could check for multi-tag combinations like
 | |
|                 <code>{ comment.documentation=fore:light-green comment.line=dark-green comment=green }.</code>
 | |
|         </p>
 | |
|         <p>
 | |
|                 Commonly, a tag list will contain an optional embedded language; optional statuses; a base type; and a set of type modifiers:<br />
 | |
|                 <code>embedded-language? status* base-type modifiers*</code>
 | |
|         </p>
 | |
|         <h3>Embedded language</h3>
 | |
|         <p>
 | |
|                 The embedded language may be a source <code>(client | server)</code> followed by a language name
 | |
|                 <code>(javascript | php | python | basic)</code>.
 | |
|                 This may be extended in the future with other programming languages and style-definition languages like CSS.
 | |
|         </p>
 | |
|         <h3>Status</h3>
 | |
|         <p>
 | |
|                 The statuses may be <code>(error | unused | predefined | inactive)</code>.<br />
 | |
|                 The <code>error</code> status is used for lexical statuses that indicate errors in the source code such as unterminated quoted strings.<br />
 | |
|                 The <code>unused</code> status may indicate a gap in the lexical states, possibly because an old lexical class is no longer used or an upcoming lexical class may fill that position.<br />
 | |
|                 The <code>predefined</code> status indicates a style in the range 32.39 that is used for non-lexical purposes in Scintilla.<br />
 | |
|                 The <code>inactive</code> status is used for text that is not currently interpreted such as C++ code that is contained within a '#if 0' preprocessor block.
 | |
|         </p>
 | |
|         <h3>Basic Types</h3>
 | |
|         <p>
 | |
|                 The basic types for programming languages are <code>(default | operator | keyword | identifier | literal | comment | preprocessor | label)</code>.<br />
 | |
|                 The <code>default</code> type is commonly used for spaces and tabs between tokens although it may cover other characters in some languages.
 | |
|         </p>
 | |
|         <p>
 | |
|                 Assembler languages add <code>(instruction | register)</code>. to the basic types from programming languages.<br />
 | |
|         </p>
 | |
|         <p>
 | |
|                 The basic types for markup languages are <code>(default | tag | attribute | comment | preprocessor)</code>.<br />
 | |
|         </p>
 | |
|         <p>
 | |
|                 The basic types for data languages are <code>(default | key | data | comment)</code>.<br />
 | |
|         </p>
 | |
|         <h3>Comments</h3>
 | |
|         <p>
 | |
|                 Programming languages may differentiate between line and stream comments and treat documentation comments as distinct from other comments.
 | |
|                 Documentation comments may be marked up with documentation keywords.<br />
 | |
|                 The additional attributes commonly used are <code>(line | documentation | keyword | taskmarker)</code>.
 | |
|         </p>
 | |
|         <h3>Literals</h3>
 | |
|         <p>
 | |
|                 Programming and assembler languages contain a rich set of literals including numbers like <code>7</code> and <code>3.89e23</code>; <code>"string\n"</code>; and <code>nullptr</code>
 | |
|                 and differentiating between these is often wanted.<br />
 | |
|                 The common literal types are <code>(numeric | boolean | string | regex | date | time | uuid | nil | compound)</code>.<br />
 | |
|                 Numeric literal types are subdivided into <code>(integer | real)</code>.<br />
 | |
|                 String literal types may add (perhaps multiple) further attributes from <code> (heredoc | character | escapesequence | interpolated | multiline | raw)</code>.<br />
 | |
|         </p>
 | |
|         <p>
 | |
|                 An escape sequence within an interpolated heredoc may thus be <code>literal string heredoc escapesequence</code>.
 | |
|         </p>
 | |
|         <h3>
 | |
|                 List of known tags
 | |
|         </h3>
 | |
|         <table>
 | |
|                 <tr><td><code>attribute</code></td><td>Markup attribute</td></tr>
 | |
|                 <tr><td><code>basic</code></td><td>Embedded Basic</td></tr>
 | |
|                 <tr><td><code>boolean</code></td><td>True or false literal</td></tr>
 | |
|                 <tr><td><code>character</code></td><td>Single character literal as opposed to a string literal</td></tr>
 | |
|                 <tr><td><code>client</code></td><td>Script executed on client</td></tr>
 | |
|                 <tr><td><code>comment</code></td><td>The standard comment type in a language: may be stream or line</td></tr>
 | |
|                 <tr><td><code>compound</code></td><td>Literal containing multiple subliterals such as a tuple or complex number</td></tr>
 | |
|                 <tr><td><code>data</code></td><td>A value in a data file</td></tr>
 | |
|                 <tr><td><code>date</code></td><td>Literal representing a data such as '19/November/1975'</td></tr>
 | |
|                 <tr><td><code>default</code></td><td>Starting state commonly also used for white space</td></tr>
 | |
|                 <tr><td><code>documentation</code></td><td>Comment that can be extracted into documentation</td></tr>
 | |
|                 <tr><td><code>error</code></td><td>State indicating an invalid or erroneous element</td></tr>
 | |
|                 <tr><td><code>escapesequence</code></td><td>Parts of a string that are not literal such as '\t' for tab in C</td></tr>
 | |
|                 <tr><td><code>heredoc</code></td><td>Lengthy text literal marked by a word at both ends</td></tr>
 | |
|                 <tr><td><code>identifier</code></td><td>Name that identifies an object or class of object</td></tr>
 | |
|                 <tr><td><code>inactive</code></td><td>Code that is not currently interpreted</td></tr>
 | |
|                 <tr><td><code>instruction</code></td><td>Mnemonic in assembler languages like 'addc'</td></tr>
 | |
|                 <tr><td><code>integer</code></td><td>Numeric literal with no fraction or exponent like '738'</td></tr>
 | |
|                 <tr><td><code>interpolated</code></td><td>String that can contain expressions</td></tr>
 | |
|                 <tr><td><code>javascript</code></td><td>Embedded Javascript</td></tr>
 | |
|                 <tr><td><code>key</code></td><td>Element which allows finding associated data</td></tr>
 | |
|                 <tr><td><code>keyword</code></td><td>Reserved word with special meaning like 'while'</td></tr>
 | |
|                 <tr><td><code>label</code></td><td>Destination for jumps in programming and assembler languages</td></tr>
 | |
|                 <tr><td><code>line</code></td><td>Differentiates between stream comments and line comments in languages that have both</td></tr>
 | |
|                 <tr><td><code>literal</code></td><td>Fixed value in source code</td></tr>
 | |
|                 <tr><td><code>multiline</code></td><td>Differentiates between single line and multiline elements, commonly strings</td></tr>
 | |
|                 <tr><td><code>nil</code></td><td>Literal for the null pointer such as nullptr in C++ or NULL in C</td></tr>
 | |
|                 <tr><td><code>numeric</code></td><td>Literal number like '16'</td></tr>
 | |
|                 <tr><td><code>operator</code></td><td>Punctuation character such as '&' or '['</td></tr>
 | |
|                 <tr><td><code>php</code></td><td>Embedded PHP</td></tr>
 | |
|                 <tr><td><code>predefined</code></td><td>Style in the range 32.39 that is used for non-lexical purposes</td></tr>
 | |
|                 <tr><td><code>preprocessor</code></td><td>Element that is recognized in an early stage of translation</td></tr>
 | |
|                 <tr><td><code>python</code></td><td>Embedded Python</td></tr>
 | |
|                 <tr><td><code>raw</code></td><td>String type that avoids interpretation: may be used for regular expressions in languages without a specific regex type</td></tr>
 | |
|                 <tr><td><code>real</code></td><td>Numeric literal which may have a fraction or exponent like '3.84e-15'</td></tr>
 | |
|                 <tr><td><code>regex</code></td><td>Regular expression literal like '^[a-z]+'</td></tr>
 | |
|                 <tr><td><code>register</code></td><td>CPU register in assembler languages</td></tr>
 | |
|                 <tr><td><code>server</code></td><td>Script executed on server</td></tr>
 | |
|                 <tr><td><code>string</code></td><td>Sequence of characters</td></tr>
 | |
|                 <tr><td><code>tag</code></td><td>Markup tag like '<br />'</td></tr>
 | |
|                 <tr><td><code>taskmarker</code></td><td>Word in comment that marks future work like 'FIXME'</td></tr>
 | |
|                 <tr><td><code>time</code></td><td>Literal representing a time such as '9:34:31'</td></tr>
 | |
|                 <tr><td><code>unused</code></td><td>Style that is not currently used</td></tr>
 | |
|                 <tr><td><code>uuid</code></td><td>Universally unique identifier often used in interface definition files which may look like '{098f2470-bae0-11cd-b579-08002b30bfeb}'</td></tr>
 | |
|         </table>
 | |
|         <h2>
 | |
|                 Extension
 | |
|         </h2>
 | |
|         <p>
 | |
|                 Each element in this scheme may be extended in the future. This may be done by revising this document to provide a common approach to new features.
 | |
|                 Individual lexers may also choose to expose unique language features through new tags.
 | |
|         </p>
 | |
|         <h2>
 | |
|                 Translation
 | |
|         </h2>
 | |
|         <p>
 | |
|                 Tags could be exposed directly in user interfaces or configuration languages.
 | |
|                 However, an application may also translate these to match its naming schema.
 | |
|                 Capitalization and punctuation could be different (like <code>Here-Doc</code> instead of <code>heredoc</code>),
 | |
|                 terminology changed ("constant" instead of "literal"),
 | |
|                 or human language changed from English to Chinese or Spanish.
 | |
|         </p>
 | |
|         <p>
 | |
|                 Starting from a common set of tags makes these modifications tractable.
 | |
|         </p>
 | |
|         <h2>
 | |
|                 Open issues
 | |
|         </h2>
 | |
|         <p>
 | |
|                 The C++ lexer (for example) has inactive states and dynamically allocated substyles.
 | |
|                 These should be exposed through the metadata mechanism but are not currently.
 | |
|         </p>
 | |
| </body>
 | |
| </html>
 |