tag:blogger.com,1999:blog-80768262008-04-22T21:03:34.205+02:00tommy.blogTommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comBlogger116125tag:blogger.com,1999:blog-8076826.post-8410847892062482852008-04-21T15:24:00.001+02:002008-04-22T21:03:34.214+02:00Making your password TextBox more secure<p>If you use a Windows Forms <a href="http://msdn2.microsoft.com/en-us/library/system.windows.forms.textbox.aspx">TextBox</a> to let your users enter a password, you should know it's not very secure: external applications can get the password from the <a href="http://msdn2.microsoft.com/en-us/library/system.windows.forms.textbox.aspx">TextBox</a> by sending it the <a href="http://msdn2.microsoft.com/en-us/library/ms632627(VS.85).aspx">WM_GETTEXT</a> message. There are even applications written specifically to do this. If you want to prevent this, you can use the following control that is derived from <a href="http://msdn2.microsoft.com/en-us/library/system.windows.forms.textbox.aspx">System.Windows.Forms.TextBox</a> and that prevents external applications from getting the password via <a href="http://msdn2.microsoft.com/en-us/library/ms632627(VS.85).aspx">WM_GETTEXT</a>. Just use it instead of the regular <a href="http://msdn2.microsoft.com/en-us/library/system.windows.forms.textbox.aspx">TextBox</a> for password fields.</p><pre>using System;
using System.ComponentModel;
using System.Drawing;
using System.Windows.Forms;
namespace TC.WinForms
{
/// <summary>Represents a text box control for entering passwords.</summary>
[ToolboxItem(true), ToolboxBitmap(typeof(TextBox))]
public class PasswordTextBox : TextBox
{
/// <summary>Initializes a new instance of the <see cref="T:PasswordTextBox" /> class.</summary>
public PasswordTextBox()
{
base.UseSystemPasswordChar = true;
}
bool fAccessText;
/// <summary>Gets or sets the current text in the <see cref="T:TextBox"/>.</summary>
/// <returns>The text displayed in the control.</returns>
public override string Text
{
get
{
fAccessText = true;
try { return base.Text; }
finally { fAccessText = false; }
}
set
{
fAccessText = true;
try { base.Text = value; }
finally { fAccessText = false; }
}
}
/// <summary>Gets the length of text in the control.</summary>
/// <returns>The number of characters contained in the text of the control.</returns>
public override int TextLength
{
get
{
fAccessText = true;
try { return base.TextLength; }
finally { fAccessText = false; }
}
}
/// <summary>Processes Windows message.</summary>
/// <param name="m">The Windows <see cref="T:Message" /> to process.</param>
protected override void WndProc(ref Message m)
{
switch (m.Msg)
{
case WM_GETTEXT:
case WM_GETTEXTLENGTH:
if (!fAccessText)
{
m.Result = IntPtr.Zero;
return;
}
else break;
case EM_SETPASSWORDCHAR: return;
}
base.WndProc(ref m);
}
const int WM_GETTEXT = 0x000D, WM_GETTEXTLENGTH = 0x000E, EM_SETPASSWORDCHAR = 0x00CC;
}
}</pre>
<p><small><b>EDIT</b>: I added code to also ignore <a href="http://msdn2.microsoft.com/en-us/library/bb761653(VS.85).aspx">EM_SETPASSWORDCHAR</a> which can also be used to make the password visible.<br><b>EDIT 2</b> (2008-04-22): I fixed the documentation tags in the code.<br><b>EDIT 3</b> (2008-04-22): I fixed some bugs that denied access to the password from within your program.</small></p>
<div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Windows+Forms" rel="tag">Windows Forms</a>, <a href="http://technorati.com/tags/WinForms" rel="tag">WinForms</a>, <a href="http://technorati.com/tags/password" rel="tag">password</a>, <a href="http://technorati.com/tags/TextBox" rel="tag">TextBox</a>, <a href="http://technorati.com/tags/security" rel="tag">security</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-10201777187465449432008-04-01T08:01:00.001+02:002008-04-01T08:01:22.453+02:00Silverlight Rehab<p>I previously linked to <a href="http://tommycarlier.blogspot.com/2008/02/life-at-microsoft.html">a video made by some Microsoft employees</a>. Now they created another funny one, this time about <a href="http://www.on10.net/blogs/tina/Silverlight-Rehab/">Silverlight addicts</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Microsoft" rel="tag">Microsoft</a>, <a href="http://technorati.com/tags/Channel+10" rel="tag">Channel 10</a>, <a href="http://technorati.com/tags/comedy" rel="tag">comedy</a>, <a href="http://technorati.com/tags/Silverlight" rel="tag">Silverlight</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-57087007700666961082008-03-29T18:57:00.001+01:002008-03-29T18:57:52.876+01:00Opera Widget tip: zooming in<p>I just discovered something cool: you can make your <a href="http://widgets.opera.com/">Opera Widgets</a> larger by zooming in. I just opened my favorite widget (<a href="http://widgets.opera.com/widget/3903/">touchTheSky</a>: a weather forecast widget), and the text was a bit small. Usually, when I'm using Opera and I'm reading a web site with a small font, I press Ctrl and use the mouse wheel to zoom in. I just noticed that I unconsciously did the same with the touchTheSky widget and to my surprise it worked. Cool feature.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Opera" rel="tag">Opera</a>, <a href="http://technorati.com/tags/Widgets" rel="tag">Widgets</a>, <a href="http://technorati.com/tags/Opera+Widgets" rel="tag">Opera Widgets</a>, <a href="http://technorati.com/tags/touchTheSky" rel="tag">touchTheSky</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-18777004961324357162008-02-13T07:53:00.001+01:002008-02-13T07:53:40.778+01:00Life at Microsoft<p>The guys over at <a href="http://on10.net/">Channel 10</a> made an awesome video to show what <a href="http://on10.net/blogs/tina/Life-At-Microsoft/">life at Microsoft</a> is <em>really</em> like. Very funny. <a href="http://on10.net/blogs/tina/Life-At-Microsoft/">Go watch it</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Microsoft" rel="tag">Microsoft</a>, <a href="http://technorati.com/tags/Channel+10" rel="tag">Channel 10</a>, <a href="http://technorati.com/tags/comedy" rel="tag">comedy</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-13179883818373603582008-02-07T07:35:00.001+01:002008-02-07T07:35:10.894+01:00Video: 12 years of Opera<p><a href="http://operawatch.com/">Daniel Goldman</a> has posted a <a href="http://operawatch.com/news/2008/02/video-opera-browser-12-year-history.html">cool little video of Opera's 12-year history</a>. In just 2 minutes it shows the long road <a href="http://www.opera.com/">Opera</a> has walked, the many versions of its desktop browser, mobile browser, mini browser and browser for devices.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Opera" rel="tag">Opera</a>, <a href="http://technorati.com/tags/Web+Browser" rel="tag">Web Browser</a>, <a href="http://technorati.com/tags/Opera+Mobile" rel="tag">Opera Mobile</a>, <a href="http://technorati.com/tags/Opera+Mini" rel="tag">Opera Mini</a>, <a href="http://technorati.com/tags/video" rel="tag">video</a>, <a href="http://technorati.com/tags/history" rel="tag">history</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-73812393921175604192008-02-01T13:10:00.000+01:002008-02-01T13:17:14.450+01:00PowerShell + Speech APII just read <a href="http://blogs.msdn.com/marcelolr/archive/2008/01/31/say-you-say-me-say-task-complete.aspx">a blog post about using the Speech API in command-line scripts</a> on <a href="http://blogs.msdn.com/marcelolr/default.aspx">Marcelo's Weblog</a>. So I had to try it but instead of using JScript, I wrote it in <a href="http://www.microsoft.com/windowsserver2003/technologies/management/powershell/default.mspx">PowerShell</a>. It's so easy and fun:<PRE>$v = New-Object -ComObject SAPI.SpVoice
$v.Speak("Who let the dogs out?")</PRE>
<div class="tags">Technorati Tags: <a href="http://technorati.com/tag/PowerShell" rel="tag">PowerShell</a>, <a href="http://technorati.com/tag/Speech" rel="tag">Speech</a>, <a href="http://technorati.com/tag/Speech+API" rel="tag">Speech API</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-62766266974076280422008-01-27T12:46:00.001+01:002008-01-27T12:46:38.818+01:00Garden Gnome Carnage<p>I just watched a video about a game called <a href="http://www.remar.se/daniel/ggc.php">Garden Gnome Carnage</a>. I haven't downloaded it yet, but it looks crazy: <a href="http://www.youtube.com/watch?v=o-o1nVjKejg">the video that explains how to play GGC</a> is pretty weird. Don't worry if you experience some <acronym title="What the Fuck?">WTF</acronym>-moments while watching it.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Garden+Gnome+Carnage" rel="tag">Garden Gnome Carnage</a>, <a href="http://technorati.com/tags/GGC" rel="tag">GGC</a>, <a href="http://technorati.com/tags/game" rel="tag">game</a>, <a href="http://technorati.com/tags/video" rel="tag">video</a>, <a href="http://technorati.com/tags/indy+games" rel="tag">indy games</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-14503084857415211652007-11-11T11:46:00.001+01:002007-11-11T11:46:35.787+01:00The Making of Peggle<p><a href="http://www.popcap.com/">PopCap</a> released an article, complete with pictures and videos, about how they created their awesome game Peggle (<a title="Download Peggle" href="http://www.popcap.com/games/peggle">download</a> or <a title="Play Peggle online" href="http://www.popcap.com/games/free/peggle">play online</a>). This cool article clearly explains the entire development process of a great game. Go read <a href="http://www.popcap.com/extras/makingpeggle/big_idea.php">The Making of Peggle</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/PopCap" rel="tag">PopCap</a>, <a href="http://technorati.com/tags/Peggle" rel="tag">Peggle</a>, <a href="http://technorati.com/tags/game+development" rel="tag">game development</a>, <a href="http://technorati.com/tags/casual+gaming" rel="tag">casual gaming</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-43223854515641304692007-11-08T07:45:00.001+01:002007-11-08T07:45:31.085+01:00Opera Mini 4 released<p>Opera has released <a href="http://www.operamini.com/">Opera Mini 4</a>. New features include <a href="http://link.opera.com/">Opera Link</a> (a way of synchronizing the bookmarks on your mobile phone with the bookmarks on your desktop PC), a virtual mouse cursor, enhanced Small Screen Rendering and <a title="Opera Mini features" href="http://www.operamini.com/features/">much more</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Opera" rel="tag">Opera</a>, <a href="http://technorati.com/tags/Opera+Mini" rel="tag">Opera Mini</a>, <a href="http://technorati.com/tags/Web+Browser" rel="tag">Web Browser</a>, <a href="http://technorati.com/tags/Java" rel="tag">Java</a>, <a href="http://technorati.com/tags/Mobile+Phone" rel="tag">Mobile Phone</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-23187031887294358162007-11-01T12:09:00.001+01:002007-11-01T12:11:26.199+01:00Cool new Opera skin<p><a title="Screenshot of new Opera skin: Vista Black and Black" href="http://files.myopera.com/TommyCarlier/albums/122489/NewOperaSkin.png"><img src="http://files.myopera.com/TommyCarlier/albums/122489/NewOperaSkin_thumb.png" align="right" border="0"></a> I just downloaded this cool new <a href="http://www.opera.com">Opera</a> <a href="http://my.opera.com/community/customize/skins">skin</a> called <a href="http://my.opera.com/community/customize/skins/info/?id=7451">Vista Black and Black</a>. It combines elements from Vista, IE7 and Office 2007. There are a lot of skins that emulate the Vista-look or IE7 or Office 2007, but none of them really looked fine for me. But I think this one looks awesome.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Opera" rel="tag">Opera</a>, <a href="http://technorati.com/tags/skin" rel="tag">skin</a>, <a href="http://technorati.com/tags/Vista" rel="tag">Vista</a>, <a href="http://technorati.com/tags/IE7" rel="tag">IE7</a>, <a href="http://technorati.com/tags/Office+2007" rel="tag">Office 2007</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-8096804785444878132007-11-01T11:00:00.001+01:002007-11-01T11:00:08.853+01:00TC Animation Library<p>I just released the source code of a Windows Forms animation library I wrote last weekend. I first didn't want to release the source code because the animations looked terrible.</p> <p>I knew the code I wrote was good and should work, but it ran horribly. The UI would just freeze when 1 animation ran; running multiple animations simultaneously was impossible. This morning I woke up and had a mad idea: to slow down the animation. The reason the UI would freeze was because all the animation steps were performed on the UI-thread and apparently the animation system flooded the UI-thread with work to do, so the normal UI-activities (painting controls, handling input, ...) were severely delayed. By adding 1 line of code (<code>Thread.Sleep(5);</code>) to the animation loop, I managed to make the animations run fluently while keeping the UI responsive.</p> <p>You can find the source code in the Channel 9 Sandbox, <a title="Download the TC Animation Library source code" href="http://channel9.msdn.com/ShowPost.aspx?PostID=352665">here</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Windows+Forms" rel="tag">Windows Forms</a>, <a href="http://technorati.com/tags/WinForms" rel="tag">WinForms</a>, <a href="http://technorati.com/tags/animation" rel="tag">animation</a>, <a href="http://technorati.com/tags/source+code" rel="tag">source code</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>, <a href="http://technorati.com/tags/open+source" rel="tag">open source</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-17726426722256614222007-10-03T22:34:00.001+02:002007-10-03T22:34:43.106+02:00.NET Framework source code released<p>I wasn't going to blog about this, but I just find it remarkable how many people did. The last few hours I received an unusual amount of newsfeed messages about it: <div><a title=".NET Framework source code released" href="http://files.myopera.com/TommyCarlier/albums/122489/DotNetSourceCode.png"><img height="480" alt=".NET Framework source code released" src="http://files.myopera.com/TommyCarlier/albums/122489/DotNetSourceCode.png" width="589" border="0"></a></div> <p></p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/.NET+Framework" rel="tag">.NET Framework</a>, <a href="http://technorati.com/tags/source+code" rel="tag">source code</a>.</div> Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-14602907727719585342007-09-03T18:55:00.001+02:002007-09-03T18:55:19.340+02:00Opera 9.5: preview of new features<p><a title="Exclusive Opera 9.5 Features & Video" href="http://cybernetnews.com/2007/09/03/cybernotes-exclusive-opera-95-features-video">CyberNet has just published an article with a video</a> that shows some of the new features of <a href="http://www.opera.com">Opera 9.5</a>. Apparently they made the rendering engine even faster than it already is, while consuming even less memory and at the same time adding a lot of CSS, JavaScript, AJAX and compatibility features. Wow.</p> <p>Functional new features include an easy way to open the current page in a different browser, restoring closed windows, synchronizing bookmarks with <a href="http://my.opera.com">My Opera</a>, full history search, major improvements to the mail client, etc...</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Opera" rel="tag">Opera</a>, <a href="http://technorati.com/tags/Web+Browser" rel="tag">Web Browser</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-43812381195577190702007-08-18T11:27:00.001+02:002007-08-18T11:27:10.557+02:00Transceiver<p>For the past 3 years, our company has been developing a new technology for secure file delivery. This week we officially released the technology called <a href="http://www.transceiver.biz/">Transceiver</a>. It works a bit like e-mail, but for transmitting files instead of mail messages. Here are some of the features:</p> <ul> <li>Designed to be secure: end-to-end security is the primary goal. <li>Designed to be private: network sniffing will not reveal who is sending files to who. <li>Uses a white-list system to avoid the problem of spam: you will only receive files from people you add to your list of <em>Trusted Addresses</em>. <li>No file size limitation (of course it's limited to the available disk space). <li>If you send a file to someone, you can track the entire transmission: you can see exactly where the file is, and when it moves from machine to machine. <li>We're developing different client applications for different scenarios. All these client applications can be downloaded for free: <ul> <li><img style="margin: 2ex" alt="Transceiver Communicator logo" src="http://www.transceiver.biz/images/TC.gif" align="right"><strong><a href="http://www.transceiver.biz/en-uk/products/index-Communicator.html">Transceiver Communicator</a></strong>: an interactive Windows application that works like a regular e-mail client. On top of Communicator, we've also developed add-ins for Word 2003, Word 2007, Excel 2003 and Excel 2007 that talk to Communicator. We'll also release a .NET 2.0 library for other developers to integrate with Communicator. <li><img style="margin: 2ex" alt="Transceiver Automator logo" src="http://www.transceiver.biz/images/TA.gif" align="right"><strong>Transceiver Automator</strong>: an application that runs as a Windows service and that can transmit and receive files automatically. It can be configured to handle incoming files or transmitting files to predefined destinations. <li><img style="margin: 2ex" alt="Transceiver Client Library logo" src="http://www.transceiver.biz/images/TL.gif" align="right"><strong>Transceiver Client Library</strong>: a .NET 2.0 library that acts as a Transceiver client and that can be used to communicate with a <em>Transceiver Server</em> to transmit and receive files.</li></ul></li></ul> <p>Like I said above, the clients are for free. What we will be selling, is the server software. We encourage people to try out the Transceiver technology for free: we've deployed a test server and a website where you can <a href="http://www.transceiver.biz/">create a temporary test account</a> (just select <em>Step 1</em> on the left-hand side). The only client application that can already be downloaded is <a href="http://www.transceiver.biz/en-uk/products/index-Communicator.html">Transceiver Communicator</a> (together with the Office add-ins). The other client applications will follow by the end of the year.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Transceiver" rel="tag">Transceiver</a>, <a href="http://technorati.com/tags/Transceiver+Communicator" rel="tag">Transceiver Communicator</a>, <a href="http://technorati.com/tags/Transceiver+Automator" rel="tag">Transceiver Automator</a>, <a href="http://technorati.com/tags/Transceiver+Client+Library" rel="tag">Transceiver Client Library</a>, <a href="http://technorati.com/tags/security" rel="tag">security</a>, <a href="http://technorati.com/tags/file+delivery" rel="tag">file delivery</a>, <a href="http://technorati.com/tags/.NET" rel="tag">.NET</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-84963908790417612182007-07-01T10:38:00.001+02:002007-07-01T10:38:52.040+02:00Calctor 2.1.1<p>This is a small update to Calctor. No additional functionality has been added. I just made some modifications so you can install it on Windows Vista.</p> <p>You can download it <a title="Calctor 2.1.1 download" href="http://my.opera.com/TommyCarlier/blog/downloads">here</a>.</p> <div class="tags">Technorati Tags: <a href="http://technorati.com/tags/Calctor" rel="tag">Calctor</a>, <a href="http://technorati.com/tags/Calculator" rel="tag">Calculator</a>, <a href="http://technorati.com/tags/Freeware" rel="tag">Freeware</a>, <a href="http://technorati.com/tags/Windows+Vista" rel="tag">Windows Vista</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-40056759803131397672007-06-03T12:50:00.001+02:002007-06-03T12:50:37.527+02:00New version of Windows Live Writer<p>I just installed the new version of <a href="http://windowslivewriter.spaces.live.com/">Windows Live Writer</a>. It's still beta, but it's a lot better than the previous beta version. It seemed for a while that development of WLW had stopped (no updates or news on the WLW blog since November from last year), but I'm glad they're still working on it. This has to be one of the coolest apps Microsoft is currently developing.</p> <div class="tags">Technorati tags: <a href="http://technorati.com/tags/Windows+Live+Writer" rel="tag">Windows Live Writer</a>, <a href="http://technorati.com/tags/blogging" rel="tag">blogging</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-14077113639022680772007-05-18T16:18:00.001+02:002007-05-18T16:18:10.815+02:00Buzzword screencastI just watched a <a title="Buzzword demo" href="http://www.peapodcast.com/danbcast/buzzworddemo/">screencast of Buzzword</a> (recorded by <a href="http://danbricklin.com/log">Dan Bricklin</a>), <a href="http://www.virtub.com/">Virtual Ubiquity</a>'s not-yet-released word processor for the web that looks very promising. It's a <a href="http://www.adobe.com/products/flash/">Flash</a>/<a href="http://www.adobe.com/products/flex/">Flex</a> application that looks so much better than any web word processor I've seen before. I'll be keeping an eye on this. <div class="tags">Technorati tags: <a href="http://technorati.com/tags/Buzzword" rel="tag">Buzzword</a>, <a href="http://technorati.com/tags/Virtual+Ubiquity" rel="tag">Virtual Ubiquity</a>, <a href="http://technorati.com/tags/word+processor" rel="tag">word processor</a>, <a href="http://technorati.com/tags/Flash" rel="tag">Flash</a>, <a href="http://technorati.com/tags/Flex" rel="tag">Flex</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-25523520853134234832007-05-06T12:42:00.001+02:002007-05-06T12:42:25.316+02:00slimCODE against diabetesI just read on <a title="slimCODE blog" href="http://www.slimcode.com:80/cs/blogs/default.aspx">Martin Plante's blog</a> that he's <a href="http://www.slimcode.com:80/cs/blogs/martin/archive/2007/05/06/help-fight-diabetes-buy-slimkeys.aspx">donating this week's earnings of his slimKEYS application</a> to <a href="http://www.hanselman.com/blog/TeamHanselmanAndDiabetesWalk2007.aspx">Team Hanselman</a> to fight diabetes. slimKEYS is a cool product, and this is a great initiative. <div class="tags">Technorati tags: <a href="http://technorati.com/tags/slimCODE" rel="tag">slimCODE</a>, <a href="http://technorati.com/tags/slimKEYS" rel="tag">slimKEYS</a>, <a href="http://technorati.com/tags/diabetes" rel="tag">diabetes</a>, <a href="http://technorati.com/tags/Team+Hanselman" rel="tag">Team Hanselman</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-59350098329469369652007-05-03T22:11:00.001+02:002007-05-03T22:11:06.345+02:00Writing a parser: overviewHere's a quick overview of the different blog posts in the series <em>Writing a parser</em>: <ul> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser.html">Introduction</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-basic-terminology.html">Basic Terminology</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-introduction-to-adl.html">Introduction to ADL</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-base-vs2005-solution.html">Base VS2005 solution</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-tokens.html">ADL tokens</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-tokenizer.html">ADL Tokenizer</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-tokenizer-correction.html">ADL Tokenizer correction</a> <li><a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-parser-node-types.html">ADL parser node types</a> <li><a href="http://tommycarlier.blogspot.com/2007/05/writing-parser-adl-parser-part-1.html">ADL Parser - part 1</a> <li><a href="http://tommycarlier.blogspot.com/2007/05/writing-parser-adl-parser-part-2.html">ADL Parser - part 2</a></li></ul> <div class="tags">Technorati tags: <a href="http://technorati.com/tags/Parser" rel="tag">Parser</a>, <a href="http://technorati.com/tags/ADL" rel="tag">ADL</a>, <a href="http://technorati.com/tags/Acronymic+Demonstrational+Language" rel="tag">Acronymic Demonstrational Language</a>, <a href="http://technorati.com/tags/Programming+Language" rel="tag">Programming Language</a>, <a href="http://technorati.com/tags/Visual+Studio" rel="tag">Visual Studio</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-42195122306750431842007-05-03T22:03:00.001+02:002007-05-03T22:03:19.095+02:00Writing a parser: ADL Parser - part 2In <a title="Writing a parse: ADL Parser - part 1" href="http://tommycarlier.blogspot.com/2007/05/writing-parser-adl-parser-part-1.html">the previous part</a> we wrote the first Parser methods, now we'll write the methods for parsing expressions. In the <a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-introduction-to-adl.html">introduction to ADL</a>, I showed the order in which expressions are evaluated. To parse expressions, we'll reverse the order and assume that everything is an AND-expression (lowest precedence): <pre>Expression ParseExpression()
{
return ParseAndExpression();
}</pre>An AND-expression consists of one or more OR-expressions, separated by <code>and</code> operators: <pre>Expression ParseAndExpression()
{
Expression lNode = ParseOrExpression();
while (!AtEndOfSource && fCurrentToken.Equals(TokenType.Word, "and"))
{
ReadNextToken(); // skip 'and'
lNode = new AndExpression(lNode, ParseOrExpression());
}
return lNode;
}</pre>An OR-expression consists of one or more comparisons, separated by <code>or</code> operators: <pre>Expression ParseOrExpression()
{
Expression lNode = ParseComparison();
while (!AtEndOfSource && fCurrentToken.Equals(TokenType.Word, "or"))
{
ReadNextToken(); // skip 'or'
lNode = new OrExpression(lNode, ParseComparison());
}
return lNode;
}</pre>A comparison is an additive expression, or 2 additive expressions separated by a comparison operator: <pre>Expression ParseComparison()
{
Expression lNode = ParseAdditiveExpression();
if (!AtEndOfSource && fCurrentToken.Type == TokenType.Symbol)
{
ComparisonOperator lOperator;
switch (fCurrentToken.Value)
{
case "==": lOperator = ComparisonOperator.Equal; break;
case "<>": lOperator = ComparisonOperator.NotEquals; break;
case "<": lOperator = ComparisonOperator.LessThan; break;
case ">": lOperator = ComparisonOperator.GreaterThan; break;
case "<=": lOperator = ComparisonOperator.LessThanOrEqual; break;
case ">=": lOperator = ComparisonOperator.GreaterThanOrEqual; break;
default: return lNode;
}
ReadNextToken(); // skip comparison operator
return new Comparison(lOperator, lNode, ParseAdditiveExpression());
}
else return lNode;
}</pre>An additive expression consists of one or more multiplicative expressions, separated by <code>+</code> or <code>-</code> operators: <pre>Expression ParseAdditiveExpression()
{
Expression lNode = ParseMultiplicativeExpression();
while (!AtEndOfSource)
{
if (fCurrentToken.Equals(TokenType.Symbol, "+"))
{
ReadNextToken(); // skip '+'
lNode = new Addition(lNode, ParseMultiplicativeExpression());
}
else if (fCurrentToken.Equals(TokenType.Symbol, "-"))
{
ReadNextToken(); // skip '-'
lNode = new Subtraction(lNode, ParseMultiplicativeExpression());
}
else break;
}
return lNode;
}</pre>A multiplicative expression consists of one or more unary expressions, separated by <code>*</code> or <code>/</code> operators: <pre>Expression ParseMultiplicativeExpression()
{
Expression lNode = ParseUnaryExpression();
while (!AtEndOfSource)
{
if (fCurrentToken.Equals(TokenType.Symbol, "*"))
{
ReadNextToken(); // skip '*'
lNode = new Multiplication(lNode, ParseUnaryExpression());
}
else if (fCurrentToken.Equals(TokenType.Symbol, "/"))
{
ReadNextToken(); // skip '/'
lNode = new Division(lNode, ParseUnaryExpression());
}
else break;
}
return lNode;
}</pre>A unary expression is a base expression, optionally prefixed by a <code>-</code>, <code>+</code> or <code>not</code> operator: <pre>Expression ParseUnaryExpression()
{
CheckForUnexpectedEndOfSource();
if (fCurrentToken.Equals(TokenType.Symbol, "-"))
{
ReadNextToken(); // skip '-'
return new Negation(ParseBaseExpression());
}
else if (fCurrentToken.Equals(TokenType.Word, "not"))
{
ReadNextToken(); // skip 'not'
return new NotExpression(ParseBaseExpression());
}
else if (fCurrentToken.Equals(TokenType.Symbol, "+"))
ReadNextToken(); // skip '+'
return ParseBaseExpression();
}</pre>A base expression is either an integer constant, a string constant, a variable, a function call or a group expression: <pre>Expression ParseBaseExpression()
{
CheckForUnexpectedEndOfSource();
switch(fCurrentToken.Type)
{
case TokenType.Integer: return ParseIntegerConstant();
case TokenType.String: return ParseStringConstant();
case TokenType.Word: return ParseVariableOrFunctionCall();
default: // TokenType.Symbol
if (fCurrentToken.Value == "(")
return ParseGroupExpression();
else throw new ParserException("Expected an expression.");
}
}</pre>A group expression is an expression between parenthesis: <pre>Expression ParseGroupExpression()
{
ReadNextToken(); // skip '('
Expression lExpression = ParseExpression();
SkipExpected(TokenType.Symbol, ")"); // skip ')'
return lExpression;
}</pre>A variable and a function call both start with an identifier. To disambiguate, we'll have to read the next token: <pre>Expression ParseVariableOrFunctionCall()
{
string lName = fCurrentToken.Value;
ReadNextToken();
if (!AtEndOfSource && fCurrentToken.Equals(TokenType.Symbol, "("))
return ParseFunctionCall(lName);
else return new Variable(lName);
}</pre>A string constant is just the read string token: <pre>Expression ParseStringConstant()
{
string lValue = fCurrentToken.Value;
ReadNextToken(); // skip string constant
return new StringConstant(lValue);
}</pre>An integer constant has to be parsed from the string value: <pre>Expression ParseIntegerConstant()
{
int lValue;
if (int.TryParse(fCurrentToken.Value, out lValue))
{
ReadNextToken(); // skip integer constant
return new IntegerConstant(lValue);
}
else throw new ParserException("Invalid integer constant " + fCurrentToken.Value);
}</pre>A function call consists of an identifier, followed by zero or more arguments between parentheses: <pre>FunctionCall ParseFunctionCall(string name)
{
ReadNextToken(); // skip '('
CheckForUnexpectedEndOfSource();
List<expression> lArguments = new List<expression>();
if (!fCurrentToken.Equals(TokenType.Symbol, ")")
{
lArguments.Add(ParseExpression());
CheckForUnexpectedEndOfSource();
while (fCurrentToken.Equals(TokenType.Symbol, ","))
{
ReadNextToken(); // skip ','
lArguments.Add(ParseExpression());
CheckForUnexpectedEndOfSource();
}
if (!fCurrentToken.Equals(TokenType.Symbol, ")")
throw new ParserException("Expected ')'.");
}
ReadNextToken(); // skip ')'
return new FunctionCall(name, lArguments.ToArray());
}</pre>Now our parser is ready. To test the parser, I've created a test application that reads code from a TextBox, parses it and shows the parse tree in a TreeView. I'm not going to show the source code of the test application in this post, but you can download the complete Visual Studio 2005 solution with both projects (TC.Adl and TC.Adl.Test) <a href="http://my.opera.com/TommyCarlier/blog/writing-a-parser-downloads">here</a>.
<div class="tags">Technorati tags: <a href="http://technorati.com/tags/Parser" rel="tag">Parser</a>, <a href="http://technorati.com/tags/ADL" rel="tag">ADL</a>, <a href="http://technorati.com/tags/Acronymic+Demonstrational+Language" rel="tag">Acronymic Demonstrational Language</a>, <a href="http://technorati.com/tags/Programming+Language" rel="tag">Programming Language</a>, <a href="http://technorati.com/tags/Visual+Studio" rel="tag">Visual Studio</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-30315600263658707552007-05-03T21:00:00.001+02:002007-05-03T21:00:31.491+02:00Writing a parser: ADL Parser - part 1We'll now write the Parser class: <pre>using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using TC.Adl.ParserNodes;
namespace TC.Adl
{
public class Parser
{
Tokenizer fTokenizer;
Token fCurrentToken;
}
}</pre>Our Parser class has only 2 fields:
<dl>
<dt><code>fTokenizer</code>
<dd>The Tokenizer to read tokens from.
<dt><code>fCurrentToken</code>
<dd>The current token (most recently read). </dd></dl>The constructor of the Parser class will accept a TextReader argument, create a Tokenizer that uses that TextReader, store it in <code>fTokenizer</code> and read the first token: <pre>public Parser(TextReader source)
{
if (source == null) throw new ArgumentNullException("source");
fTokenizer = new Tokenizer(source);
ReadNextToken();
}</pre>
<p>Now we'll add some private helper methods.</p>Reading a token is simple. just call <code>Tokenizer.ReadNextToken()</code>, which returns a <code>Token</code> or <code>null</code> at the end of the source code. <pre>void ReadNextToken() { fCurrentToken = fTokenizer.ReadNextToken(); }</pre>To determine if we're at the end of the source, we just have to check the current token for <code>null</code>: <pre>bool AtEndOfSource { get { return fCurrentToken == null; } }</pre>We'll need a method that throws an exception when the end of the source has been reached unexpectedly: <pre>void CheckForUnexpectedEndOfSource()
{
if (AtEndOfSource)
throw new ParserException("Unexpected end of source.");
}</pre>We'll also need a method that verifies the current token and skips it: <pre>void SkipExpected(TokenType type, string value)
{
CheckForUnexpectedEndOfSource();
if (!fCurrentToken.Equals(type, value))
throw new ParserException("Expected '" + value + "'.");
ReadNextToken();
}</pre>Now that we've written the private helper methods, we can write the only public method: the <code>ReadNextStatement</code> method. This methods reads a statement and returns it. If we've reached the end of the source, we'll return null, else we'll check the first token to determine the <a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-parser-node-types.html">type of statement</a>:
<ul>
<li>If the current token is the word <code>if</code>, it's an if-statement.
<li>If the current token is the word <code>while</code>, it's a while-statement.
<li>If the current token is the word <code>for</code>, it's a for-statement.
<li>If it's any other word, we assume it's an assignment or a function call. </li></ul><pre>public Statement ReadNextStatement()
{
if (AtEndOfSource)
return null;
// all the statements start with a word
if (fCurrentToken.Type != TokenType.Word)
throw new ParserException("Expected a statement.");
if (fCurrentToken.Value == "if")
return ParseIfStatement();
if (fCurrentToken.Value == "while")
return ParseWhileStatement();
if (fCurrentToken.Value == "for")
return ParseForStatement();
return ParseAssignmentOrFunctionCallStatement();
}</pre>An if-statement starts with the word <code>if</code>, followed by a condition, the word <code>then</code>, a block of statements, an optional block of statements prefixed with the word <code>else</code> and the words <code>end if</code>: <pre>IfStatement ParseIfStatement()
{
ReadNextToken(); // skip 'if'
Expression lCondition = ParseExpression();
SkipExpected(TokenType.Word, "then"); // skip 'then'
List<statement> lTrueStatements = new List<statement>();
List<statement> lFalseStatements = new List<statement>();
List<statement> lStatements = lTrueStatements;
Statement lStatement;
CheckForUnexpectedEndOfSource();
while (!fCurrentToken.Equals(TokenType.Word, "end"))
{
if (fCurrentToken.Equals(TokenType.Word, "else"))
{
ReadNextToken(); // skip 'else'
CheckForUnexpectedEndOfSource();
lStatements = lFalseStatements;
}
if ((lStatement = ReadNextStatement()) != null)
lStatements.Add(lStatement);
else throw new ParserException("Unexpected end of source.");
}
ReadNextToken(); // skip 'end'
SkipExpected(TokenType.Word, "if"); // skip 'if'
return new IfStatement(lCondition
, new StatementCollection(lTrueStatements)
, new StatementCollection(lFalseStatements));
}</pre>A while-statement starts with the word <code>while</code>, followed by a condition, the word <code>do</code>, a block of statements and the words <code>end while</code>: <pre>WhileStatement ParseWhileStatement()
{
ReadNextToken(); // skip 'while'
Expression lCondition = ParseExpression();
SkipExpected(TokenType.Word, "do"); // skip 'do'
List<statement> lStatements = new List<statement>();
Statement lStatement;
CheckForUnexpectedEndOfSource();
while (!fCurrentToken.Equals(TokenType.Word, "end"))
{
if ((lStatement = ReadNextStatement()) != null)
lStatements.Add(lStatement);
else throw new ParserException("Unexpected end of source.");
}
ReadNextToken(); // skip 'end'
SkipExpected(TokenType.Word, "while"); // skip 'while'
return new WhileStatement(lCondition, new StatementCollection(lStatements));
}</pre>A for-statement starts with the word <code>for</code>, followed by a variable, the symbol <code>:=</code>, a start-value, the word <code>to</code>, an end-value, optionally the word <code>by</code> with a step-size, the word <code>do</code>, a block of statements and the words <code>end for</code>: <pre>ForStatement ParseForStatement()
{
ReadNextToken(); // skip 'for'
CheckForUnexpectedEndOfSource();
if (fCurrentToken.Type != TokenType.Word)
throw new ParserException("Expected a variable.");
Variable lVariable = new Variable(fCurrentToken.Value);
ReadNextToken();
SkipExpected(TokenType.Symbol, ":="); // skip ':='
Expression lStartValue = ParseExpression();
SkipExpected(TokenType.Word, "to"); // skip 'to'
Expression lEndValue = ParseExpression();
CheckForUnexpectedEndOfSource();
Expression lStepSize;
if (fCurrentToken.Equals(TokenType.Word, "by"))
{
ReadNextToken(); // skip 'by'
lStepSize = ParseExpression();
}
else lStepSize = new IntegerConstant(1);
SkipExpected(TokenType.Word, "do");
List<statement> lStatements = new List<statement>();
Statement lStatement;
CheckForUnexpectedEndOfSource();
while (!fCurrentToken.Equals(TokenType.Word, "end"))
{
if ((lStatement = ReadNextStatement()) != null)
lStatements.Add(lStatement);
else throw new ParserException("Unexpected end of source.");
}
ReadNextToken(); // skip 'end'
SkipExpected(TokenType.Word, "for"); // skip 'for'
return new ForStatement(lVariable, lStartValue, lEndValue, lStepSize, new StatementCollection(lStatements));
}</pre>An assignment and a function call statement both start with an identifier, so we'll have to read the next token to determine if it's an assignment or a function call statement: <pre>Statement ParseAssignmentOrFunctionCallStatement()
{
Token lToken = fCurrentToken;
ReadNextToken();
CheckForUnexpectedEndOfSource();
if (fCurrentToken.Equals(TokenType.Symbol, ":="))
return ParseAssignment(new Variable(lToken.Value));
if (fCurrentToken.Equals(TokenType.Symbol, "("))
return new FunctionCallStatement(ParseFunctionCall(lToken.Value));
throw new ParserException("Expected a statement.");
}</pre>An assignment just has an expression after the <code>:=</code>: <pre>Assignment ParseAssignment(Variable variable)
{
ReadNextToken(); // skip ':='
return new Assignment(variable, ParseExpression());
}</pre>In the next post, we'll write the methods for parsing expression.
<div class="tags">Technorati tags: <a href="http://technorati.com/tags/Parser" rel="tag">Parser</a>, <a href="http://technorati.com/tags/ADL" rel="tag">ADL</a>, <a href="http://technorati.com/tags/Acronymic+Demonstrational+Language" rel="tag">Acronymic Demonstrational Language</a>, <a href="http://technorati.com/tags/Programming+Language" rel="tag">Programming Language</a>, <a href="http://technorati.com/tags/Visual+Studio" rel="tag">Visual Studio</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-10940847851717326982007-04-28T18:28:00.001+02:002007-04-28T18:28:16.225+02:00Writing a parser: ADL parser node types<p>In this post, I'll show you the different types of parser nodes. If you remember from <a title="Writing a parser: basic terminology" href="http://tommycarlier.blogspot.com/2007/04/writing-parser-basic-terminology.html">one of the first posts I made in this series</a>, the parser analyzes the tokens it gets from the tokenizer and turns it into structured trees. The nodes in this tree represent the different constructs (additions, multiplications, comparisons, ...).</p> <p><a href="http://files.myopera.com/TommyCarlier/albums/248390/ParserNodes.png"><img height="117" alt="The different types of parser tree nodes" src="http://files.myopera.com/TommyCarlier/albums/248390/ParserNodes.png" width="594" border="0"></a></p> <ul> <li><em>ParserNode</em> <ul> <li><em>Statement</em> <ul> <li>Assignment: <code>x := ...</code> <li>IfStatement: <code>if ... then ... else ... end if</code> <li>WhileStatement: <code>while ... do ... end while</code> <li>ForStatement: <code>for i := ... to ... by ... do ... end for</code> <li>FunctionCallStatement: <code>f(...)</code></li></ul> <li><em>Expression</em> <ul> <li><em>UnaryExpression</em> <ul> <li>Negation: <code>-x</code> <li>NotExpression: <code>not x</code></li></ul> <li><em>BinaryExpression</em> <ul> <li>Addition: <code>x + y</code> <li>Subtraction: <code>x - y</code> <li>Multiplication: <code>x * y</code> <li>Division: <code>x / y</code> <li>AndExpression: <code>x and y</code> <li>OrExpression: <code>x or y</code> <li>Comparison: <code>x < y</code> or <code>x > y</code> or <code>x == y</code> or ...</li></ul> <li>Variable: <code>x</code> <li>IntegerConstant: <code>123</code> <li>StringConstant: <code>"abc"</code> <li>FunctionCall: <code>f(...)</code></li></ul></li></ul></li></ul> <p>A <strong>statement</strong> is an instruction that does not return a value. An <strong>expression</strong> is an instruction that does return a value. A <strong>unary expression</strong> has an operator and a single child expression. A <strong>binary expression</strong> has an operator and 2 child expressions.</p> <p>You might have noticed that I defined both a FunctionCall and a FunctionCallStatement. The difference is that a FunctionCall is an expression that returns the value of the function call, and a FunctionCallStatement is a statement that ignores the value of the function call.</p>The basic types look like this: <pre><span class="keyword">public abstract class</span> ParserNode
{
}
<span class="keyword">public abstract class</span> Statement : ParserNode
{
}
<span class="keyword">public abstract class</span> Expression : ParserNode
{
}</pre>As you can see, they don't contain any code. Because I'm just explaining how parsers work, they don't have to contain code. You should add code if you use these classes to build an interpreter or a compiler. For an interpreter, the <em>Statement</em> class could have a method <em>Execute</em> to execute the statement, and the <em>Expression</em> class could have a method <em>Evaluate</em> to evaluate the expression and return the result.
<p>There's no point in showing the source code in this blog post for all the classes, so you can <a href="http://my.opera.com/TommyCarlier/blog/writing-a-parser-downloads">download them here</a>. Just create a new folder <em>ParserNodes</em> in the <em>TC.Adl</em> project and include the downloaded C# files.</p>
<p>Next time, we'll write the actual parser and a test application.</p>
<div class="tags">Technorati tags: <a href="http://technorati.com/tags/Parser" rel="tag">Parser</a>, <a href="http://technorati.com/tags/ADL" rel="tag">ADL</a>, <a href="http://technorati.com/tags/Acronymic+Demonstrational+Language" rel="tag">Acronymic Demonstrational Language</a>, <a href="http://technorati.com/tags/Programming+Language" rel="tag">Programming Language</a>, <a href="http://technorati.com/tags/Visual+Studio" rel="tag">Visual Studio</a>, <a href="http://technorati.com/tags/CSharp" rel="tag">C#</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-25322005878768979962007-04-25T07:33:00.001+02:002007-04-25T07:33:13.502+02:00Writing a parser: ADL Tokenizer correction<a href="http://www.blogger.com/profile/11160553154275818207">Sushovan</a> made a great remark on <a title="Writing a parser: ADL Tokenizer" href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-tokenizer.html">my last post</a>. He asked me why I didn't use the method <code>char.IsWhiteSpace()</code> to test for white-space. The reason I didn't, was ignorance: I had totally forgot about that method. So it's probably a good idea to change the method <code>SkipWhitespace</code> to this: <pre><span class="keyword">void</span> SkipWhitespace()
{
<span class="keyword">while</span> (<span class="keyword">char</span>.IsWhiteSpace(fCurrentChar))
ReadNextChar();
}</pre>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-6383905424221439282007-04-24T21:30:00.001+02:002007-04-24T21:30:12.379+02:00Writing a parser: ADL Tokenizer<p>We'll now start writing the tokenizer.</p> <p>One thing I haven't explained yet is how we will deal with white-space. The easiest solution is to just skip the white-space between tokens, and that's what we'll do. Usually white-space is defined as one of the following characters: space (<code>' '</code>), tab (<code>'\t'</code>), new-line (<code>'\n'</code>) or carriage-return (<code>'\r'</code>). But I've noticed that any character below space (<code>' '</code>) is never used as text. So we can just consider any character <= <code>' '</code> to be white-space, which makes it easier to test a character.</p> <p>Because we want our tokenizer to read characters from strings as well as streams, we'll use a <a href="http://msdn2.microsoft.com/en-us/library/system.io.textreader.aspx">TextReader</a>. TextReader has 2 derived classes: <a href="http://msdn2.microsoft.com/en-us/library/system.io.stringreader.aspx">StringReader</a> (to read text from a string) and <a href="http://msdn2.microsoft.com/en-us/library/system.io.streamreader.aspx">StreamReader</a> (to read text from a stream). The only method in TextReader we will use is <a href="http://msdn2.microsoft.com/en-us/library/0w3csw16.aspx">Read()</a>, which returns the next character or -1 if the end of the stream or string has been reached.</p> <p>To extract a token, we read the first character and decide what type of token it will be. For each token, we will have to store the characters it contains. We'll use a <a href="http://msdn2.microsoft.com/en-us/library/system.text.stringbuilder.aspx">StringBuilder</a> for this.</p><pre><span class="keyword">using</span> System;
<span class="keyword">using</span> System.Collections.Generic;
<span class="keyword">using</span> System.Text;
<span class="keyword">using</span> System.IO;
<span class="keyword">namespace</span> TC.Adl
{
<span class="keyword">public class</span> Tokenizer
{
TextReader fSource;
<span class="keyword">char</span> fCurrentChar;
StringBuilder fTokenValueBuffer;
}
}</pre>Our Tokenizer class has only 3 fields:
<dl>
<dt><code>fSource</code>
<dd>The TextReader to read characters from.
<dt><code>fCurrentChar</code>
<dd>The current character (most recently read).
<dt><code>fTokenValueBuffer</code>
<dd>The StringBuilder to store the characters of the current token.</dd></dl>The constructor of the Tokenizer class will accept a TextReader argument, store it in <code>fSource</code>, initialize <code>fTokenValueBuffer</code> and read the first character: <pre><span class="keyword">public</span> Tokenizer(TextReader source)
{
<span class="keyword">if</span> (source == <span class="keyword">null</span>) <span class="keyword">throw new</span> ArgumentNullException(<span class="string">"source"</span>);
fSource = source;
fTokenValueBuffer = <span class="keyword">new</span> StringBuilder();
ReadNextChar();
}</pre>
<p>Now we'll add some private helper methods.</p>Reading a character is simple. Just call <code>TextReader.Read()</code>, which returns an <code>int</code>. Because a <code>char</code> cannot be -1 (which is returned by <code>TextReader.Read()</code> at the end of the stream), we will use character <code>'\0'</code> as the end-of-stream character. <pre><span class="keyword">void</span> ReadNextChar()
{
<span class="keyword">int</span> lChar = fSource.Read();
<span class="keyword">if</span> (lChar > 0)
fCurrentChar = (<span class="keyword">char</span>)lChar;
<span class="keyword">else</span> fCurrentChar = <span class="string">'\0'</span>;
}</pre>Skipping white-space is also easy: just keep reading characters until we get a character that is not white-space: <pre><span class="keyword">void</span> SkipWhitespace()
{
<span class="keyword">while</span> (fCurrentChar > <span class="string">'\0'</span> && fCurrentChar <= <span class="string">' '</span>)
ReadNextChar();
}</pre>To determine if we're at the end of the source, we just have to check the current character for <code>'\0'</code>: <pre><span class="keyword">bool</span></span> AtEndOfSource { <span class="keyword">get</span> { <span class="keyword">return</span> fCurrentChar == <span class="string">'\0'</span>; } }</pre>We'll also write a simple method to store the current character and read the next: <pre><span class="keyword">void</span> StoreCurrentCharAndReadNext()
{
fTokenValueBuffer.Append(fCurrentChar);
ReadNextChar();
}</pre>And a method to extract the stored characters and clear the buffer: <pre><span class="keyword">string</span> ExtractStoredChars()
{
<span class="keyword">string</span> lValue = fTokenValueBuffer.ToString();
fTokenValueBuffer.Length = 0;
<span class="keyword">return</span> lValue;
}</pre>Because the source code can have errors, we'll need some methods that throw exceptions: <pre><span class="keyword">void</span> CheckForUnexpectedEndOfSource()
{
<span class="keyword">if</span> (AtEndOfSource)
<span class="keyword">throw new</span> ParserException(<span class="string">"Unexpected end of source."</span>);
}
<span class="keyword">void</span> ThrowInvalidCharException()
{
<span class="keyword">if</span> (fTokenValueBuffer.Length == 0)
<span class="keyword">throw new</span> ParserException(<span class="string">"Invalid character '"</span> + fCurrentChar.ToString() + <span class="string">"'."</span>);
<span class="keyword">else</span>
{
<span class="keyword">throw new</span> ParserException(<span class="string">"Invalid character '"</span>
+ fCurrentChar.ToString() + <span class="string">"' after '"</span>
+ fTokenValueBuffer.ToString() + <span class="string">"'."</span>);
}
}</pre>Which introduces our ParserException class (a standard Exception class): <pre><span class="keyword">using</span> System;
<span class="keyword">using</span> System.Collections.Generic;
<span class="keyword">using</span> System.Text;
<span class="keyword">using</span> System.Runtime.Serialization;
<span class="keyword">namespace</span> TC.Adl
{
[Serializable]
<span class="keyword">public class</span> ParserException : Exception
{
<span class="keyword">public</span> ParserException() { }
<span class="keyword">public</span> ParserException(<span class="keyword">string</span> message) : <span class="keyword">base</span>(message) { }
<span class="keyword">public</span> ParserException(<span class="keyword">string</span> message, Exception innerException) : <span class="keyword">base</span>(message, innerException) { }
<span class="keyword">protected</span> ParserException(SerializationInfo info, StreamingContext context) : <span class="keyword">base</span>(info, context) { }
}
}</pre>Now that we've written the private helper methods, we can write the only public method: the <code>ReadNextToken</code> method. This method reads a token and returns it. First we skip the initial white-space. If we've reached the end of the source, we'll return null, else we'll check the first character to determine the <a href="http://tommycarlier.blogspot.com/2007/04/writing-parser-adl-tokens.html">type of token</a>:
<ul>
<li>If the token starts with a letter, it's a word.
<li>If the token starts with a digit, it's an integer constant.
<li>If the token starts with a quote, it's a string constant.
<li>If it's any other character, we assume it's a symbol. </li></ul><pre><span class="keyword">public</span> Token ReadNextToken()
{
SkipWhitespace();
<span class="keyword">if</span> (AtEndOfSource)
<span class="keyword">return null</span>;
<span class="keyword">if</span> (<span class="keyword">char</span>.IsLetter(fCurrentChar))
<span class="keyword">return</span> ReadWord();
<span class="keyword">if</span> (<span class="keyword">char</span>.IsDigit(fCurrentChar))
<span class="keyword">return</span> ReadIntegerConstant();
<span class="keyword">if</span> (fCurrentChar == <span class="string">'"'</span>)
<span class="keyword">return</span> ReadStringConstant();
<span class="keyword">return</span> ReadSymbol();
}</pre>A word starts with a letter (already tested in <code>ReadNextToken()</code>) followed by zero or more letters or digits. So all we have to do is keep reading until we've reached a character that is not a letter or digit: <pre>Token ReadWord()
{
<span class="keyword">do</span>
{
StoreCurrentCharAndReadNext();
}
<span class="keyword">while</span> (<span class="keyword">char</span>.IsLetterOrDigit(fCurrentChar));
<span class="keyword">return new</span> Token(TokenType.Word, ExtractStoredChars());
}</pre>An integer constant just contains digits: <pre>Token ReadIntegerConstant()
{
<span class="keyword">do</span>
{
StoreCurrentCharAndReadNext();
}
<span class="keyword">while</span> (<span class="keyword">char</span>.IsDigit(fCurrentChar));
<span class="keyword">return new</span> Token(TokenType.Integer, ExtractStoredChars());
}</pre>A string constant contains a sequence of characters, enclosed in quotes. Because the quote character is used as a delimiter, the characters in between cannot be quotes. But all other characters are allowed. If the end of the source is reached before the closing quote, we throw an exception. We don't want the quotes to be included in the value of the token, so we'll skip them with ReadNextChar. <pre>Token ReadStringConstant()
{
ReadNextChar();
<span class="keyword">while</span> (!AtEndOfSource && fCurrentChar != <span class="string">'"'</span>)
{
StoreCurrentCharAndReadNext();
}
CheckForUnexpectedEndOfSource();
ReadNextChar();
<span class="keyword">return new</span> Token(TokenType.String, ExtractStoredChars());
}</pre>Reading a symbol is more complicated. A symbol can be one or two characters, so we'll have to treat each case individually: <pre>Token ReadSymbol()
{
<span class="keyword">switch</span> (fCurrentChar)
{
<span class="comment">// the symbols + - * / ( ) ,</span>
<span class="keyword">case</span> <span class="string">'+'</span>:
<span class="keyword">case</span> <span class="string">'-'</span>:
<span class="keyword">case</span> <span class="string">'*'</span>:
<span class="keyword">case</span> <span class="string">'/'</span>:
<span class="keyword">case</span> <span class="string">'('</span>:
<span class="keyword">case</span> <span class="string">')'</span>:
<span class="keyword">case</span> <span class="string">','</span>:
StoreCurrentCharAndReadNext();
<span class="keyword">return new</span> Token(TokenType.Symbol, ExtractStoredChars());
<span class="comment">// the symbols := ==</span>
<span class="keyword">case</span> <span class="string">':'</span>:
<span class="keyword">case</span> <span class="string">'='</span>:
StoreCurrentCharAndReadNext();
<span class="keyword">if</span> (fCurrentChar == <span class="string">'='</span>)
{
StoreCurrentCharAndReadNext();
<span class="keyword">return new</span> Token(TokenType.Symbol, ExtractStoredChars());
}
CheckForUnexpectedEndOfSource();
ThrowInvalidCharException();
<span class="keyword">break</span>;
<span class="comment">// the symbols < <> <=</span>
<span class="keyword">case</span> <span class="string">'<'</span>:
StoreCurrentCharAndReadNext();
<span class="keyword">if</span> (fCurrentChar == <span class="string">'>'</span> || fCurrentChar == <span class="string">'='</span>)
{
StoreCurrentCharAndReadNext();
}
<span class="keyword">return new</span> Token(TokenType.Symbol, ExtractStoredChars());
<span class="comment">// the symbols > >=</span>
<span class="keyword">case</span> <span class="string">'>'</span>:
StoreCurrentCharAndReadNext();
<span class="keyword">if</span> (fCurrentChar == <span class="string">'='</span>)
{
StoreCurrentCharAndReadNext();
}
<span class="keyword">return new</span> Token(TokenType.Symbol, ExtractStoredChars());
<span class="keyword">default</span>:
CheckForUnexpectedEndOfSource();
ThrowInvalidCharException();
<span class="keyword">break</span>;
}
<span class="keyword">return null</span>;
}</pre>That's it, our Tokenizer class is done. To test it, we'll write a small application. Add a new Form to the project TC.Adl.Test, named TokenizerTest. It will have a multiline TextBox where we can type in source code (TextBoxSource), a Button to tokenize the source code (ButtonTokenize), and a ListBox that will display the tokens (ListBoxTokens). It should look like this (follow link to enlarge):
<p><a href="http://files.myopera.com/TommyCarlier/albums/248390/TokenizerTest.png"><img height="252" alt="TokenizerTest designer screenshot" src="http://files.myopera.com/TommyCarlier/albums/248390/TokenizerTest.png" width="533" border="0"></a></p>The Click event handler of the button creates a new Tokenizer, reads all the tokens, and adds them to the ListBox: <pre><span class="keyword">private void</span> ButtonTokenize_Click(<span class="keyword">object</span> sender, EventArgs e)
{
ListBoxTokens.Items.Clear();
ListBoxTokens.BeginUpdate();
<span class="keyword">try</span>
{
<span class="keyword">using</span> (StringReader lSource = <span class="keyword">new</span> StringReader(TextBoxSource.Text))
{
Tokenizer lTokenizer = <span class="keyword">new</span> Tokenizer(lSource);
Token lToken = lTokenizer.ReadNextToken();
<span class="keyword">while</span> (lToken != <span class="keyword">null</span>)
{
ListBoxTokens.Items.Add(lToken.Type.ToString() + <span class="string">":\t"</span> + lToken.Value);
lToken = lTokenizer.ReadNextToken();
}
}
}
<span class="keyword">catch</span> (ParserException lException)
{
MessageBox.Show(<span class="keyword">this</span>, lException.Message, <span class="keyword">this</span>.Text, MessageBoxButtons.OK, MessageBoxIcon.Error);
}
<span class="keyword">finally</span> { ListBoxTokens.EndUpdate(); }
}</pre>Now you can start the application, run the TokenizerTest and try out some code to see what tokens are generated. Try out this sample: <pre><span class="keyword">for</span> i := 1 <span class="keyword">to</span> 100 <span class="keyword">do</span>
<span class="keyword">if</span> i < 50 <span class="keyword">then</span>
print(i, <span class="string">" < 50"</span>)
<span class="keyword">else</span>
print(i, <span class="string">" >= 50"</span>)
<span class="keyword">end if</span>
<span class="keyword">end for</span></pre>As you can see, converting characters to tokens is not very complicated. You just have to get used to it.
<div class="tags">Technorati tags: <a href="http://technorati.com/tags/Parser" rel="tag">Parser</a>, <a href="http://technorati.com/tags/Tokenizer" rel="tag">Tokenizer</a>, <a href="http://technorati.com/tags/ADL" rel="tag">ADL</a>, <a href="http://technorati.com/tags/Acronymic+Demonstrational+Language" rel="tag">Acronymic Demonstrational Language</a>.</div>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.comtag:blogger.com,1999:blog-8076826.post-50719254123023934962007-04-22T18:29:00.001+02:002007-04-22T18:29:20.762+02:00Writing a parser: ADL tokens<acronym title="Asynchronous Demonstrational Language">ADL</acronym> only has 4 different tokens: <dl> <dt>Word <dd>A <b>word</b> starts with a letter, followed by zero or more letters or numbers. <dd>Examples: <code>x</code>, <code>abc</code>, <code>f2</code>, <code>else</code>. <dt>Integer <dd>An <b>integer</b> is a sequence of one or more digits. <dd>Examples: <code>1</code>, <code>42</code>, <code>3141592654</code>. <dt>String <dd>A <b>string</b> is a sequence of characters enclosed in quotes. <dd>Examples: <code>"x"</code>, <code>"abc"</code>, <code>"Quid pro quo."</code>. <dt>Symbol <dd>A <b>symbol</b> is one of the following sequences: <dd><code>+ - * / ( ) , := == < > <> <= >=</code> </dd></dl>To identify tokens, we'll use an enum called <code>TokenType</code>: <pre><span class="keyword">namespace</span> TC.Adl
{
<span class="keyword">public enum</span> TokenType
{
None = 0,
Word,
Integer,
String,
Symbol
}
}</pre><small>(The value <code>None</code> is the default value and should not occur.)</small>
<p>A token has a type (of type <code>TokenType</code>) and a value (of type <code>string</code>). The value is the sequence of characters that represent the token.</p><pre><span class="keyword">using</span> System;
<span class="keyword">using</span> System.Collections.Generic;
<span class="keyword">using</span> System.Text;
<span class="keyword">namespace</span> TC.Adl
{
<span class="keyword">public class</span> Token
{
<span class="keyword">public</span> Token(TokenType type, <span class="keyword">string</span> value)
{
fType = type;
fValue = value;
}
<span class="keyword">readonly</span> TokenType fType;
<span class="keyword">public</span> TokenType Type { <span class="keyword">get</span> { <span class="keyword">return</span> fType; } }
<span class="keyword">readonly string</span> fValue;
<span class="keyword">public string</span> Value { <span class="keyword">get</span> { <span class="keyword">return</span> fValue; } }
<span class="keyword">public bool</span> Equals(TokenType type, <span class="keyword">string</span> value)
{
<span class="keyword">return</span> fType == type && fValue == value;
}
}
}</pre>
<p>You may have noticed that there are no comments or argument validation code. This is just to make the code simpler and easier to understand at first sight. The code I'm writing in Visual Studio is fully commented and has all the necessary argument validation code. I'll release the entire library afterwards.</p>
<p>Next time, we'll write the tokenizer.</p>Tommy Carlierhttp://www.blogger.com/profile/00487440070088105656noreply@blogger.com