Skip to content

Commit f55b37d

Browse files
committed
- Initial simple msg test case.
- Update TODO, stripping out all the redundant ole stuff. - Fix property set guids to use the new Ole::Types::Clsid type. - Add block form Msg.open - Fix file requires for running tests individually. - Start to flesh out move to mapi - Update RangesIO subclasses for changes in ruby-ole git-svn-id: https://ruby-msg.googlecode.com/svn/trunk@113 c30d66de-b626-0410-988f-81f6512a6d81
1 parent 16ffde1 commit f55b37d

File tree

11 files changed

+154
-111
lines changed

11 files changed

+154
-111
lines changed

Rakefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ task :default => [:test]
1717

1818
Rake::TestTask.new(:test) do |t|
1919
t.test_files = FileList["test/test_*.rb"]
20-
t.warning = true
20+
t.warning = false
2121
t.verbose = true
2222
end
2323

@@ -51,7 +51,7 @@ spec = Gem::Specification.new do |s|
5151

5252
s.autorequire = 'msg'
5353

54-
s.add_dependency 'ruby-ole', '>=1.2.1'
54+
s.add_dependency 'ruby-ole', '>=1.2.3'
5555
end
5656

5757
Rake::GemPackageTask.new(spec) do |p|

TODO

Lines changed: 7 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,10 @@
1-
= Ole
2-
3-
* support for shrinking the bats. sbat and bbat can grow to accomodate more data,
4-
but they can't shrink yet. at least they should be cropped so that all trailing
5-
AVAIL are removed at save time, then re-padded to block boundary.
6-
(solved by truncated_table ???)
7-
* similarly, at close/save time, we should truncate the parent io file.
8-
* wipe-out with 0's on truncation time not yet done (save on junk).
9-
* handle including bat information in the bat itself (increasingly important)
10-
(partial support)
11-
* how to handle a small block file getting bigger than threshold, or big block
12-
file getting smaller than it?? when you start writing to an empty file, it
13-
should be sbat by rights, but then it gets past a certain point, and we have
14-
to transparently map that 4096 bytes to a bbat file and keep going??? seems
15-
incredibly messy, and will cause a lot of churn in the allocation tables.
16-
neater solution???
17-
maybe handle it at close time somehow? all new chains are bbat, and then get
18-
migrated to sbat at close time? might be less expensive.
19-
"solution" for now: buffering. IO.copy buffer is 4096. increase this, to
20-
8192, then its bigger than default migration threshold (4096), so first
21-
write to unallocated file moves it directly to bbat with no sbat involvement.
22-
if you do smaller writes, you will end up claiming a lot of sbat blocks,
23-
and move to bbat later. buffering would help too, but i'll rely on IO.copy
24-
for now.
25-
(bat migration implemented)
26-
* when all the above is implemented, we'll allow cheaper edits that our current
27-
complete-rewrite approach, but they'll be messier. so, support a repack function.
28-
it opens a temporary file, completely copies @io to it, opens it as a Storage
29-
object, truncates @io, resets things like initialize would (clean header etc).
30-
then it does a Dirent#copy on the 2 roots to provide a clean version.
31-
thus:
32-
33-
Ole::Storage.open 'myfile.doc', 'r+', &:repack
34-
35-
would be a simple ole file cleaner.
36-
37-
* then test & cleanup.
38-
391
= Newer Msg
402

413
* fix nameid handling for sub Properties objects. needs to inherit top-level @nameid.
424

435
* better handling for "Untitled Attachments", or attachments with no filename at all.
446
maybe better handling in general for attached messages. maybe re-write filename with
45-
subject.
7+
subject.
468

479
* do the above 2, then go for a new release.
4810
for this release, i want it to be pretty robust, and have better packaging.
@@ -52,16 +14,16 @@
5214
submit to the usual gem repositories etc, announce project as usable.
5315

5416
* better handling for other types of embedded ole attachments. try to recognise .wmf to
55-
start with. etc. 2 goals, form .eml, and for .rtf output.
17+
start with. etc. 2 goals, form .eml, and for .rtf output.
5618

5719
* rtf to text support, initially. just simple strip all the crap out approach. maybe html
58-
later on. sometimes the only body is rtf, which is problematic. is rtf an allowed body
59-
type?
20+
later on. sometimes the only body is rtf, which is problematic. is rtf an allowed body
21+
type?
6022

6123
* better encoding support (i'm thinking, check some things like m-dash in the non-wide
62-
string type. is it windows codepage?. how to know code page in general? change
63-
ENCODER[] to convert these to utf8 i think). I'm trying to just move everything to utf8.
64-
then i need to make sure the mime side of that holds up.
24+
string type. is it windows codepage?. how to know code page in general? change
25+
ENCODER[] to convert these to utf8 i think). I'm trying to just move everything to utf8.
26+
then i need to make sure the mime side of that holds up.
6527

6628
* certain parsing still not correct, especially small properties directory,
6729
things like bools, multiavlue shorts etc.

lib/mapi.rb

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,47 @@
11
require 'msg'
22

3+
=begin
4+
5+
Mapi::PropertyStore
6+
Mapi::Item
7+
Mapi::Attachment
8+
Mapi::Recipient
9+
10+
Pst::PropertyStore < Mapi::PropertyStore
11+
Msg::PropertyStore < Mapi::PropertyStore
12+
13+
# simple hash like api. can be lazily loaded. however if you
14+
# use something like #keys, or #values, then you'd probably
15+
# just cache things. maybe this would be the raw interface,
16+
# which is wrapped by something which provides named access.
17+
class Mapi::PropertyStore
18+
def initialize symbolic=true
19+
@symbolic = symbolic
20+
end
21+
22+
def raw
23+
# gives access to the real property store object. the top level
24+
# wrapped one uses symbolic naming.
25+
self.class.new false
26+
end
27+
28+
def each symbolic=@symbolic
29+
# iterate through the key value pairs. keys are
30+
# Mapi::PropertyStore::Key instances, which allows for named
31+
# properties. values are of the appropriate class (which is
32+
# what?).
33+
end
34+
35+
def []
36+
# get a value based on a key
37+
end
38+
39+
def to_h symbolic=@symbolic
40+
end
41+
end
42+
43+
=end
44+
345
module Mapi
446
module Types
547
#

lib/mime.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# need RecursivelyEnumerable
2+
require 'rubygems'
3+
require 'ole/support'
4+
15
#
26
# = Introduction
37
#

lib/msg.rb

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
#! /usr/bin/ruby
2-
3-
$: << File.dirname(__FILE__)
4-
51
require 'yaml'
62
require 'base64'
73

@@ -34,10 +30,15 @@ class Msg
3430
# Alternate constructor, to create an +Msg+ directly from +arg+ and +mode+, passed
3531
# directly to Ole::Storage (ie either filename or seekable IO object).
3632
def self.open arg, mode=nil
37-
msg = Msg.new Ole::Storage.open(arg, mode).root
33+
msg = new Ole::Storage.open(arg, mode).root
3834
# we will close the ole when we are #closed
3935
msg.close_parent = true
40-
msg
36+
if block_given?
37+
begin yield msg
38+
ensure; msg.close
39+
end
40+
else msg
41+
end
4142
end
4243

4344
# Create an Msg from +root+, an <tt>Ole::Storage::Dirent</tt> object
@@ -162,7 +163,7 @@ def populate_headers
162163
time = nil unless Date === time
163164
# can employ other methods for getting a time. heres one in a similar vein to msgconvert.pl,
164165
# ie taking the time from an ole object
165-
time ||= @root.ole.dirents.map(&:time).compact.sort.last
166+
time ||= @root.ole.dirents.map { |dirent| dirent.modify_time || dirent.create_time }.compact.sort.last
166167

167168
# now convert and store
168169
# this is a little funky. not sure about time zone stuff either?
@@ -510,16 +511,3 @@ def inspect
510511
end
511512
end
512513

513-
if $0 == __FILE__
514-
quiet = if ARGV[0] == '-q'
515-
ARGV.shift
516-
true
517-
end
518-
# just shut up and convert a message to eml
519-
Msg::Log.level = Logger::WARN
520-
Msg::Log.level = Logger::FATAL if quiet
521-
msg = Msg.open ARGV[0]
522-
puts msg.to_mime.to_s
523-
msg.close
524-
end
525-

lib/msg/properties.rb

Lines changed: 46 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -107,30 +107,46 @@ class Properties
107107
:default => proc { |obj| obj.open }
108108
}
109109

110-
# these won't be strings for much longer.
111-
# maybe later, the Key#inspect could automatically show symbolic guid names if they
112-
# are part of this builtin list.
113-
# FIXME. hey, nice that my fake string is the same length though :)
114-
PS_MAPI = '{not-really-sure-what-this-should-say}'
115-
PS_PUBLIC_STRINGS = '{00020329-0000-0000-c000-000000000046}'
116-
# string properties in this namespace automatically get added to the internet headers
117-
PS_INTERNET_HEADERS = '{00020386-0000-0000-c000-000000000046}'
118-
# theres are bunch of outlook ones i think
119-
# http://blogs.msdn.com/stephen_griffin/archive/2006/05/10/outlook-2007-beta-documentation-notification-based-indexing-support.aspx
120-
# IPM.Appointment
121-
PSETID_Appointment = '{00062002-0000-0000-c000-000000000046}'
122-
# IPM.Task
123-
PSETID_Task = '{00062003-0000-0000-c000-000000000046}'
124-
# used for IPM.Contact
125-
PSETID_Address = '{00062004-0000-0000-c000-000000000046}'
126-
PSETID_Common = '{00062008-0000-0000-c000-000000000046}'
127-
# didn't find a source for this name. it is for IPM.StickyNote
128-
PSETID_Note = '{0006200e-0000-0000-c000-000000000046}'
129-
# for IPM.Activity. also called the journal?
130-
PSETID_Log = '{0006200a-0000-0000-c000-000000000046}'
131-
132110
SUBSTG_RX = /__substg1\.0_([0-9A-F]{4})([0-9A-F]{4})(?:-([0-9A-F]{8}))?/
133111

112+
# the property set guid constants
113+
# these guids are all defined with the macro DEFINE_OLEGUID in mapiguid.h.
114+
# see http://doc.ddart.net/msdn/header/include/mapiguid.h.html
115+
oleguid = proc do |prefix|
116+
Ole::Types::Clsid.parse "{#{prefix}-0000-0000-c000-000000000046}"
117+
end
118+
119+
NAMES = {
120+
oleguid['00020328'] => 'PS_MAPI',
121+
oleguid['00020329'] => 'PS_PUBLIC_STRINGS',
122+
oleguid['00020380'] => 'PS_ROUTING_EMAIL_ADDRESSES',
123+
oleguid['00020381'] => 'PS_ROUTING_ADDRTYPE',
124+
oleguid['00020382'] => 'PS_ROUTING_DISPLAY_NAME',
125+
oleguid['00020383'] => 'PS_ROUTING_ENTRYID',
126+
oleguid['00020384'] => 'PS_ROUTING_SEARCH_KEY',
127+
# string properties in this namespace automatically get added to the internet headers
128+
oleguid['00020386'] => 'PS_INTERNET_HEADERS',
129+
# theres are bunch of outlook ones i think
130+
# http://blogs.msdn.com/stephen_griffin/archive/2006/05/10/outlook-2007-beta-documentation-notification-based-indexing-support.aspx
131+
# IPM.Appointment
132+
oleguid['00062002'] => 'PSETID_Appointment',
133+
# IPM.Task
134+
oleguid['00062003'] => 'PSETID_Task',
135+
# used for IPM.Contact
136+
oleguid['00062004'] => 'PSETID_Address',
137+
oleguid['00062008'] => 'PSETID_Common',
138+
# didn't find a source for this name. it is for IPM.StickyNote
139+
oleguid['0006200e'] => 'PSETID_Note',
140+
# for IPM.Activity. also called the journal?
141+
oleguid['0006200a'] => 'PSETID_Log',
142+
}
143+
144+
module Constants
145+
NAMES.each { |guid, name| const_set name, guid }
146+
end
147+
148+
include Constants
149+
134150
# access the underlying raw property hash
135151
attr_reader :raw
136152
# unused (non-property) objects after parsing an +Dirent+.
@@ -453,6 +469,8 @@ def body_html
453469
#
454470
# Also contains the code that maps keys to symbolic names.
455471
class Key
472+
include Constants
473+
456474
attr_reader :code, :guid
457475
def initialize code, guid=PS_MAPI
458476
@code, @guid = code, guid
@@ -502,17 +520,21 @@ def == other
502520
alias eql? :==
503521

504522
def inspect
523+
# maybe the way to do this, would be to be able to register guids
524+
# in a global lookup, which are used by Clsid#inspect itself, to
525+
# provide symbolic names...
526+
guid_str = NAMES[guid] || "{#{guid.format}}"
505527
if Integer === code
506528
hex = '0x%04x' % code
507529
if guid == PS_MAPI
508530
# just display as plain hex number
509531
hex
510532
else
511-
"#<Key #{guid}/#{hex}>"
533+
"#<Key #{guid_str}/#{hex}>"
512534
end
513535
else
514536
# display full guid and code
515-
"#<Key #{guid}/#{code.inspect}>"
537+
"#<Key #{guid_str}/#{code.inspect}>"
516538
end
517539
end
518540
end

lib/pst.rb

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,14 @@
77
#
88
# = TODO
99
#
10-
# 0. massive refactoring and cleaning up, now that the main stuff is supported.
11-
# 1. xattribs
12-
# 2. solve recipient table problem (test4).
13-
# 3. generalise the Mapi stuff better
14-
# 4. refactor index load
15-
# 5. msg serialization?
10+
# 1. solve recipient table problem (test4).
11+
# this is done. turns out it was due to id2 clashes. find better solution
12+
# 2. check parse consistency. an initial conversion of a 30M file to pst, shows
13+
# a number of messages conveting badly. compare with libpst too.
14+
# 3. xattribs
15+
# 4. generalise the Mapi stuff better
16+
# 5. refactor index load
17+
# 6. msg serialization?
1618
#
1719

1820
=begin
@@ -238,8 +240,9 @@ def self.encrypt decrypted
238240
end
239241

240242
class RangesIOEncryptable < RangesIO
241-
def initialize io, ranges, opts={}
242-
@decrypt = !!opts[:decrypt]
243+
def initialize io, mode='r', params={}
244+
mode, params = 'r', mode if Hash === mode
245+
@decrypt = !!params[:decrypt]
243246
super
244247
end
245248

@@ -880,7 +883,7 @@ def initialize pst, idx_head
880883
decrypt = decrypts.first
881884
# convert idxs to ranges
882885
ranges = @idxs.map { |idx| [idx.offset, idx.size] }
883-
super pst.io, ranges, :decrypt => decrypt
886+
super pst.io, :ranges => ranges, :decrypt => decrypt
884887
end
885888
end
886889

lib/rtf.rb

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#! /usr/bin/ruby -w
2-
31
require 'stringio'
42

53
# this file is pretty crap, its just to ensure there is always something readable if
@@ -109,10 +107,3 @@ def self.rtf2text str, format=:text
109107
end
110108
end
111109

112-
if $0 == __FILE__
113-
#str = File.read('test.rtf')
114-
str = YAML.load(open('rtfs.yaml'))[2]
115-
#puts str
116-
puts text
117-
end
118-

test/test_Blammo.msg

15.5 KB
Binary file not shown.

test/test_mime.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,11 @@ def test_parsing_no_multipart
1414
assert_equal 'Body text.', mime.body
1515
assert_equal false, mime.multipart?
1616
assert_equal nil, mime.parts
17-
# we get round trip conversion. this is mostly fluke, as orderedhash hasn't been
18-
# added yet
1917
assert_equal "Header1: Value1\r\nHeader2: Value2\r\n\r\nBody text.", mime.to_s
2018
end
19+
20+
def test_boundaries
21+
assert_match(/^----_=_NextPart_001_/, Mime.make_boundary(1))
22+
end
2123
end
2224

0 commit comments

Comments
 (0)